Why to choose a cloud service, and which one
This is the second part in a series of posts about how the MepSQL packages were built. In part 1 I evaluated OpenSuse Build System and Launchpad PPA and ended up concluding that running your own BuildBot system is the best choice, as those public services didn't provide any facility to test their packages.
This brings us to the next topic: As I don't possess any servers, should I buy one (or more) or should I try out the cloud services? If yes, should I use Amazon EC2 or something else?
Let's look at costs:
Alternative 1a: Cheap server
I could buy a single CPU server, with 1GB RAM for 350-500 EUR. I have an ADSL connection that even allows me to use public IP addresses. If I wanted to use the least possible amount of money, I'd choose this option.
This server is less powerful (but also much cheaper) than my laptop. Builds will take a long time. Development will slow down as each iteration of building and testing takes longer. (2x-5x longer, by estimate.)
Alternative 1b: Bigger server
A quad core server, with 4 GB RAM would cost around 2500 EUR. This I could afford, but it is quite a large upfront investment in a project I don't know what the future will be. What if I later want to abandon MepSQL for something else? Or what if MepSQL becomes popular and someone offers to donate a server to the project? Then I'm sitting with a 2500 EUR server doing nothing in my basement.
Alternative 2: Cloud
In many ways, using a cloud service was ideal for this project:
- It's a good option to get started with minimal investment, you can always buy your own server later.
- You can provision a few servers in the cloud immediately, don't have to wait a few days for shipping, don't have to spend time installing Linux...
- Space considerations: Don't need to have a server in the living room / Don't need to draw cabling down into the basement.
- A build server is a batch job: If I had my own server, it would sit there doing nothing most of the time. In the cloud I can start and stop my servers as needed, and don't pay anything for the unused time.
- Can boost capacity when needed: I'll want to build and run tests on a number of platforms - a few dozens ultimately - to support all versions of Linux, Windows, Solaris on different hardware. In the cloud, I can run an "infinite" amount of these in parallel without any extra cost. On my own server, I would have to run each virtual server sequentially, since I have only one server. (MariaDB does it this way - and in total it takes about 15 hours before the last batch finishes.)
- Not only can I run things in parallel, I can choose how powerful server instances I want to launch each time. On Amazon I can pay 0.02 USD/hour for a small server with 1 CPU, or 17x more for a server with 4 CPUs that gets the job done in less than half the time. Depending on the situation, paying 17x more is worth the time saved.
Alternative 2a: Rackspace cloud
We of course all know about the Amazon cloud, but I wanted to compare it to something. Since Rackspace sponsors both Drizzle and OpenStack - and their service is based on OpenStack - I would have preferred to spend my money here. (Amazon is not open source, but the open source project OpenStack provides an EC2 compatibility layer for its own HTTP API.)
Pro: Cheaper plus more granularity in different options. The latter saves you money since you can pick precisely the kind of instance you need and not pay any extra.
Equal: The original Amazon S3 based "instance store" is difficult to manage. Essentially you can launch servers and once you stop them they disappear. To save any data between runs is complex. Rackspace Cloud Servers use traditional shared storage, can be stopped and rebooted without loosing your data. This is much, much better. However, today Amazon also provides this, it is called EBS (for Elastic Block Storage).
Con: At the time I started this, Rackspace only had a Web GUI for launching servers, but no programmable/scriptable REST API. I suspected this would be problematic - and indeed I do use this feature now on Amazon. Note that as of last month, Rackspace does have a REST API: http://www.rackspace.com/cloud/cloud_hosting_products/servers/api/.
Alternative 2b: Amazon EC2 cloud
As you already know, this is the one I ended up choosing. In addition to Rackspace lacking the REST API at the time, I figured that "it is what everyone else is using" was a powerful pro-amazon argument too.
Indeed, for instance Ubuntu provides official images that you can just launch without spending any time on installation or configuration. They include cloud-friendly tweaks compared to the standard Ubuntu installation - for instance they are configured to use an APT repository inside Amazon's data center. This make running "apt-get upgrade" amazongly fast: the download takes literally less than a second. Then of course unpacking the debs and running the installation scripts takes its own time...
So what does it cost?
I now have 3 months worth of billing data from Amazon. So what does it really cost?
Well, the first month was less than 1 USD since I just used small t1.micro instances which in the beginning are free thanks to some Amazon promotion. January and February has cost me roughly 200 USD each. So in 2 months I've spent the same money I could have spent on the cheap server in Alternative 1b. There are 2 things to say about that: 1) It is money well spent as the builds are faster in the cloud than on a cheap single CPU server, and 2) I could have spent much less if I had optimized the money side of this.
There are several reasons why I spent much more money than I should have. The biggest culprit is bzr bug 367545 For some reason when you branch the MySQL sources from Launchpad, Bzr eats up 800+ megs of RAM (essentially the whole bzr repository is stored in memory). Considering this is essentially just a download operation, it's really ridiculous for Bzr to use that much memory. If your computer has less memory - like a cheap t1.micro instance has - then bzr gets killed and the branch operation fails.
I mentioned above that Amazon doesn't offer as much granularity when choosing instances as Rackspace. So for 64-bit platforms the next size after t1.micro is m1.large and costs 17x more! So this bzr bug probably now cost me around 380 dollars as it was the main reason I had to use the m1.large instances so much!
I was also rather careless in my usage of the EC2 servers. While a build system is an excellent use case for the cloud, my usage of it was not. Being on paternity leave I actually left the servers running idle for large amounts of time. I would code a little, start a build, go and take care of the kids, code a little more, go to sleep, come back 24 hours later to code a little... If I had done this as a work project my use of EC2 would probably had been much more efficient as I would have shutdown the servers when not needed.
Towards the end I implemented a workaround to the bzr bug, which allowed me to use t1.micro instances again - I started keeping around a tar file with a shared mysql repository in it. It also cut the build time by more than half (from 2 hours to 50 minutes) as prior to this the time for bzr to download a new repository from launchpad was taking most of the time! But even if it was now possible to use the t1.micro instances, I decided to use the m1.large instance type anyway, as builds would complete 3-4 times faster. During development, this allowed me to progress faster as the wait time was reduced.
In a sense, you might say that the cloud allows you to flexibly choose whether to spend time to save money, or spend money to save time :-)
Updated Feb 23: Found out from Jay Pipes that OpenStack does provide an EC2 compatibility layer in its HTTP API, so I removed a reference to Eucalyptus and mention OpenStack instead.