Categories

Meta

Migrating virtual machines from Amazon EC2 to Google Compute Engine

My Amazon EC2 discount contract is almost up, and I’ve been playing with Google Compute Engine (GCE). Initial impressions are that it’s faster and costs less money, particularly if you don’t want to pay up-front for EC2 reserved instances. Google’s web console is more modern than Amazon’s, though slightly less sophisticated. Google’s CLI tools are much faster and don’t require Java. Google’s API uses JSON instead of XML.

In terms of capabilities, GCE is not as advanced as EC2, but it’s vastly more powerful than Linode, Digital Ocean, and the like. One exception is that Google doesn’t permit sending SMTP directly from GCE instances. They have a partnership with Sendgrid for that. I’m using Mandrill instead, and so far I’m very pleased with that choice.

Migration from EC2 to GCE without re-installation

It’s possible to migrate virtual machines from EC2 to GCE. This post explains how I migrated my production Ubuntu 12.04 LTS instance. It’s not a detailed guide. If you possess a good amount of Linux operations knowledge, I hope the information here will help you do your own migration quickly.

Assumptions

Important differences between EC2 and GCE

EC2 uses Xen for virtualization. GCE uses KVM.

Most EC2 instances are paravirtualized (PV). They do not emulate actual PC hardware, and depend on Xen support in the kernel. Most of the time, EC2 instances use PVGRUB to boot. PVGRUB is part of the Amazon Kernel Image (aki-xxxxxxxx) associated with your instance. PVGRUB basically parses a GRUB configuration file in your root filesystem, figures out what kernel you want to boot, and tells Xen to boot it. You never actually run GRUB inside your instance.

With KVM, you have a full hardware virtual machine that emulates a PC. It requires a functioning bootloader in your boot disk image. Without one, you won’t boot. Fixing this, and using a kernel with the proper support, are the two main obstacles in migrating a machine from EC2 to GCE.

Let’s get started.

On EC2:

Snapshot your system before you do anything else. If you’re paranoid, create the snapshot while your system isn’t running.

Install a recent kernel. The Ubuntu 12.04 LTS kernel images don’t have the virtio SCSI driver needed by GCE. I used HPA’s 3.13.11 generic kernel. (These days it isn’t necessary to use a “virtual” kernel image. The generic ones have all the paravirtualized drivers and Xen/KVM guest support.)

Make sure your EC2 system still boots! If it doesn’t boot on EC2, it won’t do much good on GCE.

On GCE:

Create and boot a new (temporary) instance on GCE using one of their existing distribution bundles.

Create a new volume large enough to receive the boot volume you have at EC2, and attach it to your temporary instance.

Create an MBR partition table on the target volume, partition it, and create a root filesystem.

Mount your new filesystem.

On EC2:

Copy data to your new GCE filesystem. Use any method you like; consider creating a volume on EC2 from the snapshot you just created and using that as your source. That will make sure you copy device nodes and other junk you might overlook otherwise. Remember to use a method that preserves hard links, sparse files, extended attributes, ACL’s, and so on.

On GCE:

Verify you received your data on your target volume and everything looks OK.

Great work! I have software currently running on EC2 in a paravirtualized (PV) instance with a reservation that is about to expire. Before renewing the reservation, I wanted to test other offerings.

I used the steps outlined above to successfully migrate the instance to both GCE and to an EC2’s HVM instance so I could run some benchmarks and compare. I was surprised to learn my software (both CPU and I/O intensive) performed very similarly in both Amazon’s and Google’s platforms. In the end, EC2 (PV) performance was ever so slightly (~5%) better, but again, that is with my software running for a few days with the same data sets in all environments.

My biggest stumbling block is that I had no output from the EC2 console until I finally figured out how to make it work. Basically I was working in the blind. With GCE, the console was there to help me figure out some small details.

Update to the test mentioned in the previous comment. The difference in performance between EC2’s PV and HVM instance may have been due to random factors such as neighbor activities. Further testing (running for almost 24 hours) has shown that I’m getting virtually identical performance out of both of the EC2 environments, which continue to be about 5% better than GCE.