Packaged but not virtualised

Since VMware started virtualising x86 hardware many years ago, many uses have been found for this approach of creating fully self-contained software environments. I’ve used VMware many years, and was pleased when Apple switched the Macs to “Intel inside”.

Nowadays, I can run numerous variants of Windows as well as Linux on a single laptop. I switched to Parallels as VM core, but the principle remains the same: a large file is used as virtual hard disk, with an operating system installed, and from then on the whole environment simply “runs” inside the Mac host (in my case).

Nowadays, we all take this for granted, and use virtual environments as ultra-low cost web servers without even giving it any thought. I have a number of virtual machines running on a Mac Mini for all the JeeLabs sites in fact (and some more). One piece of hardware, able to suspend and resume N completely independent systems at the click of a mouse.

But there’s a drawback: you end up duplicating a lot of the operating system and main services in each VM, and as a result this stuff not only consumes disk space but also leads to a lot of redundant backup data. I keep the VM’s very lean and mean here, but they still all eat up between 3 and 10 gigbytes of disk space. Doesn’t sound like much, but I’m pretty sure my own data uses less than 10% of that space…

Both Parallels and VMware (and probably also VirtualBox) support snapshotting, which means that in principle you could set up a full Linux environment with a LAMP stack, and then use snapshots to only store what’s different for each VM. But that’s not useful in the long run, because as soon as you get system updates, they’ll end up being duplicated again.

It’s a bit like git “forks”: set up a state, create a fork to create a variant, and then someone decides to add some features (or fix some bugs) to the original which you’d also like to adopt. For git, the solution is to merge the original changes into your fork, and use “fast-forwarding” (if I understand it correctly – it’s all still pretty confusing stuff).

So all in all, I think VM’s are really brilliant… but not always convenient. Before you know it, you end up maintaining and upgrading N virtual machines every once in a while.

I’ve been using TurnkeyLinux as basis of all my VM’s because they not only maintain a terrific set of ready-to-run VM images, but because they also make them self-updating out of the box, and because they’ve added a very convenient incremental daily backup mechanism which supports restarting from the back-up. You can even restart in the cloud, using Amazon’s EC2 VM “instances”.

This is a very interesting new development. Instead of full virtualisation, it uses “chroot” and overlay file systems to completely isolate code from the rest of the 64-bit Linux O/S it is running on. This lets you create a “box” which keeps every change inside.

My view on this, from what I understand so far, is that Docker lets you play git at the operating system level: you take a system state, make changes to it, and then commit those changes. Not once, but repeatedly, and nested to any degree.

The key is the “making changes” bit: you could take a standard Linux setup as starting point (Docker provides several base boxes), install the LAMP stack (if someone hasn’t done this already), commit it, and then launch that “box” as web server. Whereby launching could include copying some custom settings in, running a few commands, and then starting all the services.

The way to talk to these Docker boxes is via TCP/IP connections. There’s also a mechanism to hook into stdin/stdout, so that you can see logs and such, or talk to an interactive shell inside the box, for example.

The difference with a VM, is that these boxes are very lightweight. You can start and stop them as quickly as any other process. Lots of ‘em could be running at the same time, all within a single Linux O/S environment (which may well be a VM, by the way…).

Another example would be to set up a cross compiler for AVR’s/Arduino’s (or ARMs) as a box in Docker. Then a compile would be: start the box, feed it the source files, get the results out, and then throw the box away. Sounds a bit drastic, but it’s really the same as quitting an application – which throws all the state in RAM away. The point is that now you’re also throwing away all the uninteresting side-effects accumulated inside the box!

I haven’t tried out much so far. Docker is very new, and I need to wrap my mind around it to really understand the implications, but my hunch is that it’s going to be a game changer. With the same impact on computation and deployment as VMware had at the time.

As with VM’s, Docker boxes can be saved, shared, moved around, and launched on different machines and even different environments.

For more info, see this page which draws an analogy with freight containers, and how one standard container design has revolutionised the world of shipping and freight / cargo handling. I have this feeling that the analogy is a good one… a fascinating concept!

This is similiar to containers in Solaris. These came out around 2004 on Solaris , and as you say are a much more light weight OS virtualisation.

I have often wondered why it’s not been more widely used in Linux, back when I was at Sun, my experience was we would use it a lot, and only break out vmware when you actually needed a foreign OS (windows etc)

Interesting, Just a comment on some of your VMWare details.
Sanpshots are great for software development and roll back, but snapshots take an image and then work on a delta of the snapshot, so every change to disk is in fact the change plus some stored information referencing it back to the original image. This issue is it can get very slow and very big the longer snap shots are used.
They need to be deleted or collated as often as possible.
But your box idea does get around that.

Also in some cases mainly on SAN kit deduplication is used, to try to remove repeated files, which is very useful as you suggested in the moment where you have multiple of the same OS VM’s in the same area, technically you are only storing one, plus a reference to a file which is a lot smaller.