And now for something completely geeky

Now that my exams are over, I’ve gotten back to working towards launching my little startup. During the second half of my semester I couldn’t find time or energy for it, but I did occasionally give it some thinking time, some of which has borne fruit.

Beware, Troppo readers; here be nerds.

My technical architecture has a few moving parts. The heart of the operation is to gather data via HTTP requests, perform some computations, send a reply, and keep a record of the transaction. Originally I planned to do this with the conventional fashion of webserver, PHP and a database. My first proof-of-concept system used this architecture. It worked well.

But as I thought about my goals more, it became clear that two key technical ‘non-functional’ objectives are to minimise response time and to maximise requests handled per second.

The first objective is about maximising user satisfaction. You may be familiar with how some websites load slowly because they pull in images, scripts, advertisements etc from a wide range of slow-responding servers. I am determined that my service will not be one of those slowpokes.

The second objective is about maximising profitability. The more users I can support per server, the fewer severs I will need. The fewer servers, the lower my monthly expenses. In general the first and second technical objectives will go hand-in-hand, but in case of conflict it is important to have a clear ordering.

Now the original architecture looked like this:

That’s a pretty standard approach. It works well, is proven and well-supported by existing tools. Some simple tests established that on a modest virtual machine I could expect to handled about 700 requests per second at an average of 300ms each.

The major drawback with this architecture is that performance isn’t set by the web server, it’s set by the database. Most web applications are read-oriented, with users who expect their web interface to up-to-date within seconds. But my system is actually write-oriented. In my situation there’s no firm requirement that the database be up-to-date in seconds. For all I care it could take hours for the data to pass from web server to database, just so long as it does.

So back to the drawing board. My second architecture modified the original by placing a queueing service between the web server and the database.

When the web server has data, it pushes it into the queue. The database can fetch the data at its own pace. Neither component is slowing down the other one.

The use of queues to allow the front-end servers to respond faster has become very common. Developers can choose from many commercial or off-the-shelf software systems for fast queueing. In addition Amazon even have “queueing as a service” with their Simple Queue System. With SQS you pay an extremely small fee per message; Amazon guarantees that the queue entries will persist for several days and allows unlimited queue entries.

But still I wasn’t satisfied with the front end. I wanted it to go faster, if at all possible. It so happens that I noticed another service offered to Amazon customers: Elastic Block Storage. Essentially Amazon allow you to have a virtual hard drive. You can attach it to virtual machines, read from and write to it, detach it, clone it, reattach it, or even shuffle it amongst different virtual machines.

That last part interested me most. With an architecture based on queueing, the bottleneck is still at the database server. It must continuously fetch items one at a time. This is quite fast, but nowhere near as fast as using database facilities for loading in bulk. Supposing I could ship thousands or even millions of datapoints to the database server at a time, my total performance goes up — few servers required, more profit.

This leads me to my third architecture:

In this design I am writing data directly to the virtual disk in the form of a simple log. Occasionally the disk is detached from the webserver and reattached to the database server. The database server performs a bulk load, then returns the virtual disk to the pool available for use by web servers.

This allows me two further optimisations.

The first is visible in the diagram. Instead of having my HTTP server talk to a PHP backend (or indeed any other backend), Lighttpd allows me to directly embed a Lua script using the mod_magnet plugin. Lua is a pleasant, lightweight language, and very fast. In some microbenchmarks I have seen it blow the doors off PHP for web serving performance.

I didn’t use embedded Lua in the original architecture due to a flaw in its database connection system, or in the second architecture because there are no Lua libraries for talking to Amazon’s SQS. In theory I could have rectified both of these drawbacks, but I am a great believer in simplifying my programming overhead whenever possible.

The second optimisation is to use a different file system. Most file systems are optimised to speed up seek or read times. Seek time is the measure of time it takes for the head of a hard drive to find the correct track to read data from; read time is a measure of how long it takes to pull a file off disk and into memory. File systems use a number of tricks and techniques to minimise these at the expense of write times. The basic thinking is that writing to disk happens less often than reading.

But my workload is the opposite: reading happens much less frequently. Write performance is my dominant consideration.

The latest Linux kernel includes a new file system from Japanese telco NTT called NILFS2. This is a “log-structured” file system. The basic upshot is that it is built around maximising write speed, which is exactly what I want. In a microbenchmark, NILFS2 improves overall performance by roughly 20% over Ext3, the default Linux file system.

Between the embedded Lua script and NILFS2, my average response time is down to 122ms with a standard deviation of 655ms. 95% of requests are served in less than 250ms. Overall the server can spit out 2064 requests/second on a simple microbenchmark. Remember that my original design could only handle 700 requests per second. That’s an approximate tripling of performance. If nothing else it shows that good design rarely turns up on the first attempt. It’s very possible that I will think of something much better later on (and even more so that you, Humble Reader, will do so).

Getting it running correctly is more important than getting it running fast, especially when you have no customers :-)

Queuing cannot (even theoretically) speed up your average transaction rate. The best it can do is load-level, to ensure the peak burst transaction rate is increased, but only if there is time to clear up the queue during quiet periods. This is useful if you expect input transactions to come along in a bursty pattern.

MySQL is generally faster at simple operations than PostgreSQL. If write performance is the main issue then don’t bother indexing the data (or keep the index as minimal as you can). Also, for the single query at a time database operations, make sure it is not re-parsing the SQL each time (use the long-winded prepare and execute process, should be faster for repetitive operations because prepare only happens once). Probably still won’t be as fast as bulk-insert.

If you are buying virtual machine space, is there any good reason to put the database server on a separate virtual machine? If you buy two VMs, they are just as likely going to end up as two processes on the same hardware anyhow so may as well plonk everything into the same VM (and reduce the communication and context-switch overhead).

There are a few tricks with ext3 that the Squid guys have documented, like disabling updates of access times and similar. Something to try.

Queuing cannot (even theoretically) speed up your average transaction rate. The best it can do is load-level, to ensure the peak burst transaction rate is increased, but only if there is time to clear up the queue during quiet periods. This is useful if you expect input transactions to come along in a bursty pattern.

It depends how we’re measuring transaction rates. In total system terms you’re write; in fact a queue adds overhead. But remember, my top priority is to minimise response time. Pushing data straight to a fast queue means the web server can immediately serve the next request rather than waiting on a database write.

Apart from evening out bursts (as you pointed out), the other reason queues get used a lot is to allow the web and database layers to be scaled independently without much jiggery-pokery.

MySQL is generally faster at simple operations than PostgreSQL.

Only on a single CPU, and only if you’re using MyISAM. I’d sooner trust my data to used toilet paper.

Postgres scales better, is much more reliable, has better inbuilt crypto, supports proper constraints, has decent VIEWs, allows me to use Lua as the language for stored procedures and is much more extensible in general. I haven’t yet decided whether to bulk load data and then process it, or process it and then do the bulk load. I will need to build prototypes for both and see which works best.

If you are buying virtual machine space, is there any good reason to put the database server on a separate virtual machine? If you buy two VMs, they are just as likely going to end up as two processes on the same hardware anyhow so may as well plonk everything into the same VM (and reduce the communication and context-switch overhead).

Firstly, it allows me to scale the web front end and the database separately. On the backend there will be a small pool of beefy VMs. On the front end a buzzing cloud of cheap, small VMs.

Secondly, and to me this is more important, it gives me an extra ringfence around the data. In a traditional setup (like my original arch above), you set up the DB servers so that they only respond to traffic from the web servers. In the second and third architectures you can go one better — they need not even know where the web servers are or vice versa, making it that much harder for an attacker to compromise the data.

Getting it running correctly is more important than getting it running fast, especially when you have no customers

True; but on the other hand, “measure twice, cut once”. Or should that be “design thrice, code once”?

And also, in my case, speed is a feature. It partly defines correctness for my requirements.

I went with a mate of mine who was into computers and technical architecture to his own gathering, and they had almost the same discussion as mentioned in this blog. It got into this discussion, when a few of us who were managing our own businesses very concerned about the user interface of our own websites. I am glad that our current website administrators are working very well to have our website run fast and it was done very user-friendly, for most of our storage services inquiries are done online. I used to work with one of the fellows in that gathering, who is now a moving boxes supplier for most of the self storage companies in Sydney and Melbourne. He also had the same issue with his own website, and to him, it has to load fast so that customers would not get impatient, and in the end, losing some business. It was a good discussion, and we got pointers and tips from the geeks in the group.