This morning I was testing the new XML/XSLT-based engine I’m working on for the next version of Poseidon. Previously, I had only tested it on my local dev server at home, but this time I wanted to see how it would perform on wonko.com. I was shocked and dismayed to find that it performed horribly. In fact, it was almost 20 times slower than on the dev server.

After getting over the initial urge to destroy things (I’ve spent a lot of time over the last two months tweaking and optimizing my code to squeeze every last bit of performance out of it) I started benchmarking each individual component of the application to try and find the culprit. Actually, first I went on a picnic, then I started benchmarking. But I digress.

The first suspects, obviously, were the XML and XSLT routines. There’s some pretty heavy DOM manipulation going on, not to mention XSL transformations, and it had taken me quite a while to get things to a point on the dev server where I was satisfied with the performance. But my benchmarks showed that these routines weren’t the problem. In fact, they were consuming less than 1% of the total processing time.

What was the problem then, you ask? A single echo statement. All it did was echo the transformed HTML document—a string of about 37,000 bytes—to the browser. On a whim, I removed the echo and, instead, wrote the string to a temporary file and then used readfile() to dump the file’s contents to the browser. Absurdly, this brought the performance back up to the level I had expected.

After a great deal of Googling and digging through PHP bug reports, I found this old bug report (which is, rather frustratingly, marked “bogus”). In short, using echo to send large strings to the browser results in horrid performance due to the way Nagle’s Algorithm causes data to be buffered for transmission over TCP/IP. It wasn’t an issue on the dev server because it’s on my LAN.

The solution? A simple three-line function that splits large strings into smaller chunks before echoing them:

Play around with the buffer size and see what works best for you. I found that 8192, apart from being a nice round number, seemed to be a good size. Certain other values work too, but I wasn’t able to discern a pattern after several minutes of tinkering and there’s obviously some math at work that I have no desire to try to figure out.

By the way, the performance hit also happens when using PHP’s output control functions (ob_start() and friends) and when you enable output buffering in php.ini. I’m amazed more people haven’t noticed this, and I’m even more amazed that the PHP developers have just decided to ignore it since it’s not technically a bug in PHP.