The inital version installed with homebrew is ok, but updating node after installing via homebrew is impossible due to how homebrew handles linking. Using NPM makes upgrades and managing versions simpler.

This is an FYI and warning, be very careful with haswell processors with RHEL/CentOS 6.6. There is a futex wait() bug that can cause processes which wait to never resume agian. A good description is on InfoQ.

“The impact of this kernel bug is very simple: user processes can deadlock and hang in seemingly impossible situations. A futex wait call (and anything using a futex wait) can stay blocked forever, even though it had been properly woken up by someone. Thread.park() in Java may stay parked. Etc. If you are lucky you may also find soft lockup messages in your dmesg logs. If you are not that lucky (like us, for example), you'll spend a couple of months of someone's time trying to find the fault in your code, when there is nothing there to find.”

I recently saw this with Dell R630's and cassandra. A thread dump shows the threads in a BLOCKED state and the stack trace shows them as parked.

Cassandra logs go completely blank and CPU utilization will stay constant at some level (sometimes high / sometimes none). Interestingly you can revive the process with a kill -STOP <jvm_pid> and kill -CONT <jvm_pid> (which is much faster than a service restart).

Update Centos 6.6 to the newest kernel in the updates repository to fix this, version 2.6.32-504.30.3.el6.x86_64.

Big thanks to Adam Hattrell, Simon Ashley and Erick Ramirez from Datastax for the help to figure this out.

Something not often mentioned and tested is the impact of latency in the wild on the operation and scalability of a website. The vast majority of load tests conducted are ran from a local load source, jmeter in the same availability zone. In this case the latency is incredibly low, probably sub millisecond. In the real world your application will never see this kind of latency again, it will be anywhere from 50 to 500ms depending on the global mix of traffic you receive. This can kill the performance of your application in surprising ways.

The time Apache spends waiting for a response on low latency requests is going to be small, this allows your servers to handle a much larger volume of traffic spread over a much lower number of threads. This is further amplified if your application is handling a lot of small quick requests, say an web API. In the lab, a server might be able to handle thousands of requests per second with only 30-100 threads at any given time. Using such a small number of threads is stellar for performance, the box will require much less application concurrency. A change in latency from 1ms to 200ms will cause transactions overhead to take 200% longer by definition, if your application has a 1:1 ratio of thread to transaction this will also cause a 200% increase in concurrency. This could most obviously lead to the box running out of threads or memory in production before it reaches the performance levels seen during testing.

Latency issues could also highlight any bottlenecks in your code where the application blocks while waiting on other threads. You could see this in your performance graphs by comparing context switching and system CPU usage between QA and production, as waiting on other threads often shows up in the kernel level.

What to do

So finally, what can we do about this? Load test from over the internet! You should mimic production latency in your performance testing environment, this will ensure that you not only test the raw performance of your application but also stress production similar concurrency levels. To do this you should generate the load for your tests remotely in some form of cloud, like AWS. This raises a big question though, where do you generate the load from? If your average visitors are fairly geographically close by, you don't need to test from that far away. But if you have a truly global customer base you may want to generate load from the other side of the Atlantic. To decide where you really need a good average of your production latency, which is fairly hard to measure (I'm not about to ping every IP in my apache access log, haha). Luckily we can get this number a roundabout way through testing!

If using Apache HTTPD, the first step is to enable Apache server-status, if you want to see what this looks like httpd.apache.org has server-status enabled by default, kudos to them. Next test your app in QA, fire up enough threads to mirror the requests per second your production site sees; then measure the number of active threads ("requests currently being processed" in server-status). Using this you can compute the average latency you see on your production site like so:

production_latency=local_latency*production_threads/local_threads

To increase accuracy of your measurement increase the latency in your test environment, possibly generate the load from a near by AWS location. You will still know the latency but it then won't be so close to 0; the difference between .08ms and .07ms is pretty significant in the final number while being hard to measure accurately...

So then when you are armed with a production latency number, peruse different cloud providers and find one that has a latency to your test site that is near or somewhat larger that can be you see in production. Then when you run tests you can also test at similar application concurrency numbers to what is experienced in production!

Any comments, questions, concerns, or areas where I'm wrong that you'd like to troll?

I just setup my blog on google's webmasters tool and saw that they wanted a sitemap. This led to the question of "how do I make one of those!" Luckily I found David Singer's blog post on building a sitemap!. This is a direct paste of his code which I found to work exactly as required, be sure to check out his blog.

All that is required is putting the sitemap code in the root directory of your blog, and then on any pages you want to customize adding (if you don't add this the page will get the defaults: