current community

more communities

Kyle Brandt

If you are not familiar with the concept, technical debt is essentially the idea that you build and program things quickly, skipping the niceties in order to ship, and then fix it later. By putting things off you build up debt that needs to be paid down later. One of the places this most commonly shows itself is in performance.

It works like this. Developers make features because the business and users want features. Performance is hard, and the benefits of good performance are not usually as obvious or concrete as the benefits of new features. Therefore, nobody really pays attention to performance or it is intentionally skipped until it gets so bad that people consciously notice it. Then the developers need to do a “feature freeze” and fix things until performance is at least “okay.” again. If you don’t mind the cliche, the feature freeze is the “Rinse.”, and then it all starts over again — “Repeat.” This is the cycle of technical debt.

At Stack Exchange I saw this happen, the developers had to stop working on features and fix performance because it got the point where we were getting timeouts. However, here is where things get interesting: After that, it never happened again.

“Impossible!” No, it is not impossible. In reality, of course there are still things that slip by, but the overall macro cycle of technical debt, when it comes to performance, is avoidable. And if you order my VHS series for 19.95, I will tell you how.

In all seriousness, even if there is no one recipe, from my viewpoint Stack Exchange escaped the cycle through culture, and making the right performance investments. The culture that lead to this consists of:

Good performance makes a system enjoyable to use, everyone has to believe this idea. When development and operations are well integrated the teams empower each other, and since performance takes both programming and systems knowledge this is needed. Lastly, if good performance is an aspect of good craftsmanship, it becomes a source of pride.

These cultural aspects at Stack Exchange and the performance investments made enforce each other. I don’t think we could have one without the other. But if there is a secret sauce, it feels to me like it is the performance investments we have made. These investments follow a development pattern that results in instant feedback when it comes to performance:

The 3 Step Process to Good Performance Investments:

Step 1: Collect your data in a queryable way

I can’t emphasize enough how important this initial step is. Your performance data such as logs and system data (i.e. CPU/Memory/Network etc) needs to be in a format that can easily be queryed, extracted, aggregated, and molded in a way that leads to discovery. We use SQL Server for our logs and system data. It doesn’t have to be SQL, but I think that rrd, the common storage format used by systems like Cacti, although good for displaying time series graphs does not fill this requirement due to the difficulty of extracting data.

Step 2: Discover the Important Metrics

Once you have the data in a queryable format, you can then explore that data and discover what the important metrics are. Once we started capturing our web logs in SQL we were able to add custom headers that tell us things like which route is being hit, and measure performance grouped by route. If your data isn’t queryable the discovery process has too much friction.

Step 3: Automate and Integrate the Important Data

Once you have found the important data by exploring it with various queries, those queries should be automated and integrated into your application. Then with every build (rapid integration or frequent building helps) you get instant feedback. At Stack Exchange we have a dashboard that includes graphs from log data, system data, profiling results, and exceptions. We can explore our web logs with a data explorer instance. Also, some of this such as our profiler results are part of every page load.

This process leads to an instantaneous and effortless return of performance information. This eliminates the friction around discovering how your performance is changing. With this information readily available and in your face, it enables a culture where keeping up with performance becomes an aspect of good craftsmanship.

These tools we have created are performance investments. Investments are the opposite of debt. Investments give returns where as debt has interest. When you make these sort of performance investments the cycle of debt is broken and you start collecting the returns. For the most part, people in this world are either collecting returns or paying debt — and collecting returns feels damn good.