Wednesday, December 11, 2013

Performance tuning - measure don't guess

In my performance tuning career I have given the advice to measure and not guess more often than I can recall. And in many cases the target of this advice has given up after looking at the monolithic 500,000 LOC legacy application they are working.

In the post we are about to share some simple tools and concepts how to setup the initial measurement, so you can have the baseline to start with more fine-grained performance tuning.

Setting the goal

Pretty much any non-trivial application can be optimized forever. Both in bad and in good way. Bad examples include tweaking random parts of the applications and then praying for the best. Whatever might be the reason - I have seen tens and tens of developers “being pretty sure that exactly this line of code needs attention” without basing their gut feeling to any measurements.

The second category is almost as dangerous, as you can spend hundreds of man-years tuning your application “to be ready for the prime time”. The definition for prime time might vary, but are you really sure you are going to accommodate terabytes of media, tens of millions of records in your database and have to serve hundreds of thousands of concurrent users with less than 100ms latency? Unless you are aiming to put Google out of business, you most likely will not. Worse yet, spending so much time preparing for the primetime is a darn good way to assure the prime time never arrives. Instead of spending this time for squeezing out the extra 1ms in latency nobody is likely to notice, maybe the same time should have spent in ironing out this layout bug annoying the end users on Safari instead?

So how to set a meaningful target? For this you need to understand the business you are in. Real-time strategy games and First Person Shooters tend to have different requirements than online shopping carts. As last time I checked, Java was not going strong in the gaming industry, lets expect you are dealing with a typical Java EE application with web based front-end.

In this case, you should start now segmenting your application into different categories, which, based on the research are similar to the following:

0.1 seconds gives the feeling of instantaneous response — that is, the outcome feels like it was caused by the user, not the computer.

1 second keeps the user's flow of thought seamless. Users can sense a delay, and thus know the computer is generating the outcome, but they still feel in control of the overall experience and that they're moving freely rather than waiting on the computer. This degree of responsiveness is needed for good navigation.

10 seconds keeps the user's attention. From 1–10 seconds, users definitely feel at the mercy of the computer and wish it was faster, but they can handle it. After 10 seconds, they start thinking about other things, making it harder to get their brains back on track once the computer finally does respond.

So you might have functionality such as adding products to the shopping carts or browsing through the recommended items which you need to squeeze into the “instant” bucket to provide the best overall user experience resulting in better conversion rates. Initially I am pretty sure your application is nowhere near this criteria, so feel free to replace the 100ms and 1,000ms threshold with something you actually can achieve, such as 300 and 1,500ms for example.

On the other hand, you most likely have some operations which are expensive enough, such as the products search or user account registration which might fall into the second bucket.

And last, you have the functionality which you can toss into the last bucket, such as generating PDF from the invoice. Note that you also might end up with fourth category - for example if you are generating the bills of delivery for your warehouse, your business might be completely OK if batch processing the daily bills takes 20 minutes.

Pay attention that business persons have a tendency of tossing everything into the “instant” bucket. Now it is your task to explain the effort required and ask “if you had only three operations which are completing under our threshold, what should those be”.

Now you should have your application functionality categorized into different categories, such as the following:

Requirement

Category

Browse products

instant

Add product to a shopping cart

instant

Search product

seamless

Register new account

attention

Generate PDF invoice

attention

Generate daily bills of delivery

slow

Understanding the current situation

Your next goal is to understand where you will perform your measurements. For end users, the experience is complete when the page has rendered the result of the operation in their browser, which takes into account the network latency and browser DOM operations for example.
As this will be more complex to measure, let us assume your site has been well optimized against the Google Page Speed or YSlow recommendations and for simplicity’s sake lets focus on elements directly under your control.

On most cases, the last piece of infrastructure still under your control will be your web server, such as the Apache HTTPD or nginx. If you have logging enabled, you will have access to something similar to the following in your nginx access.log:

Lets investigate now what we have found in the log. The snippet we have extracted contains actions of one user completing a full transaction in our web site. The user has created an account, searched for products, browsed through the results, found a product he has liked, added it to a shopping cart, completed the purchase and generated a PDF of the invoice. Those actions can be detected in the "POST /searchProduct HTTP/1.1" column. Next important part is the last column containing the total request time it took for a particular request to complete. In the case of /searchProduct it took 3.008 seconds to complete the search.

Note that by default nginx does not have the request time logging enabled, so you might need to to modify your log_format by adding $request_time to the pattern.

Now, with a little bit of grep/sed/excel magic, you will have something similar to the following at your fingertips:

Requirement

Category

Mean

90%

Browse products

instant

0.734

0.902

Add product to a shopping cart

instant

2.422

3.490

Search product

seamless

2.800

3.211

Register new account

attention

0.428

0.480

Generate PDF invoice

attention

3.441

4.595

Generate daily bills of delivery

slow

-

-

Picking the target for optimization

From here you will get your list of optimization targets. The suspects are obvious - they fall in the categories where you violate the agreement, in the above case you have problems with three requirements - Browse products, Add product to a shopping cart and Search product all violate your performance requirements.

This article does not focus on actual optimisation tasks, but in order to give food for thought - what should you do next, knowing that adding products to the shopping cart takes way too much time?

Next steps are definitely more application specific than the prior tasks. Also, the tooling to be used can vary - you might now decide to go with APM or monitoring products. But if you do not have the budget or just do not wish to fight with procurement office, a simple solution would include adding logging trail in the component boundaries by using AOP means.

For example, you can add something like the following in your application code to monitor the time spent in the service layer.

Decomposing the request time further gives you the required insight to understand where the bottlenecks are hiding. More often than not you will quickly discover a violating query, faulty cache configuration or poorly decomposed functionality enabling you to net your first quick wins in a matter of days or even hours.