About Peter Lawrey

Lies, statistics and vendors

Overview

Reading performance results supplied by vendors is a skill in itself. It can be difficult to compare numbers from different vendors on a fair basis, and even more difficult to estimate how a product will behave in your system.

Lies and statistics

One of the few quotes from University I remember goes roughly like this:

Peak Performance – A manufacture’s guarantee not to exceed a given rating — Computer Architecture, A Quantitative Approach. (1st edition)

At first this appears rather cynical, but over the years I have come to the conclusion this is unavoidable and once you accept this you can trust the numbers you get in if you see them a new light.

Why is it so hard to give a trustworthy performance number?

There are many challenges in giving good performance numbers. Most vendors try harder to give trustworthy numbers but it is not as easy as it looks.

Latencies and throughputs don’t follow a normal distribution which is the basis of mathematically rigorous statistics. This means you are modelling something for which is isn’t a generally accepted mathematical model.

There are many different assumptions you can make, ways to test your solution and ways to represent the results.

You need to use benchmarks to measure something, but those benchmarks are either a) not standard, b) not representative of your use case, or c) can be optimised for in ways which don’t help you.

Vendors understand their products and sensibly select the best hardware for their product. This works best if you only have one product to consider. Multi-product systems many not have an optimal hardware solution for all the products, even if your organisation allowed you to buy the optional hardware.

It is easy to report the best results tested and not include results which were not so good.

Any decent vendor will use their benchmarks to optimise their solution. The downside of this is that the solution will have been optimised more for the benchmarks they report than use cases the vendor hasn’t tested e.g. your use case.

BTW: I often find it interesting to see what use cases the vendor had in mind when they benchmark their solutions. This can be a good indication of a) what it is good for, b) the assumptions made in designing the solution, and c) how it is generally used already.

Should we ignore all benchmarks?

This can lead people to give up on micro-benchmarks and benchmarks in general because they have been “lied” to many times before.

However, used correctly benchmarks can be a good guide even if they cannot give you definitive or completely reliable answers. As such I suggest you should be highly sceptical that small difference in performance give you any indication of what you would expect tot see, and only take note of wide variations in performance. By wide variations I mean 3 to 10 times differences.

Percentiles for latency

Customers generally remember the worst service they ever got and take the average service for granted. When looking at the latency of your systems, it is generally the higher latencies which cause the most issues if not customer complaints.

A common approach for modelling the distribution of latencies is to sort all the latencies and report a sample of the worst.

Percentile

One in N

Scale

Notes

50%

“typical”

1x

This is a good indication of what is possible. It is the most optimistic figure you could use

90%

one in ten

2x-3x

This is a better indication of performance if tested on a real, complex system.

99%

one in 100

4x-10x

For benchmarks of simplified systems, this is a better indication of what you can realistically expect to achieve

99.9%

one in 1,000

10x-30x

For benchmarks of simplified systems, this is a conservative indication of what you can expect.

99.99%

one in 10,000

20x-100x

This number is nice to have but difficult to reproduce, even for the same benchmark, let alone for a different use case. See below

99.999%

one in 100,000

varies

This number is almost impossible reproduce between systems. See below

Generally speaking, the latencies escalate geometrically, as you get into the higher percentiles. The very high percentiles have limited value as you have to take more samples to get a reproducible number even on the same system from one day to next. They can vary dramatically based on the use case or system.

A guide to the number of samples you need for reproducible numbers

Java has a additional feature that it gets faster as it warms up. In the past I have advocated removing these warm-up figures, but given micro-benchmarks give overly optimistic figures, I am more inclined to include them if for no other reason than it is simpler. My rule of thumb for reproducible percentile figures is that for 1 in N, you need N^1.5 samples for simple micro-benchmarks and N^2 samples for complex systems.

Percentile

One in N

Simple test samples

Complex test samples

90%

one in ten

~ 30

~ 100

99%

one in 100

~ 300

~ 10,000

99.9%

one in 1,000

~ 30,000

~ 1 million

99.99%

one in 10,000

~1 million

~ 100 million

99.999%

one in 100,000

~ 30 million

~ 10 billion

99.9999%

one in 1,000,000

~ one billion

~ one trillion

Maximum or 100%

never

Infinite

Infinite

Based on this rule of thumb I don’t believe a real maximum can be measured empirically. Never the less, not reporting it all isn’t satisfactory either. Some benchmarks report what is the “worst in sample” which is better than nothing, but very hard to reproduce. To mitigate the cost of warm up in real systems, I suggest latency critical classes should be pre-loaded, if not warmed up on start up of your application.

In summary

If you are looking for a performance figure you can use, I suggest using the 99 percentile as a good indication of what you can expect in a real system. If you want to be cautious, use the 99.9 percentile. If this number is not given, I would assume you might get about 10x the average or typical latency and 1/10th of the throughput the vendor can get under ideal conditions. Usually this is still more than enough. If the vendor quotes performance figures close to what you need, or worse doesn’t quote figures at all, beware !! I am amazed how many vendors will say they are fast, quick, fastest, efficient, high performance but don’t quote any figures at all.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Join Us

With 1,240,600 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!

Disclaimer

All trademarks and registered trademarks appearing on Examples Java Code Geeks are the property of their respective owners. Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. Examples Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.