Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

QCon 2015 Broken Performance Tools

Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.

The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."

The caution signs were a little messed up on slideshare, and have been replaced with images. The original (higher resolution) PDF is: http://www.brendangregg.com/Slides/QCon2015_Broken_Performance_Tools.pdf

G'Day, I'm Brendan. I normally give talks about performance tools that work, but today I'm going to talk about those that don't: broken and misleading tools and metrics.

As a warning: when you use performance tools, you will get hurt. It's like a surface that's slippery when wet, or dry. It's like, always slippery.

Awesome place to work Centos and Ubuntu

I'm not really going to talk about software bugs. I'm going to talk about things that appear to work that don't. Things that are counter-intuitive. Understanding those is what separates beginners from experts, and is the hardest part to learn.

Is this ok? Does that answer our questions? Pretty simple, right?

imagine you went to another country and hired a car, and were told that "green" on a traffic light means "probably ok, but you might get T-boned"… That'd make the traffic lights pretty useless.

Why are you still running top?

"Maybe it's the network… Can you check with the network team if they have had dropped packets … or something?"

We're hiring

QCon 2015 Broken Performance Tools

4.
This Talk
• Observability, benchmarking, anti-patterns, and lessons
• Broken and misleading things that are surprising
Note: problems with current implementations are discussed, which
may be fixed/improved in the future

16.
CPU Summary Statistics
• %Cpu row is from /proc/stat
• linux/Documentation/cpu-load.txt:
• /proc/stat is used by everything for CPU stats
In most cases the `/proc/stat' information reflects
the reality quite closely, however due to the nature
of how/when the kernel collects this data
sometimes it can not be trusted at all.

19.
A CPU Mystery…
• As load increased, CPU ms per request lowered (blue)
– up to 1.84x faster
• Was it due to:
- Cache warmth? no
- Different code? no
- Turbo boost? no
• (Same test, but
problem fixed, is
shown in red)

21.
Out-of-order Execution
• CPUs execute uops out-of-
order and in parallel across
multiple functional units
• %CPU doesn't account for
how many units are active
• Accounting each cycles as
"stalled" or “retiring" is a
simplification
• Nowadays it's a lot of work
to truly understand what
CPUs are doing
https://upload.wikimedia.org/wikipedia/commons/6/64/Intel_Nehalem_arch.svg

48.
Why Stacks are Broken
• On x86 (x86_64), hotspot uses the frame pointer register
(RBP) as general purpose
• This "compiler optimization" breaks stack walking
• Once upon a time, x86 had fewer registers, and this
made much more sense
• gcc provides -fno-omit-frame-pointer to avoid
doing this
– JDK8u60+ now has this as -XX:+PreserveFramePoiner

63.
Valgrind
• A suite of tools including an extensive leak detector
• To its credit it does warn the end user
"Your program will run much slower
(eg. 20 to 30 times) than normal"
– http://valgrind.org/docs/manual/quick-start.html

68.
Monitoring
• By now you should recognize these pathologies:
– Let's just graph the system metrics!
• That's not the problem that needs solving
– Let's just trace everything and post process!
• Now you have one million problems per second
• Monitoring adds additional problems:
– Let's have a cloud-wide dashboard update per-second!
• From every instance? Packet overheads?
– Now we have billions of metrics!

83.
"Most popular benchmarks are flawed"
Source: Traeger, A., E. Zadok, N. Joukov, and C. Wright.
“A Nine Year Study of File System and Storage
Benchmarking,” ACM Transactions on Storage, 2008.
Not only can a popular benchmark be broken, but so can
all alternates.

90.
Product Evaluations
• Benchmarking is used for product evaluations & sales
• The Benchmark Paradox:
– If your product’s chances of winning a benchmark are
50/50, you’ll usually lose
– To justify a product switch, a customer may run several
benchmarks, and expect you to win them all
– May mean winning a coin toss at least 3 times in a row
– http://www.brendangregg.com/blog/2014-05-03/the-benchmark-paradox.html
• Solving this seeming paradox (and benchmarking):
– Confirm benchmark is relevant to intended workload
– Ask: why isn't it 10x?

91.
Active Benchmarking
• Root cause performance analysis while the
benchmark is still running
– Use observability tools
– Identify the limiter (or suspected limiter) and include it with the
benchmark results
– Answer: why not 10x?
• This takes time, but uncovers most mistakes

97.
Kitchen Sink Benchmarks
• Run everything!
– Mostly random benchmarks found on the Internet, where
most are are broken or irrelevant
– Developers focus on collecting more benchmarks than
verifying or fixing the existing ones
• Myth that more benchmarks == greater accuracy
– No, use active benchmarking (analysis)

106.
UnixBench
• The original kitchen-sink micro benchmark from 1984,
published in BYTE magazine
• Innovative & useful for the time, but that time has passed
• More problems than I can shake a stick at
• Starting with…

109.
UnixBench Makefile
• "Fixing" the Makefile improved the first result, Dhrystone
2, by 64%
• Is everyone "fixing" it the same way, or not? Are they
using the same compiler version? Same OS? (No.)

110.
UnixBench Documentation
"The results will depend not only on your
hardware, but on your operating system,
libraries, and even compiler."
"So you may want to make sure that all your
test systems are running the same version of
the OS; or at least publish the OS and
compuiler versions with your results."

116.
Blame Someone Else Anti-Method
1. Find a system or environment component you are not
responsible for
2. Hypothesize that the issue is with that component
3. Redirect the issue to the responsible team
4. When proven wrong, go to 1

117.
Performance Tools Team
• Having a separate performance tools team, who creates
tools but doesn't use them (no production exposure)
• At Netflix:
– The performance engineering team builds tools and uses
tools for both service consulting and live production triage
• Mogul, Vector, …
– Other teams (CORE, traffic, …) also build performance tools
and use them during issues
• Good performance tools are built out of necessity

121.
Observability
• Trust nothing, verify everything
– Cross-check with other observability tools
– Write small "known" workloads, and confirm metrics match
– Find other sanity tests: e.g. check known system limits
– Determine how metrics are calculated, averaged, updated
• Find metrics to solve problems
– Instead of understanding hundreds of system metrics
– What problems do you want to observe? What metrics would
be sufficient? Find, verify, and use those. e.g., USE Method.
– The metric you want may not yet exist
• File bugs, get these fixed/improved