Our results are disturbing because they indicate that proﬁler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and signiﬁcant—all four of
the state-of-the-art proﬁlers produce incorrect proﬁles. Incorrect
proﬁles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance.
We show that a proof-of-concept proﬁler that does not use yield
points for sampling does not suffer from the above problems

The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?

UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?

5 Answers
5

First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad.
The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated.
Mistakes are common because of some persistent myths about performance profiling.

But let's be positive.
If one wants to find opportunities for speedup, it is really very simple:

Sampling should be uncorrelated with the state of the program.
That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.

Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample.
The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack.
(Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)

Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?

An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location.
For example, here is an example of performance tuning
in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times.
It was not essential to know precisely the size of each problem before fixing it, but to know its location.
A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find.
As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems.
It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.

On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%.
So is that accurate? Not really, but has the problem been found? Precisely.

In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly.
(Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point.
It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.)
INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.

It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.

Nice answer. Although I'm not entirely agree with this: any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems. Not all problems can be solved, sometimes performance bottlenecks are there as attribute of the application, which mean the others problems won't be magnified. That's certainly a big problem.
–
nandaDec 9 '10 at 6:51

2

@nanda: That's why I said "and fixed". Basically, if there are problems A, B, C, D, and E, regardless of their relative sizes, any one you find and fix, regardless of order, magnifies the others. If there's one you can't fix, it doesn't, but you can still move on to the others.
–
Mike DunlaveyDec 9 '10 at 11:36

If I read it correctly, the paper only talks about sample-based profiling. Many profilers also do instrumentation-based profiling. It's much slower and has some other problems, but it should not suffer from the biases the paper talks about.

The conclusion of the paper is that we
cannot really believe the result of
profilers. But then, what is the
alternative of using profilers.

No. The conclusion of the paper is that current profilers' measuring methods have specific defects. They propose a fix. The paper is quite recent. I'd expect profilers to implement this fix eventually. Until then, even a defective profiler is still much better than "feeling".

How about the second reason: "observer effect"? Any profiler will suffer the problem and the only way to remove observer effect is to remove the observer, i.e. not using any profilers
–
nandaDec 8 '10 at 14:10

1

@nanda: But clearly, not using any profiler because it affects the performance is like not eating pie because it may turn out to taste horrible. It is not possible to learn about hotspots without any observation (except perhaps in contrived examples that don't depend on user input), but if you try to optimize without knowing where it has significant effects, your odds are pretty bad by the 80-20 rule.
–
delnanDec 8 '10 at 15:10

Unless you are building bleeding edge applications that need every CPU cycle then I have found that profilers are a good way to find the 10% slowest parts of your code. As a developer, I would argue that should be all you really care about in nearly all cases.

+1 I agree that finding the worst parts of your app usually helps improving the performance to acceptable levels. Most performance increases are not achived by making the small methods faster, but by just not calling them because of optimized high level code.
–
DanielDec 8 '10 at 13:50

1

@Daniel: the paper linked to makes a convincing case that profilers often dont identify the slowest parts of the code correctly.
–
Michael BorgwardtDec 8 '10 at 13:57

@Michael: My fault! I wanted to write that finding the worst parts of your app, even with a profiler, WILL show you MOST of the slowest parts. I concluded that solving the problems if often not a case of a few millis, but can most often by achived by not calling the (maybe wrongly measured) methods at all.
–
DanielDec 8 '10 at 14:03

1

@Michael: I tried to cover that with "Profilers are like any other tool and they have their quarks." In practice I have found them to be "good enough"
–
Andrew WhiteDec 8 '10 at 14:10

1

"profilers are a good way to find the 10% slowest parts of your code". Does that mean you got a 10% speedup? That says one of two things. 1) The code was nearly optimal to begin with, if 10% was all there was to get, or 2) there are other problems in the code that the profiler didn't find. I've seen people assume 1.
–
Mike DunlaveyDec 28 '10 at 14:08

If you don't trust profilers, then you can go into paranoia mode by using aspect oriented programming, wrapping around every method in your application and then using a logger to log every method invocation.

Your application will really slow down, but at least you'll have a precise count of how many times each method is invoked. If you also want to see how long each method takes to execute, wrap around every method perf4j.

After dumping all these statistics to text files, use some tools to extract all necessary information and then visualize it. I'd guess this will give you a pretty good overview of how slow your application is in certain places.

@Daniel: it still is one alternative approach to using a profiler if you don't trust one.
–
dariooDec 8 '10 at 13:50

Yes, but if you don't trust profilers about the performance results (methods calls didn't count here because they are still reliably measured by profilers), then the approach of using AspectJ in combination with perf4j is even more misleading.
–
DanielDec 8 '10 at 13:58

Actually, you are better off profiling at the database level. Most enterprise databases come with the ability to show the top queries over a period of time. Start working on those queries until the top ones are down to 300 ms or less, and you will have made great progress. Profilers are useful for showing behavior of the heap and for identifying blocked threads, but I personally have never gotten much traction with the development teams on identifying hot methods or large objects.