Take a look at some systems that enable you to trace the execution of applications and work out what they are doing without having to make any modifications to the source code, and even without having to stop and restart the application. See how with tracing alone, you can find and diagnose problems with just a few commands.

SUN's free and open DTrace is the probing instrument of choice. It is ported to Apple Mac OS X and FreeBSD and QNX. The unique thing about DTrace, is that it sees EVERYTHING occuring in the system. EVERYTHING. No other is not even close to do that.

"I looked at one customer's application that was absolutetly dependant of getting the best performance possible. Many people for many years had looked at the app using traditional tools. There was one particular function that was very "hot" - meaning that it was called several million times per second. Of course, everyone knew that being able to inline this function would help, but it was so complex that the compilers would refuse to inline.

Using DTrace, I instrumented every single assembly instruction in the function. What we found is that 5492 times to 1, there was a short circuit code path that was taken. We created a version of the function that had the short circuit case and then called the "real" function for other cases. This was completely inlinable and resulted in a 47 per cent performance gain.

Certainly, one could argue that if you used a debugger or analyzer you may have been able to come to the same conclusion in time. But who would want to sit and step through a function instruction by inctruction 5493 times? With DTrace, this took literally a ten second DTrace invocation, 2 minutes to craft the test case function, and 3 minutes to test. So in slightly over 5 minutes we had a 47 percent increase in performance.

Another case was one in which we were able to observe a high cross call rate as the result of running a particular application. Cross calls are essentially one CPU asking another to do something. They may or may not be an issue, but previously in was next to impossible (okay, really impossible) to determine their effecs with anything other than a debug version of the kernel. Being able to correlate the cross call directly to application was even more complex. If you had a room full of kernel engineers, each would have theories and plausible explanations, but no hard quantifiable data on what to do and what the impact to performance would be.

Enter DTrace.... With an exceedingly simple command line invocation of DTrace, we were able to quickly identify the line of code, the reason for the cross calls, and the impact on performance. The basic issue was that a very small region of a file was being mmap(2)'d, modified, msync(3C)'d, and then munmap(2)'d. This was basically being done to guarantee that the modified regoin was sync'd to disk.

The munmap(2) was the reason for the cross call and the application could get the same semantics by merely opening the file with O_DSYNC. This change was made and performance increased by almost double (not all from the cross calls, but they were the "footprint" that lead us down this path). So we went from an observable anomaly that previously had no means of analysis to a cause and remediation in less that 10 minutes."