This series of interviews spotlights Java Champions, individuals who have received special recognition from Java developers across industry, academia, Java User Groups (JUGs), and the larger community.

Bio: A Java Champion since September 2005, Kirk Pepperdine is a primary contributor to javaperformancetuning.com, which is widely regarded as the premier site for Java performance tuning information, and is the coauthor of Ant Developer's Handbook. He has been heavily involved in application performance since the beginning of his programming career and has tuned applications in a variety of languages: Cray Assembler, C, Smalltalk, and, since 1996, Java technology. He has also worked on building middleware for distributed applications.

Pepperdine began his career in biochemical engineering and has researched high-performance liquid chromatography for Ottawa University and the National Research Council of Canada. He worked with Cray supercomputers at the Canadian Department of Defense, as a consultant at Florida Power & Light, and has been a senior consultant with GemStone Systems. He is currently an independent consultant and an editor at TheServerSide.com.

JavaSunCom (JSC): What are the best shortcuts to eliminate guesswork in identifying performance bottlenecks?

Pepperdine: There are no all-purpose shortcuts. You need to be methodical if you want to find a "go fast" button in any particular situation. That's why I introduced the box with Dr. Heinz Kabutz, as a guide to explain how I approach tuning engagements.

Much of my work involves analyzing systems in an effort to solve critical performance problems. After working on a number of systems, I began to notice a trend. The development teams that I assisted all had developers who were all quite capable of understanding the underlying issue when it was presented to them, but they lacked a methodology that could help them identify the problem.

"Because we're trained to look at code, when something goes wrong, we look at code... Developers often fix things that have little or no impact on overall performance. I've seen teams literally waste months rewriting ugly code that had no impact on performance."

Kirk Pepperdine
Java Champion

Often, in programming as in life, we make educated guesses that are correct. But sometimes we guess wrong, depending on the quality of the information that we base our guess on.

Among developers, there's a behavior that is more damaging than guessing. Because we're trained to look at code, when something goes wrong, we look at code. And no matter how good our code is, we can always find something wrong or ugly that's begging to be fixed.

Finding ugly code will throw even the best developers off track -- because the code is ugly, they will guess that it's the source of the problem. So developers often fix things that have little or no impact on overall performance. I've seen teams literally waste months rewriting ugly code that had no impact on performance.

I'm not saying it's wrong to look into the code. Typically, that's exactly what we'll have to do. But we should delay looking at the code until we have a solid measurement or clue as to exactly which part of our application is responsible for the problem. It's amazing how, when developers are armed with a good measurement, they suddenly start looking past the ugly bits.

I've attempted to summarize these observations and experiences in two instruments: the mantra "Measure, don't guess" (PDF) and the box.

"Measure, don't guess" means don't do anything that hasn't been justified with a solid measurement that leads you directly to the problem. Unfortunately, getting a solid measurement isn't always so easy. Because of this, people work from hunches and assumptions, trying to find the smoking gun. This is where the box can help.

Figure 1.The Box

The box provides a guide for investigating problems. Feedback from those using it has been very positive. The box is a reminder that a system is much more than software. There is hardware, which includes the operating system. On top of that, we have a VM for managed systems and then our Java application. The top layer in any system are end users, or people, as the layer is labeled in the box.

Each layer is important to the overall performance of the system. If you change anything in any layer, you will change the performance profile. If you don't account for all of the layers, you run the risk of hiding performance bottlenecks or creating artificial ones. This misstep alone often results in teams wasting a lot of time.

Dumb Versus Complex Code

JSC: You wrote this in a 2006 Java Specialists' Newsletter article : "I have found that violating design principles or writing overly complex code is often the stumbling block to achieving good performance." Sun's Brian Goetz, in a similar vein, recommends that developers should write "dumb code," by which he means straightforward, clean code that follows the most obvious object-oriented principles, in order to get the best compiler optimization. He argues that clever, hacked-up, bit-banging code will get poorer results. Your thoughts?

Pepperdine: There are really two questions here: First, how does writing dumb code help with performance? And second, how does writing well-structured code help with performance? I'll answer the "dumb code" one first.

While we write code to run on a machine, the primary consumers of code are humans. Dumb code tends to be more readable and hence more understandable. If we can iron out the twists, then we have a better chance of avoiding the dumb mistakes that clever code may hide from us.

HotSpot, IBM's MMI, and the JIT are tools that work to optimize our code for us through dynamic profiling. Complex code tends to confuse these tools, so that they provide either suboptimal optimizations or no optimizations at all.

We can see this with a well-written microperformance benchmark. Most of the code in a well-written microbenchmark is there to confuse the JIT so that it doesn't translate our code into something that no longer measures the effect we're interested in. While a microbenchmark may be a pathological case, the same sorts of things can happen in our real application code.

Another reason to write dumb code is that most of the complexities are due to some optimization that everyone thinks is needed. In many cases, these optimizations are premature. While I'm all for performance planning, I'm dead set against premature optimizations. When is a plan a plan, and when is it premature? I guess it's a little like the difference between art and porn: You'll know it when you see it.

Which brings us to the second question -- how does well-structured code help performance? Most performance problems can only be solved by adding or altering code in the application. I've found that if the code is well structured and loosely coupled, if the classes are cohesive, and the code uses delegation, I can avoid the whack-a-mole problem when I start to change code.

This problem can also be called shotgun refactoring -- if I make a change in one part of the application, other seemingly random parts of the application will break. And as I fix the breakage, I create a whole series of new breaks, and so on.

So how can we avoid this? First, follow the DRY -- Don't Repeat Yourself -- principle. Let's look at collections as an example. We would traditionally manage a query against a collection in Java by creating an iterator:

Here's the trap. If another part of our application needs to doStuff(), it's likely that the code will get repeated either as a cut and paste or as simply rewritten. Either way, you've just violated DRY, which has numerous consequences. You've also neglected another design principle: Delegate, don't assume (responsibility).

By not delegating, you risk violating DRY. You certainly violate the principle of information hiding. Think of it this way: By doing a get, you've not only violated encapsulation but have tightly coupled your class to the callee. When you violate encapsulation by exporting state, you are forced to also export the application logic needed to manage that state and hence violate DRY. So you can see that this is wrong from many different perspectives.

Here's the big performance hit: Suppose the data structure that is being used is suboptimal, and suppose you recognize that it needs to be changed. If you've exported state and behavior using iterators or whatever, you've created the whack-a-mole problem. Take a look at what happens when we delegate:

Here we have more code. But this is often the case when you demonstrate something in a tiny contrived example. You only see the benefits in large code bases, which, of course, make for horrible examples.

That aside, we have a class that wraps our intended collection. Our new class is semantically in tune with our domain. How many problem domains contain the word Hashmap and how many domains contain Customers? So we've corrected our vocabulary.

Next, we've provided a home for our queries, at least the queries that we've anticipated. If we get other patterns of queries, it may be possible to add in a secondary collection that is keyed differently. Think of it as adding another index on a database table.

The beauty of it is that because I've encapsulated and delegated the calls, I'm free to make these changes unencumbered by what my clients may or may not know. In other words, no whack-a-mole. Furthermore, every client will realize the performance benefit of the optimization. So this is a win all the way round.

I intentionally didn't use generics. Using this development pattern, I don't need compile-time type checking on the collection(s) because the class API provides all the safety that's needed. I've always contended that it's a mistake to expose a raw collection in an API, and because I never feel that the primary use case for generics is justified. But that's another discussion.

So what happens if someone needs a query that we haven't provided? Closures might seem like a good solution, but they would have to be implemented very carefully or we could "closure" ourselves into a whack-a-mole problem.

The closure would have to access only those elements that could be classified as nonprimitive, by which I mean those elements that can function without knowledge of the underlying structure. This is in contrast to a primitive method or a method that needs to understand the underlying data structure of the class. The point is to hide the implementation details to things outside of our domain.

To summarize, performance tuning often requires that I touch code, which is not much different from refactoring. All of the arguments that the Agile crowd put forth -- loose coupling, simple code, following good design patterns, having good unit testing in place, automated builds, and so on -- also apply to performance tuning.

Don't Trust Performance Tips

JSC: What advice do you have regarding Java technology performance tips?

Pepperdine: A while ago, I added to my presentations a slide that says, "Everything I'm about to tell you will be wrong." I say this because, as time marches on, tips grow stale and things need to be reassessed. Even scarier, some tips are just plain wrong to begin with.

Think of it this way: The prescription (drug) that your friend is taking may result in ill health if you were to take it. Same goes with performance tips.

Here's an example: A while back, I was asked to review a paper on what you could do in your code to help the garbage collector. The big tip in the article was that one should null out references as soon as you stop needing them. Littering your code with myObject = null statements just seemed wrong. I would contend that if you can null out an object with no ill effects, the object is improperly scoped or scoped too broadly. A better solution would be to narrow the scope of the object so that it goes away when the value is no longer needed. This is a case where clever code to help the garbage collector is really a code smell.

Java Technology Pain Points

JSC: What are your greatest Java technology "pain points," and how do you cope with them?

Pepperdine: When I began working with Java programming after working a lot with Smalltalk, I was struck by how much more work I had to do because the Java language was strongly typed. I still don't like the fact that the language designers decided to directly expose primitive types. In Smalltalk, primitives were managed as immediate directs, which in effect says the value of the "object" is held in the pointer.

It seems to me that this decision tightly couples implementation, the language, with representation. The Java language has two separate syntaxes that don't mix. This in turn motivated autoboxing, a solution to the syntax problem that I consider to be yet another code smell.

There is one pain point we can be thankful for: The Java language and platform lowered the bar on distributed programming. It allows average developers to create systems that would have taken a rock-star team of C/C++ developers to create in the past.

Still, there are certain realities that no language or platform can paper over. This makes the Java EE and Spring and now Grid frameworks both a blessing and a curse, as they encourage developers to swim with the sharks. This can be fun unless you are swimming with one that decides to attack.

Java EE, Spring, and Grid can take a bite out of you in quite different ways that are related to the same fundamentals. Spring, JEE, and grids all have to deal with networks, caching read multiple copies of the same thing that everyone expects to be constantly consistent, locking up shared resources.

All of these issues still need to be dealt with, no matter which of these architectural styles you decide to use. It is remarkable what Java has enabled people to do, but it still can't save them from needing to understand and know how to deal with the fundamentals.

Performance Tuning Misconceptions

JSC: What are the major misconceptions you encounter in performance tuning?

Pepperdine: I can think of a few interesting situations where conventional wisdom fails. For example, I walked onto a client site at the beginning of a one-week engagement and was immediately invited into the CEO's office. The CEO wanted to get a sense of what I planned to do. He offered me a desk and complete access to the source. He was surprised when I said that I most likely wouldn't be looking at the code. You can imagine how surprised he looked, and of course, he immediately asked what I planned to do.

This is misconception number one -- the assumption that I, or any consultant for that matter, can drop onto a work site for a week and start looking at code.

First, I'm not smart enough to read through an application's code base and find performance bottlenecks, no matter how much time you give me. And with only a week to work with, there are just too many details to absorb.

Second point, the box tells us that performance problems exhibit themselves in a live, fully functional system, so it only makes sense to look for them in a live, fully functional system.

That said, I did eventually dive into the code, but by that time, I was completely focused on a very narrow aspect of the code. The misconception is quite obvious -- as developers, we are paid to look at, write, and manipulate code. Why would we ever do anything else?

Here's another example. In my performance tuning course, I use an exercise that began as a lesson on how to use a profiler. Jack Shirazi and I expected it to take about 30 minutes to complete.

The first time we presented the course, we were stunned that no one had identified the primary bottleneck after 30 minutes, yet everyone in the class was happily coding away. An entire group of people who had just been told how to profile, when faced with a problem and a deadline, abandoned all reason and just started hacking at the code.

We have presented the course countless times, and each time, developers continued to ignore profiling and just jump into the code. Jack and I scratched our heads for quite some time trying to figure out what was going on.

I finally got a clue when I taught the course to a group that included a tester. The tester had very little experience coding in Java. Yet he was the first person to complete the exercise within 30 minutes.

Since then, we've had a number of testers in the course, and overwhelmingly, it has been the testers who have been able to solve the problem in the requisite time span. They started by profiling and pretty much ignored the code. The conclusion seems obvious.

Database Interactions and Memory Problems

JSC: How do you approach problems in database interactions and memory management?

Pepperdine: Database and memory problems make up the vast majority of the performance issues that I face. Most of the database problems can be traced back to overutilization or poor structure. That includes too many table joins or lack of indexes -- simple things like that. Most database administrators (DBAs) are well equipped to recognize and squish these types of problems. I like working with DBAs because they are well aware of the importance of acting on a direct measurement.

The bigger problem is overutilization of the database by the application. If the team has been "dumb" enough -- and I say that in a very complimentary way -- to use the JPA and/or a mapping tool such as Hibernate or TopLink, then there's a chance to inject caching into the application. At the very least, code interacting with the database will most likely be fairly well structured. Sadly, I've run into a number of applications that I have deemed to be cache resistant.

What I mean by cache resistant is that there's no effective way that caching can be injected into the application. This can happen for several reasons. First, and less common nowadays with people using EJB, Spring, Juice, and so on, is the scattering of JDBC (Java Database Connectivity) logic throughout the application. To find and change all that logic is, to put it mildly, a major hassle. Second, I've seen this happen when applications use the database as an IPC channel or worse, use it as a distributed locking mechanism. Third involves putting key business logic into the database, and a fourth is when you have denormalized data flows into the database.

I'm not antidatabase. Databases are and will continue to be important pieces of technology. As much as database vendors may disagree with me, it is my opinion they've been abused. This wasn't a problem as long as clock speeds kept pace. But that hasn't been happening for the past couple of years. Having every thread of execution somehow end up in a database is just not going to work anymore unless you set the database up as a data grid. Which means, why not just code the application in SQL?

"I don't see memory databases as an option for most applications as long as the amount of data being kept expands faster than we are creating memory."

Kirk Pepperdine
Java Champion

I say this because we all know that users are demanding more and more performance, for systems to work harder, and do more faster. Sun, Intel, AMD, and others have continued to oblige by moving to multicore CPUs. All of a sudden, parallelism is much more important than it has been in the past, and now the abuses of the technology are coming home to roost.

I predict that we will roll back on how databases are used. I think they will be repurposed to secure corporate data and do less pure transactional work. As cheap and as plentiful as memory is, I don't see memory databases as an option for most applications as long as the amount of data being kept expands faster than we are creating memory. Of course, this prediction is like throwing a punch at a boxer and not expecting him to duck. DB vendors are ducking. But for now....

So getting back to diagnosis, the first step in recognizing overutilization is to count the number of interactions between the application and the database. That count should be tempered against the amount of work that you're doing. There are a number of tools out there that can give you this measurement. Glassbox and JaMon are a couple of open-source offerings. Most of the commercial vendors offer this capability. The quick fix is to add caching, as it's much easier to do this than eliminate the excessive calls to the database.

Here's a takeaway: The cheapest call you can make is the one you didn't make.

The other trick is to bulk up on the database interactions. Bulking up is a common optimization whenever you have to cross an expensive barrier. I lived in a building that had a large grocery store on the ground floor. While I could always quickly run down and get fresh food, it was more effective to get more less often and store extra in the fridge or cupboard

The same principle holds for databases or with connecting to any process that involves use of a network or other high-energy barrier.

Solving Memory Problems

JSC: How do you go about solving memory problems?

Pepperdine: Memory leaks are easiest to solve, and the best tool to use is the generations feature found in the NetBeans profiler. You can also use VisualVM, as it contains the same stuff.

Another problem is object loitering, and again, the NetBeans generations feature may be helpful in finding these cases. In the past, I've always let the system rest so I can see what times out. Now with generations, you can immediately see what may be causing you grief.

"Kudos to the NetBeans profiling team. They have altered the face of memory profiling. Prior to the generations feature in the profiler, finding leaks used to be quite tricky. Now it is mechanical."

Kirk Pepperdine
Java Champion

A third problem is high rates of object churn. I find that simplifying the site's feature in the hprof profiler works wonders. You can find the equivalent information in the NetBeans profiler. Kudos to the NetBeans profiling team. They have altered the face of memory profiling. Prior to the generations feature in the profiler, finding leaks used to be quite tricky. Now it is mechanical. One should be able to find most memory problems within minutes using these new tools.

If you're facing high rates of object churn, that will translate into very inefficient garbage-collection (GC) numbers. Sometimes the problem is simply that the JVM * doesn't have enough heap space. Monitoring GC activity will give you a hint that a general heap-sizing exercise could solve your problem.

HPJMeter is a free tool that will read Sun verbose GC logs and provide you with that measurement. Tagtraum also has a tool that will read the logs and calculate GC efficiency. I've often found that just playing with heap-space sizings can make a huge difference in GC efficiency, application response times, and throughput.

I guess I should plug another GC log viewer, gchisto. Tony Printezis, the creator, has allowed me to lend him a hand in adding functionality. Right now, it's pretty limited in functionality, but I expect that will change very shortly. The VisualVM team is very interested in integrating gchisto. I think it's a great fit.

Pepperdine:Jack Shirazi started the site after he wrote his very successful book, Java Performance Tuning, published by O'Reilly. The book is the result of our experiences working together.

In a cafe, we worked out a plan of action literally on the back of a napkin. That plan became an integral part of Jack's book. Jack is a lot more disciplined than I am. After he wrote the book, Tim (O'Reilly) encouraged him to start the web site, and shortly afterwards, I started contributing to the site.

The site is a comprehensive set of everything related to Java performance. The tips section offers a selection of all the performance tuning tips that appear on the Web. There's not enough time to vet the tips, so some may be questionable, but the sum total of knowledge on the site can't be matched.

Reducing Stress

JSC: What role does stress reduction play when you're attempting to correct an application that is performing poorly? Do you have any tips for helping people calm down and become more functional?

Pepperdine: Great question -- we mustn't forget that although we are working with machines, we ourselves are not machines. Emotions, feelings, stress, everything that makes us human also has a huge impact on how we perform.

A product company I worked for frequently sent me to their more difficult customers. I assumed it was some sort of punishment for something I'd done. I asked the sales guy what was up with that. He said, "Hey, I send you to noisy customers, and when you leave, they're no longer noisy."

What I'd discovered very early on is that the customers were often frustrated. They had tight deadlines, aggressive plans, and intense pressure to deliver -- and no matter what they did, things just didn't seem to be working. To make things worse, they "knew" the problem wasn't their fault. I let them rant and rave and in the process vent their frustration. What happened is they ended up explaining exactly what was wrong with the system. So it both allowed them to vent and I could parrot back what they told me in the form of a diagnosis. They got to release all of their stress, and I ended up looking brilliant.

"I've used lots of release valves to calm stressed-out developers: I've rolled VMs through a cluster, neutered the HTTPSession object, used GC to slow down certain parts of the application to improve overall throughput, tuned memory to some very insane configuration so that the application would run for a working day, and on and on."

Kirk Pepperdine
Java Champion

I quickly learned that the more frustrated the customer, the more brilliant I was going to appear to be, which may sound a bit egotistical, but honestly, most of the teams I was helping were cleverer than I could ever hope to be. The only advantage I had was that I'd learned to turn frustration and stress into something useful.

Stress prevents us from learning. The first thing I look for in an SOS engagement is a pressure-relief valve, some hack or trick to reduce the level of stress in the room.

In one case, I put in a cron job that ran every 15 minutes, looking for any database transaction that had run for more than some threshold period of time. If it found one, it would kill that session. This is an ugly hack. The user whose transaction was killed certainly wasn't happy. But the hack stabilized the system enough so that most of the users who had customers in their faces got work done. It also took pressure off the developers.

Every time the system went south, which was quite often, the phones would start -- and just think of the rat in the cage being buzzed at random times. You could imagine what a relief it was to get the phones to stop ringing. You could see the stress drain out of the room and the brains turn back on. It set up an environment where we could have a meaningful discussion about a permanent fix.

I've used lots of release valves to calm stressed-out developers: I've rolled VMs through clusters, neutered the HTTPSession object, used GC to slow down certain parts of the application to improve overall throughput, tuned memory to some very insane configuration so that the application would run for a working day, and on and on.

This is triage, and my only goal is to keep the patient alive to give developers time to start fixing what is broken.

_______ * As used on this web site, the terms "Java Virtual Machine" and "JVM" mean a virtual machine for the Java platform.