There are a lot of knobs and switches in the WebSphere Portal and Application Server software stack that you can play with in the process of tuning the environment. We provide guidance and recommendations of what to pay attention to through our Performance and Tuning Guide, but the fact is that what actual values work best for you depends highly on what type of applications you run in Portal and the size and activity of your user population. The only way I know of to accurately arrive at the correct set of tuning values is to follow this basic testing methodology:

Install and configure WebSphere Portal on a single system with some simple default content and perform the requisite tuning as perscribed by the Performance and Tuning guide.

Take a baseline performance measurement of this simple site, making adjustments to the infrastructure and software tuning until desirable results are obtained. Also get an idea for how many users the baseline system can sustain before the system runs out of resources (more on this later).

Start adding content and repeat the analysis. Adjust the tuning as necessary. At least now you know, based on the baseline, that as changes are made, if things go wrong, you have an idea of what change may have instigated it.

Performance and capacity testing is a long, highly iterative process. It is also resource intensive, as it requires dedicated systems for days on end, as well as enough people to manage the test environment, observe test results, and tune the system. Often times, this important process is the first thing cut from a deployment plan in jeapordy. In my experience, you either pay for this time up front, before you go live, or you pay for it after you go live. It must be done, and is a lot more expensive the later you do it.

But all this being said, what should your goal be? Obviously, you go into this process with certain metrics in mind. For instance, I want to be able to handle 400 concurrent users with no worse than a 5 second response time, and maybe that is only during login. That's fine, and simple to measure, but there will be days where, for some reason, you have a lot more than 400 concurrent users, or you have to take systems down and the remaining systems must take the load. It isn't enough to know if your environment can handle the typically load; you need to know if it can handle the atypical, or worst case, scenario as well. You may not know what the worst case scenario is up front. But what you do need to know is what the maximum capacity of your portal environment is, so as the usage approaches that number, you will know you are in trouble and need to add capacity.

To understand what your environment's capacity is, you need to drive utilization of your portal environment to the point of CPU exhaustion. Not memory exhaustion, or DB connection exhaustion - CPU exhaustion. The reason for that is that as CPU utilization approaches 100%, the system slows down to the point of nearly being unresponsive, but it doesn't fail. Under this condition, the Web servers managing the load across a portal cluster will mark such a system as down and route traffic elsewhere, giving the server a chance to recover, which it should once the requests have been processed. If you run out of some other resource, like memory or DB connections, before running out of CPU, then things really start to fail and you won't recover from that.

So, as you perform your maximum capacity tests, give the server instances a large enough Java heap size and request and DB connection pool sizes to allow for enough traffic through driving CPU to 100%. If you can't before running out of resources other than CPU, then it is time to scale vertically (creating vertical cluster members on the same physical system) or reallocate processors to other systems. If you can configure the system to meet this goal, then you have your "sweet spot".

When asked if WebSphere Portal supports 64bit, what really needs to be asked is if WebSphere Portal supports running in a 64bit JVM? We have supported running in a 32bit JVM on 64bit hardware for quite some time. We introduced our first 64bit JVM support on the Series i platform with WP 5.0.2. We added 64bit JVM support on zLinux with WP 6.0.0.1 (to overcome the 31bit address space limitation with small maximum heap sizes) and most recently added 64bit JVM support with HP-UX on HP Integrity Servers with WP 6.0.1. We will be adding more and more platforms, especially with new releases, but I typically ask in return if you are sure you really need 64bit JVM support.

Sure, with 64bit you can have very large heap sizes (many gigabytes instead of the maximum of 2GB on most UNIX systems), and thus allow a single application server instance to become CPU saturated before the heap is consumed, but that isn't necessarily a good thing. The larger the heap can grow, the longer garbage collections can take, especially full GCs which require a pause of the JVM while the heap is scanned and defragmented, looking for the maximum amount of garbage to collect. The larger the heap, the greater the potential fragmentation, and thus the longer the full GC cycles. And with the shear number of objects that are created and destroyed every second in a portal, fragmentation in the heap can happen more often than you might think. These pauses can amount to poor user experience.

Personally, I haven't seen any specific data to suggest what the perfect maximum heap size is for a portal, and I'm certain that number will vary by implementation, but based on conversations I've had with performance specialists, I suspect it is somewhere in the 1.75GB to 2GB range.

WebSphere Portal's largest value proposition is that it aggregates content, and because of this, it also aggregates lines of business, partners, people, technologies, and various middleware stacks that contribute to the overall IT stratosphere. It's a complicated process to care and feed a WebSphere Portal deployment. I quite often get called in to speak to customers and at conferences on the subject of portal deployment and operations and find myself addressing the same questions over and over.

This blog of mine is an attempt to get some of this information recorded and centralized. Much of it, hopefully, will find its way into product documentation, Redbooks, and white papers eventually. But for now, I will vet it out here.