Memory Contention in J2EE Applications for Multiprocessor Platforms

With the need for highly scalable J2EE applications in the enterprise environment, parallel processing of threads is required on multi-processor platforms. The memory requirements in the JVM heap for the processing of these threads and concurrent processing have caused to create performance and scalability bottlenecks in the deployment of these J2EE applications. This article explores the issue of synchronization of threads while accessing the memory within the JVM heap on a multi-processor platform for a J2EE application.

Memory Requirement of J2EE Applications

J2EE applications are currently being deployed in the enterprise environment that require thousands (if not millions) of customers requiring data and its processing every second. This huge requirement of data from so many concurrent customers has created a demand for larger heap size, and hence more RAM. With more RAM, which has helped in increasing the heap size available for J2EE applications, and multiple processors available for simultaneous processing of threads, the bottleneck has shifted to how the memory is accessed by these threads and the time being spent in this access.

Multi-Processor Platforms

As the number of processors is increased to resolve the issue of scalability, more and more threads are being simultaneously processed within the system. These threads require memory for the processing of data, the creation of new Java objects, and other Java operations. As multiple threads are processed in the processors, there is a need to maintain the coherency and the sanctity of the data in the system. The threads in these processors are reading and writing to the memory simultaneously. Hence there is a need to synchronize these threads so that they do not read incorrect data or overwrite each other's data. In Figure 1 below, the idea of single-processor versus multi-processor platforms accessing memory is shown. On a single-processor platform, there is only one thread executing at any given moment of time and therefore no need for synchronization. However, with multiple threads executing simultaneously on the multi-processor platform, the access to the memory needs to be synchronized, which results in contention and bottlenecks.

The "Thread Local Area" attempts to solve this issue by pre-allocating a small amount of memory in the JVM heap for each Java thread. However, this memory space is found to be not sufficient for J2EE applications with high memory requirements.

Experiments and Observations

Experiments were done for a J2EE application with the code shown below on an eight-CPU platform.

For every test, these J2EE threads were asked to create Java object of a particular memory size. A multi-user load was fired and the response times, throughput, and resource utilizations were observed for these tests. The JVM heap size was kept to a large value so that the frequency of garbage collections was kept to a minimum. We have observed that even with increased loads, the ratio of the "GC Pause Time" to the "Total time the test has been run" is about 1:35, and therefore considered its effect as small in this experiment. The results are shown in Figure 2.

It was observed that as the load was increased for a Java object of a particular memory size, the rate of increase in the throughput was not as the same as that of the rate of increase in server-CPU utilization. Also, the increase in the response time was not linear, but exponential. These experiments clearly show that high memory requirements of these J2EE threads clearly create a contention in the system and acts as a bottleneck in the scaling of the J2EE applications. It was also observed that the time it takes for these threads to access the memory and create objects was added to the processor utilizations. This can be seen from the increase in the service demands for increasing loads, as in the chart in Figure 3 for the creation of a Java object of a particular memory size.

Figure 3. Increase in service demand versus load for Java objects of different memory sizes

Scale Out

As the memory contention is inherent in the system architecture, the best way to alleviate this problem is to add one more system box to the deployment configuration (scale out) with load balancing, as seen in the figure below. Scale up will not help, as adding more processors to the system will only increase the memory contention by allowing more parallel threads in your application to process, which will not help in increasing the throughput and response time.

Figure 4. Scale out of application server machine

Infrastructure Sizing

The time it takes to access memory by the threads in the processor gets added to the processor time, and hence the processor utilization. With more threads in the multiple processors waiting to access the memory, the processors do not get utilized effectively. The normal response of adding more processors to increase the output will not help in this case and will lead to ineffective capacity planning of the J2EE applications. Quantitative or simulation models for predicting performance and infrastructure need to take this into account.

The Memory Tuning of Your Applications

One of the most important steps in reducing this problem is to tune your J2EE application for unnecessary and loitering objects. Any of the memory debuggers can help you identify memory bottlenecks and achieve optimization; one example is JProbe. Memory debuggers give the memory allocation map of the application, indicating where the objects are being created and how much memory is being used. They also point out loitering objects, which contribute to the memory leak in the application.
Object reuse also needs to be considered as a strategy for memory optimization. This can be done with the help of caching frameworks like Apache JCS. These caching frameworks help in storing the objects specified in the caching policy in your application for a particular period of time and allowing the reuse of the same.

Conclusion

The need to maintain the sanctity of the data in multi-processor environment causes synchronizations in the processing of the threads, which attribute to the bottleneck in the J2EE applications with high memory requirement.
Some of the possible solutions to the memory contention include:

Scale out by adding one or more system boxes to the deployment configuration.