Couple of days back, I was asked to look into some web server related issue, at our partner's site. According to them, they packaged and deployed the web application, per the instructions of Sun Java Web Server (aka iPlanet web server) documentation -- yet they couldn't access their application using a web browser. They gave me a clue that they noticed some error (see below) during the web server start up:

[23/Mar/2006:04:07:29] failure (11038): WEB4220: The web application [/mywebapp] is unavailable because of errors during startup. Please check the logs for errors

The first thing I did was to check the actual log (<webserver_root>/<server_instance>/logs/errors) file for a more detailed error message, and found the one that I'm looking for:

The real problem is the actual order of welcome-file-list and mime-mapping tags in web.xml. mime-mapping tag should appear before welcome-file-list in web.xml file. So, swapping welcome-file-list and mime-mapping tags fixed the issue, and the web application is accessible through a web browser, now.

The key is finding the real order of all tags that we define in web.xml. All web.xml files should conform to the XML DTD for a Servlet web-app (war) module, in order for the web server to load the application, properly. The last known published DTD is available in Sun Microsystem's web site at: XML DTD for a Servlet 2.3 web-app (war) module. Apparently the following piece {in DTD} helped me resolving the issue:

In an effort to clean up the outdated content, SDN/Sun Studio team got all the published articles reviewed one more time. Since there are two articles under my name, they forwarded the new feedback and asked me to make the changes, as they fit. The updated content is live now, and is available at the following URLs:

It is a known fact that multi-threaded applications do not scale well with standard memory allocator, because the heap is a bottleneck. When multiple threads simultaneously allocate or de-allocate memory from the allocator, the allocator will serialize them. Therefore, with the addition of more threads, we find more threads waiting, and the wait time grows longer, resulting in increasingly slower execution times. Due to this behavior, programs making intensive use of the allocator actually slow down as the number of processors increases. Hence standard malloc works well only in single-threaded applications, but poses serious scalability issues with multi-threaded applications running on multi-processor (SMP) servers.

Solution: libumem, an userland slab allocator

Sun started shipping libumem, an userland slab (memory) allocator, with Solaris 9 Update 3. libumem provides faster and more efficient memory allocation by using an object caching mechanism. Object caching is a strategy in which memory that is frequently allocated and freed will be cached, so the overhead of creating the same data structure(s) is reduced considerably. Also per-CPU set of caches (called Magazines) improve the scalability of libumem, by allowing it to have a far less contentious locking scheme when requesting memory from the system. Due to the object caching strategy outlined above, the application runs faster with lower lock contention among multiple threads.

libumem is a page based memory allocator. That means, if a request is made to allocate 20 bytes, libumem aligns it to the nearest page (ie., at 24 bytes on SPARC platform -- the default page size is 8K on Solaris/SPARC) and returns a pointer to the allocated block. As these requests add up, it can lead to internal fragmentation, so the extra memory that is not requested by application, but allocated by libumem is wasted. Also libumem uses 8 bytes of every buffer it creates, to keep meta data about that buffer. Due to the reasons outlined in this paragraph, there will be a slight increase in the per process memory footprint.

Quick tip:Run "truss -c -p <pid>", and stop the data collection with Ctrl-c (^c) after some time say 60 sec. If you see more number of system calls to lwp_park, lwp_unpark, lwp_mutex_timedlock, it is an indication that the application is suffering from lock contention, and hence may not scale well. Consider linking your application with libumem library, or pre-load libumem during run-time, for better scalability.

The following DTrace script really helped me nailing down some lock contention issue, that I was looking into, at work. This simple script records all the call stacks, upto 60 frames, whenever a call has been made to lwp_*() API, explicitly or implicitly. At the end (ie., when we press ^C), it dumps all the stack traces along with the number of times the same call stack was executed. It also prints the duration (in seconds) for which the data was collected, and the IDs of active LWPs.

This script can be easily modified to obtain the call stacks to any kind of function call, by replacing "lwp_*", with the actual function name. Also "execname" has to be replaced with the actual process name.

Solaris Zones: Resource Controls - CPU explains the steps to control the CPU resources on any server, running Solaris 10 or later. It is also possible to restrict the physical memory usage by a process, or by all processes owned by a user. This can be done either in a local zone or in a global zone on Solaris 10 and later. Note that Solaris 9 and later versions can be used for capping physical memory.

The goal of this blog entry is to show the simple steps in restricting the total physical memory utilization by all processes owned by an user called giri to 2G (total physical memory installed: 8G), in a local zone called v1280appserv.

To achieve the physical mem cap, we have to start with a project creation, for the user giri. A project is a grouping of processes that are subject to a set of constraints. To define the physical memory resource cap for a project, establish the physical memory cap by adding rcap.max-rss attribute to the newly created project. rcap.max-rss in a project indicates the total amount of physical memory, in bytes, that is available to all processes in the project. Project creation and establishing the physical memory cap steps can be combined into one simple step as shown below:

That's about it. When the resident set size (RSS) of a collection of processes owned by user giri, exceeds its cap, rcapd takes action and reduces the total RSS of the collection to 2G. The excess memory will be paged out to the swap device. The following run-time statistics indicate that the physical memory cap is effective -- observe the total RSS size under project appservproj; and also from the paging activity (vmstat output).