Taxiing to optimization

I had a driver for a couple of years. It does sound more glamorous that it really
is as Leonard, my driver, was actually a taxi driver that I had made a deal with.
Leonard was courteous, reliable and often knew my schedule better than I did. He
was also curious as to where I was off to now and what I was up to. The question
is, how does one explain to a layperson, what performance tuning is all about? But
then, I quickly realized that I was talking to a master of optimization. Here was a
person that understood the quickest way to get from point A to point B. He also knew
how to adjust this path based on expected traffic. So, I came up with an explanation
I was able to base on his ability to optimize. The world is full of opportunities
for learning. And now lets see what we can learn from this month's roundup.

One poster wondered why GC kicked in repeatedly after 32M when he specified -Xms32m -Xmx64.
Was the mx parameter being ignored? The sole answer suggested that the JVM was tuned
to believe that GC is cheaper than reallocating the heap. So once 32M is reached, it
will always try reclaiming space before asking the OS for more memory. In this particular
application, the reclaims were always successful, allowing the JVM to stay at 32M.
Neither the JVM vendor nor version was mentioned.

Another question requested how to configure the JVM for production. The answer: use
-server, and set -Xms and -Xmx heap parameters. There was no answer to the followup
question of how to choose the heap parameters. My answer is to check
http://www.JavaPerformanceTuning.com/tips/
(or the
2nd edition of Jack's book which covers the heap tuning methodology).

The "Java vs C++ speed" troll post came up yet again. This time, not much heat
was generated, as all the answers sensibly said Java was faster for some things
C++ faster for others. One poster suggested Java wasn't fast enough to
write games or real-time systems, which will come as a surprise to the gamers
over at JavaGaming.org, and lots of embedded systems writers. (They should
actually be pleased, it's always nice to be told you are succeeding at
achieving the impossible). The best answer:
'The real question should be "Is Java fast enough for the job at hand".'

A fascinating discussion about replacing multiple threads with NIO Select for
a multiplayer networked game server cropped up. This is a live game, with many
(over 100) players. The programmer found that despite the OS having free resources,
the JVM could not exceed about 1,000 threads on his Linux server (different JVMs
had different limits). He had correctly reset ulimit to allow unlimited threads
for the user, and had recompiled the kernel to allow 4096 open files per process
(up from the default of 1024). None of this seemed to help. The other posters
suggested switching to NIO, which he did, and then the thread limitation was no longer
an issue, as the NIO based server used only a few threads. However, a different
issue now cropped up. The NIO Select call was taking 100% CPU, but the
server seemed able to handle as many users as required. Instead of a proper
select blocking call, it seemed to be polling continuously. Although the discussion
never solved ths problem, the code was posted. Reviewing the code, I could see that
whenever a new socket was accepted, it was registered with the server in both
READ and WRITE modes. However, a new or unwritten socket is always ready to be
written to, so naturally the selector immediately returned each time it was
called because there were ready sockets to service. The problem was a subtle
bug in the code, difficult to understand if you haven't played with NIO selectors
before. The call to the selector always returned immediately, and was looping,
hence the 100% CPU utilization. Whenever there was spare time in the system, the
I/O service thread looped, but it wasn't actually causing any load to the
CPU other than one erroneously looping thread, so the CPU could handle all
other game engine threads without a noticeable decrease in performance.
The solution is to avoid registering the socket for WRITE mode except when
it actually needs to write.

Another poster found that using a custom ColorModel slowed painting down
enormously on one platform (MacOS X). The problem turned out to be that
ColorModels are optimized one each platform, and you should use the default
ColorModel, from GraphicsConfiguration.getColorModel(), or
Toolkit.getColorModel() to gain the optimal performance (or actually to
optimize your chances that you'll get an optimal ColorModel for optimal
performance).

Finally, there was an inconclusive discussion on whether boolean comparison
or integer comparison to 0 is faster. My guess, and that of the respondants
was that it depends entirely on the JVM/OS/compiler combinations, and optimizations
applied by them.

One poster was seeing strange behavior from his clustered Weblogic BMP entity
beans. It looked like synchronization (of data between servers) wasn't working.
The beans were strongly
optimized, with database stores and accesses only happening when the bean
had changed. But Weblogic uses ejbLoad() to synchronize by re-loading beans.
So it seems like the optimization may interact with the synchronization
to cause problems.

Another poster asked about running batch EJB jobs on millions of records. The only
answer pointed out that running a batch job simplistically, as if each record
manipulation was like one user call, would be like simulating millions of
user requests, causing millions of very short transactions and would likely
bring the system (especially database) to its knees. This poster suggested
throttling the requests, combining transactions and controlling when commits occur.

Balletic I.T.

Last week I had a conversation with a sales guy. He commented that the real
technical people would always search for what they are looking for. But the
most successful businesses were those that also catered to the hobbyist. The
interesting point is that he was talking about a ballet store. It's funny how
some observations transcend specialities.