server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 WARN [org.jgroups.protocols.pbcast.GMS] [T:127] I (10.199.18.11:45393) am not a member of view [10.199.18.12:39800|9] [10.199.18.12:39800, 10.199

.18.13:39310], shunning myself and leaving the group (prev_members are [10.199.18.12:34166, 10.199.18.13:60923, 10.199.18.11:45393, 10.199.18.12:39800, 10.199.18.13:39310], current view is [10.199.18.1

I'm not sure about the cluster split and why the node does not join the cluster after the GC. I've seen cases where the network drops sometimes some of the multicast packages which might not a problem during normal operation but in case of such split.

You should keep an eye on it.

The important thing is to eliminate the long pauses of full GC and this will be a hard work as well.

One option is to use the incremental mode (-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10) see [1] for more information.

Also you should analyze the memory footprint, often it happen that objects survive a minor GC to often and go into the OldGenSpace (but die here imediately), in this case it might help to increase the young and survivor areas to avoid it.

To analyze you can use jstat, visualvm or jconsole. Where jstat can be used in production without a big impact to the running VM.