Monthly Archives: July 2017

As I described previosely, in 2012 we were implementing a large Documentum project, and that time we were observing a lot of performance and stability issues (actually, it was bit weird to observe such issues in mature product maintained by talented team). And one of issues was following: it was not possible to restart JMS during business hours (from my perspective it is not a big deal to do that if your application is properly designed – business users do not consume JMS, so, bouncing JMS should not affect business users):

about 5 of 10 restarts ended with deadlocks in java

about 4 of 10 restarts ended with crash of java process with SIGBUS error in libzip.so, something like:

So, we filed a couple SRs and after a while EMC support started claiming that first two issues were addressed in recent Documentum patches. Frankly speaking, during last 4 years I haven’t seen JMS crashes with SIGBUS error in libzip.so, but now I’m not sure that talented team had really fixed the issue, because two days ago our DevOps engineer claimed that Documentum 7.2 is still affected and he was a witness of three succeeded crashes of JMS with SIGBUS error in libzip.so – I just have lost faith in talented team 😦

Well, let discuss this odd error and try to fix it 🙂 If you google for java + sigbus + libzip.so you will find a plenty of bugs in openjdkJIRA and all of them states something like:

it is expected that application receives SIGBUS error if mapped file gets overwritten:

SIGBUS Attempted access to a portion of the buffer that does not
correspond to the file (for example, beyond the end of the
file, including the case where another process has truncated
the file).

there is a -Dsun.zip.disableMemoryMapping=true system property which was designed to “prevent crashes”

So, JDK seems to “work as designed”, nobody is going to fix this behaviour (not true for JDK9, but for current JDK versions we must take this behaivour as given) and the real root cause of our problem is “somebody” changes jar files on JMS while they are in use and, hence, causes it crashes, and this “somebody” is an odd TBO implementation. Is it possible to fix it? Actually, I have no idea what talented team was doing four years ago, but I have found robust solution within four hours (if my memory serves me right they was trying to fix the issue during four months) – the good starting point here is to find/write a good testcase and I have found it in JDK-8017777 SIGBUS in java.util.zip.ZipFile.getEntry:

Well, JVM didn’t crash, but we got a plenty of errors, so, -Dsun.zip.disableMemoryMapping=true system property prevents crashes only and seems to be unusable 😦 Here we need to stop blaming JDK and start thinking about following: what does mean “mapped file gets overwritten”, i.e. are the following scenarios are doing the same or not: