we use hudson to run these xts crash recovery tests every hour and we build the jboss-as with our jbosstm snapshot every time by using the following steps:

#BUILD JBOSS-AS

cd ${WORKSPACE}

rm -rf jboss-as

git clone git://github.com/jbosstm/jboss-as.git

if [ "$?" != "0" ]; then

exit -1

fi

cd jboss-as

git checkout -t origin/4_16_BRANCH

if [ "$?" != "0" ]; then

exit -1

fi

git remote add upstream git://github.com/jbossas/jboss-as.git

git pull --rebase --ff-only upstream master

if [ "$?" != "0" ]; then

exit -1

fi

MAVEN_OPTS=-XX:MaxPermSize=256m ./build.sh clean install -DskipTests

if [ "$?" != "0" ]; then

exit -1

fi

#RUNNING XTS CRASH RECOVERY TESTS

they have always work fine until the first error happen about #90 (2012-4-23 19:20:56) and it looks like Intermittent. we are investgating in this issue and don't have a lot of information to hand over.

That seems to be really confusing. Your master dates back 23 days. In the network I see that your 5_BRANCH builds on top of that master but has a lot of commits that come from the jbossas/master. Anyway, the top of your 5_BRANCH uses an OSGi layer that is prior to the Final versions that have been put in place 4 days ago.

The Final versions I consider stable and ready for EAP. If you still have this issue with those Final versions there would be an issue indeed - otherwise I'd consider this as "out-of date"

Perhaps you like to consider a workflow like this.

* frequently pull the AS master from upstream and push to your master (never commit to your master)

* do your work on a feature branch that only contains your commits

* when you see relevant changes in master, rebase your branch onto master

Just to be clear, our branch (5_BRANCH) doesn't do anything but include the latest version of Transactions, everything else should be as per master (at the time of our last rebase). That said, it seems like we could have been quite out of date when we first reported this issue.

So periodically what we do to our 5_BRANCH is:

git checkout 5_BRANCH

git up jbossas master

git push

I have nuked ce58ff5b4d83869e747df8a7620633588e363679 from our 5_BRANCH too, so now our only deviation from master should be 4e37ce2951dadaace3639192afc2a280ad902060 (the commit where we change the version of transactions - after my nuking it is now rev 5ac4867d0429065c442c2a14d581b71d12269bbe).

I should also point out that my comment about "periodically pulling the latest version" is somewhat superfluous as (as Amos pointed out) our build script actually does this:

git remote add upstream git://github.com/jbossas/jboss-as.git

git pull --rebase --ff-only upstream master

if [ "$?" != "0" ]; then

exit -1

fi

To condense the actual steps:

rm -rf jboss-as

git clone git://github.com/jbosstm/jboss-as.git

cd jboss-as

git checkout -t origin/5_BRANCH

git remote add upstream git://github.com/jbossas/jboss-as.git

git pull --rebase --ff-only upstream master

So we can be sure we are always using the latest version of the AS regardless of what our github variant suggests, e.g. lets say our 5_BRANCH was point at r1 (mixing my scm metaphores but hopefully you will see where I am going), the rebase means we will always be at rN (where N is the latest commit) by virtue of the rebase, plus our commit which rolls the transaction subsystem update in will always appear on top.

This happens only on certain (lower spec) VMs in our CI cluster. It happens during a sequence on Junit tests. The nodes it happens on are slightly lower spec than the ones that the same tests work on. Amos is investigating all avenues (e.g. arquillian etc) to try to find where the issue comes from.

We use the managed profile for arquillian.

Our test:

1. Starts AS (using arquillian)

2. Crashes AS (using byteman to do a JVM halt)

3. (Re)starts AS (using arq)

It is at stage 3 we get the failure on low spec boxes, note that subsequent tests (using the same pattern, do pass). Is there any kind of resident program spawned by OSGI/socket/file timestamp checking going on that might explain why the msc thinks there is one an instance registered running? Here is the osgi part of that stack, I am wondering what PersistentBundles are:

at org.jboss.as.osgi.service.PersistentBundlesIntegration.addService(PersistentBundlesIntegration.java:75)

at org.jboss.as.osgi.parser.OSGiSubsystemAdd$1.execute(OSGiSubsystemAdd.java:100)

It makes me think that somehow arq is using the initial instance of the AS spawned in step 1, but then why are the first two (for example) OSGI services able to be installed? Also, the crash in step 2 is a JVM halt, so I can't see it being an issue with not being shutdown properly.

For confirmation, here is a list of the osgi version in our pom.xml which should match master of today: