I'm evaluating mod_cluster with apache 2.2 and JBoss EAP 4.3, and I'm getting some strange behavior.After configuring everything, all seems to work fine at first, my requests are properly balanced. But as soon as I start to stress-test the config with the included demo-client, I find that after a while some request are failing with 404 errors. The failures occur in a bursty style (all failures happen at the very same moment - or at least in the same second).After thousands of correct responses I get some 404-s to a load-balanced url, and one second later everything gets back to normal. This behavior occurs for one second every 1-3 minutes. There is no sign of any overload anywhere (OS, network, etc.).After turning on the debug level logs in apache, I've found some strange messages, that seems to be related to the problem, and which I can't explain:

Thanks for the tip, it didn't work though :-(Further filtering the debug level log messages shows an interesting symptom.It seems like at time of the 404 errors the "update_workers_node" process works differently than other times...

Thank for the tip, but that didn't help either. The behavior is just the same.

What we have found is that setting the backgroundProcessorDelay parameter in server.xml to 10 seems to make the error less common but still existent. As we're using JBoss EAP 4.3, the default value for this was -1 according to the documentation. We did not explicitly define this before.The pattern is all the same: normal behavior, then lots of 404-s for a second, then everything gets back to normal for minutes again. The length of the normal period seems to be random...

We've already tested lots of different configs (RHEL/Ubuntu, Apache 2.2.7/Apache 2.2.13, Apache prefork/worker mode, 1/2 working nodes, VMWare/KVM/raw metal, etc.), but all of these configs show the same error.

The MaxRequestsPerChild setting was explicitly 0 (no max requests) in my config, but changing this to an extreme low (1), "normal" (10000) or extreme high but not infinite (1000000) value did not change the behavior. Using the demo client the number of live client threads stabilizes at around 20-30, all others fall out with 404 within the first few minutes of the stress testing. After some time (talking about hours) each of the clients get a 404 and go dead.My client settings are: 80 clients, 40 second startup time, 100ms sleep time.

The probability of the occurrences seems to be rising with the load, so it seems like a threading or performance issue(?)... the strange thing is that I don't see any bottlenecks with any of my monitoring tools (JBoss, Apache, OS and so on), so it's probably threading.The other unambiguous characteristic of the symptom is that it's bursty. Lots of 404's come in the very same second.

Thanks for the help so far, I'd be glad if you had more suggestions. I'd really like to put mod_cluster to production (my company is adopting lots of JBoss products), but this issue needs to be resolved first.