My application runs perfectly in both, development and production mode on my local development machine. When deploying on the production server, loading the first GWT page fails with the following message in the browsers error console:

The effect is that the html and javascripts load, can be seen in the page source, but the page remains empty to the user.

The production environment is jetty6 on a machine in a DMZ and currently we have port forwarders on the firewall to allow connectivity from the outside. Tried apache mod_proxy too and had the same problem.

I've checked the basics, connectivity, paths, etc. Since there are no entries in the server-side log, I suspect that the handshake to the remote bus times out. Any ideas what could cause that or how to get more debug information? Thanks!

Am currently using Errai 2.0.0.Final, but this has been happening with all CR and Beta versions too.

Yes, it definitely appears that the connection to the server is timing out. You may need to look at the network requests in whatever browser you're working in to get more details as to what might be going wrong.

Thanks, the problem is intermittent (the worst kind). Everything worked until the next jetty restart. This is what I see in the last request that remains in a pending state when connecting via an ssh portforwarder and going around the firewall on the production server:

Interesting that the production server is a virtual machine. Restarting the host also helped. But the problem keeps recurring and I would like to get to the bottom so that we can set up the environment properly.

Unfortunately, this looks like a tricky one. As you say, intermittent problems are the worst kind.

Here are some things you might want to try if you haven't already:

1. Try disabling long polling in the DefaultBlockingServlet by setting the system property org.jboss.errai.bus.do_long_poll=false. This option is really only intended for testing, but it's possible the proxies have introduced some sort of "pinch point" in the HTTP data path that's stalling the long polls.

2. Check if there's something in your server-side application that's blocking for a long time (for example, a thread race condition leading to deadlock). When the app wedges up, get a thread dump from the VM by sending Jetty a SIGQUIT (or use the jstack program that comes with the JDK).

3. Try staying connected to the app from various places (including locations inside the DMZ network) and see if they all freeze up at the same time, or if they lock up one at a time.

Sorry I don't have any specific insights here, but the more things you try, the more likely we can figure out what's going on.

Thanks Jonathan, the pointers you provided was exactly what I needed. At the moment - since the restart of the VMWare host - the problem doesn't manifest. It's definitely something in our environment. Will keep this thread posted with the development.

We are seeing what looks like the same problem here, though I have some more details on the aberrant behavior which might help to figure out what is going on. Here is a synopsis:

environment: happens reproducibly in our demo environment, which is a jetty server running weak single-core VM on an overtaxed machine --- however, similar behavior can be demonstrated on a developer workstation.

symptom: our client freezes at the "Loading JavaScript, please wait..." --- actually, under the cover, it has just sent the initial POST request to initialize the server (and logged "sending initial handshake to remote bus"), and is sitting around waiting for the response.

steps to reproduce:

start the jetty server

wait until it unpacks the warfile, starts up the server side, and "[bus] buffer status [freebytes: ...]" is logged

try to connect a client

We can get different, related behaviors after Step 3, however. I've got three interesting cases.

Case 1 - Single client connects to a fast(er) server:

Client issues intial POST request, and waits.

Some short time later (e.g. around a minute), server responds.

Client continues with initialization and runs as expected.

When subsequent clients (re)connect, the response to the initial POST takes about 100ms.

AND, reloading the client page (or any new client connections) after that generally no longer have the big delay after the initial POST, but they fail anyway, showing two instances of "received response from initial handshake" (etc) in the client-side log.

I'll add a client-side log from "Case 3" to the end of this posting. The delay between the client's initial POST and the server's response is corroborated by packet captures on the server VM; i.e., I doubt that the VM networking has anything to do with the problem. I've only tested this with the DefaultBlockingServlet. (We switched away from the JettyContinuationsServlet a few months ago because we were experiencing other strange delays/latency in message handling.) The attached log is from a system using the AsyncDispatcher, although we see the same behavior with the SimpleDispatcher.

Here are my theories about this behavior:

The server-side does some lengthy/complex initialization *after* it receives the first client request. On a slow machine, this causes it to sit and think for a long time before responding to those first clients.

If that delay is too long, something(s) timeout on the client-side, and/or server-side --- but other things don't. When the client finally receives a response, the original request handler still tries to handle it, but other components have already given up, so it does not succeed.

Something is not being locked properly on the server-side, so that if multiple clients connect while the server is still doing its complex initialization, the state gets confused enough that it cannot properly handle future startup of new clients.

If true, 2 and 3 would be bugs of one sort or another. 1 would be an anti-feature --- if that's really what is going on, I think we'd prefer a setting (and/or default behavior) to just do all the initialization at server startup, and not defer anything until the first client request came in.

Here is a grab from the firefox web-console for a client in Case 3 (with some editing/notes):

The problem reproduces for me exactly as Alvin described it. The time between jetty startup and client successfully initialising the errai bus is 10-15 minutes on a not too overtaxed machine. This happens in production mode only, however. In development/hosted mode there is no significant delay.

We encounter the same problem. After a deployment on our Linux machine, the very first page request takes almost 10 minutes to complete. The first out.X-X.errai POST request has a very long time to complete.

Okay, it sounds like the entropy pool in Linux is still too small even for our reduced demands on SecureRandom.

To answer your question, yes, IMO, 1024 bits of seed is more than necessary. We've just pushed a revised seed generator to both the 2.1 and 2.0.1 branches. Once the snapshot build is out, could you try it and see if it helps?

What I don't understand: why does it take so long to create a seed; what's the idea behind this seed algorithm? During that time, the CPU of the production system is hardly used? Do you have some references for me to read?