Really delete this comment?

in JPPFNode.perform(), we're missing an exception handling in case there is a socket error while writing the results of a job. When this happens, the class loader connection is not reinitialized and the node just hangs, because all subsequent class loading requests fail with an NPE. I've added the following:

After this, the node reconnects properly after the driver is restarted, and the job is resubmitted and ends normally.

There is however one remaining problem: the socket error is only detected when the results are written. This means the node may wait for a very long time that the execution of the current job is complete. A possible mitigation would be to use the recovery mechanism, however I feel this isn't very satisfying.

The problem is that, while the job is being executed in the node, we are neither reading from, nor writing to the socket connection, so we can't detect that the connection is closed.

Really delete this comment?

This is now fixed. I added a connection checker mechanism in the node, which checks if the connection to the driver is still working while the taskss are executing. It is then suspended, until the next job arrives and starts executing. As this mechanism makes execution a little slower (although it is negligible for long-lived jobs), I made it optional via a configuration property "jppf.node.check.connection = false"