Compute Server Design

Let's start with a big picture of the compute server and its relation to JavaSpaces. The typical space-based compute server looks like the figure below. A master process generates a number of tasks and writes them into a space. When we say task in this context, we mean an entry that both describes the specifics of a task and contains methods that perform the necessary computations. One or more worker processes monitor the space; these workers take tasks as they become available, compute them, and then write their results back into the space. Results are entries that contain data from the computation's output. For instance, in a ray-tracing compute server, each task would contain the ray-tracing code along with a few parameters that tell the compute process which region of the ray-traced image to work on. The result entry would contain the bytes that make up the ray-traced region.

A space-based compute server(Illustration by James P. Dustin, Dustin Design)

There are some nice properties the contribute to compute servers' ubiquity in space-based systems. First of all, they scale naturally; in general, the more worker processes there are, the faster tasks will be computed. You can add or remove workers at runtime, and the compute server will continue as long as there is at least one worker to compute tasks. Second, compute servers balance loads well -- if one worker is running on a slower CPU, it will compute its tasks more slowly and thus complete fewer tasks overall, while faster machines will have higher task throughput. This model avoids situations in which slow machines becoming overwhelmed while fast machines starve for work; instead, the load is balanced because the workers perform tasks when they can.

In addition, the masters and workers in this model are uncoupled; the workers don't have to know anything about the master or the specifics of the tasks -- they just grab tasks and compute them, returning the results to the space. Likewise, a given master doesn't need to worry about who computes its tasks or how, but just throws tasks into a space and waits. The space itself makes this very easy, as it provides a shared and persistent object store where objects can be looked up associatively (for more details on these aspects of spaces, please refer to November's column). In other words, the space allows a worker to simply request any task, and receive a task entry when one exists in the space. Similarly, the master can ask for the computational results of its (and only its) tasks -- without needing to know the specifics of where these results came from.

Finally, the space provides a feature that makes this all work seamlessly: the ability to ship around executable content in objects. This is feasible because Java (and Jini's underlying RMI transport) makes underlying class loading possible. How does this work? Basically, when a master (or any other process, for that matter) writes an object into a space, it is serialized and transferred to that space. The serialized form of the object includes a codebase, which identifies the place where the class that created the object resides. When a process (in this case, a worker) takes a serialized object from the space, it then downloads the object from the appropriate codebase, assuming that its JVM has not downloaded the class already. This is a very powerful feature, because it lets your applications ship around new behaviors and have those behaviors automatically installed in remote processes. This capability is a great improvement over previous space-based compute servers, which required you to precompile the task code into the workers.