Client Side Map-Reduce with Javascript

I’ve been doing a lot of scalability research lately. The High Scalability website has been fairly valuable to this end. I’ve been thinking of alternate approaches to my application designs, mostly based on services. There was an interesting article about Amazon.com’s architecture that describes a little bit on how they put services together.

I started thinking about an application that I work on and how it would work if every section of the application was talking to each other through a web service or sockets passing JSON or Protocol Buffers rather than the current monolithic design that uses object method calls. I then had the thought that why limit your services to being deployed on a set of static machines. There’s only so much expandability in that, what if we harnessed all of the unused power of the client machines that visit the site.

Anyone who’s done any serious work with ECMAScript (aka Javascript) knows that you can do some pretty powerful things in that language. One of the more interesting features about it is the ability to evaluate plain text in JSON format into Javascript objects using eval(). Now eval() is dangerous because it will run any text given to it as if it were Javascript. However that also makes it powerful.

Imagine needing to do a fairly intensive series of computation on a large data set and you don’t have a cluster to help you. You could have clients connect to a web site that you’ve set up and have their browser download data in JSON format along with executable Javascript code. The basic Javascript library can be such that a work unit will be very generic and contain any set of data and functions to perform on that data along with a finishing AJAX call to upload the results. On the server side when you have a Map-Reduce operation that you need to perform, you can distribute work units that contain both a section of the data along with the code needed to execute on it to any connected clients that have this library and are sending AJAX polling requests asking for work.

A work unit gets placed into a JSON string and sent back as the AJAX reply. The client then evaluates the reply which calls a javascript function that processes the data (which is probably a map job). Once the data is process the javascript automatically makes another AJAX call to send the result data back, most likely with some sort of work unit ID to keep track of anything, and requests the next work unit. The server then just coordinates between the various clients, setting time outs for each work unit so that if someone closes their browser the job can be handed out to an active client or a back-end server.

This will work a lot better on CPU intensive processes than it will on memory intensive ones. For example, sorting a large data set requires some CPU, but a lot more memory because you need to keep each element in memory that you’re sorting. Sending entire large lists to clients to sort and receiving the results would be slow due to bandwidth and latency restraints. However, performing large computations on a smaller series of data such as what’s done with SETI or brute force cryptography circumvention where you can send heartbeats of partial results back, there could be a benefit.

The limits of course will be on how much memory you can allocate in your browser to JavaScript. Also, since this technique would focus on heavy computational functions, the user will probably notice a fairly large hit on browsing speed in non-multithreaded or multiprocessing browsers. Naturally from a non-scientific point of view, most people would be outraged if their computer was being taken advantage of for its computing resources without their knowledge. However for applications working on an internal network or applications that state their intentions up front, users might be interested in leaving a browser open to a grid service to add their power to the system. This would probably be easier if you make each work unit worth a set of points and give them a running score like a game.