Legion: Build your own virtual super computer with Silverlight

Legion is a grid computing framework that uses the Silverlight CLR to execute user definable tasks. It provides grid-wide thread-safe operations for web clients. Client performance metrics, such as bandwidth and processor speed, may be used to tailor jobs. Also includes a WPF Manager application.

Introduction

I have always been a fan of distributed computing. With the advent of Silverlight, I felt there was an exciting opportunity to create a Grid computing framework utilizing the client-side Silverlight CLR. That is how Legion emerged.

Legion is a Grid Computing framework that uses the Silverlight CLR to execute user definable tasks. Legion uses an ASP.NET application and web services to download tasks, upload result data, and provide grid-wide thread-safe operations for web clients or agents. Multiple tasks can be hosted at once, with Legion managing the delegation of tasks to agents. Client performance metrics, such as bandwidth and processor speed, may be used to tailor jobs for clients. Legion provides a management service and WPF application that is used to monitor the Legion grid.

I have deployed Legion to a demonstration server here so you can see it in action.

Figure: Legion system overview.

Background

While Grid computing is not new, there has been a resurgence of interest in it due mainly to organisations looking to capitalise on their underutilised IT infrastructure, to perform computationally intensive business related tasks.

Aside from large organisations jumping on the bandwagon, public interest in Grid computing has of course been stimulated by large scientific projects, such as the Seti@home project. After performing some initial research for this article I became enthused by what a volunteer grid computer is capable of achieving, and what, by offering a framework like Legion may achieve. The results of some volunteer Grids have been remarkable. For example, the Smallpox Research Grid Project managed to find 44 strong treatment candidates for smallpox; an, as yet, incurable disease. Vaccination for Smallpox ended in the 1980s, but there are fears that the virus may re-emerge. (Wikipedia, 2007; The Inquirer, 2003)

The allure of harnessing unused computing power is strong, and so too is the notion of many users contributing otherwise wasted clock cycles for the benefit of humanity. One reason why so many Grid projects have garnered so many participants, is the feeling of belonging to a shared mission; it's free, and users gain a feeling of contributing to a good cause, just by running an application in the background on their computer.

Could it be that we are on the cusp of a web paradigm shift, where web browsers share the load of the web server? Opening the door to smaller start-ups with little investment in infrastructure, providing an unlimited capacity; serving any number of clients. Such a capability has traditionally been the domain of monolithic companies with massive data centres. What would that do? Well, it would radically transform the web; providing a level of democratisation never before seen. Taking this idea further, imagine, if you will, what a peer to peer WebTorrent protocol might bring. Web servers and web browsers alike, serving as seeds and distributing the load. We are venturing outside the scope of the article a little, but I raise these ideas to encourage in the reader a feeling of opportunity. For that is what having a CLR in the browser now affords us. An opportunity, not only to create nice UI, but also to move to a less server-oriented application model. Not merely a thin-client model, but rather a distributed processing model.

I believe security to be the main challenge in providing client-centric peer to peer volunteer computing. Protecting a user's privacy is paramount, and it's risky sending sensitive information, which may be eavesdropped or modified by some rogue client. Likewise, trusting the results from an inherently untrustworthy source presents an issue. Clearly mechanisms are required to address these issues.

Likewise, providing secure cross domain browser support will bring a lot of opportunity to decentralise and share services more readily between organisations. Microsoft and others have proposed some techniques. Yet, it was inevitable that doing so within existing browser capabilities requires an assortment of hacks. Hacks, such as using IFrames to communicate across domains. While preventing cross domain browser communication seems reasonable, with today's ubiquity of webservices, we can quite easily send client data from server to server, and thus across domains, which effectively circumvents the protection mechanism anyway.

Unlike other distributed computing projects, Legion allows users to participate by simply viewing a webpage. The shift to cloud computing, and web based application, may have fostered a growing reluctance in users to download and install applications. Forcing users to download software to participate, while improving the dependability of the grid (we'll discuss the pitfalls of a browser based Grid later), must surely decrease the overall participation rate. Users prefer not to install software. It's a hassle.

Legion, on the other hand allows us to bring new 'volunteers' to our grid by means of providing enticing content. Thus, we are no longer dependent on a single motivating cause to bring participation. It provides developers with a system that will deploy and execute modules written using any .NET Desktop/Silverlight CLR compatible languages. The beauty of Legion is that it does not require user installation of any Legion software. It relies solely on the Silverlight CLR in the user's browser.

I've heard many comparisons between Silverlight, Flash, Java applets etc. Most comparison seems to resolve around what an old project manager of mine called "pointy clicky things."; pure UI. Well, in this article my aim is to take the focus away from being solely UI, and take more of a look at what having a .NET CLR in the browser might mean.

Long live Clog

Clog, unbeknownst to me, was a prelude to this project, and turned out to be invaluable in its development. Without Clog for Silverlight working on the concurrency elements of this project would have been made extremely difficult. I have, as it were, been eating my own dog food. It made debugging with multiple clients a breeze. Using the Log4NetLogStrategy I had logging coming in from the website, the Agents and even the WPF Manager at the same time. It would have been next to impossible to debug multiple clients executing concurrently. And using trace statement would have been awful. Clog is pretty cool, even if I do say so myself. I've made several important updates to its code base, which I will be publishing soon.

Legion Agent UI

On the client-side, Legion is hosted within a Silverlight module that automatically controls the downloading and execution of tasks.

Figure: Legion Agent pictured in browser.

The Agent visualizer that is provided may be customized to suit. I may provide a more discreet interface at a later stage. As can be seen, the Silverlight module is hosted on an HTML page, and will happily share this space with other content; be it HTML or perhaps some other interactive Silverlight animation to keep the user enthralled and the browser open.

The interface displays the task name, as it is defined in the server-side config, the number of tasks completed, stored on the clients machine in Isolated Storage, and the percentage complete of the current executing task.

Legion Manager

The Legion Manager is a WPF application that allows monitoring of the grid. It provides a summary of the progress of all tasks, and the current grid capacity; including available processing power in megaflops and total bandwidth for all connected clients.

Figure: Legion Grid Manager pictured in browser.

Components Overview

Legion consists of three main components. The GridComputing and GridComputing.Management components use the Desktop CLR. The third, GridComputing.Silverlight, is Legion's main client-side component, and is executed within the Silverlight CLR. The main types from each component are shown below.

Figure: Legion component overview.

The WPF Manager is a WPF application that communicates with the server-side GridComputing module to monitor the Legion grid. It consumes GridSummary instances that are created by the GridManager.

Agent and Client instances are passed to the grid web services as part of each request from the management and agent components; and they encapsulate the location data and performance metrics of callers.

The MasterTask and SlaveTask provide the respective server and client task implementations. Jobs are created by a MasterTask and dished out to SlaveTasks by the GridManager.

The GridMonitor is a tool to provide mutual exclusivity for Agents. The GridSync encapsulates the location of the code region, and the GridMonitor uses a web services to request exclusive access the region.

Agents communicate with Legion on the server-side using the JsonGridService. When working with Silverlight 1.1 we use JSON JavaScript Object Notation. You can find a little more info on working with Silverlight and web services in one of my previous articles.

Figure: JsonGridService class diagram.

The Grid service is mostly a wrapper for the GridManager with some additional error handling thrown in. The following is a brief overview of the JsonGridService methods:

Register: Lets the server know that the calling Agent is ready to be connected to the network, and that it is available for new tasks. Returns a unique identifier that will be used in future calls to identify the Agent.

StartNewTask: Retrieves a TaskDescriptor containing information regarding a new SlaveTask and the Job data required to run the task. This is called periodically by the Agent on the completion of each task.

UpdateJobProgress: Lets the server know where the client is up to in its processing of a SlaveTask.

JoinTask: Called when the SlaveTask completes on the client-side. This allows the result of the task to be processed by the MasterTask and joined with other results from other clients. This is the endpoint for a client's SlaveTask.

Disconnect: Lets the server know that the Agent will no longer be processing tasks.

Download100KB: Used when measuring client bandwidth. It simply returns a large object that brings up the SOAP message size to approximately 100 KB.

LockEnter, LockUpdate, LockExit: Used to provide mutual exclusion for client-side code blocks. When an Agent has exclusive access, it periodically polls using the LockUpdate method to inform the LockManager that it is still active (that the client browser has not been closed).

Grid Tasks

When creating a new Legion task, we create a server-side master task, and a client-side slave task. A custom task is created by extending the MasterTask class in the main GridComputing module, and by extending the SlaveTask class in the GridComputing.Silverlight module. The role of the MasterTask is to split the grid computing activity into bite size chunks, and hand these off to client-side Agents. The client-side SlaveTask is the point of concurrency; this is where we place the code that we want executed by multiple clients. When a SlaveTask completes its work, the result is sent back to the MasterTask to be merged with the results of other SlaveTasks.

Figure: MasterTask class diagram.

The client-side static TaskRunner is responsible for the instantiation, asynchronous execution, and provision of Job data for the SlaveTask. The TaskRunner is also the main point of interaction between the Silverlight UI and Legion.

In order to execute a client-side task (a SlaveTask) we require the SlaveTaskType name, and the Job data. The Job data is created by the MasterTask, while the TypeName is defined in the server-side config as slaveType. The TaskDescriptor and child Job instance are returned from the JsonGridService when the Agent makes a call to JsonGridService.StartNewJob(Agent agent).

Figure: TaskDescriptor and Job class diagram.

Configuration

Legion uses a configSection to define the tasks to be run. First we declare the section within the configSections element, as shown in the following excerpt.

It is important to note that the slaveType attribute of each task must include the entire signature of the assembly. If we fail to include any part, such as the Version or Culture attributes, Silverlight will be unable to resolve the download path for the assembly.

Agent Performance Metrics

To measure the connection bandwidth we merely call a web service that returns a result message that is approximately 100 kilobytes. We do this once when the connection is established.

I used Fiddler to throttle the download rate by setting it to Simulate modem speeds in order to test the bandwidth measuring.

Figure: Setting Fiddler to simulate modem speeds.

We can see in the next capture that, following the throttling of the download rate, the result is 6 KB/s, which is about right. The accuracy is sufficient for our purposes. Of course we are only running the test once for each client, and so we must take the result as being just a rough estimate.

Figure: Debugging Agent with modem speed simulation.

To measure the client's processor speed, we have the client execute some floating point operations.

When we prepare a job for an Agent in the server-side MasterTask we are then able to utilise the performance metrics in order to tailor a job best befitting the client. For example, if the client is on a slow connection or is unable to process a lot of operations, we can reduce the job size that we send to it.

Figure: Job tailored for Agent using performance metrics.

We can also make use of various other metrics, and client information. This is passed in the form of an IAgent instance to the MasterTask.GetRunData method.

Figure: IClient and IAgent interfaces. A MasterTask is able to use the client information to tailor a job for the Agent.

Prime Finder Task

Legion includes two demo tasks: Prime Finder, and Mutex Example. The first, Prime Finder, searches for prime numbers within a specified range. Each Agent is designated a range in which to test each number within the range for primality. When complete, the client-side SlaveTask returns the list of primes that were located, and the MasterTask inserts the results into a composite list. It's pretty simple.

Once the MasterTask is finished handing out search ranges to Agents, it continues to dispatch unjoinedJobs, until results are obtained for all.

Agent Task Execution

The SlaveTask is instantiated on the client-side using information present in the TaskDescriptor instance passed over the wire from the server-side GridManager. The TaskDescriptor contains a Job that is executed by the SlaveTask. The client-side TaskRunner is responsible for seeing that the task is downloaded and instanciated. And, it is the TaskRunner that oversees the running of the SlaveTask.

Figure: Task runner fetching a job.

The process of executing a task is outlined in the next diagram in more detail. Here we see that when the Silverlight Agent begins execution in a browser, it automatically initiates a client side task. The Agent registers with the Legion server, downloads the TaskDescriptor and begins work on the job.

Figure: Task execution.

When the client-side SlaveTask is complete, the result is merged by the MasterTask, and the TaskRunner requests a new task. Following the completion of all MasterTasks a Job with its Enabled property is passed back to the Agent indicating that there is nothing left to do. At this point the Agent will be in an idle state. Yet it will continue to periodically request tasks from the server in case new tasks become available.

Client-side Mutual Exclusion

In order to allow SlaveTask to work safely with shared data, we require a mechanism for mutual exclusivity. Just as .NET provides various types including the Monitor class for thread concurrency, I have created the GridMonitor class. With the GridMonitor we are able to safely access code regions asynchronously. The GridMonitor class uses the JsonGridService to control access to regions that are identified using GridSync instances. Each GridSync identifies one or more code regions that must only allow one client access at a time.

Figure: Silverlight Agent module, GridMonitor class diagram.

When an Agent wishes to gain exclusive access to a code block, it must make a request to the server-side LockManager using a GridMonitor in conjunction with a GridSync. The purpose of the GridSync is to identify the region, or regions, of the code that is/are to be synchronized. The ScopeTypeName property of the GridSync forms part of the identify of the GridSync, and will normally be the class type in which the GridSync is instanciated. The LocalName forms the second part of the identity, and is a name relative to the ScopeTypeName. The ScopeTypeNameLocalName combination should generally be unique, otherwise deadlocks may ensue. It's analogous to using the lock (objectInstance) {...} syntax, and behaves in the same manner. Once a Client has ownership of a lock, it may execute the code block from whatever thread it likes. Thus, multithreading on the client still requires caution when using a GridMonitor and should employ the standard mutual exclusion techniques using Monitors etc. from the FCL.

Figure: Agent denied ownership of lock. GridMonitor will block until ownership is granted, or timeout occurs.

When using the GridMonitor, we wrap it in a using statement. When the executing thread leaves the using block, the GridMonitor will be disposed, and will attempt to release the lock by calling its own Exit method. This produces an asychronous call to the server-side LockManager; automatically relinquishing ownership of the lock.

While the GridMonitor has ownership of the lock, it will poll the server-side LockManager periodically, to let it know that the Agent is still alive, and that ownership of the lock should still remain with that Agent.

Figure: LockManager class diagram.

The following excerpt from the MutexExampleSlave task demonstrates how to create a mutually exclusive block using a GridMonitor.

The MutexExampleMaster task does very little appart from incrementing a task counter and dishing out ids for demonstration purposes at client-side. The code is provided here for good measure.

///<summary>/// This is the server side implementation for a task
/// that demonstrates the use of the GridMonitor.
///</summary>class MutexExampleMaster : MasterTask
{
long taskCounter;
constint runTimes = 100;
public MutexExampleMaster()
{
StepsGoal = runTimes;
}
///<summary>/// Gets the run data for the <seecref="Agent"/> slave task.
/// This data should encapsulate the task segment
/// that will be worked on by the slave. <seealsocref="Job"/>///</summary>///<paramname="agent">The agent requesting the run data.</param>///<returns>The job for the agent to work on.</returns>publicoverride Job GetRunData(IAgent agent)
{
Job job = new Job(++taskCounter);
job.CustomData = agent.Id;
if (taskCounter >= runTimes)
{
thrownew InvalidOperationException("Task is complete.");
}
StepsCompleted = taskCounter;
return job;
}
///<summary>/// Joins the specified task result. This is called
/// when a slave task completes its <seecref="Job"/>,
/// after having called <seecref="GetRunData"/>;
/// returning the results to be integrated
/// by the associated <seecref="MasterTask"/>.
///</summary>///<paramname="taskResult">The task result.</param>publicoverridevoid Join(TaskResult taskResult)
{
if (taskCounter > 100)
{
Completed = true;
}
}
}

Expiring Collections

The most difficult challenge in creating a volunteer Grid computing platform that uses web clients is the volitility of the network, and the transient nature of the clients. At no time can we ever be certain that a node is still connected. When a client connects we assign it a connected window. This is a timespan in which the client must call home to be deemed still alive. If we are engaging in a concurrent activity we must retake ownership of any locks that the client has when the client does not call home within the window. Lest a deadlock may occur. Also, if we don't free up our data relating to a connection, then we will have a memory leak. We, therefore, periodically timeout Agents if they fail to call home. In order to accomplish this, I chose to implement a self contained mechanism for expiring client data. This comes in the form of two Expiring collections; an ExpiringDictionary and an ExpiringQueue. As it turned out, they proved to be very useful.

Figure: ExpiringDictionary and ExpiringQueue collections.

Both work the same as their compatriots in the System.Collections.Generic namespace, but with the added edition of a timer, and facility to touch items to renew their insertion date stamps. Both use an internal timer to eject items from their internal collections.

Figure: ExpiringDictionary. Items removed periodically by timer.

The ExpiringDictionary is used by the GridManager to keep track of connected Agents and their Jobs' progress, and also by the LockManager to associate locks with their respective Agents. The ExpiringQueue is also used by the LockManager, which maintains an ExpiringDictionary of LockSync and ExpiringQueue pairs. We associate a LockSync with a queue of Agents; thereby expiring all requests for a certain code block periodically, and also expiring individual requests for code blocks periodically. This minimises the amount of lock request information that we need to maintain. For example, if a MasterTask completes, then we can forget about all locks associated with its SlaveTask. Or if an user closes his or her browser, after having made a call to GridMonitor.Enter then we can safely remove the request, and free up the lock, after a specified time.

SlaveTasks are executed client-side on a low priority thread, and this is overseen by the TaskRunner.

Saving and retrieving data using the Isolated Store

Legion stores the number of tasks that have been processed by a user, using the Isolated Store. The Isolated Store is a virtual file system that is specific to each user. Thus, with multiple concurrent browsers for the same user, within the same domain, we are able to share data. I have provided a static IsolatedStorageUtility class to aid in storing and retrieving application settings and data. Unfortunately I was not able to provide the functionality that I had hoped because there isn't a Safe Critical facility to serialize objects as binary, XML, nor JSON. Once we can, perhaps in Silverlight 2.0, I will come back to it.

How to write a Grid Task

So we've seen how Legion works internally, but what is probably of more interest to you, is how to create your own custom task. All we need to do is extend two classes, and add the task to the config. The two example tasks I have included with this release will provide you with the essentials.

First, create a new regular .NET class library project, and add a reference to the main GridComputing module. In it, create a new class that extends MasterTask, and provide overrides for the GetJob and Join methods. The first method GetJob is responsible for providing a client-side Agent with what it needs to run a task. The Join method takes the output from a SlaveTask and combines it with the output of other SlaveTasks.

Next, create a new Silverlight class library project, and add a reference to the SilverlightAgent project. This will contain the client-side code for the task. Create a new class and extend SlaveTask. Include a Post Build Event for the slave project, which will copy the built assembly to your website's ClientBin directory. It is not necessary to reference the Silverlight slave task project from the SilverlightAgent project. In fact, I would recommend that you do not. They must simply be copied to the websites ClientBin directory, and Silverlight will take care of downloading the assemblies to the browsers.

Finally, define a new config element for the Task in the web.config of the website. This should include the fully qualified type name of the SlaveTask.

Future enhancements

Provide a uniform shared storage facility for Agents.

Implement half or full duplex channelling.

Conclusion

Web based Volunteer Grid computing offers a tremendous opportunity to harness the unused resources of visitors to a web site. We are able to leverage this resource in the form of a virtual super computer, and it affords us the opportunity to provide services in a manner outside of the traditional client-server model. This article discussed the design and use of Legion, a Grid Computing framework that uses the Silverlight CLR to execute user definable tasks. We saw how Legion uses an ASP.NET application and web services to download tasks, upload result data, and provide grid-wide thread-safe operations for agents. We also looked at how client performance metrics, such as bandwidth and processor speed, may be used to tailor the workload of Agents. It is my hope that this article will help to further interest in Grid computing, and also awareness of Silverlight as more than just a tool for UI.

I hope you find this project useful. If so, then I'd appreciate it if you would rate it and/or leave feedback below. This will help me to make my next article better.

Special Thanks

To Andre de Cavaignac of decav.com and LAB 49 for kindly allowing the use of his excellent WPF Line Graph.

Share

About the Author

Daniel Vaughan is a Microsoft MVP and co-founder of Outcoder, a Swiss software and consulting company dedicated to creating best-of-breed user experiences and leading-edge back-end solutions, using the Microsoft stack of technologies--in particular WPF, WinRT, Windows Phone, and also Xamarin.Forms.

Daniel is the author of Windows Phone 8 Unleashed and Windows Phone 7.5 Unleashed, both published by SAMS.

Daniel is the developer behind several acclaimed Windows Phone apps including Surfy, Intellicam, and Splashbox; and is the creator of a number of popular open-source projects including Calcium SDK, and Clog.

Would you like Daniel to bring value to your organisation? Please contact

Comments and Discussions

Very cool program, even if I don't have VS 2008. Looking at the PrimeFinder code, I noticed that you oopsed a little by going up to condition / 2 instead of sqrt(condition). A minor glitch in the grand scheme of things.

5!

There are II kinds of people in the world, those who understand binary and those who understand Roman numerals. Web - Blog - RSS - Math

The Silverlight CLR delegates thread scheduling to the operating system. So it is the OS that determines on which processor threads get to run. This is done, via SecurityCritical code, through native APIs, and is not immediately accessible to user (Transparent) code.
For example, we can see that the internal QueueUserWorkItem of the ThreadPool is marked as extern.

Indeed the Figure: IClient and IAgent interfaces does list ProcessorCount as a property, but the Silverlight mscorlib property Environment.ProcessorCount is SecurityCritical, so as it turned out I couldn’t use it. I left it with a comment, because I thought it might be something worth coming back to. On further consideration, however, I can’t see how it could be all that useful in tailoring Job size etc.

At any one time, a single SlaveTask executes within the client (Legion Agent), on its own low priority thread. Thus, we are not so interested in parallel execution on the client. But, of course, we can still use all the asynchronous mechanisms provided by the FCL for that.

This is a _great_ article well presented. Using Silverlight for something beyond presentation, like this, is an excellent example of thinking outside the box.

Now that you have the code-base in place, have you thought about how to put it to use? It seems to me that one minor problem with the SETI screensaver project is that you are never sure of what the project is actually doing. You know the system is cranking away -- but for all you know, it could simply be stuck in an infinite loop.

Are there finite projects that your grid computing platform can be put to? Also, is it always efficient, or is there a critical mass of clients at which point the cost of running the grid is less than the cost of simply running the algorithms being processed by the clients yourself?

The efficiency of the grid really comes down to the task implementation, as there is always scope to tailor jobs that are sent to clients, based on the number of clients, the processing power of the clients, or the load that the server is experiencing. There will be a point of optimal performance; one where the ratio of job size to Agent drop-out rate is minimized.
If it turned out that the Legion server was a bottleneck, then one could conceivably farm out the Legion servers.

Indeed whether one chooses to use a grid approach or not reminds me of that old analogy: Would you prefer to pull a plow with an ox or with a thousand chickens?

I have been looking for a way to contribute to an existing distributed computing project. I’m still looking for one which provides for participation via a web service.

I’ve also had some ideas such as having clients serve content to other clients via Legion. There are, although, security issues there. But it sounds interesting doesn’t it.

Another idea I had was to allow volunteers to utilize the grid by dragging and dropping their own modules onto the agent. Thus, not only would you have volunteers ‘working’ for the grid, you’d also have people utilizing the grid for their own projects, without having to setup the Legion server component.

Would you prefer to pull a plow with an ox or with a thousand chickens?

I've never heard this before, but I like it.

Daniel Vaughan wrote:

I’ve also had some ideas such as having clients serve content to other clients via Legion.

Very cool. I can imagine a game where the game gets better performance as more people join. Puts a new spin on "the wisdom of crowds."

Daniel Vaughan wrote:

Another idea I had was to allow volunteers to utilize the grid by dragging and dropping their own modules onto the agent. Thus, not only would you have volunteers ‘working’ for the grid, you’d also have people utilizing the grid for their own projects

Another great idea. People could contribute to projects they like by surrendering a bit of their computer power. It sounds like the basis for an interesting social experiment.

I think you might be sitting on something really transformative, here.

[P.S. I just noticed in your bio that you spend part of your time in Prague. I used to be a short order cook at Joe's Bar in Staro Myesta Malostrana in the early 90's when it first openned. Joe's was a great place back then, though I've heard it's slipped a bit since.]

I've been using Seti for a while, and always wondered about the architecture and code behind it. I think this article will give me a good insight, and maybe even make them rethink the next version of BOINC

I guess one of the main differences in the implementation of Boinc and Seti@home etc, when compared to Legion, is that Legion ‘Agents’ run directly in the web browser. There are pluses and minuses in both scenarios. Having a Desktop application/service running discretely on a volunteers computer allows for a greater level of confidence in nodes remaining connected. But, having a web browser based grid means that you don’t need to rely on participants downloading and installing software specifically for your cause. It also opens up the possibility for offloading server side processing to clients. The sky's the limit!