Introduction

Many computational problems facing financial engineers and the scientific community today require massive amounts of computing power. There are several ways to deliver this computational performance. Symmetric multiprocessing systems or message-passing interfaces have been the standards for meeting high performance computing needs. However, using parallel web services, which are based on standard protocols and run on commodity x86-based servers, may be a simpler and less expensive way to deliver the computing horsepower needed.

This article demonstrates the power of asynchronous web services by explaining how to set up parallel web services with .NET 2.0 to tackle a Monte Carlo simulation that can approximate the value of Pi to several decimal places. Monte Carlo simulations use randomly generated numbers to evaluate complex problems that are not easily solved with more traditional mathematical methods. Monte Carlo methods are used for modeling a wide range of challenging financial problems and physical systems in scientific research.

Estimating Pi (π) using Monte Carlo simulation

The area of a circle is Pi*r^2, where r represents the circle's radius. To estimate the value of Pi, consider the quarter of the circle in the first quadrant, the area prescribed by (Pi*r^2)/4. See Figure below. The square in which the quarter circle is inscribed has an area of r^2. By setting the radius equal to 1, the quarter circle's area is Pi/4 and the area of the square is simply 1. The ratio of the area of the quarter circle to the area of the square is (Pi/4)/1 = Pi/4.

Consequently, Pi equals 4 times the ratio of the area of the quarter circle to the area of the square. A Monte Carlo simulation can be used to estimate this ratio and therefore approximate the value of Pi. The ratio can be estimated by randomly choosing points (x,y) within the square and keeping track of the number of those points that fall within the quarter circle - those where Sqrt(x^2+y^2)<=1 - versus the total number of points tried. Once a good estimate of the ratio is established, Pi can be estimated by multiplying the ratio by 4. The quality of the Pi estimate is dependent on the number of points used. The more points used, the better the estimate of Pi. Depending on the degree of accuracy required, millions or billions of points may be tried. Distributing billions of point calculations across multiple servers running Monte Carlo web services will parallelize the process and generate accurate results quickly.

Understanding web services

A web service is essentially a function or method that is available to other machines over a network. Because web services use standardized interfaces such as the Web Services Description Language (WSDL), SOAP, XML and HTTP that are integrated with development tools like Visual Studio 2005, much of the work required to take advantage of web services is already done. Calling a remote web service from an application is only slightly more difficult than calling a local function.

Calling web services or functions across a network may involve a significant amount of latency. Good parallel web service architectures will minimize the frequency of the calls across the network relative to the amount of processing done by each call. This is usually achieved by designing the main computational loop into the web service and designing the client application to specify the number of iterations needed when calling the web service. This type of design will keep network traffic low relative to the amount of processing performed by the cluster of servers.

In the Microsoft environment, web services effectively run on top of IIS. This means that multiple calls to a web service will automatically be multi-threaded at the server. Another benefit of web services is that they are easily clustered by simply deploying the web service on many different servers. Consequently, applications that use web services can parallel-process by spreading the computing requirement across many threads on many CPUs. This is particularly beneficial to "embarrassingly parallel" workloads, which run many parallel computations with essentially no dependency between those computations. Most Monte Carlo simulations fall into the embarrassingly parallel category.

Server Load Balancers make parallel web services even easier. A server load balancer (SLB) distributes multiple client requests across many different servers. By using a server load balancer in front of a cluster of servers, an application can point at the load balancer and treat it like a single server. If additional servers are needed to meet the computational requirements of the client application, they can be added behind the SLB without any modifications to the web service or the client application. SLBs have become extremely sophisticated. However, to parallelize web services, only the most basic features are needed. A "round-robin" distribution with no session affinity will work nicely for most Monte Carlo simulations. More sophisticated SLBs can monitor the network or CPU utilization of each server in the cluster and pass the next request to the least utilized server. Depending on the type of problem being solved, these features can be extremely useful, but are usually not required.

When configuring an SLB for a web service cluster, make sure that session affinity and session persistence are turned off. Session affinity and persistence route traffic from a client system to the same server every time. This is required if the SLB sits in front of servers that are handling web shopping cart applications, but it is exactly the opposite of what is needed for a parallel cluster of web services.

Client applications can also take advantage of a parallel web service architecture without a server load balancer. The calling application simply handles the distribution of requests across the pool of servers itself. However, Windows Server 2003 ships with Network Load Balancing, which effectively balances up to 32 servers without a physical SLB. There are also open source SLBs like "Balancer," which can turn a Linux-based server into a cheap SLB for a compute farm.

Architecting the Monte Carlo web service

The MonteLoop web service shown below generates random points and checks each point to see whether it falls in the quarter circle. MonteLoop requires the arguments Tries and Seed. Tries represents the number of random points to generate. Seed is the seed number for the random number generator.

When using random number generators on multiple systems or in multiple threads, it is important to make sure that each instantiation of the random class uses a different seed. By default, the .NET random class uses the system clock to generate the seed. When calls are spread across multiple threads or multiple systems, it is possible to generate identical random sequences using system clocks. Consequently, seeds should be specified by the calling application.

MonteLoop returns the object class Monte, which consists of InCircle, the number of points that fell inside the quarter circle, and Tries, the total number of points tested. As described above, these two outputs create the ratio needed to calculate Pi. Technically, Tries need not be returned since the calling application is already aware of the number of tries requested. However, to fully demonstrate the use of web services for Monte Carlo simulations, it is important to show how classes can be returned.

Creating the web service in Visual Studio 2005

To create the MonteLoop web service in Visual Studio 2005, choose File, New Web Site, and ASP.NET Web Service. Make sure that the selected language is Visual C#. Edit the File System name so that it points to the correct directory and ends with a meaningful name. Then click OK. Copy the Monte class and the MonteLoop shown above into the public class Service. If the workstation is configured with IIS and .NET 2.0, test the web service without debugging by pressing ctrl+F5. Internet Explorer will open, showing the service description. Click on the MonteLoop link. Enter 4000 in Tries and 12345 in Seed and then click Invoke. The XML results will be shown in a new window. InCircle should be 3142 and Tries should be 4000.

Next, publish the web service by clicking on Build and then Publish Web Site. Now click the "…" icon under target location. To test on the localhost, choose the Local IIS icon, select "Default Web Site," and click the add directory icon. Enter the name of the new directory and click "Open" then "OK". Test to make sure everything worked correctly by opening Internet Explorer and entering http://localhost/{directory_name}/service.asmx.

An easy way to push the web service to multiple servers is simply to copy the files in the localhost directory to the appropriate servers' IIS directories, typically "C:\Inetpub\wwwroot\{directory name}", and then specify that those directories contain applications under IIS. To specify that a directory contains an application, open the Application Server control panel, click on IIS Manager, find the new directory that contains the web service, right click that directory, select Properties in the Directory Tab, click the Create button, and click OK. To test the web service on the server, simply open IE and enter http://{server IP address}/{directory name}/service.asmx.

Calling web services asynchronously

With Visual C# 2005 and .NET 2.0, asynchronous web service calls are slightly different than in prior versions. With .NET 2.0, the web service proxy uses methods ending in Async and Completed to make asynchronous calls. In order to use these asynchronous calls, a handler for the Completed event must be registered using the CompletedEventHandler. An asynchronous web service call begins with the Async method. Once the web service has finished execution, the Completed event fires, causing the specified event handler to execute. The specified event handler receives the web service results as event arguments in the second parameter. The next code sample demonstrates how this works.

In order to take advantage of the parallel nature of web services, the client application must be able to asynchronously call the web service many times before receiving any results. If the web service sits behind a server load balancer or if the calling application is going to make more than two calls at a time to a particular server -- which is typical when using multicore or hyperthreaded processors -- then the application must specify that more than 2 connections are allowed from the client to the same server address. W3C mandates that only 2 connections are allowed from a client machine to the same address. Visual Studio and .NET enforce this by default. To override the default, set max connections in the configuration section of the app.config file. One strategy is to set the maximum number of connections equal to the total number of CPUs or Hyper-Threaded CPUs available. However, a common rule of thumb is to set the maximum number of connections to 12 times the number of CPUs across which the web service is distributed. Add the following to the app.config file within the tag:

Designing the client application

The client application consists of two major parts: 1) the class created to call the web service asynchronously and 2) the main function. The class consists of two methods: Start and DoneCallBack.

The Start method configures the web service, sets up the event handler, calls the web service and increments the counter ActiveWebServiceCounter to track the number of called web services that have not yet returned results. The DoneCallBack method is called when the web service Complete event fires. This method adds the results of each individually called web service to the total number of points in the quarter circle and the total number of points tried. DoneCallBack then displays the current value of Pi and the total number of points tried so far. Showing the results as the web service calls return notifies the user that the program is still running and gives a feeling of the current level of precision. DoneCallBack also decrements the ActiveWebServiceCounter. The main portion of the program will check this counter to determine if all of the web service calls have finished running before reporting the final results.

The main function of the program instantiates an object of the Cluster class for each web service call that is going to be made. It generates a list of random numbers that will be used to seed the random number generators of the web service calls, and then starts a loop to fire off the web service calls. Even though the calls will be asynchronous, the loop will be blocked by the number of maxconnections specified in the app.config file. Consequently, once the number of requests exceeds maxconnections, the application will wait for a web service to complete before making another web service call. After the loop finishes making the predetermined number of web service calls, the program will wait for all of the web services to complete. ActiveWebServiceCounter decrements each time a web service call completes. When ActiveWebServiceCounter reaches zero, all of the web services are done and the totals have been updated. The program then outputs the final value of Pi and the total number of Tries.

For testing purposes, the program tracks the starting time, ending time and calculates the total elapsed time required for the Monte Carlo simulation. Note that PiConsoleApp is designed to use remote Web References. Pointing PiConsoleApp at the localhost may generate unexpected results. An unlimited number of connections are allowed to the localhost, regardless of the maxconnection setting. The local system can quickly be swamped by the main loop.

Creating the application in Visual Studio 2005

To create the PiConsoleApp application, choose File and then New Project. Again ensure that Visual C# is selected.Choose the Console Application icon and enter the name of the application. The name entered will be used as the namespace. Next, copy the WSAsyncCall class, the Main function and the variables from above into the new application under the Program class. Create a Web Reference to the server or servers running the PiMonte web service by clicking the Project drop down and selecting Add Web Reference. In the URL text box, enter the complete URL of the web service, i.e. http://{server_IP_address}/{directory_name}/service.asmx. Remember to NOT use localhost. If the web service cluster is running behind a Server Load Balancer, enter the virtual server name or IP address followed by any server directories and the service name with the .asmx extension. To use the source code unaltered, enter "WSReference" as the name of the Web Reference and click Add Reference. Visual Studio does not always handle renaming references or namespaces correctly, so try to be consistent from the beginning. Next, edit the app.config file's maxconnection tag as described above. At this point, the application should be ready to run.

Parallel web services without a load balancer

In the sample code above, one Web Reference was created by pointing at the IP address of a server load balancer that distributes the requests to multiple servers. If you do not have a load balancer or do not wish to configure the Windows 2003 Network Load Balancer, you can still use web services to parallelize an application. By creating a web reference to each server and then creating a class like the WSAsyncCall class for each reference, you can distribute the web service calls across multiple servers in the main loop. Simply round-robin the web server requests through the number of servers available.

Parallel web service performance

The performance of this parallel web service-based application was tested using two dual core 2.8GHz Xeon processor-based servers running Microsoft Windows Server 2003 R2. The network was a 100Mb network with multiple hops between the client application and the servers. Each test consisted of 16 billion Monte Carlo points.

A server running a single threaded version of the Monte Carlo simulation required 34 minutes and 34 seconds to complete. Two servers using the Network Load Balancing feature included with Server 2003 running parallel web services typically required 4 minutes and 30 seconds. The same two servers using an external Server Load Balancer typically required only 3 minutes and 50 seconds to complete 16 billion Monte Carlo tries. The performance improvement of parallelizing this type of application is astounding.

Running 7 Monte Carlo simulations of 16 Billion tries each, the values of Pi worked out between 3.1415728 and 3.141623 with an average of 3.14159295. The actual value of Pi is 3.14159265. The Monte Carlo result is not bad!

Conclusion

By distributing a Monte Carlo simulation across multiple servers and multiple threads using web services, performance can be greatly improved. Because web services are basically written like traditional functions, they can be easily parallelized without hand-coding a multi-threaded application, custom writing a message passing interface or using other high performance computing management software. Web services provide a relatively simple and straightforward method of distributing parallel problems across multiple compute platforms.

History

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Matt Stander earned an MBA with a concentration in Finance from New York University and a Bachelor of Science in Electrical Engineering from the University of Michigan. When he is not developing next-generation computer systems, he spends much of his time developing trading strategies and financial algorithms.

The two servers each had one Dual-Core 2.8GHz Xeon processor that supported Hyper-Threading. Consequently, there were 4 real cores (2 per server); however, they appeared like 8 (4 per server) due to the Hyper-Threading.

The simulation was running eight simultaneous threads. The load balancers did a good job distributing the eight threads evenly so that four threads ran on each server. While running the simulations, the servers reported full CPU utilization.

This article is interesting to me because I have implemented our way to do parallel computation. See the article "Network multiple computers/processors for scientific parallel computing" at the site http://www.udaparts.com/document/articles/snpisec.htm.

I believe that parallel computation will become very important in the near future.