Platform embiggens Symphony financial grids

Grid computing software maker Platform Computing has goosed its low-latency Symphony financial grid as well as some add-ons to its Load Sharing Facility (LSF) gridding wares for more traditional HPC parallel cluster grids.

Platform Symphony 5.1 may not have been giving a new release number, but it includes some important features that could arguably justify having given it the 6 moniker. "It is not a major release, but a significant one," Ken Hertzler, vice president of product management at the company, tells El Reg.

The LSF product is arguably the first commercially supported, enterprise-grade grid computing job scheduler that companies and academic research institutions – which do not have the resources of the big government labs around the world – felt comfortable deploying to schedule jobs on their parallel clusters. While LSF is batch-oriented product, great at job scheduling across clusters with very large nodes and juggling lots of small jobs on fairly loosely coupled compute grids.

LSF was not built for speed, which is fine for a workload such as credit scoring, which you can do overnight, but will not do at all for jobs such as Monte Carlo simulations, risk analytics, and pricing algorithms for financial instruments. And so, nine years ago, Platform started from scratch and created Symphony in Java based on ideas from LSF, with the idea that it would not have to scale across hundreds or thousands of jobs, but do a smaller number at much lower latency.

Moreover, in financial services, because the workloads are unpredictable, another thing that customers want is to be able to scale up their grids quickly and then scale them back fast to save power and cooling costs. This is less of a factor in the academic and government HPC grids that use LSF to manage their batch jobs. The assumption here is that these HPC grids always have a lot of work lined up and they want to keep the utilization of the machinery as high as possible because server nodes are not being turned off. The Symphony product has been therefore been designed to "flex up and flex down" quickly.

"Even though banks are going great guns now, their IT departments are still under great pressure to cut costs and squeeze resources," says Hertzler.

Symphony is also being tweaked to support the APIs used by the world's favorite big data chewer, MapReduce. Platform has just announced a variant of Symphony, called Platform Workload Manager for MapReduce, that can run Hadoop MapReduce applications on top of the Symphony grid. This tool does not use any of the MapReduce code but rather supports the MapReduce APIs implemented in Hadoop inside of Symphony.

Platform does not give out specifics of its sales and installed base for LSF, Symphony and Infrastructure Sharing Facility (ISF) cloudy infrastructure control freak, which is another derivative of LSF that is designed to manage virtual servers in more traditional commercial settings. What Platform does say is that it has over 2,500 customers worldwide across all of its products – with many organizations buying more than one product. Hertzler says that 12 of the top 50 banks in the world are using Symphony for a variety of financial services applications, including Monte Carlo simulations, risk analytics, and product pricing.

This is not a huge market in terms of customer base – Hertzler says there are several hundred such grids running in the world, but the number is growing fast. In the past two quarters, in fact, Platform has more than tripled its revenues in the capital markets where Symphony, not LSF or ISF, is the lead product it peddles. This is a lot more growth than Platform Computing is getting out of its traditional HPC business.

A lot of tier-one banks use the code, but now hedge funds, pension funds, and financial services companies that have ISV applications, rather than homegrown Java. C, C++, or C# code that they have tweaked to run atop the Symphony grid themselves, are now getting into the financial grid game. Over 20 key financial application ISVs have certified their applications to run on Symphony, which means these smaller shops can now do grids. Key ISVs include Algorithmics, Murex, QuIC Financial, SAS, Sungard, and Sybase.

With Symphony 5.1, for typical financial transactions, the average latency for transactions is on the order of sub-millisecond or lower response time. This is a little bit better than Symphony 5.0, says Hertzler, But the big improvement with Symphony 5.1 is cluster scalability. The number of cores that can now be allocated to a single cluster managed by Symphony has doubled to 40,000 cores, up from 20,000 for the prior release. The number of cores that can be allocated to a single application running on a Symphony grid has also been doubled, up to 10,000 cores. And the number of concurrent applications that can be managed by Symphony has been tripled, to 300 concurrent applications.

This may sound like a lot, but a lot of Platform customers already have 25,000 to 50,000 cores in their grids, and a tier one bank is up around 100,000 cores across a couple of grids. And they are telling Platform that they want to more than double the size of their grids in the next year or two because of increased regulations (which causes them to run risk analytics on more stuff) and a desire to run deeper analysis on the data they are gathering from markets and customers.

The Symphony 5.1 release is also being tweaked so it can dispatch work to both CPUs and GPU co-processors in hybrid grids. Hertzler says that a lot of Platform's financial services customers are investigating how they might integrate Nvidia's Tesla GPU co-processors and CUDA programming environment for the GPUs into their financial apps.

This support is not quite ready yet – it should be done by the end of May, according to Hertzler. And a little further down the road, Platform will add support for the OpenCL parallel programming environment that is used by Advanced Micro Device on its FireStream GPU co-processors. The update also offers compilation-free integration with C# applications, which will make it easier to run Windows applications atop Symphony.

Symphony 5.1 costs $250,000 for a 100-node cluster, and scales up to millions of dollars for licenses on larger machines.

In addition to the Symphony grid software update, Platform has revved its RTM Web-based management portal for its HPC cluster control freak, dubbed Platform HPC 2.1, appropriately enough. RTM is short for Report, Track, and Monitor; this tool spans the LSF job scheduler, cluster manager, and message passing interface (MPI) clustering code that comprises a traditional HPC cluster. RTM 8 adds multi-cluster monitoring support and is able to present data on cluster resource consumption by user, group, or team, not just at a cluster level as in the past release. It also has automated alerts and exception handling to automate on-the-fly rejiggering of workload scheduling on the cluster so admins don't get shaken out of bed in the middle of the night.

The HPC stack at Platform also now has the companion Analytics 8 tool, a visualization tool for the jobs that run on LSF clusters for those of us who think visually. The tool can gather up and retain information from thousands of users and millions of jobs and present the data preconfigured or customized dashboards. ®