R Is Still Hot – and Getting Hotter

When I wrote a white paper titled “R Is Hot” about four years ago, my goal was to introduce the R programming language to a larger audience of statistical analysts and data scientists. As it turned out, the timing couldn’t have been bet...

Revolution R Plus

Revolution R Plus is the enhanced and supported distribution of the world's most widely used statistical data analysis software, open source R. A complete platform for data science and building data driven applications, Revolution R Plu...

Free Course: Introduction to Revolution R...

Revolution R Enterprise allows R users to process, visualize, and model terabyte- class data sets at a fraction of the time of legacy products without requiring expensive or specialized hardware. This is an introductory course for accom...

Revolution R Enterprise: Faster Than SAS

In analytics, speed matters. How much? We asked the director of analytics from a leading U.S. marketing services provider, a Revolution Analytics customer. Her team supports more than 1,000 predictive models currently in production; her...

The Revolution Analytics perspective on Big Data

When it comes to Big Data, it’s “one thing to be able to query it, but it’s another thing to be able to actually ask that data meaningful questions,” according to Revolution Analytics head of marketing and community David Smith. The exe...

Using Revolution R allows risk analysts to improve processing performance through parallelization. Utilizing Revolution R with doRSR and doSMP reduces the time to results and automates management of computer resources. Revolution Analytics’ parallelization routines are scalable to the resources available.

Calculating operational risk is a relatively new discipline and while there are guidelines provided by international agreements and federal statute a great deal of the decision making about how to identify and measure datasets is left intentionally vague. Most risk analysts have settled upon a Loss Distribution Approach (LDA) incorporating four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business Environment/Internal Control Factor data. Using these elements analysts must determine whether a single line of business is distinct from the others and requires its own risk exposure estimate for use in the LDA. The biggest challenges here for analysts are the relative youth of operational risk practices and relative paucity of data available for measurement.

Challenge

Challenge

Once the heterogeneous datasets, now called “units of measure," have been identified, a Poisson distribution is then used to model the frequency of operational loss events. Understanding the severity of loss events is a much more difficult task and there are several ways that it can be done. The method used is dictated by the nature of the dataset that is being fitted. This is the area where the youth of this practice is especially troublesome. The rules require that organizations estimate a 1-in-1,000 year event based on less than 15 years of operational loss data. In many cases organizations have units of measure that have a small number of observations which can lead to unidentified heterogeneity and/or heavily skewed loss distributions.

With loss frequency modeling and loss severity fitting complete for each unit of measure, the next step is to draw a set of random frequency observations and severities for use in Monte Carlo simulations. Each simulation provides a single point on the aggregate loss distribution.

Many simulations containing millions of iterations must be run to observe a sufficient number of losses to reasonably assess what a 1 in 1,000 year event might look like. According to David Humke, "Doing a simulation on this type of data really could take days if you’re just using base R."

Solution

Solution

Revolution Analytics and Northern Trust Come Together

"Northern Trust went to Revolution Analytics and asked if we could explore the opportunity to parallelize our Monte Carlo simulations." David Humke

In addition to the time lost waiting for results, there is the management headache of trying to run parallelized simulations across different hardware with different operating systems. This can be a considerable resource drain on an analyst group.

Knowing there had to be a better way to parallelize their Monte Carlo simulations, Northern Trust and Revolution Analytics set up a series of tests to benchmark performance of the doRSR and doSMP parallelization packages across different hardware packages. They put together an environment that had both 32-bit and 64-bit operating systems. They also looked at using a single node with multiple processors and multiple nodes with multiple processors including a laptop with 4 cores, a server with 8 cores and a 3-node high performing cluster on Amazon with 8 cores a piece.

The metrics they used to evaluate the software and hardware in performing the simulations were: 1) Elapsed Time by Step and 2) Memory Usage.

Results

Results

Improved Performance and Easily Scalable

Revolution Analytics’ parallelization can be easily scaled up from laptop/server to the cluster using Revolution Analytics’ distributed computing capabilities. As suspected parallelization greatly improved simulation performance. Performance improves with the number of cores and easily scales with the available resources within the cluster.

"Overall, the take away that we found from this work with Revolution Analytics is that they do have a good product offering for parallelization." David Humke

Parallelization will allow analysts to spend less time waiting for results and more time analyzing the results. For risk analysts working with products and services that are active in the marketplace and need to meet regulatory compliance accurately and in time; time to result is a critical factor in their success. Effective parallelization routines are just as important in providing effective resource management and ensuring that your group is taking advantage of all the computing resources available. The benchmark testing with Northern Trust demonstrated that the use of Revolution Analytics’ parallelization packages doRSR and doSMP are much more efficient in managing a diverse hardware environment than attempting to do it manually and that the packages are effective at scaling to use all computing resources within that environment.

About Company

About Company

About Revolution Analytics

Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. The company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R Enterprise product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Used by over two million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offering free licenses of Revolution R Enterprise to everyone in academia.

About Revolution Analytics

Revolution Analytics was founded in 2007 to foster the R community, as well as support the growing needs of commercial users. Our name derives from combining the letter "R" with the word "evolution." It speaks to the ongoing development of the R language from an open-source academic research tool into commercial applications for industrial use.

Though our Revolution R products, we aim to make the power of predictive analytics accessible to every type of user & budget. We provide free and premium software and services that bring high-performance, productivity and ease-of-use to R – enabling statisticians and scientists to derive greater meaning from large sets of critical data in record time.

We also offer our full-featured production-grade software to the academic community for FREE, in order to support the continued spread of R's popularity to the next generation of analysts.