In-Memory Technology Unleashes Real-Time Analytics

Over the past 10 years, the use of in-memory data grids (IMDGs) has rapidly gained popularity. Driven by the need to handle fast-growing application workloads, application architects have dramatically scaled performance by employing multiple servers and storing rapidly changing data in memory within an IMDG. Memory-based storage plus shared access and seamless elasticity make an IMDG an ideal repository for operational data such reservations, shopping carts, financial portfolios, workflow state, user credentials, smart grid state, and much more. The grid ensures fast data access and easily can be expanded or contracted as workloads change.

While application scalability has been the main incentive for IMDG adoption to date, competitive pressures and the desire to optimize business results have created a growing need to analyze operational data even while it is rapidly changing. Business managers have realized that the ability to continuously analyze live data in their online systems can unlock the potential to extract important patterns and trends that otherwise cannot be spotted. For example, with real-time analysis, financial trading applications can rapidly respond to fluctuating market conditions as market data flows through trading systems. Likewise, smart grid monitoring systems can analyze telemetry from many sources to anticipate and respond to unexpected changes in a power grid and reservations systems can quickly reroute travelers affected by cancellations. In all of these examples, the data sets hold live, fast-changing data in active, real-time business operations.

In many ways, the growing need for real-time analysis has been catalyzed by the vibrant activity around “big data” analytics. The promise of breakthrough insights from analyzing petabyte-sized data sets has refocused the efforts of BI analysts wanting to get a leg up on their competitors, and that enthusiasm rapidly is spreading to managers of smaller, operational data sets. Parallel computing techniques such as “map/reduce,” popularized by the open source Hadoop platform, have opened the door to examining very large data sets with dramatically reduced analysis times. However, Hadoop’s reliance on file-based storage and batch processing techniques is not well suited for real-time applications and has restricted its utility to large, static data sets, with processing improvements measured in hours or even days. However, the emerging trend of integrating map/reduce analysis with in-memory data storage is now opening the door to analyzing operational data sets in real time.

At its core, the ability of map/reduce to quickly analyze a complete data set offers important benefits for real-time analysis not previously available with techniques such as complex event processing, which focuses on examining an incoming data stream. When combined with in-memory data storage to avoid disk and network overheads, its fast, parallel processing time precisely fits the needs of live systems handling fast-changing data. Having identified this synergy, companies like ScaleOut Software recently have integrated map/reduce capabilities directly into software-based IMDG products. This new technology enables fast map/reduce analysis of a grid’s in-memory data with minimum data movement. Because data motion is the enemy of performance, being able to execute analytics in-place makes processing extremely fast. By using an IMDG, organizations can now store and analyze fast changing, operational data in-memory within a single environment, which offers the twin benefits of scalable application performance and built-in, real-time analytics.

To demonstrate the IMDG’s ability to run continuous map/reduce analytics on fast-changing data, ScaleOut Software deployed its IMDG on a 75-server compute cluster within the Amazon Web Services EC2 cloud environment to perform real-time analysis within a financial services application. This application held a terabyte of partial price histories for a large pool of stock symbols (and another terabyte of replicated data for high availability). While the stock histories were continuously updated to simulate a market feed updating stock prices, the IMDG repeatedly executed map/reduce analytics on the data set to model an ongoing analysis of stock trading strategies during a trading session. Even with updates being performed at the rate of 1.1 gigabytes per second, the IMDG was able to complete map-reduce operations every 4.1 seconds, clearly demonstrating the viability of continuous analysis on operational data sets.

This innovative use of familiar big data techniques integrated with the scalability and in-memory speed of IMDGs paves the way for organizations to identify issues and opportunities in their business processes as they occur and make needed adjustments in real time. Doing so increases both customer satisfaction and operational efficiency, thereby substantially raising the bar for the competition.