Constructing a scalable risk analysis solution is a fascinating architectural challenge. If you come from Financial Services you are sure to appreciate that. But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains.

Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. Seeing the vast interest in the webinar, I would like to share the highlights with you here.

From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. These trends are aggravated by global financial regulatory reforms set following the late-2000s financial crisis, which mandate reducing exposure to risk by shortening risk settlement cycles.

Traditional architectures were based on overnight batch processing using compute grids and relational databases. But can the traditional architecture meet the new demand for near-real-time processing? Experience over many customers show that traditional architecture fails to meet the challenge: data become stale quickly; relational databases become bottlenecks due to the disk and network constraints; predefined queries are too rigid; remote execution of pre-processing/post-processing on the data (such as formatting data inputs, or performing calculation result aggratagion) is too slow; implementing home-grown orchestration layer is too cumbersome and inefficient.

Constructing a massively-scalable near-realtime risk analysis solution requires a new architectural approach. The correct way to look at the problem is that of a realtime analytics on big data. it's important to realize that the intraday data changes very frequently but is of limited volume, whereas historical data changes much less frequently but is of much higher volumes. Good architecture should accomodate for these inherent differences, employing a multi-tiered architecture with in-memory data grid for intraday data and NoSQL database for historical data, and a processing layer to unify the two datastores, making them look as one for querying purposes.

Another challenge in risk calculations is streaming the results back to the clients as they arrive, and also to dispatch ticks and other events, which arrive at high rate, back to UI. I find Event Driven Architecture (EDA) is highly suitable for handling these use cases. Supporting asynchronous data fetch and having the ability to treat data mutation as an event that can be dispatched are some of the characteristics that I'd be looking for when implementing such architectures.

The risk calculations are usually accompnied by ETL pre-processing of the data for aligning data format to industry standards, and post-processing of calculation results for result aggregation. This pre-processing and post-processing logic should happen very efficiently given the high rate of data streamed. Remote invocation of this logic on the data is too cumbersome. Having the ability to execute this logic co-located with the data, preferably on the very same VM, is what I'm looking for in such architectures.

For such challenges I find Elastic Application Platforms to be a suitable tool, providing both the in-memory data grid and the ability to execute business logic and messaging co-located with the data, having it all scalable via sharding the data and HA via redundant synchronous replicas. For my implementation I used GigaSpaces XAP, which in addition to all of that provided me easy integration with the back-end NoSQL database of the historical data, which made it easy to host the end-to-end big-data solution as one cohesive solution.

This is of course just the basic architecture. There are many challenges in how to intersect the compute grid with the data grid, how to co-locate the orchestration logic with the data and scale it in a highly-available manner, how to extend the architecture to multi-site deployment, how to onboard such a system to the cloud, preferably keeping the solution vendor-agnostic, and so forth.

Reader Comments (1)

Hi Todd,

Great post, as usual, and another nice example of polyglot persistence. In capital markets alone there are many examples where high velocity, temporal data must be available for real-time analytics as well as to combine with historical data for deeper exploration. Real-time trade compliance is another example, where “fast” and “deep” data are needed to drive dashboards and alerting systems.

I’d suggest a couple of other requirements for your in-memory data grid spec – database durability and streaming export. Durability, of course, requires the database to be recoverable in the face of a full system crash. Streaming export requires a reliable, high performance interface between your in-memory data grid to your historical database. A key feature of that interface is the ability to handle impedance mismatches between real-time and historical datastores (i.e., one datastore may be able to write faster than the other can read).

The real-time data aspects of your application are an excellent match for VoltDB’s partitioned, in-memory architecture and relational model. VoltDB can scale to millions of write operations per second; it co-locates data-intensive logic with the data; and it can do some fairly extensive (ETL-like) data enrichment operations in real-time using materialized views. VoltDB hits your scaling and HA specs on the money, and addresses pretty much everything in your last paragraph.

I’m in no way suggesting that GigaSpaces was the wrong way to go for your application. GigaSpaces is a fine choice, as it addresses your EDA needs along with your high velocity data needs. I’m just offering VoltDB as another technology that’s well-suited to the kinds of high velocity data problems raised in your post.