As described previously in this post , Value at Risk (VaR) is a popular risk measure used for risk management, margin calculation, regulatory reporting, capital charges, and pre-trade decision making. VaR is also often used for hedge optimization, portfolio construction, and to optimize the tracking error of a portfolio against some benchmark.

VaR is defined as predicted worst loss over a target horizon within a given confidence interval . For example, if you hold $1000 worth of GOOG shares, one-day VaR with a 95% confidence level is $22.

VaR is used in many contexts across an organization, so it can carry different names for different use cases and thereby create various reporting needs. Here is a typical set of views:

Regardless of context, it’s important to note that VaR is not simply additive: the VaR of a portfolio containing assets A and B does not equal the sum of VaR of asset A and VaR of asset B. Hence, if you were to create the following table:

For this reason, traditional databases are of limited value when the VaR information being reported is not linearly aggregate-able. (Many important risk measures beyond VaR, such as counterparty credit risk, fall into this category.) For that reason, most risk-reporting warehouses will pre-aggregate all frequently-used dimensions and use the pre-aggregated values to report VaR. This approach works somewhat, but the available views are limited and fixed: calculating VaR along any other dimension, or for a custom set of assets, requires a new aggregation job and thus for users to wait for results (for hours or even days).

There are other limitations as well:

Due to a shallow schema with limited analytical capabilities, only standard slice-and-dice operations and simple aggregation functions can be used for reporting.

Limited support is available to run user-defined functions (UDFs) or to call out to external analytical libraries. Thus risk measures are pre-aggregated at a fixed set of dimensions.

Schema are fixed and thus new analytics and aggregations require new views and schema changes.

Why Hadoop?

In contrast, Apache Hadoop ecosystem technologies such as Apache Spark, Apache Impala (incubating), and Apache Hive—combined with serialization formats such as Apache Avro, Thrift, Protocol Buffers, Hive Serdes—can serve as the foundation of “high-definition” analytic infrastructure with these attributes:

Extensibility : embedding custom analytics for aggregation and reporting via UDFs and UDAFs allow a Domain-Specific Language (DSL) to be implemented as an extension to built-in functionality. UDFs can call external analytical libraries, as well.

Support for complex data : using persistence formats such as Avro, Thrift, Protocol Buffers, and Hive Serdes, you can model complex domain objects via their support of rich types.

With this approach, users can ask any questions or run any reports they like without the need to request custom analytics jobs, making deeper insight available on-demand.

Let’s review an example. Let’s say you want to store a complete view of a trade using Spark SQL, instead of calculating a single VaR number using some assumptions. With the following SQL, all the historic P&L for the trade are stored using Hive’s Array datatype.

Spark SQL is not limited to querying data from HDFS; it also integrates with data sources like Apache HBase, Apache Cassandra, Apache Kudu (incubating), and even relational databases. This support for polyglot persistence allows for joins across data sources; for example; positions can come from HDFS, time series from HBase or Cassandra, and business hierarchies and reference data from Oracle Database.

Conclusion

Vendors have generally focused on the cost-reduction benefits of moving workloads onto the Apache Hadoop stack. In fact, in some cases, this stack can serve as the foundation of smarter, higher-definition, more agile, and faster analytic infrastructure overall.

With this approach, you can not only free your engineering team to do other things than run reports, but also mine your data arbitrarily to uncover new opportunities for business optimization.