Database

olap4j: Online Analytical Processing for Java

By Luc Boudreau, July 28, 2011

Online analytical processing (OLAP) is capable of leveraging multiple processors, but it has long been hampered by the lack of standardized APIs...until now!

The advent of multicore and many-core processors on expensive desktops and servers has ushered in an era in which small companies can fairly easily perform data mining and analytical processing of large databases as part of an effort to optimize business performance.

Online analytical processing (OLAP) is capable of leveraging multiple processors, but it has long been hampered by the lack of standardized APIs, especially for Java. Not that nobody was working on them...

In fact, over the past two decades, many Java Community Process initiatives have tried to establish successful APIs, but all efforts have failed; in many cases because they relied on proprietary extensions. This article discusses a recently released API, supported by multiple vendors in the business intelligence (BI) sector, which makes it easy to leverage today's processors to glean insights into business data.

The API, known as olap4j, is the OLAP equivalent of Java database connectivity (JDBC) for relational data. Specifically, olap4j extends core classes of JDBC specifications 3 and 4 in order to bring OLAP data sources to the Java platform. Connections can be obtained by JDBC connection management facilities.

The API uses statements, which are provided over connections. The queries, formulated using the MDX language, can be sent textually to the connection. The olap4j API also includes a type system capable of representing any server-specific MDX grammar as a business object model.

The API makes heavy use of metadata, as most OLAP-related operations involve a user exploring the database and building queries interactively. Results of multi-dimensional queries are represented by a CellSet object. CellSet is to multi-dimensional data what ResultSet is to tabular data.

Getting Started

The olap4j API is distributed as four modules, which are shown in Table 1.

[Click image to view at full size]

Table 1.

For the purpose of this article, I will use both the core module and the XML/A driver. You can find those libraries at the olap4j website and in the Resources section at the end of this article.

Connecting to an OLAP Server

Connecting to an OLAP server is done in the same way you connect to a relational database. As I mentioned earlier, olap4j leverages JDBC driver management and connection facilities. A sample is shown in Listing One.

The code in Listing One will give you a connection object to a remote OLAP server using the XML/A driver. We use JDBC 4's connection unwrapping to obtain access to the underlying OLAP connection. Applications developed for Java 5 can cast the connection object directly.

Take note that the code in Listing One uses the XML/A driver's caching SPI. A few basic implementations of the cache are provided with olap4j, but integrators are encouraged to develop their own. In this case, the SOAP requests will be cached in memory and use a Least-Frequently-Used eviction policy, enforced for a maximum of 100 elements over 10 minutes.

More configuration options exist for both the XML/A driver or the cache SPI within the API documentation.

Exploring an OLAP Server

The metadata hierarchy of an OLAP server is different from what you might be accustomed to with relational databases. The information is organized according to the hierarchy illustrated in Figure 1.

[Click image to view at full size]

Figure 1: Layout of the olap4j metadata hierarchy.

The olap4j API exposes the metadata in two ways. It can be explored either through DatabaseMetaData, in the form of tabular data, or by exploring a hierarchy of business objects. Listing Two shows how this is done programmatically.

Listing Two: Exploring olap4j metadata

final OlapDatabaseMetaData meta =
oConn.getMetaData();
// We can explore the meta data in a tabular form
final ResultSet catalog = meta.getCatalogs();
// We can also explore using the object model
final Schema schema = oConn.getOlapSchema();
for (Cube cube : schema.getCubes()) {
System.out.println(cube.getName());
}

By convention, OLAP connections are bound to a database, a catalog, and a schema. Although a user can modify the binding easily by simply invoking setters or passing JDBC arguments in the connection's URL, connection implementations are expected to bind automatically to the first elements of the hierarchy encountered, should none be specified. Users of olap4j should keep this in mind when using an OLAP connection's sugar accessors, such as getOlapCatalog(). Retrieving a list of Catalogs implies a prior binding to a Database. If you have not set a specific database to use, the connection will bind itself to the first one it encounters, then return all catalogs within this database.

Parsing and Validating Queries

We are now ready to query the server for data. Let's write a first query and parse it (Listing Three). The parser will check it for correct grammar.

A SelectNode object is a representation of your query, as interpreted by your connection's specific grammar. The API does not endorse any singular MDX grammar (so as to keep the project independent of OLAP vendors). Instead, the connection itself has to expose a parser that is capable of bridging the grammar with olap4j's type system.

Once the textual query is parsed, you are left with an object representation of it, but there is no telling if this query, aside from being grammatically correct, is valid. To determine if the query is executable and will return data, the query validator (Listing Four) is used.

The validator is a helper object used to validate queries further. It looks up all members and hierarchy elements referenced in your query and makes sure that they exist on the target server. In our example, indeed, there is no such thing as cake time [alas! –Ed.].

Validating the query might seem like overhead, as executing the query straight away will indeed tell you whether the query is valid or not. But when a query is validated, only the metadata is evaluated against the server, and no data is computed or returned. This means that the server won't have to compute all the tuples of members and associated data. Validating a query is a very fast operation compared with executing it. If you are developing a GUI where a user sits in front of the screen until an answer is given, you should keep that in mind.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!