An agile and relevant BI foundation built on SAP HANA

If you are active in the SAP ecosphere and you have somehow managed to never see or hear anything about SAP HANA, I can only conclude the following. You obviously live under a rock in the deepest darkest forest on a planet in a distant galaxy where SAP has yet to establish a sales territory. All joking aside, it is very unlikely that SAP HANA has not been at the forefront of most SAP related discussion in recent years. For those that know about SAP HANA, I would bet that your first thought is always related to one word. That word is “fast”. SAP HANA is an in-memory, columnar store and massively parallel analytical data processing engine. For the average business intelligence (BI) consumer this conglomeration of adjectives and technical terms mean one thing; it’s fast. If we accept that SAP HANA is fast, we have to ask ourselves what do I gain with speed? We then have to ask is speed alone a primary reason to purchase SAP HANA?

What do I gain with speed?

Below I will list a few of the more obvious reasons that speed maters. There are other reasons not listed but this will help start the thinking process.

Productivity

It is very easy to argue that we are more productive when we spend less time waiting for software to respond. In the golden oldie days of BI, we would often endure a query that required 30 + minutes to execute. That was 30 minutes of time we had to do something else. I like to call these queries the “coffee break queries”. This is because we often use this time to go get coffee. If we can reduce these queries to seconds with SAP HANA, users will be more likely to stay engaged and remain productive.

Depth

Speaking of 30 minute queries, can you imagine how unwilling a consumer would be to explore their data if every perspective change required a 30 minute wait time? My experience is that users quickly loose interest in exploring their data when the software is too slow to respond. If we could reduce these queries to seconds, users will be more likely to not only be productive but to also dig a little deeper into their data. Think of SAP BusinessObjects Explorer running on SAP HANA. Users can take a billion row dataset and explore it from multiple perspectives. Because of speed, we can now gain better depth into our data.

Data Loading

When we think of speed, we often focus on the consumer’s experience. However, daily IT processes can also be accelerated with SAP HANA due to the increased data loading capabilities. If your legacy data load process required 14+ hours, most SAP HANA solutions will likely reduce that by more than 50%. This gives IT departments more time to recover from data load failures or to extend the scope of the data set. It also provides the data consumer with a greater chance that their data will be available the next day.

Competitive Edge

As a general rule, if we can outthink and respond faster to a changing economic climate, we often gain a competitive edge over our competition. In some ways, SAP HANA can helps us do that. SAP HANA cannot think for us but it can help us discover trends, identify changes, understand our successes, identify our failures and accelerate other areas of our business. Most organizations can already do all of these things effectively with their current BI environments. However, imagine how much better it will be when it happens faster with larger and more complex datasets.

Customer satisfaction

Another competitive advantage for organizations is their ability to service their customers. Speed can help an organization better understand their customer by instantly analyzing past history and predicting future needs. This can be done quickly and on the fly when we have speed. With that advantage we can start embedding sophisticated analytics in our frontend applications. Take for example a call center. If we enter the customer’s details into a system, we can then have a series of analytics pop-up that help us understand the customer relationship to our organization. We can also suggest products and services they might be interested in based on these analytics. This is all achievable because of speed.

Is speed alone the main reason for purchasing SAP HANA?

Query and processing speeds alone are not the only reasons to purchase SAP HANA. There is more to SAP HANA than just speed. Speed allows an organization to achieve technical wonders but behind the speed is a process. There must be a process for obtaining data that is consistent, repeatable and reliable. For years we have known this as the Enterprise Information Management (EIM) process. Using processes and tools, organizations can obtain data, model it into usable structures and then load it into a database for querying. The process also helps other initiatives such as data governance, data quality and a “single version of the truth”.

The down side to the processes is that obtaining new data is often seen as slow and cumbersome by the data consumers. They often have to wait weeks or months before data is available to them. This happens for a variety of reasons. IT resources often have to first find the data, develop a processes to capture it on a recurring basis and then deal with any gaps or inconsistencies. That is not necessarily a problem that can be overcome with any level of speed or technology. The hope is that these complexities subside after a strong data governance program is instituted. One where the organization begins to manage data as an asset as opposed to a byproduct. In short, only processes and management can fix part of the EIM problem.

However, there is another side to EIM that has evolved around the needs of the data consumer. There is more to data than just storing it in a Kimball / Inman star schema of tables. Users have to be able to construct or access queries that can answer relevant business questions. If we solve these problem with the EIM process alone, we often end up with extra aggregate tables, custom fact tables or a variety of different tables formulated to help query processing. Unfortunately, these extra tables add more and more time to the recurring data update process. As the processing time and complexity of managing these processes increases, the data consumer becomes more and more impatient. With the traditional RDBMS, these extra tables are also sometimes required because of the inefficiencies of the row store and spinning disk. With SAP HANA, we can begin rethinking this strategy on all fronts. SAP HANA’s speed helps us reshape this strategy but there is more to this story than speed.

Let’s not forget the SAP HANA also has something special built-in to its data platform. SAP HANA has multidimensional models built directly into the database. By name they are called attribute views, analytic views and calculation views. In a generic since, they act as a semantic layer. They are often called information views or information models. They act as a layer that exists between the database tables and the data consumers. They provide easy, consistent and controlled access to the data.

In a traditional BI landscape, this semantic layer is typically separated from the RDBMS. However, with SAP HANA it is directly integrated. In some BI products, this layer acts as a separate data store. Take for example the traditional OLAP cube. OLAP cubes, using MOLAP storage, not only store the metadata but they also store the actual data. They act like a supercharged BI database but they require extra time to load. Then there are technologies such as the SAP BusinessObjects universe. It takes a ROLAP approach where data is kept in the database and only metadata is stored in the semantic layer. This cuts down on the data load times but the semantic layer interface is typically proprietary. This means its not universally possible for most BI tools to access the semantic layer. In most cases the same is true of the OLAP cube because not every BI tool has an interface to its data. The MOLAP and ROLAP methodologies both have multiple benefits which are beyond the scope of this posting. However, one thing is clear with the legacy semantic layer. It has traditionally been separated from the RDBMS and it often only works with a limited set of BI tools.

SAP HANA is different, though. To understand how it is different lets answer a few questions. One, how does SAP HANA help the overall EIM process? Two, how do we leverage the SAP HANA multidimensional information models to aid consumers?

SAP HANA helps the overall EIM process by eliminating the need to create special use tables and pre-aggregate tables. With SAP HANA, these tables can be converted into logical information views and accessed with SQL. By eliminating these persistent tables, we are reducing the time required to execute the recurring data update process. These same information views also help IT to make changes faster. By reducing the number of physical tables in the model, IT developers are able to make changes without the need to also update data found in downstream tables. The basic idea is that logical information views are more agile than maintaining the equivalent physical table.

If we assume that agility is an important benefit of SAP HANA, we must also consider relevancy. Getting content to the consumer quickly is part of the battle. We also have to make sure that the content is relevant. This means that we need to make sure that the content helps the consumer answer relevant questions. This is also an area where the information view helps the consumer. Developers can create custom logical models that contain relevant data attributes and measures. Because they are embedded into the RDBMS and centrally located, these views can also be accessed by any tool that supports the SAP HANA ODBC, JDBC or MDX drivers. This helps an organization to maintain a single version of the truth while simultaneously providing an agile and relevant foundation for the consumption of data.

The above figure helps to illustrate this benefit. Starting at the bottom we wrap the entire process into the Data Quality and Data Governance umbrella. This means that these processes govern everything we do in SAP HANA. We then focus on how the data gets into SAP HANA tables. This can be a daily batch Extract Translate Load (ETL) processes with SAP Data Services or a daily delta load using the BW extractors into a BW on HANA instance.

We also have to look at the Extract Load Translate (ELT) processes. This includes instances were SAP Landscape Transform (SLT), SAP Replication Services or SAP Event Stream Processor (ESP) is used to provision data, in real-time, into SAP HANA. As an added bonus we can also add BI solutions built into SAP Business Suite on HANA in this area.

The provisioning solution that is used at this layer all depends on the data consumer’s requirements and the organization’s needs. However, the goal at the data provisioning layer is to minimize the amount of data duplication and eliminate the need for subsequent processing into additional tables. Data should be stored at its lowest level of granularity in either a normalized or denormalized fashion. Ideally it is only stored in a single location that can be reused in any information model. For example, this means we can focus on developing the base FACT and DIMENSION tables, DSOs or replicating normalized ECC tables. Anything that is required above this layer should be logical. This is not to say that we can facilitate all needs with logical models, but it should be the preferred methodology to facilitate an agile foundation.

Above the physical table layer we also have what I am calling the “Business View” layer. This is the layer where we achieve two goals. One, we try to do the bulk of specialized transformations and calculations at this layer. The SAP HANA information views are logical and do not require the movement of data. This makes them agile when changes are required. Second, we try to construct views that facilitate the needs of the data consumer. By maintaining relevant views, data consumers will be satisfied productive decision makers.

At the top of the diagram we have the SAP BusinessObjects BI tools. Ideally we would use SAP BusinessObjects but other 3rd party tools can also be considered. Regardless of the type of tool, SAP HANA information views should be the primary point of access. This is because they should contain all of the relevant transformations, security and calculations. Again, they are centralized which helps foster a single version of the truth throughout the organization.

In summary, beyond speed there are two other fundamental benefits of SAP HANA. Agility and Relevancy are also benefits of SAP HANA. Regardless of the SAP HANA solution we choose, agility and relevancy can be leveraged. They might be implemented in slightly different ways but the goal should remain the same. As an added benefit we can also expect great performance (in other words “Speed” helps too).