Big Data: ½ PB in #BWonHANA

We have seen how bringing as much data in HANA, SAP’s in-memory platform, as is needed for real-time decision support from any source is revolutionizing interactive analysis on live-data. HANA scales linearly across hundreds of nodes to analyze as much data as needed within a window of opportunity. Hasso’s demo of 4 billion records of retail data (start approx. at 1:01:30) at SAPPHIRE and Vishal’s previous announcement on 1 PB of raw data in main memory already show the scalability of HANA.

However, for information lifecycle reasons (e.g. archives, historical data, clickstream logs for years of web data or detailed machine logs, corporate data retention policies such as retaining all data for 7+ years for legal purposes) customers often use cold storage strategies and the challenge has been to smoothly and unrecognizably (to the outside) integrate such a cold storage area thereby allowing all of HANA’s functions to work on that data too if required.

In Amsterdam’s SAP Teched keynote, a dashboard demo was shown (start approx. at 41:00) that worked on top of a BW system with HANA as the underlying DB thereby leveraging a new, innovative feature inside HANA, namely extended tables. The latter are tables that logically sit in HANA and can be used as if they were normal HANA tables. However, they physically sit in a Sybase IQ server that is closely tied to that HANA system. This allows to provide an area for “cold data” – i.e. not frequently used but important data, e.g. in the corporate memory of an EDW – at an attractive price point at the expense of slightly decreased performance. The BW-HANA-IQ system holds ½ PB of data in total. This blog describes a little bit the background of that demo and the exposed feature.

Purpose: Cold Data Areas

Data warehouses typically have areas of cold data:

The acquisition or landing area receives the data as it arrives from the source systems. It is accessed once to be refined and harmonized but then “waits” there to be deleted or to be archived. The latter takes place only after some days, weeks or months, i.e. whenever it is guaranteed that the data has been successfully incorporated (harmonized, refined, transformed) into higher (e.g. reporting) layers.

Similarly, a corporate memory area captures the complete history of the loaded data. It is used as a source for reconstructions without the need to access the sources again. For instance, there are internet companies that keep 10-15 years of clickstream history in their corporate memory.

Data sitting in such cold areas – in real-world scenarios typically 40-60% of data volume of a data warehouse fall into this category – do not need to occupy main memory or other resources in HANA. It makes sense to provide an area within HANA and w/o any functional restrictions that match the usage profile of that type of data. This is what is referred to as extended storage. Technically, this is done by leveraging infrastructure of Sybase IQ; to the user, this is not visible.

Extended Tables

So what is an extended table? In simple words: it is a table definition sitting in the HANA catalog but actually pointing to a table in a connected Sybase IQ server. The latter acts as an extended storage to HANA. An extended table is similar to a virtual table but there is more to it as it is more tightely integrated into HANA than just a virtual table, e.g.

optimized data transfer between HANA and IQ – e.g. type conversions

data processing is pushed to IQ

monitoring in HANA Studio

joint backup & recovery across HANA and IQ – i.e. as if it were one homogenous DB instance

The Demo Scenario

The demo runs on a BW-on-HANA system with an IQ system connected as extended storage. The IQ system holds ½ PB of raw data (≈ CSV file data). In the demo, a simple dashboard was shown that was built with Design Studio. The dashboard runs unchanged on both, an iPad and a desktop browser. It uses a BW query that sits on top of a BW composite provider. The latter comprises 167 write-optimized DSOs, one for each fiscal period between January 2000 and November 2013. All write-optimized DSOs have been created with the extended table property in BW – see figure 3 – but the one for November 2013. The latter “lives” in HANA, i.e. in in-memory storage. Each write-optimized DSO holds approx. 2 billion rows, translating into approx. 320 billion rows in total for that composite provider.

Fig. 3: The extended table property for a write-optimized DSO in BW.

The dashboard can be seen in figures 4 and 5. The initial access (fig. 4) reads data only from the write-optimized DSO that sits completely in-memory (November 2013). There are bars that indicate how much data has been selected in each storage (HANA and IQ); in the first access, it is approx. 2.8 million rows in HANA and 0 rows in IQ. The second drill-down (fig. 5) incorporates the data from Nov 2012 to Oct 2013 and, thus, accesses the write-optimized DSO in IQ as well: approx. 288 million rows are selected from there. While the first drill-down takes less than 1 second, the second step takes about 9 seconds.

Fig. 4: Result of drill step 1 in the demo dashboard.

Fig. 5: Result of drill step 2 in the demo dashboard.

Availability

The extended table feature is technically available with the following product versions:

Conclusions

So, what are the benefits? Fundamentally, this feature allows BW-on-HANA to handle PB-scale big data volumes at an attractive price point. This is specifically important in order to cater for cold data areas – e.g. of an EDW like staging (acquisition) or corporate memories.

You bring up several good points – first, yes IQ is a very cool product!!!

I have worked with IQ for several years now and am originally from Sybase. The great thing is that by utilizing BW on HANA with IQ and HANA Extended Tables, SAP is furthering their commitment to the awesome IQ product line.

To address the concern regarding extended storage DSOs in IQ and the limitation on length for tech names for BW-objects. In the initial rollout, HANA extended tables in BW will be used for individual PSAtables and write-optimized DataStore object tables. BW naming conventions are in effect for this use case, but the use is for a relatively small set of objects. The extended table feature is projected to be available for native HANA in an upcoming HANA SP release, and then extended tables will follow the naming conventions that SAP HANA has for HANA tables.

Cost concerns….you will not be doubling your costs by using SAP HANA and SAP IQ together.

It is true that HANA and IQ are two different RDBMS, but, SAP IQ is included as part of HANA licensing for BW.

IQ will be handling a large amount of cool/warm data that you would normally need to factor into the sizing of your HANA hardware. IQ therefore brings a significant cost savings, since it can provide disk-backed technology to offload and manage data that does not require the extreme low latency of HANA in-memory.

Labor – it is all managed through the BW layer which is transparent to the user and therefore does not require any additional training or special skills.

I hope this alleviates your concerns and will further confirm SAP’s commitment to making data warehousing “simpler” for our customers!

However, when you create a extended table DSO, how do we then connect this extended table to one in Sybase IQ? When we connect to a HADOOP file distribution, through HIVE, we create a virtual table onto of a HIVE table that is part of the remote source. How does the dots connect in the case of an extended table in HANA and a table stored in IQ?

long story short: ES is in a pilot stage for the BW use case only. See OSS note 1983178 –http://service.sap.com/sap/support/notes/1983178. For native HANA, it is not supported yet. It is planned for HANA SP9. Have a look at the document “Big Data Management in HANA BW.pdf” provided in that note.

I’m a bit confused about Sybase IQ and HANA. If a company purchases a 500 Gb HANA appliance, it comes with 5 Tb disk. Is this Sybase IQ and does the extended table reside here? Or do I need to stand up another DB server with Sybase IQ?

For the pilot phase, a separate IQ instance is necessary but for GA there will be a service – running on the HANA instance – that handles the extended storage. So, no separate IQ instance required! Some more details can be found in the docs attached to OSS note 1983178 – https://css.wdf.sap.corp/sap/support/notes/1983178.

Thanks for deciphering , ‘Extended tables’ concept to us.
The question I am pondering on is – does that mean we would be duplicating table/data; one in ‘Extended table(IQ) and other in SSD disk’?? If that the case then thsi contradicts SAPs in memory statement saying that ‘Store once and access from anywhere’!