Shrink your Tables with SAP HANA SP08

Shrink your Tables with SAP HANA SP08

Abani Pattanayak, SAP HANA COE (Delivery)

Jako Blagoev, SAP HANA COE (AGS)

Introduction:

Yes, with HANA SP08 you can significantly reduce size of your FACT tables significantly. Depending on the the size of the primary key and cardinality of the dataset, you can get significant (up to 40%) savings in static in static memory usage.

This savings in memory is compared to HANA SP07 or earlier revisions.

So what’s the catch?

There is no catch.

The saving is based on how the primary key of the table is stored in HANA. Please check the biggest FACT table in your SP07 or SP06 HANA database, the size of the primary key will be around 30 – 40 % of the total size of the table.

With HANA SP08, we can eliminate this 30 – 40% memory taken by the primary key. So there is no negative performance impact on query performance.

Show me the Money (What’s the trick)?

You need to recreate the primary key of the table with INVERTED HASH option.

CREATECOLUMNTABLE“SAPSR3”.“MY_FACT_TABLE”(

“STORID”NVARCHAR(10),

“ORDERID”NVARCHAR(15),

“SEQ”NVARCHAR(10),

“CALMONTH”NVARCHAR(6),

“CALDAY”NVARCHAR(8),

“COUNTRY”NVARCHAR(3),

“REGION”NVARCHAR(3),

..

..

PRIMARYKEY INVERTED HASH (“STORID”,

“ORDERID”,

“SEQ”,

“CALMONTH”,

“CALDAY”,

“COUNTRY”,

“REGION”))

WITH PARAMETERS (‘PARTITION_SPEC’ = ‘HASH 8 STORID’)

;

You can use ALTER TABLE command to drop and recreate primary key of the table.

However, if you have a scale-out system or a really BIG fact table with billions of record, We’d highly recommend to create a NEW table with INVERTED HASH Primary Key and then copy the data over to the new table. Then rename the tables.

Result

The following is the result of updating the primary key in a customer project. As you see below, the saving in static memory is around 531GB over 1980GB.

So overall, there is a saving of at least 2-nodes (0.5 TB each) in a 9-node scale out system.

The best part of this exercise, there is no negative performance impact on query performance.

Note: I’d suggest you review your existing system and evaluate if you can take advantage of this feature.

I think it is probably too early to say there are NO negatives for all use cases. For our scenario, it worked out pretty well. I believe this is going to be further tested in coming months by SAP applications (BW, ERP etc.). So depending on the results, it may (not) be the default setting for certain cases.

Good point – even for traditional databases SAP BW never used a primary key / primary index on fact tables (/BIC/F*, /BIC/E*). On the other hand the INVERTED HASH approach should also help in other scenarios.

In some customers, SLT pretty much already handles the uniqueness in a table, already deciding whether the new data is an insert, update or delete and issuing the appropriate command once it gets to the HANA box. So for sidecar scenarios using SLT, given that performance is the same, wouldn’t it be reasonable to eliminate the PK altogether?

That’s certainly a possibility as long as the data-integrity can’t be maintained. (in this scenario SLT will take care of that). However, if you have to evaluate what’s the saving you are having by eliminating the primary keys. Using inverted primary key, you are getting pretty significant savings already.

Sure, the happy path of code execution won’t likely lead to duplicate entries.

But how would you even know about duplicates when there’s no primary key constraint in place?

While it’s surely not necessary to have the PK on the replication to keep the single true record consistent (that happens in the source system) it would be still a nice thing to know when the data is not consistent in itself anymore.

And SLT did not have an “auto-fix” function, last time I looked…

As for the fact tables: correct, there is no actual need to have a primary key here as we can deal with duplicates during the aggregation.

I think that might the wrong term. SLT “SHOULD” deliver data to the target table that will not violate the existing constraints of the source table – which of course the DDL should also match. That has nothing to do with actions that can happen outside that replication process, so technically SLT does not “take care” of consistency in the target table, I agree with you on that Lars. There should however, be reasonable assurance that without a PK, SLT should (there’s that word again) act as a proxy for constraint handling.

SLT doesn’t have an auto-fix function, but I believe there is some functionality I have heard of where you can do a complete table compare between source and target, maybe Tobias Koebler would have more detail on that one.

–>But how would you even know about duplicates when there’s no primary key constraint in place?<–

That’s the hard part and major risk of the trade off. The entire context of this original post was to minimize the amount of memory used for indices. My interest on behalf of a customer was a similar thing, but I was thinking even farther than the hash conversion.

To give an example, I took a standard SLT replicated table, copied it and converted the PK index to hash and then also took another copy and straight away dropped the PK altogether. Ensuring it was all loaded to memory, these are some of the results I got from a memory size perspective. For reference, this table was about 900k entries.

So similar to Abani’s results, I saw a 35% reduction in index size with the hash conversion. Taking it further and dropping the PK took this to 68% reduction, or a total size that was 3.1x smaller than the original. That would be fairly significant with larger volumes.

For the performance angle, I did a single key search (WHERE “MATNR” = ‘X’), range search (WHERE “MATNR” BETWEEN ‘X’ and ‘Y’) and inlist search (WHERE MATNR IN (‘X’, ‘Y’) to see what we got.

Surprisingly, all three methods performed more or less than same at a consistent 15ms response time. Now, given that it’s such a low response time and table is less than 1m records, it needs to be scaled up to give more accurate results. Just thought this was an initial look.

Now, I am not advocating or even SUGGESTING that PK’s be dropped, this was simply an investigation on size savings and trade off with performance. The constraint benefit that a PK brings doesn’t factor into the two above topics. The only reason we get into this is because HANA doesn’t allow you to define a PK WITHOUT these indices.

I had similar results at a customer where we had a 1TB cluster table for pricing conditions. I rebuilt the table with a hash PK index and it was 35% smaller.

Now be very careful with table access times, because HANA’s single-user access times are not a good representation of access cost. In single-user mode, you don’t know how much of the NUMA memory throughput of the appliance is being used.

What you really need to do is load test the system with many concurrent threads. This will give you an idea of the attribute vector scan cost. With range partitions and a hash PK on just one table, we increased aggregate SD throughput by 3x, despite having very similar (1ms) single user access times.

In 7.40 Support Package 5, the storage type is changed to column store for the following tables:

EDIDC

EDIDS

INDX

REPOSRC

So if your HANA is a lower version, It must a ROW Table but no Column Table.

I will only change the index type if you see a significant space use of index space, other wise I will not attempt to change to inverted hash,but it is easy to change during the migration, so my question is what indexes are defined on REPOSRC Table?

The inverted hash is one of two possible implementations for a primary key of a column store table.

Indexes are a different object and not connected to the inverted hash implementation.

It’s correct that it is generally not supported to change table storage parameters like ROW/COLUMN for SAP delivered tables, however constructing a technical reason for the error message out of it is a mistake.

SAP product management announced the inverted hash primary key implementation to be available as of SPS9, no it’s definitively nothing to put in production in an earlier release (at least not without specific support for this by SAP).

As with most features, it’s highly recommendable to understand and try out features first (on a test system that can be rebuilt at every point in time without any negative impact).

Concerning the error message itself: I’d consider this a bug and recommend to have a support incident opened for this. There’s no conceptual reason why any ABAP process should have issues with the way that a primary key constraint is implemented on SAP HANA.

Let’s be very clear here: this is a relatively new feature and there’s no recommendation by SAP HANA development to go and implement it (yet).

A much better option to save on storage space is to get rid of concatenated attributes that had been created for joins in pre-SP8 versions.

Hi – just checking whether “INVERTED HASH” is also supported for design-time tables – i.e. a table created via generating a .hdbtable file and activating? All of our tables are created using this method, and we do have some rather large primary keys we would like to test this out on, but I am unable to determine the corect syntax to use in this case, as the primary key definition using hdbtable file looks very much unlike the “create table” as referenced in the example above. Something like:

In the KB article it discusses the potential downsides (everything is trade off) of switching to Inverted Hashes – in particular hash collisions. What is the impact on the system if a table has a large number of hash collisions – additional performance impacts? Less memory savings? The KB Article points out http://service.sap.com/sap/support/notes/1969700 for SQL to detect collisions. But neither discusses what should be done when a table has a ‘large’ number.

Yes, hash collisions are a performance penalty and a rather minor additional space impact.

Since you cannot really change anything about the hash collisions, the information about the HASH collisions is really more of informative nature right now.

It might be the missing piece of explanation why a certain primary key access is slower than expected.

To the point of “what large means”: this completely depends on the hashing function used internally here. At some point hashing functions tend to create collisions – but I don’t know the actual hashing function used or the threshold for collisions to be expected.

Recommending the usage of the feature in very well known scenarios like BW is definitively a good idea in order to save space.

I know, this probably aren’t the answers you were looking for, but that’s what I can say to this.

what’s the current official situation in SAP BW? In SPS9, SPS10, SPS11 or SPSXY?Is the inverted hash going to be the default setting or merely an SAP sanctioned alternative that you have to implement manually for every datamart?