But first, following is the setup for the various demos to come. I basically create one table called BIG_BOWIE that’s about 1GB in size and then simply create another table called DWH_BOWIE where the contents of this are re-insert into itself a few times to get to about a 60GB table. The various columns have differing distinct values and distributions of data.

I used an X2-2 1/2 rack as my toy and yes, once people saw the table names instantly knew who created them :)

We see that execution times reduce significantly, down to just 4 secs. So we manage to read via a FTS a 60GB table in under 4 seconds, not bad at all.

A hint that something different might have occurred here is the appearance of the “storage” predicate listing. This basically tells us that a smart scan “might” have occurred. Note though that the “reported” physical reads and consistent gets as reported by the database is basically the same as the previous run. More on this in later posts.

The reason for this improvement is due in large part to the use of Exadata Storage Indexes. As discussed previously, a Storage Index can potentially automatically avoid having to read significant portions of data by determining areas of storage that can’t possibly contain data of interest. This is a classic example of Storage Indexes in operation, putting into practice the notion that the quickest way to do something is to avoid doing as much work as possible.

But how to tell whether a Storage Index really did kick in and how much work did it actually save ?

The V$SQL view now has a number of additional columns that provides useful information:

There are also numerous new statistics which begin with ‘cell ‘ that I generally capture before and after a particular operation to see the storage related workloads. Two statistics I find particularly useful are:

The cell physical IO bytes saved by storage index statistic denotes how much storage has not had to be read due to Storage Indexes. So this tells us just how useful Storage Indexes have been in reducing physical IO operations. The cell physical IO interconnect bytes returned by smart scan statistic denotes how much data has actually been returned to the database servers as a result of a smart scan.

As the above numbers highlight (note this last query had been the only activity within the session), the Storage Indexes were highly effective and were able to physically skip reading the vast majority of the table (59,414MB) during the Full Table Scan operation and that only a relatively small amount data (26MB) had to be returned back to the database servers.

By not having to read most of the data, the resultant Full Table Scan on this relatively large table was completed not in minutes as previously, but in a matter of a few seconds.

Share this:

Like this:

Related

Thanks a lot for sharing this with us.
In an attempt to contribute to your post, I will place some considerations/questions here in order to find out whether you agree or not with them.

Would you agree that without performing previous queries against the dwh_bowie table, there would not be any benefit added by the storage indexes as they would not be available at all?

Also, a friend of mine noticed that when the query is executed for the second time (after setting cell_offload_processing=true), the buffer cache is not flushed. Some of the blocks could be in DRAM. Do you agree?

Previous queries with predicates on the album_id and artist_id columns need to have been run in order for the storage indexes to have been created (or partitally created). So yes there is a “warming up” process that needs to take place for storage indexes to be created. I’ll discuss this in my next post.

No, I don’t agree with your second comment that blocks are necessarily in DRAM. This is a “BIG” table and Oracle knows that storing a BIG table in DRAM is likely a waste of memory so Oracle protects the buffer cache. One way it now does this is to perform a “direct-read” when performing large FTS by moving data not into the SGA but into the session PGA so as not to flush out useful shareable data with such large scans.

If you re-run the first query again and again without the smart scan, you will notice that the physical I/Os doesn’t decrease and the same data is physically re-read again and again because the 60GB of table data is not being stuffed into the buffer cache (not that it’s big enough to store the whole table anyways and the early table blocks would have been flushed out regardless).

Note that the smart scan is likewise performing a direct-read and so also by-passing the buffer cache but it’s the storage indexes and avoiding having to physically read vast amounts of the data that is making most of the difference here.

Awesome yet simple explanation of Storage indexes. I had read these posts earlier and I can appreciate these posts even more now that I have started working on Exadata.
I was wondering why is the reported” physical reads and consistent gets as reported by the database with Storage index same as the run without Storage index. Should we not expect reduction in physical reads since with Storage index we see we have skipped reading good portion of the table blocks from the storage (59,414MB savings).

The reason for this is that the database has no idea that storage indexes have potentially kicked in and skipped a portion of the table blocks at the storage layer. So the database reports the same reads as if all the table was read entirely off disk.