Linux, Oracle, Hadoop.

Here’s a little known feature of Exadata – you can use a Bloom filter computed from a join column of a table to skip disk I/Os against another table it is joined to. This not the same as the Bloom filtering of the datablock contents in Exadata storage cells, but rather avoiding reading in some storage regions from the disks completely.

So, you can use storage indexes to skip I/Os against your large fact table, based on a bloom filter calculated from a small dimension table!

This is useful especially for dimensional star schemas, as your SQL statements might not have direct predicates on your large fact tables at all, all results will be determined by looking up relevant dimension records and then performing a hash join to the fact table (whether you should have some direct predicates against the fact tables, for performance reasons, is a separate topic for some other day :-)

Let me show an example using the SwingBench Order Entry schema. The first output is from Oracle 11.2.0.3 BP21 on Cellsrv 12.1.1.1.0:

Enkitec is the best consulting firm for hands on implementation, running and troubleshooting your Oracle based systems, especially the engineered systems like Exadata. We have a truly awesome group of people here; many are the best in their field (just look at the list!!!).

We all are now part of Accenture and this opens up a whole lot of new opportunities. I think this is BIG, and I will explain how I see the future (sorry, no Oracle Database internals in this post ;-)

In my opinion the single most important detail of this transaction is that both Enkitec and the folks at Accenture realize that the reason Enkitec is so awesome is that awesome techies want to work here. And we don’t just want to keep it that way – we must keep it that way!

The Enkitec group will not be dissolved into the Accenture. If it were, we would disappear, like a drop in the ocean and Accenture would have lost its investment. Instead we will remain an island in the ocean continuing to provide expert help for our existing and new customers – and in long term help Accenture build additional capability for the massive projects of their customers.

We will not have ten thousand people in our group. Instead we will continue hiring (and retaining) people exactly the way we’ve been – organic growth by having only the best, likeminded people. The main difference is, now with Accenture behind us, we can hire the best people globally, as we’ll have operations in over 50 countries. I understand that we won’t likely even double in size in the next few years – as we plan to stick to hiring only the best.

I think we will have a much, much wider reach now, showing how to do Oracle technology “our way” all around the world. With Accenture behind us, we will be navigating through even larger projects in larger businesses, influencing things earlier and more. And on a more personal note, I’m looking forward to all those 10 rack Exadata and 100TB In-Memory DB Option performance projects ;-)

There are a few interesting MOS articles about these files and how/when to get rid of those (don’t remove any files before reading the notes!), but none of these articles explain why these JOXSHM (and PESHM) files are needed at all:

If you haven’t read them – here are the previous articles in Oracle memory troubleshooting series: Part 1, Part 2, Part 3.

Let’s say you have noticed that one of your Oracle processes is consuming a lot of private memory. The V$PROCESS has PGA_USED_MEM / PGA_ALLOC_MEM columns for this. Note that this view will tell you what Oracle thinks it’s using – how much of allocated/freed bytes it has kept track of. While this doesn’t usually tell you the true memory usage of a process, as other non-Oracle-heap allocation routines and the OS libraries may allocate (and leak) memory of their own, it’s a good starting point and usually enough.

Then, the V$PROCESS_MEMORY view would allow you to see a basic breakdown of that process’es memory usage – is it for SQL, PL/SQL, Java, unused (Freeable) or for “Other” reasons. You can use either the smem.sql or pmem.sql scripts for this (report v$process_memory for a SID or OS PID):

From the above output we see that this session has allocated over 100MB of private memory for “SQL” reasons. This normally means SQL workareas, so we can break this down further by querying V$SQL_WORKAREA_ACTIVE that shows us all currently in-use cursor workareas in the instance. I’m using a script wrka.sql for convenience – and listing only my SID-s workareas:

After missing last year’s Hotsos Symposium (trying to cut my travel as you know :), I will present at and deliver the full-day Training Day at this year’s Hotsos Symposium! It will be my 10th time to attend (and speak at) this awesome conference. So I guess this means more beer than usual. Or maybe less, as I’m getting old. Let’s make it as usual, then :0)

I have (finally) sent the abstract and the TOC of the Training Day to Hotsos folks and they’ve been uploaded. So, check out the conference sessions and the training day contents here. I aim to keep my training day very practical – I’ll be just showing how I troubleshoot most issues that I hit, with plenty of examples. It will be suitable both for developers and DBAs. In the last part of the training day I will talk about some Oracle 12c internals and will dive a bit deeper to the lower levels of troubleshooting so we can have some fun too.

It’s long-time public knowledge that X$ fixed tables in Oracle are just “windows” into Oracle’s memory. So whenever you query an X$ table, the FIXED TABLE rowsource function in your SQL execution plan will just read some memory structure, parse its output and show you the results in tabular form. This is correct, but not the whole truth.

Check this example. Let’s query the X$KSUSE table, which is used by V$SESSION:

Now let’s check in which Oracle memory region this memory address resides (SGA, PGA, UGA etc). I’m using my script fcha for this (Find CHunk Address). You should probably not run this script in busy production systems as it uses the potentially dangerous X$KSMSP fixed table:

When the Smart Flash Cache was introduced in Exadata, it was caching reads only. So there were only read “optimization” statistics like cell flash cache read hits and physical read requests/bytes optimized in V$SESSTAT and V$SYSSTAT (the former accounted for the read IO requests that got its data from the flash cache and the latter ones accounted the disk IOs avoided both thanks to the flash cache and storage indexes). So if you wanted to measure the benefit of flash cache only, you’d have to use the cell flash cache read hits metric.

This all was fine until you enabled the Write-Back flash cache in a newer version of cellsrv. We still had only the “read hits” statistic in the V$ views! And when investigating it closer, both the read hits and write hits were accumulated in the same read hits statistic! (I can’t reproduce this on our patched 11.2.0.3 with latest cellsrv anymore, but it was definitely the behavior earlier, as I demoed it in various places).

Side-note: This is likely because it’s not so easy to just add more statistics to Oracle code within a single small patch. The statistic counters are referenced by other modules using macros with their direct numeric IDs (and memory offsets to v$sesstat array) and the IDs & addresses would change when more statistics get added. So, you can pretty much add new statistic counters only with new full patchsets, like 11.2.0.4. It’s the same with instance parameters by the way, that’s why the “spare” statistics and spare parameters exist, they’re placeholders for temporary use, until the new parameter or statistic gets added permanently with a full patchset update.

So, this is probably the reason why both the flash cache read and write hits got initially accumulated under the cell flash cache read hits statistic, but later on this seemed to get “fixed”, so that the read hits only showed read hits and the flash write hits were not accounted anywhere. You can test this easily by measuring your DBWR’s v$sesstat metrics with snapper for example, if you get way more cell flash cache read hits than physical read total IO requests, then you’re probably accumulating both read and write hits in the same metric.

This post also applies to non-Exadata systems as hard drives work the same way in other storage arrays too – just the commands you would use for extracting the disk-level metrics would be different.

I just noticed that one of our Exadatas had a disk put into “predictive failure” mode and thought to show how to measure why the disk is in that mode (as opposed to just replacing it without really understanding the issue ;-)