Somehow 12.1.0.2 has managed to get confused by the combination of “group by ts” and “group by trunc(ts,’DD’)” and has performed “group-by elimination” when it shouldn’t have. If you use the ‘outline’ option for dbms_xplan.display_cursor() you’ll find that the bad result reports the hint elim_groupby(@sel$1), which leads to an alternative solution to hinting the optimizer_features level. Start the code like this:

August 27, 2014

A comment on one of my early blogs about the 12c in-memory database option asked how Oracle would deal with read-consistency. I came up with a couple of comments outlining the sort of thing I would look for in a solution, and this note is an outline on how I started to tackle the question – with a couple of the subsequent observations. The data is (nearly) the same as the data I generated for my previous article on the in-memory database (and I’m running 12.1.0.2, of course):

In this case I’ve made the inmemory priority high and I haven’t set any column to “no inmemory” although I have made one column different from the rest (v$_im_column_level doesn’t get populated unless there is some variation across columns). I have to say I couldn’t get very consistent behaviour in terms of when the data finally got into memory with this table creation – possibly something to do with using “alter table” rather than “create table” – but a second “alter table t1 inmemory;” seemed to do the trick if Oracle was playing hard to get.

Once I’d checked that the table was in memory I collected performance figures from v$mystat and v$session_event for the following query:

select
/* Test Run */
last_ddl_time
from
t1
where t1.created > trunc(sysdate)
and t1.object_type = 'TABLE'
and t1.subobject_name is not null
;

Once I was satisfied that the in-memory option was working correctly, I went through the following steps:

Session 1: set transaction read only;

Session 1: run the query and collect performance figures

Session 2: do several small, committed, updates, modifying a total of 30 or 40 random rows

Session 2: Flush the buffer cache – so that we can see future block acquisition

The effect of the “set transaction read only;” was to force the session to do some extra work in the second execution of the query to make the data read-consistent back to the start of the “transaction”. The results were as follows (don’t forget that some of the numbers will relate to the action of collecting the performance figures):

There’s quite a lot of stats which probably aren’t interesting – and there’s one detail that is important but doesn’t appear (at least not clearly) and that’s the fact that the table in question had about 6,800 blocks below its highwater mark.

So, what do the stats tell us? The most obvious change, of course, is that we had to do some physical reads to get a result set: 24 multiblock reads and 5 single block reads (the latter from the undotablespace). This is echoed in the session stats as 853 “physical reads cache” from 29 “physical read IO requests”. We can then see the specific read-consistency work (in two ways – with a third close approximation):

We applied 48 undo change vectors to fix up 41 blocks to the correct point in time and used them to read 48 rows – the last pair of figures won’t necessarily match the first two pairs, but they do give us a measure of how much data we had to acquire from the cache when trying to do an in-memory scan.

The number 41 actually appears a couple more times: it’s “table scan blocks gotten” (which might seem a little odd since we got far more than 41 blocks by multiblock reads – but we only really wanted 41), and it’s also the change (downwards) in “session logical reads – IM”. Even when Oracle does a pure in-memory query it calculates the number of blocks it would have been reading and reports that number as “session logical reads” and “session logical reads – IM” – so there’s another way to get confused about buffer visits and another statistic to cross-check when you’re trying to work out how to calculate “the buffer hit ratio” ;)

After the first read the scattered reads all seemed to be 32 blocks of “intermittent” tablescan – perhaps this is a measure of the number of blocks that are compressed into a single in-memory chunk (for query low), but perhaps it’s a side effect of the “physical reads prefetch warmup” that Oracle may do when the cache has a lot of empty space. I’ll leave it as an exercise to the reader to refine the test (or think of a different test) to determine whether it’s the former or latter; it’s quite important to find this out because if Oracle is tracking change at the “in-memory chunk” rather than at the block level then a small amount of high-precision change to an in-memory table could result in a relatively large amount of “redundant” I/O as a long-running query tried to stay read-consistent.

August 24, 2014

The title of this piece is the name given to a new feature in 12.1.0.2, and since I’ve recently blogged about a limitation of the in-memory option I thought I’d pick this feature as the next obvious thing to blog about. This is a bit of a non sequitur, though, as the feature seems to have nothing whatsoever to do with the in-memory option; instead it’s a cunning mechanism combining aspects of the star-transformation (but without the bitmap indexes), Bloom filters, and “group-by” placement to minimise the cost of aggregation over high-volume joins.

I have a “large” table of people, and people can live in one town and work in another. Towns are in states and I’m interested in a report about people who live in one specific state but work in another (e.g. New Hampshre vs. Massachusetts). There are a couple of “padding” columns to represent the data associated with each town and person that I might want in a report. To keep things simple I haven’t extended the query out to select the name of the state. Here’s the query I might use to get the report I want:

The order of operation (row source generation) is: 3,5,6,4,2,1 – we build a hash table from the towns in state 2; build a hash table from the towns in state 1; scan the people table and probe the state 1 hash table, any row that survives is used to probe the state 2 hash table, and the rows that survive the second probe are aggregated to produce the answer.

When you do this type of thing with very large data sets one of the potential performance threats comes from the volume of data you have to aggregate. As we’ve joined the three tables the row length grows significantly before we finally aggregate (admittedly my data set is small, and the number of rows we’re going to aggregate also appears to be very small according to the predictions). There’s also (in the early stages at least) the potential for passing a very large number of rows from the fact table through the first (and possibly subsequent) hash join, doing a lot of work to eliminate the rows you don’t need.

In 12c the optimizer can choose to minimise both these threat points using “vector transformation”. (The name may also reflect the possibility that the code path will take advantage of vector processing (SIMD) operations if they’re available in the CPU.) Here’s the execution path I got when I added the /*+ vector_transform(@sel$1) */ hint to my query – it’s not sensible for this tiny data set, of course, but the hint is a way of learning what Oracle can do:

There are three critical components to this plan: first, we create a couple of “Key Vectors” from the towns table, then we use those key vectors while scanning the people table and aggregate a minimal data set, finally we join back to the data associated with the key vectors. Reprising my introductory paragraph: the creation and use of the key vectors is similar to the Bloom filter approach; the final join-back is similar to the strategy used in Star Transformations (especially the ones where temp tables appear), and the key vector allows the high-volume fact data to be aggregated as much as possible before adding extra row-length from the dimensions.

In outline Oracle does the following:

scan the towns table to extract the id, and padding columns for id_state = 1 / work town – this produced 50 rows with my data set

manipulate the result to extract the distinct values of padding, and give each value a unique numeric identifier – this is the information that goes into the temp table (with one extra column) – this produced 10 rows

manipulate the result again to produce an in-memory array of (town.id, temp_table.identifier) – this is the key vector, containing 50 elements.

The second temp table and key vector for (id_state = 2 /work town ) will be created in the same way.

As the fact table is scanned Oracle can apply the key vectors very efficiently (we hope) to pick out the people rows that would be involved in the final aggregate and associate with each relevant row the two padding identifiers that belong to that row (this step is a bit like doing 2 hash joins – but presumably much more efficient; Bloom filtering does something very similar). After selecting the minimum number of rows we can aggregate them on the two padding identifiers (an example of the “aggregate early”/”place group by” principle – aggregate before joining); finally we join back to the two temporary tables to translate the short padding identifiers into the long padding values (just as we do in star transformations with temporary table transformation).

Strangely we aggregate again after the join-back. I don’t think it’s necessary in this case because I’m fairly sure that the join back is on a unique set of columns – but perhaps this is a generic strategy allowing for variations in the mechanism, including such things as cases where the vector transform is only applied to a subset of the dimension tables.

Technically you could almost emulate this strategy in any version of Oracle (and I probably have at various times over the last few years) with the major limitation that the “KEY VECTOR USE” operations at lines 17 and 18 would have to be replaced with hash joins; no doubt, though, the major CPU saving of this approach is the difference between consecutive hash joins and what appears to be (from the execution stats) concurrent vector filtering. At some point – if a client needs the extra performance edge before they get to 12c – I’ll have to see if I can engineer an example in 11g that emulates the whole plan but uses Bloom filtering to approximate the key vector filtering.

August 15, 2014

I’ve been struggling to find time to have any interaction with the Oracle community for the last couple of months – partly due to workload, partly due to family matters and (okay, I’ll admit it) I really did have a few days’ holiday this month. So making my comeback with a bang – here’s a quick comment about the 12.1.0.2 in-memory feature, and how it didn’t quite live up to my expectation; but it’s also a comment about assumptions, tests, and inventiveness.

One of the 12.1.0.2 manuals tells us that the optimizer can combine the in-memory columnar storage mechanism with the “traditional” row store mechanisms – unfortunately it turned out that this didn’t mean quite what I had hoped; I had expected too much of the first release. Here’s a quick demo of what doesn’t happen, what I wanted to happen, and how I made it happen, starting with a simple definition (note – this is running 12.1.02 and the inmemory_size parameter has been set to enable the feature):

All I’ve done at this point is create a table with most of its columns in-memory and a couple excluded from the columnar store. This is modelling a table with a very large number of columns where most queries are targeted at a relatively small subset of the data; I don’t want to have to store EVERY column in-memory in order to get the benefit of the feature, so I’m prepared to trade lower memory usage in general against slower performance for some queries. The query against v$im_column_level shows me which columns are in-memory, and how they are stored. The call to explain plan and dbms_xplan then shows that a query involving only columns that are declared in-memory could take advantage of the feature. Here’s the resulting execution plan:

Note that the table access full includes the inmemory keyword; and the predicate section shows the predicates that have taken advantage of in-memory columns. The question is – what happens if I add the object_id column (which I’ve declared as no inmemory) to the select list. Here’s the resulting plan:

There’s simply no sign of an in-memory strategy – it’s just a normal full tablescan (and I didn’t stop with execution plans, of course, I ran other tests with tracing, snapshots of dynamic performance views etc. to check what was actually happening at run-time).

In principle there’s no reason why Oracle couldn’t use the in-memory columns that appear in the where clause to determine the rowids of the rows that I need to select and then visit the rows by rowid but (at present) the optimizer doesn’t generate a plan to do that. There’s no reason, though, why we couldn’t try to manipulate the SQL to produce exactly that effect:

I’ve joined the table to itself by rowid, hinting to stop the optimizer from getting too clever and eliminating the join. In the join I’ve ensured that one reference to the table can be met completely from the in-memory columns, isolating the no inmemory columns to the second reference to the table. It is significant that the in-memory tablescan is vastly lower in cost than the traditional tablescan – and there will be occasions when this difference (combined with the knowledge that the target is a relatively small number of rows) means that this is a very sensible strategy. Note – the hints I’ve used happen to be sufficient to demonstrate method but I’d be much more thorough in a production system (possibly using an SQL baseline to fix the execution plan).

Of course, this method is just another example of the “visit a table twice to improve the efficiency” strategy that I wrote about a long time ago; and it’s this particular variant of the strategy that allows you to think of the in-memory columnar option as an implementation of OLTP bitmap indexes.

August 3, 2014

I suggested a little while ago that thinking about the new in-memory columnar store as a variation on the principle of bitmap indexes was quite a good idea. I’ve had a couple of emails since then asking me to expand on the idea because “it’s wrong” – I will follow that one up as soon as I can, but in the meantime here’s another angle for connecting old technology with new technology:

It is a feature of in-memory column storage that the default strategy is to store all columns in memory. But it’s quite likely that you’ve got some tables where a subset of the columns are frequently accessed and other columns are rarely accessed and it might seem a waste of resources to keep all the columns in memory just for the few occasional queries. So the feature allows you to de-select columns with the “no inmemory({list of columns})” option – it’s also possible to use different degrees of compression for different columns, of course, which adds another dimension to design and planning – but that’s a thought for another day.

So where else do you see an example of being selective about where you put columns ? Index Organized Tables (IOTs) – where you can choose to put popular columns in the index (IOT_TOP) segment, and the rest in the overflow segment, knowing that this can give you good performance for critical queries, but less desirable performance for the less important or less frequent queries. IOTs allow you to specify the (typically short) list of columns you want “in” – it might be quite nice if the same were true for the in-memory option, I can imagine cases where I would want to include a small set of columns and exclude a very large number of them (for reasons that bring me back to the bitmap index analogy).

July 27, 2014

So 12.1.0.2 is out with a number of interesting new features, of which the most noisily touted is the “in-memory columnar storage” feature. As ever the key to making best use of a feature is to have an intuitive grasp of what it gives you, and it’s often the case that a good analogy helps you reach that level of understanding; so here’s the first thought I had about the feature during one of the briefing days run by Maria Colgan.

Obviously the analogy isn’t perfect … but I think it’s very close: for each column stored you use a compression technique to pack the values for a large number of rows into a very small space, and for each stored row you can derive the rowid by arithmetic. In highly concurrent OLTP systems there’s still room for some contention as the session journals are applied to the globally stored compressed columns (but then, private redo introduces some effects of that sort anyway); and the “indexes” have to be created dynamically as tables are initially accessed (but that’s a startup cost, it’s timing can be controlled, and it’s basically limited to a tablescan).

Whatever the technical hand-waving it introduces – thinking of the in-memory thing as enabling real-time bitmaps ought to help you think of ways to make good use of the feature.

May 15, 2014

Prompted by a pingback on this post, followed in very short order by a related question (with a most gratifying result) on Oracle-L, I decided to write up a note about another little optimizer enhancement that appeared in 12c. Here’s a query that differs slightly from the query in the original article:

As a general principle, the “OR EXISTS” stops the optimizer from unnesting the subquery, so my original article suggested a workaround that required you to rewrite the query with a UNION ALL, using the lnnvl() function (where possible) as the easy way to eliminate accidental duplication. Take a look at the plans for my new query, though – first in 11.2.0.4, then in 12.1.0.1:

As expected, 11.2.0.4 has had to use a filter subquery approach – but 12.1.0.1 has found a different path. For this special “is null” case the optimizer has unnested the subquery and used a “null aware (NA) semi-join”. In this very small example there is no change in the reported cost, and the mechanics of the execution plan will be quite similar at run time – but in real systems there are bound to be cases where the new strategy is more efficient.

Unfortunately …

Bug 18650065 (fixed in 12.2) rears it’s ugly head: WRONG RESULTS ON QUERY WITH SUBQUERY USING OR EXISTS.
I can demonstrate this with the following code:

I’ve modified a few rows so that the “null-aware” bit of the new transformation matters, but I’ve now got a data set and transformation where I get the wrong results because I’ve defined a primary key (unique would have done) on a critical column in the query. If you check the execution plan you’ll find that the optimizer has switched from a null aware semi-join to a simple nested loop join.

There is a workaround for this problem – disable the relevant feature:

May 4, 2014

Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.

Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:

I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.

We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:

The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).

So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.

Bonus thought:

In 12c Oracle collects column stats automatically as it loads data into an empty table; but not if it’s got extended stats defined on it. (This is bug number 18425876, labelled as fixed in 12.2 [ed: actually fixed in 12.1.0.2]). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)

The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.

Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual
*
ERROR at line 1:
ORA-20001: Error when processing extension - resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_STATS", line 12977
ORA-06512: at "SYS.DBMS_STATS", line 44967
ORA-06512: at "SYS.DBMS_STATS", line 44986

March 14, 2014

Just one of those little snippets to cover something new and remind of something old. A single session can now have three (or more) temporary tablespaces in use at the same time for different reasons.

In 12c you can set parameter temp_undo_enabled to true, at which point the undo relating to global temporary tables (GTTs) will be written into the database default temporary tablespace, which means it won’t generate redo. As an interesting side effect this also means that you can do DML on temporary objects in a physical standby database. Currently the undo segment shows as type “UNDEFINED” in v$sort_usage. A detail to watch out for, though – it looks as if each session may get its own temporary undo segment – so be careful about specifying the extent size and tempfile size for the database default temporary tablespace.

In 11g you can specify a tablespace (though not a tablespace group) in the ‘create global temporary table’ statement, which means you keep activity about GTTs separated from the I/O resulting from sorts and hash joins etc. spilling to disc; in fact you could specify a different temporary tablespace for every GTT if you wanted to – and I could imagine a case for having a couple of different temporary tablespaces to accommodate GTTs with very different usage characteristics. (Unfortunately you still can’t specify a tablespace in the dbms_lob.create_temporary() function). If you don’t specify a tablespace for a GTT it will go into the default temporary tablespace of the user who is using it (not the database default, and not the default for the definerof the GTT). If you create indexes on a GTT they will automatically go into the same tablespace as the table.

Finally, of course, there’s the default temporary tablespace for the user and this is where GTTs will go if they don’t have a tablespace specified, and where all the scratch data (sorts, hash tables, factored subqueries et. al.) will go.

This combination means, of course, that you could manage to do a single “insert as select” writing a GTT to one temporary tablespace, with its undo going to a second temporary tablespace, and the spill from a sort or hash join in the select going to a third. The flexibility probably won’t make much difference to performance (for most people), but it’s possible that it will make it easier to monitor where the work is coming from if you’r ever in the position where your single temporary tablespace is subject to a lot of I/O.

Footnote:

In the past I’ve advised DBAs to set up a small number of tablespaces (or tablespace groups) so that they can allocate different classes of users – typically grouped by business function – to different temporary tablespaces. The ability to allocate GTTs to temporary tablespaces allows a further degree of refinement to this strategy.

March 5, 2014

One of the changes that appeared in 12c was a change to the BROADCAST distribution option for parallel execution. I mentioned this in a footnote to a longer article a couple of months ago; this note simply expands on that brief comment with an example. We’ll start with a simple two-table hash join – which I’ll first construct and demonstrate in 11.2.0.4:(more…)

February 6, 2014

Here’s a simple little demonstration of an enhancement to the optimizer in 12c that may result in some interesting changes in execution plans as cardinality estimates change from “guesses” to accurate estimates.

December 8, 2013

When you upgrade you often find that some little detail (of the optimizer) that didn’t receive a lot of attention in the “New Features” manuals introduces a few dramatic changes in execution plans. Here’s one example of a detail that is likely to catch a few unlucky people. We start with a very simple table which is just and id column with some padding, and then show the effect of a change in the handling of “constant subqueries”. Here’s my data set:

November 14, 2013

Oracle 12c has increased the maximum length of character-based columns to 32K bytes – don’t get too excited, they’re stored out of lines (so similar in cost to LOBs) and need some modification to the parameter file and data dictionary (starting the database in upgrade mode) before you can use them.

Be cautious about enabling this option and test carefully – there are going to be a number of side effects, and some of them may require a significant investment in time to resolve. The first one that came to my mind was that if you’ve created a function-based index on a pl/sql function that returns a varchar2() type and haven’t explicitly created the index on a substr() of the return value then the data type of the function’s return value will change from the current default of varchar2(4000) to varchar2(32767) – which means the index will become invalid and can’t be rebuilt or recreated.

Obviously you can redefine the index to include an explicit substr() call – but then you have to find all the code that was supposed to use the index and modify it accordingly.

November 6, 2013

I wrote a note about the 12c “In-Memory” option some time ago on the OTN Database forum and thought I’d posted a link to it from the blog. If I have I can’t find it now so, to avoid losing it, here’s a copy of the comments I made: