Fundamentally Irrelevant

There’s always a risk when you go off testing something that you notice some side-effect or issue that turns out to be irrelevant to the main investigation.

I’ve been investigating a performance problem on an insert .. select statement.

For the last couple of days, I’ve had a physical copy of the production database from the day before and I’ve been running this statement, rolling it back, running it again, rolling it back, etc and I’ve been running it in within a plsql harness doing some before/after runstats calls to supplement the extended trace.

Comparing the first run of the insert statement with subsequent runs, I noticed significant differences between the statistics leaf node splits and leaf node 90-10 splits and wondered how differences in these might tie in to some of the differences in other statistics.

The actual values are unimportant but on the first run there were roughly 3000 leaf node splits and 1200 leaf node 90 10 splits, meaning that 1800 were 50-50.

On any subsequent run, there were 1200 leaf node splits, of which all were leaf node 90-10 splits.

So, why the difference?

With a 50:50 leaf split, I want to insert a new entry somewhere in the middle of my existing index block and so Oracle is taking my full index block, getting a new block and allocating half of the index entries from the full block and updating the various linking references.

Whereas with a 90:10 split, I’ve got a value higher than the current entries in this full block. Typically you get 90:10 splits with sequence inserts, i.e. monotonically increasing values always going into the far right side of the index, each value bigger than the last.

So why the different observations?

I wondered the question on oracle-l but sometimes just being a bit dim doesn’t come translate effectively in an email or forum thread.

There was far too much information in my question when really I should have just asked “what am I not getting?“.

Sometimes it’s difficult to ask the right question when you don’t get what you’re missing.

Sometimes we can look at 2 + 2 and come up with 5.

Sometimes we’re just being vacant and just staring at 2 + 2 not realising we’re expected to add it up?

I understood why my insert would be getting mostly 50:50 splits on these indexes and 90:10 splits on those but didn’t simply get the correlation with the rollback & repeat.

Space management is a recursive operation that is effectively unimpacted by my transaction rollback.

I could understand why there were no 50:50 splits in the subsequent runs.

So, even though my transaction rolled back, the effect of my transaction-that-never-was was still to split some of the index keys across more blocks that they used to occupy.

But then why wouldn’t the same be true for 90-10 splits?

Honestly, this was a painful wait for me for the penny to drop and I prefer not to do my penny drop waiting in public.

Umm… because those new blocks were filled with new data – those hundreds of thousands of new sequence numbers.

… and you rolled back

… so you left behind a bunch of completely empty blocks which went back on the freelist

… ready for you to do it all again when you repeated the insert.

Doh!

Something fundamental just overlooked.

And yet completely irrelevant to what I was investigating, brought about only because of the whole roll-back-and-repeat thing

Like this:

Related

4 Responses to Fundamentally Irrelevant

You are not “dim” (maybe just a slight bit inexperienced :-)) ). You have simulated why simulations sometimes aren’t the real thing. The only real simulation is to take an exact physical image with exactly the same concurrent load/activity *EVERY TIME* you restart a simulation.
Note : Concurrent activity by other sessions also has an impact.

I don’t think I’m inexperienced so I’m rightly hard on myself for being slow to twig what I think should have been obvious to me.

You’re right about simulation and the potential impact of concurrent sessions but that’s a luxury that most people don’t have or rather aren’t prepare to pay for.

This was really a simulation for me to investigate little areas and tangents around the real problem, establishing baselines from which to test potential fixes and supplementing information from things like 10046 trace files taken from production.

There’s an awful lot of investigation that we can do away from a production environment but the more we can make it production-like the better – volume, stats, etc, etc.

Anyway, tangents – even when irrelevant to what you’re meant to be looking at – are good.

And I’m trying to be open and not too defensive about revealing shortcomings.