I've already answered this twice: It is suboptimal if subsequent processes have to suffer from this "pristine" state due to increased work and if the problem could be solved otherwise without causing this additional work.

I'm just questioning here is a weekly rebuild really a reasonable solution to this particular problem.

Yup, that's exactly what you are doing.

I don't know you, but you must be quite the guru to openly question Hemant's judgement . . .

Don,

I'm quite sure that Hemant can speak for himself and address the questions I've raised. I'm not questioning his personal judgment, I'm just raising here a reasonable question in a technical discussion.

This is nothing personal, and Hemant and myself are fine and I'm sure he and I can discuss this openly on a purely technical basis. I know this because it was me who nominated him for the [Oracle ACE award|http://hemantoracledba.blogspot.com/2009/09/i-am-oracle-ace-officially.html] and made sure that he received it, so it's unlikely that I want to question Hemant's ability here in general.

Why artificially shrink an index that is due to its usage pattern supposed to be "large"?

Do I want a 450MB index on a 1.3MB table which has a peak of 16,000 to 17,000 rows? No, I do not.

Different runs against the table insert/delete from tens of rows to the 15000-17000 rows.
Unfortunately, the gather_stats runs do not get the real count. I cannot "insert" a gather_stats within each program / execution that uses this tbale. It is a "temp" table used by Peoplesoft.
A GTT would be ideal but changes to the schema are not allowed (Peoplesoft doesn't support or implement GTTs either).

To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}

{noformat} (lowercase, curly brackets, no spaces) so that the text appears in

fixed format

.
"There are three classes of people. Those who see; those who see when they are shown; those who do not see."
Leonardo da Vinci
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
Edited by: jkestely on Sep 29, 2009 8:39 AM

It is suboptimal if subsequent processes have to suffer from this "pristine" state due to increased work

Are you saying that an index achieves a "steady state", based on the SQL workload"?

Are you saying that the "old" index would be better for these "subsequent processes"?

If so, why?

Sorry, I don't get it . . .

Don,

I've already tried to explain several times why an index rebuild can cause additional work for the DML applied to the table after the rebuild.

Let's shortly summarize the scenario I'm talking about:

- You rebuild the index; assuming the underlying table is almost empty, so the index is now very small

- The processes start to use the table again, up to X sessions insert concurrently into it => The index has to grow significantly => numerous block splits are required to do so => increased undo and redo activity due to the block splits => increased contention

- The index finally will grow to a certain size required to accommodate the concurrent usage of the index *and should stay at that size*.

- Until you perform the next rebuild, and the same starts from beginning again

Now if the index is already at this "steady state", it's very likely that no further block splits are required, so no additional work required to grow the index again.

Again, my point is simply: Why rebuild an index again and again, only to generate additional work to have the index working towards its "steady state".

The questions I've raised are asking: What is the actual impact of the larger index size? Do you really have to rebuild to address these issues? Is a rebuild the most reasonable way to address these issues or are there other means that could help?

It all depends on the answers to these questions - if the rebuild is the most appropriate measure, that's fine, but potentially there are more suitable solutions with less impact.

Note that all this is under the assumption that above scenario applies, which according to Hemant's reply might not be the case here.

so you mean to say that you haven't seen the index reach a certain size and stay there? It keeps growing all the time?

Why artificially shrink an index that is due to its usage pattern supposed to be "large"?

Do I want a 450MB index on a 1.3MB table which has a peak of 16,000 to 17,000 rows? No, I do not.

Although I agree that it needs to be verified that this is a reasonable index size and not caused by some kind of bug, you again demonstrate the "index fast full scan" case with statistics suggesting that the index is very small.

Like I've already asked:
- What happens if you use index statistics reflecting the larger size of the index?
- What is the issue with having those more appropriate statistics?
- Is there a particular reason why you keep those 500 rows - few blocks index statistics in place?

I'm not suggesting to gather the statistics all the time for each run, but simply locking the statistics using more appropriate index statistics reflecting the actual size of the index.

Hemant K Chitale wrote:
Yes, I have been following David Kurtz's postings. A number of ideas on GTTs, Dynamic Sampling, Partitioning etc are available.
Implementing them in Peoplesoft
a. Isn't straightforward when you have to go through tight change controls
b. Isnt' easy when there is concern about "breaking" a vendor's supplied schema and whether subsequent patches/upgrades from the vendor will or will not break

This isn't a home-grown application.

Of course, Peoplesoft is coming with its own particularities and standards, but I disagree on your points. Peoplesoft as an ERP does not mean you cannot do anything to improve performance/storage (or archiving...).
Partitioning is fully supported by Peoplesoft (more than useful for payroll process and billion rows tables), GTT is also a nice solution (for "temp" tables), it is also supported, but only for non-restartable processes. Of course, you'll need to take care when applying patch do not overwrite your changes (as for all applications, isn't it ?). And I don't know one production Peoplesoft installation without any customizations.
In all cases, that makes your application customized, but sometimes we need to make choices between customization and performance, especially because Peoplesoft has not been designed for Oracle database (which could explain the Jonathan's comment earlier "+This does not look like a mechanism designed by someone who understands Oracle, or indexing, let alone Oracle's indexing (or even concurrency).+").

For the last 2,179 INSERTs (and DELETEs) , I have from 0 to 14,026 rows, with an average of 840 and a stddev of 1,844 rows at each pass. There is significant skew in terms of some columns (SETID could have very few distinct values, but some are very small counts, some are very large, while PROCESS_INSTANCE is a single value (always increasing) at each pass0.

Hemant K Chitale wrote:
Yes, I have been following David Kurtz's postings. A number of ideas on GTTs, Dynamic Sampling, Partitioning etc are available.
Implementing them in Peoplesoft
a. Isn't straightforward when you have to go through tight change controls
b. Isnt' easy when there is concern about "breaking" a vendor's supplied schema and whether subsequent patches/upgrades from the vendor will or will not break

This isn't a home-grown application.

Hemant,

I know how hard it is to get permission (from local management and from Peoplesoft support) to implement a change like this - but it has been done before, and it's always helpful to be able to quote a precedent.

I think Randolf has raised two important points:
<ul>
Rebuilding once per week may be better than nothing, but it may not help enough.
My "scenario 2" may not be a (complete) explanation of the phenomenon.
</ul>

I'll make a couple of comments abouit those points a bit later.
Have you tested the effect of a stored outline for the delete statement to force a tablescan for the delete ?

The problem with the delete walking through the index is that it has to walk though a lot of empty space.

If you make this delete do a tablescan, then your description of the issue suggests that "empty space" in the table would be much smaller (which is a good thing). On the other hand the run-time engine will be doing "row by row" deletes in the index rather than "block-optimised" deletes, which would be a less good thing. HOWEVER - because of the nature of the INSERT and the definition of the index, the table rows would probably be well-clustered with respect to the indexes - so the number of index block visits would probably be small.

Coming back to the rebuild: the net benefit of the rebuild depends on the typical pattern of growth. From scenario 2, the larger the index grows, the more likely it is to grow again as the delete takes longer and more processes can join the queue. If the first serious expansion occurs at three minutes past nine on Monday morning, then the rebuild doesn't help much; if the index only grows slowly during the week, then the rebuild may result in enough improvement for enough of the time to keep people happy. While Randolf's comments about the cost of "re-growing" the index are generally appropriate, I don't think they would be particularly significant in this case where you are basically serialising a number of batch jobs that access different parts of the index.

As far as scenario 2 being incomplete - you've reported your index growing to 450MB against a table of 1.3 MB. It's easy to make that sort of thing happen, and I've seen it in production systems before now; but the cause doesn't match your description, and I can only envisage the index growing to about 10 times the size of the table.

Agreed that P'Soft can be customised. We do have a number of customistaions.

However, no customisations / alterations against standard schema objects and tables. (Hey, I can't make all the decisions, but must implement what can fit into existing frameworks).

Also, as you point out, Peoplesoft differentiates between restartable and non-restartable jobs. In other Peoplesoft applications, I see TRUNCATE statements against similar "temporary" tables. That, too, would have been a good choice but cannot be done for jobs accessing these tables -- so P'Soft uses INSERTs and DELETEs.

For the last 2,179 INSERTs (and DELETEs) , I have from 0 to 14,026 rows, with an average of 840 and a stddev of 1,844 rows at each pass. There is significant skew in terms of some columns (SETID could have very few distinct values, but some are very small counts, some are very large, while PROCESS_INSTANCE is a single value (always increasing) at each pass0.

The statistics for 500 rows are based on real live data in the table.

Hemant,

my point was not about the table or column statistics but in particular about the index statistics and its size, e.g. number of leaf blocks. I'm not sure if that was clear enough from my previous posts. Telling the optimizer that this index is 450MB in size would surely prevent your "index fast full scan" case from happening; I'm not sure about the impact on the DELETE statement - if this was sufficient to have it switch automatically to a full table scan instead.

I still doubt that it is reasonable to keep/lock index statistics that are way off from the actual state of the index. Obviously this depends then on the impact on other statements that might be affected by that index statistics change, as I've already mentioned previously.

I've already tried to explain several times why an index rebuild can cause additional work for the DML applied to the table after the rebuild

Yes you have.

I'm not trying to be argumentative, I truly don't understand your point.

I really want to know more about this phenomenon because I've never seen it happen.

*******************************************************************

cause additional work for the DML applied to the table after the rebuild.

Your argument is that an index in a fragmented state may cause less work for DML than the same index after a rebuild, right?

Does that assume that the DBA does not adjust PCTFREE for the rebuild to allow for subsequent expected DML?

Randolph, maybe the misunderstanding is about the nature of the "logical deletes", and that dead empty nodes remain in the tree, ready to recieve new rows?

In my experience, a "sparse index" (or fragmented, unbalanced, browned, or exploded, whatever you call it) does not perform "less work" for subsequent DML than a brand new tree.

Why rebuild an index again and again, only to generate additional work to have the index working towards its "steady state".

This is because the many of the "dead nodes" have been placed back onto the freelist. . .

I've seen people who expect high "subsequent DML, rebuild the index with PCTFREE="lowval" to build a large tree with pre-allocated free space, but their performance does not seem significantly different.

Plus, as Hemant notes, even stead state indexes seem to grow, rather the re-use deleted space . . .

The index finally will grow to a certain size required to accommodate the concurrent usage of the index and should stay at that size.

I'm not sure that this is a fair assumption.

Again, my point is simply: Why rebuild an index again and again, only to generate additional work to have the index working towards its "steady state".

In my experience, I see sparse tree contunue to grow, even though it appears that they have lots of free space.

What is the actual impact of the larger index size?

Assuming that the space is going to be re-used, there is no space benefit.

True. There are spikes in the growth pattern. The index size may be stable for some 4 to 6 hours and then suddenly grow significantly in the next 1 hour, although in theory there isn't any difference in the usage (meaning : the usage is supposed to be the same). Without capturing set of rows to another table and then comparing them each time, I wouldn't be sure. But getting permission to write an audit trigger to duplicate every row so that I can "discern the pattern of data being inserted" isn't going to be easy.

If the first serious expansion occurs at three minutes past nine on Monday morning, then the rebuild
doesn't help much; if the index only grows slowly during the week, then the rebuild may result in
enough improvement for enough of the time to keep people happy

I don't dispute that.

but the cause doesn't match your description

So the information I have about the processes isn't complete. I know. And I have to live with it.

Apparently not. Not both indexes grow to the same size -- although the first 5 columns of the indexes are the same.

But then, (see my preceding response to Jonathan), I don't have access to the actual values being inserted into the table by each session.
I can't be as good as you'd expect because I can't get as much information as I would like to get -- as you would like to analyze !