Enhance Derby with EXPLAIN Functionality

Details

Description

This enhancement extends Derby with EXPLAIN functions. Users want to have more feedback than they`re getting with the current RuntimeStatistics facility. This extension is based on the RuntimeStatistics/ResultSetStatistics functions / classes.

Felix Beyer
added a comment - 26/Mar/07 17:52 This file documents the new system procedures and their parameters.
An example will demonstrate the usage of the functions and the new explain functionality.

Bryan Pendleton
added a comment - 27/Mar/07 02:57 Hi Felix, thanks for posting the code. It looks great! There's a lot of code, so it may take a while to read.
Why did you choose to spell "explain" without the "e" in the code?

Bryan Pendleton
added a comment - 27/Mar/07 05:57 1) If multiple people are simultaneously using explain mode to store query plans into
the system catalogs, how do they distinguish their explain data from each other?
2) If I use explain mode for multiple SQL statements, how do I go back in after the
fact and find the particular explain data for the particular SQL statement that I'm interested in?

>Why did you choose to spell "explain" without the "e" in the code?
There`s no special reason for this. Maybe to distinguish the special Derby explain behaviour from expectations coming from users , who are familiar with the explain functions of commercial DBMS.

The current explain implementation is optimized for ad-hoc queries and tool support Furthermore the explain data is quite extensive to analyze. I wanted to make a compromise between detailed explain information which is almost unreadable by human users and which has to be evaluated with the help of a tool and a compact version of explain data which is only applicable for rough investigations but is still browseable by human users.

>1) If multiple people are simultaneously using explain mode to store query plans into
>the system catalogs, how do they distinguish their explain data from each other?
The XPLAINStatements catalog does not have a user attribute to distinguish the explain data by users. By now the only way to distinguish explain data is via their UUID and the session id attribute. (see physical schema) In Net environments the drda id is additionally available to clearify the net connection and thus the user.

>2) If I use explain mode for multiple SQL statements, how do I go back in after the
>fact and find the particular explain data for the particular SQL statement that I'm interested in?
The easiest way to do this is to filter out the wanted explain data by the stmt text attribute. If there are more statements which have the same stmt text then go for the explain timestamp and pick the one which is right.
select stmt_text, stmt_id, xplain_time from sys.sysxplain_statements order by stmt_text, xplain_time;
With the wanted stmt_id step into the other catalogs to query more data.

Felix Beyer
added a comment - 28/Mar/07 09:45 >Why did you choose to spell "explain" without the "e" in the code?
There`s no special reason for this. Maybe to distinguish the special Derby explain behaviour from expectations coming from users , who are familiar with the explain functions of commercial DBMS.
The current explain implementation is optimized for ad-hoc queries and tool support Furthermore the explain data is quite extensive to analyze. I wanted to make a compromise between detailed explain information which is almost unreadable by human users and which has to be evaluated with the help of a tool and a compact version of explain data which is only applicable for rough investigations but is still browseable by human users.
>1) If multiple people are simultaneously using explain mode to store query plans into
>the system catalogs, how do they distinguish their explain data from each other?
The XPLAINStatements catalog does not have a user attribute to distinguish the explain data by users. By now the only way to distinguish explain data is via their UUID and the session id attribute. (see physical schema) In Net environments the drda id is additionally available to clearify the net connection and thus the user.
>2) If I use explain mode for multiple SQL statements, how do I go back in after the
>fact and find the particular explain data for the particular SQL statement that I'm interested in?
The easiest way to do this is to filter out the wanted explain data by the stmt text attribute. If there are more statements which have the same stmt text then go for the explain timestamp and pick the one which is right.
select stmt_text, stmt_id, xplain_time from sys.sysxplain_statements order by stmt_text, xplain_time;
With the wanted stmt_id step into the other catalogs to query more data.

I have not looked at the code changes for this issue, nor have I actually tried out the new functionality. I plan to, but I have not yet had enough time.

I did, however, apply the full patch (it applied cleanly) and I ran Derby's primary two regression suites ("derbyall" and "suites.All") on Red Hat Linux with ibm142. A handful of tests failed in "derbyall" because of metadata differences: namely, the metadata now includes information for the new XPLAIN stored procedures. This means the expected results for those tests have to be updated to reflect the presence of the new procedures. Unfortunately, I accidentally deleted the derbyall results so I do not have a list of the tests which require updates. However, if you run

java org.apache.derbyTesting.functionTests.harness.RunSuite derbyall

on your machine, you should hopefully be able to see the failures for yourself. Note that the "derbyall" suite takes three to four hours to complete (on my test machine). If you are uncertain about how to fix the tests, please feel free to ask questions to derby-dev.

Aside from the metadata failures, derbyall ran cleanly. As for "suites.All" I saw 5 failures:

FAILURES!!!
Tests run: 5664, Failures: 5, Errors: 0

I have not had time to look into the cause of the failures; can you run this suite and investigate a bit more?

As for the patch itself, it is a very large patch. I count over 10,000 lines of code changes in all (including comments). As Bryan said earlier, it is probably going to take a while for people to read and review it. It might help if you were able to break the patch up into smaller logical "pieces", where each piece compiles cleanly and passes the regression suites. It is generally much easier to review smaller patches, and you are more likely to get comments from more people in a shorter time. See:

"If you are attaching a large patch, you should sign an [WWW] ICLA, generally granting Apache license rights over your Derby contributions. Evidence that you filed an ICLA is supposed to appear in the [WWW] Contributors list. However, since Apache updates this list only sporadically, you may need to resubmit your ICLA or pester Apache to notice it. For more information, see the [WWW] Apache licensing info."

I think 10k lines counts as "a large patch", so it'd be great if you could sign an ICLA The wiki page at the above URL has the necessary links. If you have already filed an ICLA then it might be helpful if you could indicate that in an email or as a comment for this issue.

And one final suggestion:

Don't be shy! If at any point you have any questions about Derby-whether they relate to this Jira or not-please feel free to ask them on derby-dev or on derby-user. And if you have input/answers to other people's questions/discussions, please chime in at any point. We welcome your participation.

A B
added a comment - 30/Mar/07 17:42 I have not looked at the code changes for this issue, nor have I actually tried out the new functionality. I plan to, but I have not yet had enough time.
I did, however, apply the full patch (it applied cleanly) and I ran Derby's primary two regression suites ("derbyall" and "suites.All") on Red Hat Linux with ibm142. A handful of tests failed in "derbyall" because of metadata differences: namely, the metadata now includes information for the new XPLAIN stored procedures. This means the expected results for those tests have to be updated to reflect the presence of the new procedures. Unfortunately, I accidentally deleted the derbyall results so I do not have a list of the tests which require updates. However, if you run
java org.apache.derbyTesting.functionTests.harness.RunSuite derbyall
on your machine, you should hopefully be able to see the failures for yourself. Note that the "derbyall" suite takes three to four hours to complete (on my test machine). If you are uncertain about how to fix the tests, please feel free to ask questions to derby-dev.
Aside from the metadata failures, derbyall ran cleanly. As for "suites.All" I saw 5 failures:
FAILURES!!!
Tests run: 5664, Failures: 5, Errors: 0
I have not had time to look into the cause of the failures; can you run this suite and investigate a bit more?
java junit.swingui.TestRunner -noloading org.apache.derbyTesting.functionTests.suites.All
This regression suite usually takes less than an hour to complete.
As for the patch itself, it is a very large patch. I count over 10,000 lines of code changes in all (including comments). As Bryan said earlier, it is probably going to take a while for people to read and review it. It might help if you were able to break the patch up into smaller logical "pieces", where each piece compiles cleanly and passes the regression suites. It is generally much easier to review smaller patches, and you are more likely to get comments from more people in a shorter time. See:
http://wiki.apache.org/db-derby/PatchAdvice
http://wiki.apache.org/db-derby/IncrementalDevelopment
Of course that is just a suggestion; you are not required to post incremental patches.
Also note the following, copied from the Derby wiki page:
http://wiki.apache.org/db-derby/DerbyContributorChecklist
"If you are attaching a large patch, you should sign an [WWW] ICLA, generally granting Apache license rights over your Derby contributions. Evidence that you filed an ICLA is supposed to appear in the [WWW] Contributors list. However, since Apache updates this list only sporadically, you may need to resubmit your ICLA or pester Apache to notice it. For more information, see the [WWW] Apache licensing info."
I think 10k lines counts as "a large patch", so it'd be great if you could sign an ICLA The wiki page at the above URL has the necessary links. If you have already filed an ICLA then it might be helpful if you could indicate that in an email or as a comment for this issue.
And one final suggestion:
Don't be shy! If at any point you have any questions about Derby- whether they relate to this Jira or not -please feel free to ask them on derby-dev or on derby-user. And if you have input/answers to other people's questions/discussions, please chime in at any point. We welcome your participation.

What is the status of this patch? Was a CLA ever filed? If not, is that a blocker?

That is, if the community were to decide that we wanted to include this work, are
there legal issues that need to be overcome? Or is it simply a matter of digesting
the proposed implementation and deciding if we want to proceed with it?

Bryan Pendleton
added a comment - 01/Feb/09 16:10 What is the status of this patch? Was a CLA ever filed? If not, is that a blocker?
That is, if the community were to decide that we wanted to include this work, are
there legal issues that need to be overcome? Or is it simply a matter of digesting
the proposed implementation and deciding if we want to proceed with it?

Rainer Gemulla
added a comment - 04/Feb/09 05:35 This patch has been created as part of a diploma thesis. There are no legal issues of any kind. I will try to contact Felix Beyer (who did all the work) in order to file the CLA.

There are probably other interactions that have changed since the patch was first written.

I've made an attempt to revise the patch to bring it up to date so that it will apply
cleanly against the current trunk, and will at least compile, and have attached that
updated patch as 'incorporateTrunkChanges.diff'.

Although I think that this revised patch will at least compile, I very much doubt that
it will run; I simply had to make too many changes to the patch to give me much
confidence that I haven't broken things.

But the exercise did give me a bunch more familiarity with the patch

And now I can start working with the code against the current trunk, stepping through
it and getting more familiar with it.

Bryan Pendleton
added a comment - 05/Feb/09 00:00 I've been working on trying to bring the patch up to date with the current trunk.
Since the time that this patch was originally posted, at least the following other
patches have touched related parts of the code: DERBY-3037 , DERBY-2661 ,
DERBY-1734 , DERBY-3147 , DERBY-2775 , DERBY-3570 , DERBY-3137 , DERBY-2998 .
There are probably other interactions that have changed since the patch was first written.
I've made an attempt to revise the patch to bring it up to date so that it will apply
cleanly against the current trunk, and will at least compile, and have attached that
updated patch as 'incorporateTrunkChanges.diff'.
Although I think that this revised patch will at least compile, I very much doubt that
it will run; I simply had to make too many changes to the patch to give me much
confidence that I haven't broken things.
But the exercise did give me a bunch more familiarity with the patch
And now I can start working with the code against the current trunk, stepping through
it and getting more familiar with it.

Bryan Pendleton
added a comment - 05/Feb/09 03:06 I messed up the argument type to syscs_set_xplain_style, it should be TypeDescriptor.INTEGER.
I'll correct that in a later patch, after I've done more testing and study.

Rainer Gemulla
added a comment - 05/Feb/09 03:43 Thanks, Bryan. My personal feeling is that these extensions can be of great value for both Derby developers and application developers using Derby. It's great to see that there is some progress.

In a complete run of the regression suite, there were a handful of failures. From a quick
scan, they were all related to the new system catalogs added by this patch. So it at least
seems like the new code does no harm, as far as is measured by the regression tests.
I'll look into each of the test failures in more detail.

Bryan Pendleton
added a comment - 05/Feb/09 15:25 In a complete run of the regression suite, there were a handful of failures. From a quick
scan, they were all related to the new system catalogs added by this patch. So it at least
seems like the new code does no harm, as far as is measured by the regression tests.
I'll look into each of the test failures in more detail.

Bryan Pendleton
added a comment - 06/Feb/09 14:18 Attached 'updateRegressionTests.diff' patch proposal includes changes
to the regression tests to incorporate the new system catalogs. With this
patch, I get a clean regression test run.

Bryan Pendleton
added a comment - 06/Feb/09 15:51 I hadn't properly done 'svn add' on all the relevant files and directories,
so my previous patch attachments were missing some of the new files.
Re-attaching 'updateRegressionTests.diff'.

For the next step, I'm thinking about starting to work on some
draft documentation, as I think the process of drafting documentation
will help me clarify some of the details of the patch. I think that the
documentation should include:

an overview of what this feature is, how it can help you, and how it works

Bryan Pendleton
added a comment - 06/Feb/09 15:58 I'm becoming more familiar with the patch, which is good.
For the next step, I'm thinking about starting to work on some
draft documentation, as I think the process of drafting documentation
will help me clarify some of the details of the patch. I think that the
documentation should include:
an overview of what this feature is, how it can help you, and how it works
reference material on the new system procedures
reference material on the new system catalogs
a number of examples of how to use the feature

Kathey Marsden
added a comment - 06/Feb/09 19:10 Bryan asked:
"What is the status of this patch? Was a CLA ever filed? If not, is that a blocker?"
I think it is a blocker for a patch of this size. I don't see Felix Beyer on the list at:
http://people.apache.org/~jim/committers.html

Bryan Pendleton
added a comment - 17/Feb/09 01:14 Brief status update:
I've been in contact with Felix Beyer, and we're trying to get the ICLA filed
I've made good progress on drafting docs, and on writing some new tests,
don't have any patches to post yet, but hopefully soon.
while writing tests, I found an unrelated issue (I think) in DataValueFactory,
and will file a separate issue to track that problem

Attached 'startRegressionTest.diff' patch differs in two
meaningful respects from the previous patch:

it incorporates the workaround that Knut suggested
for DERBY-4062; namely, to directly construct SQLInteger
objects for the integer data values rather than calling
DataValueFactory.getDataValue(). This seems like a
reasonable approach to me, and means that this work won't
be blocked while we discuss the preferred approach to
take with DERBY-4062.

it includes an initial, incomplete, partial new regression test
to be run as part of the derbylang junit test suites. At this
point, I'm just writing the most simple tests of the new
feature, trying to ensure that in some straightforward
scenarios, reasonable values are captured and written
to the new system tables. So far, this testing seems to be
going well and I think it should be possible to continue
adding additional regression tests along these lines to this suite.

I haven't re-run the full regression suite in a while, so it's possible
that this patch breaks some of the tests, although hopefully not
since I haven't done much to the patch other than the two changes
mentioned above.

Bryan Pendleton
added a comment - 20/Feb/09 04:03 Attached 'startRegressionTest.diff' patch differs in two
meaningful respects from the previous patch:
it incorporates the workaround that Knut suggested
for DERBY-4062 ; namely, to directly construct SQLInteger
objects for the integer data values rather than calling
DataValueFactory.getDataValue(). This seems like a
reasonable approach to me, and means that this work won't
be blocked while we discuss the preferred approach to
take with DERBY-4062 .
it includes an initial, incomplete, partial new regression test
to be run as part of the derbylang junit test suites. At this
point, I'm just writing the most simple tests of the new
feature, trying to ensure that in some straightforward
scenarios, reasonable values are captured and written
to the new system tables. So far, this testing seems to be
going well and I think it should be possible to continue
adding additional regression tests along these lines to this suite.
I haven't re-run the full regression suite in a while, so it's possible
that this patch breaks some of the tests, although hopefully not
since I haven't done much to the patch other than the two changes
mentioned above.

I haven't looked at the patch yet, but I've had a brief look at the documentation that you posted on DERBY-4065 (great work, by the way!), and I noticed that the STMT_TEXT column in SYSXPLAIN_STATEMENTS was of type LONG VARCHAR (length 32700). Given that the maximum length of VARCHAR is just 28 characters shorter than LONG VARCHAR, do you think it would be a good idea to change the data type of that column to VARCHAR? That would allow comparisons on that column without a cast. Otherwise, I think a statement like "select * from sys.sysxplain_statements where stmt_text='select * from t'" would result in ERROR 42818: Comparisons between 'LONG VARCHAR (UCS_BASIC)' and 'CHAR (UCS_BASIC)' are not supported.

Knut Anders Hatlen
added a comment - 22/Feb/09 18:47 Hi Bryan,
I haven't looked at the patch yet, but I've had a brief look at the documentation that you posted on DERBY-4065 (great work, by the way!), and I noticed that the STMT_TEXT column in SYSXPLAIN_STATEMENTS was of type LONG VARCHAR (length 32700). Given that the maximum length of VARCHAR is just 28 characters shorter than LONG VARCHAR, do you think it would be a good idea to change the data type of that column to VARCHAR? That would allow comparisons on that column without a cast. Otherwise, I think a statement like "select * from sys.sysxplain_statements where stmt_text='select * from t'" would result in ERROR 42818: Comparisons between 'LONG VARCHAR (UCS_BASIC)' and 'CHAR (UCS_BASIC)' are not supported.

Thanks Knut, I think that's a good idea. I had noticed the awkwardness
of querying against the stmt_text field, and I agree with you that the
small amount of reduction in field size is worth it to be able to issue queries like :

select stmt_id, stmt_text from sys.sysxplain_statements where stmt_text like '% from countries%'

I've uploaded a new version of the patch as 'startRegressionTest.diff'.

I'm beginning to feel like the regression test is at least establishing broad
basic-level coverage of the new feature.

I think that as we get more experience with this feature, there will be a variety of
more advanced issues that we'll want to have tests for, but at least we now have
a test suite which verifies basic reasonableness of almost all the columns in
the new system tables.

The test development that I've done has identified a number of small items that
I want to follow up on, mostly noted by 'fixme' or 'todo' comments in either the
code itself or in the docs.

I think I'll turn my attention to these items next, making sure that we have
regression tests which explore these items in detail, and then trying to
figure out how to address the particular issue in either the code or the docs.

Bryan Pendleton
added a comment - 22/Feb/09 21:47 Thanks Knut, I think that's a good idea. I had noticed the awkwardness
of querying against the stmt_text field, and I agree with you that the
small amount of reduction in field size is worth it to be able to issue queries like :
select stmt_id, stmt_text from sys.sysxplain_statements where stmt_text like '% from countries%'
I've uploaded a new version of the patch as 'startRegressionTest.diff'.
I'm beginning to feel like the regression test is at least establishing broad
basic-level coverage of the new feature.
I think that as we get more experience with this feature, there will be a variety of
more advanced issues that we'll want to have tests for, but at least we now have
a test suite which verifies basic reasonableness of almost all the columns in
the new system tables.
The test development that I've done has identified a number of small items that
I want to follow up on, mostly noted by 'fixme' or 'todo' comments in either the
code itself or in the docs.
I think I'll turn my attention to these items next, making sure that we have
regression tests which explore these items in detail, and then trying to
figure out how to address the particular issue in either the code or the docs.

It seems like it would be nice if the result set number (a small integer, identifying this
particular result set within this statement) were included as a column in
sys.sysxplain_resultsets. It is sometimes, but not always, included in op_details currently.

Bryan Pendleton
added a comment - 23/Feb/09 15:58 It seems like it would be nice if the result set number (a small integer, identifying this
particular result set within this statement) were included as a column in
sys.sysxplain_resultsets. It is sometimes, but not always, included in op_details currently.

I need to investigate how this should work in an upgrade scenario. Is there already
code which runs during upgrade which would detect that these system tables do
not yet exist, and would create them? Also, is there anything special needed so that,
if the system tables do not yet exist, then the feature is smart enough to not allow
itself to be turned on until the system tables are created (e.g., in "soft" upgrade)?

Bryan Pendleton
added a comment - 24/Feb/09 02:48 I need to investigate how this should work in an upgrade scenario. Is there already
code which runs during upgrade which would detect that these system tables do
not yet exist, and would create them? Also, is there anything special needed so that,
if the system tables do not yet exist, then the feature is smart enough to not allow
itself to be turned on until the system tables are created (e.g., in "soft" upgrade)?

I have not done a system catalog upgrade, but here are some places to look.

I assume the right thing would be to disable the feature in soft upgrade. I think this is usually done in parser or the procedure call, whatever applies best for the feature. It looks like
update statistics procedure should have this, and has been added to upgrade tests.

For pointers on adding system catalogs maybe you can look at code in
java/engine/org/apache/derby/impl/sql/catalog/DD_Version.java. There looks like a few versions
in the past that dealt with new catalogs:

if (fromMajorVersionNumber <= DataDictionary.DD_VERSION_DERBY_10_3)

{
// Add new system catalogs created for roles
bootingDictionary.upgradeMakeCatalog(
tc, DataDictionary.SYSROLES_CATALOG_NUM);
}

It looks like the 10.5 update statistics feature laid the test groundwork for 10.5 upgrade testing
to the upgrade tests, so you should just be able to add appropriate test cases:
java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests

Mike Matrigali
added a comment - 24/Feb/09 20:55 I have not done a system catalog upgrade, but here are some places to look.
I assume the right thing would be to disable the feature in soft upgrade. I think this is usually done in parser or the procedure call, whatever applies best for the feature. It looks like
update statistics procedure should have this, and has been added to upgrade tests.
For pointers on adding system catalogs maybe you can look at code in
java/engine/org/apache/derby/impl/sql/catalog/DD_Version.java. There looks like a few versions
in the past that dealt with new catalogs:
if (fromMajorVersionNumber <= DataDictionary.DD_VERSION_DERBY_10_3)
{
// Add new system catalogs created for roles
bootingDictionary.upgradeMakeCatalog(
tc, DataDictionary.SYSROLES_CATALOG_NUM);
}
if (fromMajorVersionNumber <= DataDictionary.DD_VERSION_DERBY_10_1)
{
// add catalogs 1st, subsequent procedure adding may depend on
// catalogs.
// Add new system catalogs created for grant and revoke
bootingDictionary.upgradeMakeCatalog(
tc, DataDictionary.SYSTABLEPERMS_CATALOG_NUM);
bootingDictionary.upgradeMakeCatalog(
tc, DataDictionary.SYSCOLPERMS_CATALOG_NUM);
bootingDictionary.upgradeMakeCatalog(
tc, DataDictionary.SYSROUTINEPERMS_CATALOG_NUM);
}
It looks like the 10.5 update statistics feature laid the test groundwork for 10.5 upgrade testing
to the upgrade tests, so you should just be able to add appropriate test cases:
java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests

Dag H. Wanvik
added a comment - 24/Feb/09 22:39 For roles, I added the above (Mike's quote) upgrade code as part of
DERBY-3191 . To avoid confusion, 10.4 databases has the system table
SYS.SYSROLES, but the feature isn't enabled until 10.5, however.

Thanks Mike and Dag for the upgrade pointers. Those tests are awesome!
I was able to add tests for the new XPLAIN features, and, using those tests,
found and fixed a handful of upgrade issues, so I think upgrade now works.
Attached patch 'startUpgradeTests.diff' passes the complete regression
suite in my environment.

I'm still tracking down to-do's and fixme's, but would welcome
additional review of the patch by anyone who has time.

Bryan Pendleton
added a comment - 26/Feb/09 05:55 Thanks Mike and Dag for the upgrade pointers. Those tests are awesome!
I was able to add tests for the new XPLAIN features, and, using those tests,
found and fixed a handful of upgrade issues, so I think upgrade now works.
Attached patch 'startUpgradeTests.diff' passes the complete regression
suite in my environment.
I'm still tracking down to-do's and fixme's, but would welcome
additional review of the patch by anyone who has time.

I suspect that the 'sourceDepth' column should be removed from SYSXPLAIN_RESULTSETS.
If I'm reading the source code correctly, this field is used during the text formatting
of the runtime statistics information to manage the indentation levels of the text display;
it is incremented by one in the initFormatInfo() method when a parent resultset
constructs its own text by calling upon its child result sets to produce their text.
I don't think that it has any value when the runtime statistics information is captured
into the SYSXPLAIN_RESULTSETS table.

I intend to remove the sourceDepth field from the code patch, and from the docs patch.

Bryan Pendleton
added a comment - 27/Feb/09 23:20 I suspect that the 'sourceDepth' column should be removed from SYSXPLAIN_RESULTSETS.
If I'm reading the source code correctly, this field is used during the text formatting
of the runtime statistics information to manage the indentation levels of the text display;
it is incremented by one in the initFormatInfo() method when a parent resultset
constructs its own text by calling upon its child result sets to produce their text.
I don't think that it has any value when the runtime statistics information is captured
into the SYSXPLAIN_RESULTSETS table.
I intend to remove the sourceDepth field from the code patch, and from the docs patch.

Attached is 'removeSourceDepth.diff', the latest version of the patch proposal.
The primary change is that the 'sourceDepth' field is no longer present in the
SYSXPLAIN_RESULTSETS table. @author tags are also removed.

Bryan Pendleton
added a comment - 01/Mar/09 00:19 Attached is 'removeSourceDepth.diff', the latest version of the patch proposal.
The primary change is that the 'sourceDepth' field is no longer present in the
SYSXPLAIN_RESULTSETS table. @author tags are also removed.

I intend to remove the 'NO_ACCESSED_HEAP_COLUMNS' column from
SYSXPLAIN_SCAN_PROPS. This column is always being set to NULL
in the current code, and I can't think of any use for this column which
isn't already being met by the NO_FETCHED_COLUMNS column.

It's possible that this field was intended to be used to hold the information
from an IndexRowToBaseRow result set, since
RealIndexRowToBaseRowStatistics does compute the columns accessed
from the base heap, and makes that information available in the text
output of the runtimestatistics. However, this result set does not have
a corresponding scan props row, so we don't have a way to hold
the value there. There is a commented-out bit of code in the
XPLAINSystemTableVisitor class which looks like it is intended to deposit
this information into the op_details column for the IndexRowToBaseRow
result set, which might be one way to make it available:

Bryan Pendleton
added a comment - 03/Mar/09 04:56 I intend to remove the 'NO_ACCESSED_HEAP_COLUMNS' column from
SYSXPLAIN_SCAN_PROPS. This column is always being set to NULL
in the current code, and I can't think of any use for this column which
isn't already being met by the NO_FETCHED_COLUMNS column.
It's possible that this field was intended to be used to hold the information
from an IndexRowToBaseRow result set, since
RealIndexRowToBaseRowStatistics does compute the columns accessed
from the base heap, and makes that information available in the text
output of the runtimestatistics. However, this result set does not have
a corresponding scan props row, so we don't have a way to hold
the value there. There is a commented-out bit of code in the
XPLAINSystemTableVisitor class which looks like it is intended to deposit
this information into the op_details column for the IndexRowToBaseRow
result set, which might be one way to make it available:
+ String op_details = "("+statistics.resultSetNumber + "), " +
+ statistics.tableName ;//+ ", " +
+ //"ACCESSED HEAP COLUMNS: "+ statistics.colsAccessedFromHeap;
For the time being, since I'm not sure what the column was supposed to
hold, and since having an always-NULL column doesn't seem useful,
I intend to remove the column from SCAN_PROPS.

Bryan, great job in bringing the patch back upto date with the trunk and figuring out this large patch from the original submitter. I just have couple comments
1)In RealAnyResultSetStatistics.java, RealDistinctScalarAggregateStatistics.java and RealWindowResultSetStatistics.java, do we need to check if childResultSetStatistics is not empty before we set number of children to 1. If yes, then we should also check childResultSetStatistics is not empty
before calling childResultSetStatistics.accept(visitor); This is what we are doing in the rest of the patch.
2)In RealCurrentOfStatistics.java, should we be calling visitor.visit(this); after setting the number of children to 0?
3)In XPLAINFactory.java, while trying to determine the appropriate XPLAINVisitor, if we get an exception, then we are printing the stack trace. That at some point should be fixed. Should the fix be that if we run into exception, just use the default visitor?
java/engine/org/apache/derby/impl/sql/execute/xplain/XPLAINFactory.java
4)I noticed in couple places in XPLAINSystemTableVisitor.java where we do printStackTrace on the exception. The error handling is probably already on your to-do list but I thought I would point it out.

Mamta A. Satoor
added a comment - 03/Mar/09 21:26 Bryan, great job in bringing the patch back upto date with the trunk and figuring out this large patch from the original submitter. I just have couple comments
1)In RealAnyResultSetStatistics.java, RealDistinctScalarAggregateStatistics.java and RealWindowResultSetStatistics.java, do we need to check if childResultSetStatistics is not empty before we set number of children to 1. If yes, then we should also check childResultSetStatistics is not empty
before calling childResultSetStatistics.accept(visitor); This is what we are doing in the rest of the patch.
2)In RealCurrentOfStatistics.java, should we be calling visitor.visit(this); after setting the number of children to 0?
3)In XPLAINFactory.java, while trying to determine the appropriate XPLAINVisitor, if we get an exception, then we are printing the stack trace. That at some point should be fixed. Should the fix be that if we run into exception, just use the default visitor?
java/engine/org/apache/derby/impl/sql/execute/xplain/XPLAINFactory.java
4)I noticed in couple places in XPLAINSystemTableVisitor.java where we do printStackTrace on the exception. The error handling is probably already on your to-do list but I thought I would point it out.

Mamta, your observation about the handling of childResultSetStatistics in the various
implementations of accept(XPLAINVisitor) is quite interesting.

This is a new accept() method added to most of the various ResultSetStatistics sub-types
as part of this patch. In the original patch, the implementation of the accept() method
was (mostly) written in a style of being careful about checking the childResultSetStatistics
pointer before using it. For example, here's the implementation from RealSortStatistics:

Note that the code checks to see if childResultSetStatistics is null before using it.

This is the pattern in most of the new accept() methods, with the 3 exceptions that you noted.

However, the more I look at this code, the more I think that those 3 exceptions are actually
the desirable pattern, and most of the other accept() implementations are being too cautious.

For example, I think that it DOES make sense for RealDeleteResultSetStatistics.accept()
to have code which checks to see whether sourceResultSetStatistics is null or not, because
sourceResultSetStatisics is an optional part of RealDeleteResultSetStatistics. Sometimes it
is present, sometimes not. I'm not sure WHY the sourceResultSetStatistics is optional here,
but it clearly is, and the code elsewhere in RealDeleteResultSetStatistics is careful to check
it before dereferencing it.

On the other hand for RealUnionResultSetStatistics, for example, I think the code is being
too cautious. The current code added by this patch for this class looks like this:

But I think this is unnecessary. It is clear by reading elsewhere in
RealUnionResultSetStatistics that the leftResultSetStatistics and
rightResultSetStatistics are not optional, and must always be present,
so I think the new code would be simpler and clearer if it was instead:

So rather than changing the 3 Statistics classes that you mentioned to add "!= null"
checks for the childResultSetStatistics field, I'm tempted to instead go through
the other Statistics classes and to ensure that the new accept() method only
uses conditional logic for the child statistics objects if the children truly are optional.

Bryan Pendleton
added a comment - 04/Mar/09 22:31 Mamta, your observation about the handling of childResultSetStatistics in the various
implementations of accept(XPLAINVisitor) is quite interesting.
This is a new accept() method added to most of the various ResultSetStatistics sub-types
as part of this patch. In the original patch, the implementation of the accept() method
was (mostly) written in a style of being careful about checking the childResultSetStatistics
pointer before using it. For example, here's the implementation from RealSortStatistics:
+ public void accept(XPLAINVisitor visitor) {
+ int noChildren = 0;
+ if(this.childResultSetStatistics!=null) noChildren++;
+
+ //inform the visitor
+ visitor.setNumberOfChildren(noChildren);
+
+ // pre-order, depth-first traversal
+ // me first
+ visitor.visit(this);
+ // then my child
+ if(childResultSetStatistics!=null)
{
+ childResultSetStatistics.accept(visitor);
+ }
+ }
Note that the code checks to see if childResultSetStatistics is null before using it.
This is the pattern in most of the new accept() methods, with the 3 exceptions that you noted.
However, the more I look at this code, the more I think that those 3 exceptions are actually
the desirable pattern, and most of the other accept() implementations are being too cautious.
For example, I think that it DOES make sense for RealDeleteResultSetStatistics.accept()
to have code which checks to see whether sourceResultSetStatistics is null or not, because
sourceResultSetStatisics is an optional part of RealDeleteResultSetStatistics. Sometimes it
is present, sometimes not. I'm not sure WHY the sourceResultSetStatistics is optional here,
but it clearly is, and the code elsewhere in RealDeleteResultSetStatistics is careful to check
it before dereferencing it.
On the other hand for RealUnionResultSetStatistics, for example, I think the code is being
too cautious. The current code added by this patch for this class looks like this:
+ public void accept(XPLAINVisitor visitor) {
+ int noChildren = 0;
+ if(this.leftResultSetStatistics!=null) noChildren++;
+ if(this.rightResultSetStatistics!=null) noChildren++;
+
+ //inform the visitor
+ visitor.setNumberOfChildren(noChildren);
+
+ // pre-order, depth-first traversal
+ // me first
+ visitor.visit(this);
+ // then visit first my left child
+ if(leftResultSetStatistics!=null)
{
+ leftResultSetStatistics.accept(visitor);
+ }
+ // and then my right child
+ if(rightResultSetStatistics!=null)
{
+ rightResultSetStatistics.accept(visitor);
+ }
+ }
But I think this is unnecessary. It is clear by reading elsewhere in
RealUnionResultSetStatistics that the leftResultSetStatistics and
rightResultSetStatistics are not optional, and must always be present,
so I think the new code would be simpler and clearer if it was instead:
+ public void accept(XPLAINVisitor visitor)
{
+ //inform the visitor
+ visitor.setNumberOfChildren(2);
+
+ // pre-order, depth-first traversal
+ // me first
+ visitor.visit(this);
+ // then visit first my left child
+ leftResultSetStatistics.accept(visitor);
+ // and then my right child
+ rightResultSetStatistics.accept(visitor);
+ }
So rather than changing the 3 Statistics classes that you mentioned to add "!= null"
checks for the childResultSetStatistics field, I'm tempted to instead go through
the other Statistics classes and to ensure that the new accept() method only
uses conditional logic for the child statistics objects if the children truly are optional.
Does that makes sense?

Mamta A. Satoor
added a comment - 05/Mar/09 04:59 Bryan, I agree that we should not do unnecessary checks. So, it will be good to get rid of redundant checks for null rather than adding them to all the accept methods.

Some thoughts based on reading the discussion and the proposed
documentation:

I was wondering if there was a better place to store these data than
in the system tables. The problems I see with using the system tables
are:

1) The size of the seg0 directory in an empty database increases by
19% with this patch (712KB -> 848KB) which leads to increased
footprint and possibly increased database creation time even for those
who don't use the feature.

2) I guess these tables can grow quite big. How can we reclaim that
space? The reset procedure deletes the rows, but I don't think we can
compress system tables. And we cannot drop them.

3) System tables are shared by all users, so the approach doesn't work
very well in a multi-user environment. If two users collect data, they
may be confused by seeing the other user's data. And if one of them
calls the reset procedure, all the others who are collecting
statistics will lose their data too.

4) We'll need upgrade code if we change the tables.

I haven't given this much thought, but would it be possible to create
these tables on demand, possibly as temporary tables in the SESSION
schema? Then different users wouldn't interfere with each other, and
we wouldn't need to worry about disk footprint or upgrade of system
tables. Or perhaps one could pick table names (or at least their
prefix) when enabling XPLAIN mode?

Knut Anders Hatlen
added a comment - 05/Mar/09 11:24 Some thoughts based on reading the discussion and the proposed
documentation:
I was wondering if there was a better place to store these data than
in the system tables. The problems I see with using the system tables
are:
1) The size of the seg0 directory in an empty database increases by
19% with this patch (712KB -> 848KB) which leads to increased
footprint and possibly increased database creation time even for those
who don't use the feature.
2) I guess these tables can grow quite big. How can we reclaim that
space? The reset procedure deletes the rows, but I don't think we can
compress system tables. And we cannot drop them.
3) System tables are shared by all users, so the approach doesn't work
very well in a multi-user environment. If two users collect data, they
may be confused by seeing the other user's data. And if one of them
calls the reset procedure, all the others who are collecting
statistics will lose their data too.
4) We'll need upgrade code if we change the tables.
I haven't given this much thought, but would it be possible to create
these tables on demand, possibly as temporary tables in the SESSION
schema? Then different users wouldn't interfere with each other, and
we wouldn't need to worry about disk footprint or upgrade of system
tables. Or perhaps one could pick table names (or at least their
prefix) when enabling XPLAIN mode?
call syscs_util.syscs_set_runtimestatistics(1);
call syscs_util.syscs_set_xplain_style(1, 'APP', 'PREFIX_');
.
.
.
select count from prefix_xplain_statement_timings;

Knut, thanks very much for the thoughts and suggestions. I have had many of the
same concerns, so it's very nice to have a chance to talk about them. Here's a couple
of my reactions:

storing the data in temporary session-scoped tables is intriguing, but it seems like
that would make it harder for people who want to intentionally include data from
multiple sessions and/or retain the data beyond the end of their session; they'd have
to copy the data out to their own tables to save it.

I agree that users could be confused by seeing multiple sessions data in the same
tables, although there are fields that help them distinguish this data: session id,
transaction id, drda id, etc.

your points about short-comings with respect to:

compressing/recovering space in the xplain tables

reset deleting all the data, even including data they might want to keep

having to upgrade the tables
all seem valid, but all seem like they could be addressed by additional follow-on features.
For example we could in a future release provide a system procedure which compressed
the xplain tables, and could provide a modification to the reset procedure which allowed
for only resetting a subset of the data (by session id, etc.).

Your point about not wanting to incur costs for this feature when the user isn't
using the feature is quite a strong point, and it definitely concerns me too. Adding
6 new system tables isn't something we should do casually. In addition to the
ideas you had (use temporary tables, use user tables), it seems like we could
have a few more possibilities:

create the tables as system tables, but do it on demand, when the feature is first used.

create the tables as system tables, but not automatically nor on demand; instead,
have some additional system procedures which the user could call to:
a) create the system tables for the xplain feature, prior to using it
b) drop the system tables when they were not using the feature and wanted to avoid overhead.

The idea about being able to use ordinary user tables for storing this data is also
very interesting, but it seems to raise some further questions:

the user would presumably need to pre-create these tables, and give them the
right structure. Or would we auto-create them for the user, following some pattern like you suggest?

it seems like this would introduce some new potential error conditions and
opportunities for the user to confuse themselves:

tables existed, but the schema wasn't a perfect match (would we allow this anyway if
the schema was a superset of what we needed?)

tables unexpectedly disappeared or misbehaved in mid-flight.

I'm also not really sure what we'd have to do to the current implementation to use user tables;
that is, how does it affect the code? I guess we'd have to store the information about which
user tables to use, presumably in the LCC where the patch currently stores the xplain style and mode,
and then we'd have to change the visitor code to have the appropriate code for inserting the
data into the user tables, which hopefully is quite similar to the code which inserts the data
into the sytsem tables, except that the system tables are quite easy to find from a code
point of view, because we just can use the internal system catalog numbers, whereas user
tables would have to go through bind processing to figure out which actual tables to use.

Thanks very much for raising these concerns; I'm very interested to hear
more about what you and others think about this issue.

Bryan Pendleton
added a comment - 05/Mar/09 22:27 Knut, thanks very much for the thoughts and suggestions. I have had many of the
same concerns, so it's very nice to have a chance to talk about them. Here's a couple
of my reactions:
storing the data in temporary session-scoped tables is intriguing, but it seems like
that would make it harder for people who want to intentionally include data from
multiple sessions and/or retain the data beyond the end of their session; they'd have
to copy the data out to their own tables to save it.
I agree that users could be confused by seeing multiple sessions data in the same
tables, although there are fields that help them distinguish this data: session id,
transaction id, drda id, etc.
your points about short-comings with respect to:
compressing/recovering space in the xplain tables
reset deleting all the data, even including data they might want to keep
having to upgrade the tables
all seem valid, but all seem like they could be addressed by additional follow-on features.
For example we could in a future release provide a system procedure which compressed
the xplain tables, and could provide a modification to the reset procedure which allowed
for only resetting a subset of the data (by session id, etc.).
Your point about not wanting to incur costs for this feature when the user isn't
using the feature is quite a strong point, and it definitely concerns me too. Adding
6 new system tables isn't something we should do casually. In addition to the
ideas you had (use temporary tables, use user tables), it seems like we could
have a few more possibilities:
create the tables as system tables, but do it on demand, when the feature is first used.
create the tables as system tables, but not automatically nor on demand; instead,
have some additional system procedures which the user could call to:
a) create the system tables for the xplain feature, prior to using it
b) drop the system tables when they were not using the feature and wanted to avoid overhead.
The idea about being able to use ordinary user tables for storing this data is also
very interesting, but it seems to raise some further questions:
the user would presumably need to pre-create these tables, and give them the
right structure. Or would we auto-create them for the user, following some pattern like you suggest?
it seems like this would introduce some new potential error conditions and
opportunities for the user to confuse themselves:
tables already existed, or couldn't be created (security problems, resource problems)
tables existed, but the schema wasn't a perfect match (would we allow this anyway if
the schema was a superset of what we needed?)
tables unexpectedly disappeared or misbehaved in mid-flight.
I'm also not really sure what we'd have to do to the current implementation to use user tables;
that is, how does it affect the code? I guess we'd have to store the information about which
user tables to use, presumably in the LCC where the patch currently stores the xplain style and mode,
and then we'd have to change the visitor code to have the appropriate code for inserting the
data into the user tables, which hopefully is quite similar to the code which inserts the data
into the sytsem tables, except that the system tables are quite easy to find from a code
point of view, because we just can use the internal system catalog numbers, whereas user
tables would have to go through bind processing to figure out which actual tables to use.
Thanks very much for raising these concerns; I'm very interested to hear
more about what you and others think about this issue.

Thanks for working on this issue, Bryan. Like Knut and yourself, I am a little concerned that we are creating many system catalogs for one feature. I like the idea that the plan-capturing tables should be created on-demand in a user schema. You could imagine adding a new overload of syscs_util.syscs_set_runtimestatistics:

call syscs_util.syscs_set_runtimestatistics( 'myStatisticsSchema' )

This would

1) Create myStatisticsSchema (owned by the current user) if it didn't already exist.

2) Create the plan-bearing tables in myStatisticsSchema if they didn't already exist or if their shape is stale. I think that the contract should be that the layout of these tables can change between feature releases. That is, plans are not guaranteed to survive across upgrade. Derby reserves the right to drop and recreate these tables after upgrade. Regardless of where we put the statistics, I think that plans from previous releases should be flushed after upgrade--from painful previous experience at Sybase I can report that you don't want to be responsible for persisting plans across upgrade boundaries.

3) Stuff a reference to myStatisticsSchema into the LCC as you suggest. The contract is that plans are dumped into the latest schema specified by a syscs_set_runtimestatistics command issued in your session.

Garbage-collecting uninteresting statistics would then just be a matter of dropping a schema.

Rick Hillegas
added a comment - 06/Mar/09 15:16 Thanks for working on this issue, Bryan. Like Knut and yourself, I am a little concerned that we are creating many system catalogs for one feature. I like the idea that the plan-capturing tables should be created on-demand in a user schema. You could imagine adding a new overload of syscs_util.syscs_set_runtimestatistics:
call syscs_util.syscs_set_runtimestatistics( 'myStatisticsSchema' )
This would
1) Create myStatisticsSchema (owned by the current user) if it didn't already exist.
2) Create the plan-bearing tables in myStatisticsSchema if they didn't already exist or if their shape is stale. I think that the contract should be that the layout of these tables can change between feature releases. That is, plans are not guaranteed to survive across upgrade. Derby reserves the right to drop and recreate these tables after upgrade. Regardless of where we put the statistics, I think that plans from previous releases should be flushed after upgrade--from painful previous experience at Sybase I can report that you don't want to be responsible for persisting plans across upgrade boundaries.
3) Stuff a reference to myStatisticsSchema into the LCC as you suggest. The contract is that plans are dumped into the latest schema specified by a syscs_set_runtimestatistics command issued in your session.
Garbage-collecting uninteresting statistics would then just be a matter of dropping a schema.

Bryan Pendleton
added a comment - 07/Mar/09 15:04 Thanks everyone for the feedback. These are all good ideas. I intend to pursue
the notion of writing the data to ordinary user tables, using some blend of
the APIs that Knut and Rick suggested.
I'll be back later with the results of that investigation.

I'm trying to figure out how to prototype the suggested alternate approaches without
writing a ton of extra code. The system table implementation takes advantage of
various special-purpose APIs in the DataDictionary, such as:

startWriting()

getSystemSchemaDescriptor()

addDescriptorArray()

getCatalogRowFactory()
and so forth.

For the time being, at least, I'm a bit intimidated by the scope of trying to manage
update access to arbitrary user tables in arbitrary user schemas. How would I know
whether the tables are hidden behind synonyms or views? How would I figure out
if the tables had indexes or constraints on them? How could I make precise the
notion of checking if the table schema doesn't quite match – would we require a
perfect match of table names, column names, data types, etc? Or could the table
definitions be 'close' and then things like data type conversion would be expected to
make things work?

One idea would be to essentially construct the relevant INSERT statements at the
point when SYSCS_UTIL.SYSCS_SET_XPLAIN_STYLE() is called. Then, I'd
somehow run those INSERT statements through the SQL compiler, and save
the execution data structures that the compiler produces, and then later, when
the visitor code wants to capture the query plans to the tables, it would somehow
conceptually 'execute' those saved INSERT statements.

Bryan Pendleton
added a comment - 10/Mar/09 15:00 I'm trying to figure out how to prototype the suggested alternate approaches without
writing a ton of extra code. The system table implementation takes advantage of
various special-purpose APIs in the DataDictionary, such as:
startWriting()
getSystemSchemaDescriptor()
addDescriptorArray()
getCatalogRowFactory()
and so forth.
For the time being, at least, I'm a bit intimidated by the scope of trying to manage
update access to arbitrary user tables in arbitrary user schemas. How would I know
whether the tables are hidden behind synonyms or views? How would I figure out
if the tables had indexes or constraints on them? How could I make precise the
notion of checking if the table schema doesn't quite match – would we require a
perfect match of table names, column names, data types, etc? Or could the table
definitions be 'close' and then things like data type conversion would be expected to
make things work?
One idea would be to essentially construct the relevant INSERT statements at the
point when SYSCS_UTIL.SYSCS_SET_XPLAIN_STYLE() is called. Then, I'd
somehow run those INSERT statements through the SQL compiler, and save
the execution data structures that the compiler produces, and then later, when
the visitor code wants to capture the query plans to the tables, it would somehow
conceptually 'execute' those saved INSERT statements.

I would be tempted to not worry much about synonyms and views. It's true, those edge-cases could arise but I think the probability is very low, the negative consequences are slight, and the workaround is easy (turn off plan-gathering until you understand what's broken). I don't think you have to worry about schema mismatch for this first increment--the person who wants to change the shape of these tables can work through those issues later on.

Rick Hillegas
added a comment - 10/Mar/09 15:37 Hi Bryan,
I would be tempted to not worry much about synonyms and views. It's true, those edge-cases could arise but I think the probability is very low, the negative consequences are slight, and the workaround is easy (turn off plan-gathering until you understand what's broken). I don't think you have to worry about schema mismatch for this first increment--the person who wants to change the shape of these tables can work through those issues later on.

Attached 'userSchemaPrototyping.diff' is a substantially modified patch. I've been
working on prototyping a new approach, placing the XPLAIN tables into a user schema.

This patch is not ready; it barely compiles, and definitely doesn't work. But I've
done a lot of investigation into the new approach, and wanted to post the patch
to keep the work on the community's radar.

The new code:

renames the system procedures to SET_XPLAIN_SCHEMA and GET_XPLAIN_SCHEMA.

SET_XPLAIN_SCHEMA now takes a schema name, and creates that schema if
it doesn't yet exist.

then the system procedure creates the six XPLAIN tables in that schema.

I still need to do the work so that the captured XPLAIN data is stored into those tables.

Plus there's an enormous amount of error-handling, testing, and clean up that will need to be done.

But I'm pleased with the progress so far, and felt it was worthy of posting.

Bryan Pendleton
added a comment - 30/Mar/09 05:01 Attached 'userSchemaPrototyping.diff' is a substantially modified patch. I've been
working on prototyping a new approach, placing the XPLAIN tables into a user schema.
This patch is not ready; it barely compiles, and definitely doesn't work. But I've
done a lot of investigation into the new approach, and wanted to post the patch
to keep the work on the community's radar.
The new code:
renames the system procedures to SET_XPLAIN_SCHEMA and GET_XPLAIN_SCHEMA.
SET_XPLAIN_SCHEMA now takes a schema name, and creates that schema if
it doesn't yet exist.
then the system procedure creates the six XPLAIN tables in that schema.
I still need to do the work so that the captured XPLAIN data is stored into those tables.
Plus there's an enormous amount of error-handling, testing, and clean up that will need to be done.
But I'm pleased with the progress so far, and felt it was worthy of posting.

In addition to upper-casing the sql text, I think that you should also trim() leading whitespace before you look for DML tokens.

Catalogs:

Maybe I'm mis-reading the patch or maybe I grabbed the wrong patch, but it seems to me that this patch still creates system tables to hold the plan statistics. I do like the intended approach of creating the tables in a user-specified schema.

LanguageConnectionContext.setXplainMode() and getXPlainMode():

I think it would be good to introduce some named constants for these modes and then use them instead of hard-coded numbers.

General approach:

There seem to be three jobs which the statistics collector has to perform:

1) Pick an order in which to walk the graph.

2) Find node-specific details.

3) Decide where and how to persist the details.

This work is divided between two authorities:

A) The swarm of node-specific statistics holders in package org.apache.derby.impl.sql.execute.rts

B) The visitor, XPLAINSystemTableVisitor

(For the record, I'm puzzled by the swarm of statistics holders in (A). That suggests to me an old mis-factoring of the problem--but that's an ancient design decision which falls outside the scope of this improvement.)

(A) is responsible for (1) and (B) is responsible for (3). That division makes sense to me. This structure will make it possible to write visitors which do something fancier, like fork the statistics stream to a monitoring process or feed the forked stream back into the optimizer so that the optimizer can learn from its mistakes.

However, I'm puzzled that (B) is also responsible for (2). As a consequence of this decision, we see a lot of overloads of visit() in (B). Anyone who wants to implement an alternative visitor will have a lot of work to do. I wonder if this means that (2) should be done by (A) and that we should add a recordDetails() method to the XPLAINable interface. Something like this:

public interface XPLAINable

{
// determines the order in which to walk the graph
public void accept(...);
// find the details in this node which need to be recorded
public void recordDetails(...);
}

Rick Hillegas
added a comment - 30/Mar/09 19:41 Thanks for the patch, Bryan. I have a couple comments:
XPLAINUtil.getStatementType():
In addition to upper-casing the sql text, I think that you should also trim() leading whitespace before you look for DML tokens.
Catalogs:
Maybe I'm mis-reading the patch or maybe I grabbed the wrong patch, but it seems to me that this patch still creates system tables to hold the plan statistics. I do like the intended approach of creating the tables in a user-specified schema.
LanguageConnectionContext.setXplainMode() and getXPlainMode():
I think it would be good to introduce some named constants for these modes and then use them instead of hard-coded numbers.
General approach:
There seem to be three jobs which the statistics collector has to perform:
1) Pick an order in which to walk the graph.
2) Find node-specific details.
3) Decide where and how to persist the details.
This work is divided between two authorities:
A) The swarm of node-specific statistics holders in package org.apache.derby.impl.sql.execute.rts
B) The visitor, XPLAINSystemTableVisitor
(For the record, I'm puzzled by the swarm of statistics holders in (A). That suggests to me an old mis-factoring of the problem--but that's an ancient design decision which falls outside the scope of this improvement.)
(A) is responsible for (1) and (B) is responsible for (3). That division makes sense to me. This structure will make it possible to write visitors which do something fancier, like fork the statistics stream to a monitoring process or feed the forked stream back into the optimizer so that the optimizer can learn from its mistakes.
However, I'm puzzled that (B) is also responsible for (2). As a consequence of this decision, we see a lot of overloads of visit() in (B). Anyone who wants to implement an alternative visitor will have a lot of work to do. I wonder if this means that (2) should be done by (A) and that we should add a recordDetails() method to the XPLAINable interface. Something like this:
public interface XPLAINable
{
// determines the order in which to walk the graph
public void accept(...);
// find the details in this node which need to be recorded
public void recordDetails(...);
}
Thanks,
-Rick

I'm definitely in the midst of converting the code from the prior system-table approach
to the new user-table approach, so I'm not surprised you found that part of the patch
baffling. I'll continue to work on this, and will also give some thought to the other suggestions
you made, and hopefully I'll be back before too long with a more complete and working proposal.

Bryan Pendleton
added a comment - 31/Mar/09 00:08 Thanks Rick for the review and comments – this is quite helpful!
I'm definitely in the midst of converting the code from the prior system-table approach
to the new user-table approach, so I'm not surprised you found that part of the patch
baffling. I'll continue to work on this, and will also give some thought to the other suggestions
you made, and hopefully I'll be back before too long with a more complete and working proposal.

Bryan Pendleton
added a comment - 07/Apr/09 03:37 I made some good progress on 'userSchemaPrototyping.diff' and wanted to update it again.
It now passes many of the tests, which is very cool!
I am having a bit of problems with jdbc:default:connection, I'll post
a message to derby-dev trying to explain the problem and ask for help.

Rick, I'm getting ready to have a go at the re-factoring of the visitor pattern as you suggested.
I think this is the last major re-working of the patch that should be needed, after which we
should be ready for more detailed review. I hope to have that version of the patch ready
before too long.

Bryan Pendleton
added a comment - 08/Apr/09 16:41 Rick, I'm getting ready to have a go at the re-factoring of the visitor pattern as you suggested.
I think this is the last major re-working of the patch that should be needed, after which we
should be ready for more detailed review. I hope to have that version of the patch ready
before too long.

Thanks, Bryan. Let me know when you want me to look at the next patch. I'll try to be responsive although I'm a little mired in investigating some of the defects which have come up during 10.5 testing. Thanks.

Rick Hillegas
added a comment - 08/Apr/09 16:57 Thanks, Bryan. Let me know when you want me to look at the next patch. I'll try to be responsive although I'm a little mired in investigating some of the defects which have come up during 10.5 testing. Thanks.

Attached patch 'refactorVisitor.diff' represents the results of re-factoring the
interactions between the XPLAINSystemTableVisitor and the various
ResultSetStatistics classes, hopefully along the lines of what Rick was
suggesting in his feedback.

I think the re-factoring was successful; at least, the patch is 40% smaller
than it was a month ago, and I think that the code is cleaner because
the knowledge of the details of the various RS Statistics classes is moved
back into those classes where it feels better.

I believe that this patch reflects the completion of the major revisions suggested
by the reviews in March:

capture the data into user tables in an indicated schema, rather than system catalogs

encapsulate the result set statistics knowledge with the result sets

repackage the table definitions so that they use a simpler representation

There is still substantial work to be done in the area of testing, and the
documentation work will also need to be somewhat revised. Also, the upgrade
processing is not ready yet; I can't really finalize the upgrade work until there is
an explicit 10.5 release whose jars I can check into the repository.

But please if you have a chance have a look at 'refactorVisitor.diff' and let me know
your comments and suggestions.

Bryan Pendleton
added a comment - 11/Apr/09 18:06 Attached patch 'refactorVisitor.diff' represents the results of re-factoring the
interactions between the XPLAINSystemTableVisitor and the various
ResultSetStatistics classes, hopefully along the lines of what Rick was
suggesting in his feedback.
I think the re-factoring was successful; at least, the patch is 40% smaller
than it was a month ago, and I think that the code is cleaner because
the knowledge of the details of the various RS Statistics classes is moved
back into those classes where it feels better.
I believe that this patch reflects the completion of the major revisions suggested
by the reviews in March:
capture the data into user tables in an indicated schema, rather than system catalogs
encapsulate the result set statistics knowledge with the result sets
repackage the table definitions so that they use a simpler representation
There is still substantial work to be done in the area of testing, and the
documentation work will also need to be somewhat revised. Also, the upgrade
processing is not ready yet; I can't really finalize the upgrade work until there is
an explicit 10.5 release whose jars I can check into the repository.
But please if you have a chance have a look at 'refactorVisitor.diff' and let me know
your comments and suggestions.

Bryan Pendleton
added a comment - 13/Apr/09 02:56 Attached slightly updated copy of refactorVisitor.diff:
the patch now makes a first-order check for the correct 'shape' of the XPLAIN
tables when the user calls SET_XPLAIN_SCHEMA.
there's now a test case which verifies the behavior when a table with a
non-matching shape is present when SET_XPLAIN_SCHEMA is called.

I'm concerned about the patch's proposed modification to iapi.sql.execute.RunTimeStatistics.
The patch proposes to add a reference to impl.sql.execute.rts.ResultSetStatistics.

This seems like an indication of a layering problem; the internal ResultSetStatistics
class should not need to be visible through this high-level interface.

The newly-added method getTopRSS() is only used in one place, in the XPLAINSystemTableVisitor.
I'm going to investigate whether I can change this API so that it doesn't expose the implementation class.

One idea was to replace the current API with a new one, something like:

public void acceptFromTopRSS(XPLAINVisitor v)

which would be implemented in RunTimeStatisticsImpl.java roughly as:

if (topResultSetStatistics != null)
topResultSetStatistics.accept(v);

Then the external iapi.sql.execute.RunTimeStatistics interface would only need to be aware of
the iapi.sql.execute.xplain.XPLAINVisitor interface, which keeps the layering cleaner, I think.

Bryan Pendleton
added a comment - 21/Apr/09 23:58 I'm concerned about the patch's proposed modification to iapi.sql.execute.RunTimeStatistics.
The patch proposes to add a reference to impl.sql.execute.rts.ResultSetStatistics.
This seems like an indication of a layering problem; the internal ResultSetStatistics
class should not need to be visible through this high-level interface.
The newly-added method getTopRSS() is only used in one place, in the XPLAINSystemTableVisitor.
I'm going to investigate whether I can change this API so that it doesn't expose the implementation class.
One idea was to replace the current API with a new one, something like:
public void acceptFromTopRSS(XPLAINVisitor v)
which would be implemented in RunTimeStatisticsImpl.java roughly as:
if (topResultSetStatistics != null)
topResultSetStatistics.accept(v);
Then the external iapi.sql.execute.RunTimeStatistics interface would only need to be aware of
the iapi.sql.execute.xplain.XPLAINVisitor interface, which keeps the layering cleaner, I think.
I'll give this a try and see if it works.

Bryan Pendleton
added a comment - 23/Apr/09 17:03 Attached 'finalReview.diff' is the patch I intend to commit over the weekend.
I've been through it and fixed a number of tab/space issues, a number of
javadoc issues, and tried to improve comments in a few places.
There's still a lot of work remaining, but I feel this is worthy of a commit to the trunk,
where we can gain further experience with the feature and continue to improve it.

The test is only run when you use Java 6, so it's easy to miss. I took the liberty to add the new XPLAIN functions SYSCS_GET_XPLAIN_MODE and SYSCS_GET_XPLAIN_SCHEMA to the list of expected functions in that test (see jdbc4tests.diff) and committed revision 768671.

Knut Anders Hatlen
added a comment - 26/Apr/09 08:15 Thanks for all the work on this issue, Bryan. I'm looking forward to testing the new functionality.
I noticed that there were some failures in jdbc4/TestDbMetaData after the check-in:
http://dbtg.thresher.com/derby/test/tinderbox_trunk16/jvm1.6/testing/testlog/SunOS-5.10_i86pc-i386/768597-org.apache.derbyTesting.functionTests.suites.All_diff.txt
The test is only run when you use Java 6, so it's easy to miss. I took the liberty to add the new XPLAIN functions SYSCS_GET_XPLAIN_MODE and SYSCS_GET_XPLAIN_SCHEMA to the list of expected functions in that test (see jdbc4tests.diff) and committed revision 768671.

The patches have been committed to the trunk, the various sub-tasks are complete, and the documentation is visible on the web site. I think we can mark this issue as resolved and deal with any subsequent work in separate issues.

Bryan Pendleton
added a comment - 15/May/09 02:07 The patches have been committed to the trunk, the various sub-tasks are complete, and the documentation is visible on the web site. I think we can mark this issue as resolved and deal with any subsequent work in separate issues.
Felix, thanks again for the contribution!