Activity

Closing this issue now, since I have put in all the changes I intended. I am sure there are more formatting improvements that could be made, though. If/when we feel the need, we could re-open this issue, or a file new one.

Dag H. Wanvik
added a comment - 31/Aug/09 23:37 Closing this issue now, since I have put in all the changes I intended. I am sure there are more formatting improvements that could be made, though. If/when we feel the need, we could re-open this issue, or a file new one.

Uploading aother patch for this issue, which re-introduces the changes that had to be rolled back.
derby-4087-statistics fixes the formatting errors seen by Knut and Bryan. Since I could not
use java.util.regexp, I hand-coded three patterns in StringUtil.java this time around.
The reason for all this text massaging is that the qualifiers are pre-formatted before the output indentation level is knows, which lead to the weird formatting seen. The new method
StringUtil.ensureIndent fixes that problem.

I had to modify many old harness test masters, but now they all look nicer at least

Dag H. Wanvik
added a comment - 30/Aug/09 00:54 - edited Uploading aother patch for this issue, which re-introduces the changes that had to be rolled back.
derby-4087-statistics fixes the formatting errors seen by Knut and Bryan. Since I could not
use java.util.regexp, I hand-coded three patterns in StringUtil.java this time around.
The reason for all this text massaging is that the qualifiers are pre-formatted before the output indentation level is knows, which lead to the weird formatting seen. The new method
StringUtil.ensureIndent fixes that problem.
I had to modify many old harness test masters, but now they all look nicer at least
Regressions passed ok.

Knut Anders Hatlen
added a comment - 28/Aug/09 11:59 This broke the Tinderbox build, as java.util.regex cannot be used in engine code (the package is not part of Foundation Profile 1.1).
http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/UpdateInfo/808657-buildDetails.txt

Dag H. Wanvik
added a comment - 27/Aug/09 21:16 Temporarily rolled back changes to the result set statistics printing due to broken masters in old harness lang suite:
java/engine/org/apache/derby/impl/sql/execute/rts/RealDistinctScanStatistics.java
java/engine/org/apache/derby/impl/sql/execute/rts/RealHashScanStatistics.java
java/engine/org/apache/derby/impl/sql/execute/rts/RealTableScanStatistics.java
svn revision 808601. Will make a new patch with modified masters to match the printing improvements.

I am not sure this is has much value. The brief mode is enabled by
-Dderby.debug.true=DumpBrief. I'll take it out if people think this is
unnecessary cruft. I find it useful to get an overview sometimes. It
does show shared (aliased) nodes as in the normal mode.

Dag H. Wanvik
added a comment - 21/Aug/09 00:42 Thanks for looking at this guys.
Uploading a version "b" which:
fixes the wrong formatted Knut and Brian saw in the run-time
statistics. This touches:
RealDistinctScanStatistics.java,
RealHashScanStatistics.java,
RealTableScanStatistics.java
StringUtil
adds the comments Knut requested. And no, no special reason to use AbstractMap, replaced it with Map.
adds some more formatting to OrderByList, OrderByColumn and
IndexColumnOrder.
I forgot to mention that this patch also adds a "brief structure mode
dump" mode, which only prints the tree structure, e.g. thus:
org.apache.derby.impl.sql.compile.VirtualColumnNode@4fdf11
sourceColumn:
org.apache.derby.impl.sql.compile.ResultColumn@1be0369
expression:
org.apache.derby.impl.sql.compile.VirtualColumnNode@1d1fc02
sourceColumn:
org.apache.derby.impl.sql.compile.ResultColumn@f800db
expression:
org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@5364
operand:
org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@162f030
operand:
org.apache.derby.impl.sql.compile.ColumnReference@870e2f
source:
org.apache.derby.impl.sql.compile.ResultColumn@b44eb4
expression:
org.apache.derby.impl.sql.compile.VirtualColumnNode@19b2faf
sourceColumn:
org.apache.derby.impl.sql.compile.ResultColumn@157bf4c
expression:
:
I am not sure this is has much value. The brief mode is enabled by
-Dderby.debug.true=DumpBrief. I'll take it out if people think this is
unnecessary cruft. I find it useful to get an overview sometimes. It
does show shared (aliased) nodes as in the normal mode.

The strange newline handling in the formatting of the stop and start position values is definitely
present in other places, such as the RUNTIMESTATISTICS displays. I believe it's part of some
low-level shared code which constructs string versions of the underlying qualifier objects.

I think it would be nice to fix, but it's definitely present prior to this patch.

Bryan Pendleton
added a comment - 18/Aug/09 15:06 The strange newline handling in the formatting of the stop and start position values is definitely
present in other places, such as the RUNTIMESTATISTICS displays. I believe it's part of some
low-level shared code which constructs string versions of the underlying qualifier objects.
I think it would be nice to fix, but it's definitely present prior to this patch.

The sample log file looks a lot more readable than what I'm used to, so a +1 from me based on visual inspection of the output!

I looked at the patch briefly, and nothing caught my attention. But perhaps a short comment to explain printedObjectsMap and getPrintedObjectsMap() would be in order. (And was there a reason for using AbstractMap in the signature instead of Map?)

Looking at the sample log, I see that the output for start/stop position and qualifiers is a bit odd:

start position:
null

stop position:
null

qualifiers:
None

I don't think this was introduced by your patch, though, as I think I have seen similar oddities before.

Knut Anders Hatlen
added a comment - 18/Aug/09 13:58 The sample log file looks a lot more readable than what I'm used to, so a +1 from me based on visual inspection of the output!
I looked at the patch briefly, and nothing caught my attention. But perhaps a short comment to explain printedObjectsMap and getPrintedObjectsMap() would be in order. (And was there a reason for using AbstractMap in the signature instead of Map?)
Looking at the sample log, I see that the output for start/stop position and qualifiers is a bit odd:
start position:
null
stop position:
null
qualifiers:
None
I don't think this was introduced by your patch, though, as I think I have seen similar oddities before.

Attaching a bash script and a sample output which I find useful sometimes for looking at the query trees. It finds the dumped query tree(s) from a given query on derby.log and prints it on std out., and also adds helper vertical lines to help match up tree levels. If an existing database is not given, it will create a sample one on the fly with one table.

Dag H. Wanvik
added a comment - 11/Aug/09 17:31 Attaching a bash script and a sample output which I find useful sometimes for looking at the query trees. It finds the dumped query tree(s) from a given query on derby.log and prints it on std out., and also adds helper vertical lines to help match up tree levels. If an existing database is not given, it will create a sample one on the fly with one table.
Example:
dumpquery -pboie 'select * from t order by j' > sample.log

Uploading a patch that addresses these issues, derby-4087-a. I have tried to ensure correctness by
making lots of manual queries and inspecting the dumped trees for readability.
To me the trees look more consistent and readable with the patch, but there are many changes so its quite possible
I made some errors in the process. If people think it's OK, further tweaks can be done later now that we have a
more consistent use of the patterns.
Since this all concerns code used used for debugging, there are presently no tests.
The regressions ran OK at least (hardly surprising)

Dag H. Wanvik
added a comment - 08/Aug/09 21:27 Uploading a patch that addresses these issues, derby-4087-a. I have tried to ensure correctness by
making lots of manual queries and inspecting the dumped trees for readability.
To me the trees look more consistent and readable with the patch, but there are many changes so its quite possible
I made some errors in the process. If people think it's OK, further tweaks can be done later now that we have a
more consistent use of the patterns.
Since this all concerns code used used for debugging, there are presently no tests.
The regressions ran OK at least (hardly surprising)

I have found some further issues regarding toString, treePrint and
printSubNodes. The comments only apply to subclasses of
QueryTreeNode.

Some class variables' value are printed with a leading "<label>: "
in front of the value, others do not have a label, the value is
printed alone. Usually, the label is the same as the class
variable's name. Shouldn't this always be the case for uniformity?

Some class variable labels are printed even if the values is not,
in cases where the variables value is null, e.g. "preJoinFL: " in
SelectNode. I suggest not printing in such cases to save space.

Some toString methods call super.toString before formatting the
class's own member variables, others do it the other way around. Is
there a rationale for this I wonder? Or is it just accidental? It
seems the established pattern is that super be called after, except
for DDL operations, which always call super first, so I suggest
adjusting toString to those two patterns throughout.

Some usages of toString terminate with a newline, most do not. I
think they should either end with a newline (DDL) or newline + call
to super.toString (DML). At the root (qua anchor),
QueryTreeNode.toString just prints "", so effectively for DML, the
last thing printed will be the newline also.

I think all nodes should call super.toString. This could be because
the author "knew" they are right under QueryTreeNode, and so has
chosen to optimize the call away (QueryTreeNode.toString prints the
empty string), or it could be an omission. I think it would be good
to be consistent here, too.

Example of nodes that do not call super.toString at all:
GenerationClauseNode
OrderByColumn
TableName

TableName is possibly an allowable exception, I think, because its
toString will return a string even in insane builds, so I not sure
it can tolerate a terminating newline, so I intend to not touch
this.

Some nodes call toString of underlying nodes in their own toString,
but such subnodes should, according to the contract defined in
QueryTreeNode, be handled by PrintSubNodes/TreePrint:

Vectors are handled specially. Vectors with tree elements are
subclasses of QueryTreeNodeVector, which contains a generic toString
method which calls toString of the elements of the vector and
concatenates them. I suggest changing this to be printSubNodes and
use series of calls to treePrint instead to comply with the general
pattern. Subclasses which have no additional members need not
override toString either.

Some subclasses of QueryTreeNodeVector needlessly define their own
printSubNodes method. I propose to remove those:

ValueNodeList
SubqueryList
PredicateList
FromList

this now inherit QueryTreeNodeVector's printSubNodes.

One subclasses of QueryTreeNodeVector (TableElementList) rolls its
own toString to print the elements. I suggest to remove this an
dinstead rely on the general mechanism outlined above.

Some classes neglect to print own member variables:
ResultColumnList

Some subclasses of QueryTreeNodeVector define their own treePrint
method, e.g.

ResultColumnList

This seems to be a remnant from a time when ResultColumnList was not
a subclass of QueryTreeNode, c.f this comment in

Dag H. Wanvik
added a comment - 07/Aug/09 22:56 - edited I have found some further issues regarding toString, treePrint and
printSubNodes. The comments only apply to subclasses of
QueryTreeNode.
Some class variables' value are printed with a leading "<label>: "
in front of the value, others do not have a label, the value is
printed alone. Usually, the label is the same as the class
variable's name. Shouldn't this always be the case for uniformity?
Some class variable labels are printed even if the values is not,
in cases where the variables value is null, e.g. "preJoinFL: " in
SelectNode. I suggest not printing in such cases to save space.
Some toString methods call super.toString before formatting the
class's own member variables, others do it the other way around. Is
there a rationale for this I wonder? Or is it just accidental? It
seems the established pattern is that super be called after , except
for DDL operations, which always call super first , so I suggest
adjusting toString to those two patterns throughout.
Non-DDL classes that call super.toString first:
CoalesceFunctionNode
CurrentDatetimeOperatorNode
SpecialFunctionNode
ExtractOperatorNode
Some usages of toString terminate with a newline, most do not. I
think they should either end with a newline (DDL) or newline + call
to super.toString (DML). At the root (qua anchor),
QueryTreeNode.toString just prints "", so effectively for DML, the
last thing printed will be the newline also.
I think all nodes should call super.toString. This could be because
the author "knew" they are right under QueryTreeNode, and so has
chosen to optimize the call away (QueryTreeNode.toString prints the
empty string), or it could be an omission. I think it would be good
to be consistent here, too.
Example of nodes that do not call super.toString at all:
GenerationClauseNode
OrderByColumn
TableName
TableName is possibly an allowable exception, I think, because its
toString will return a string even in insane builds, so I not sure
it can tolerate a terminating newline, so I intend to not touch
this.
Some nodes call toString of underlying nodes in their own toString,
but such subnodes should, according to the contract defined in
QueryTreeNode, be handled by PrintSubNodes/TreePrint:
DistinctNode (formats childResult)
GroupByNode (formats childResult)
OrderByNode (formats childResult)
OrderByColumn (formats expression)
I suggest bringing such instances into line.
Vectors are handled specially. Vectors with tree elements are
subclasses of QueryTreeNodeVector, which contains a generic toString
method which calls toString of the elements of the vector and
concatenates them. I suggest changing this to be printSubNodes and
use series of calls to treePrint instead to comply with the general
pattern. Subclasses which have no additional members need not
override toString either.
Some subclasses of QueryTreeNodeVector needlessly define their own
printSubNodes method. I propose to remove those:
ValueNodeList
SubqueryList
PredicateList
FromList
this now inherit QueryTreeNodeVector's printSubNodes.
One subclasses of QueryTreeNodeVector (TableElementList) rolls its
own toString to print the elements. I suggest to remove this an
dinstead rely on the general mechanism outlined above.
Some classes neglect to print own member variables:
ResultColumnList
Some subclasses of QueryTreeNodeVector define their own treePrint
method, e.g.
ResultColumnList
This seems to be a remnant from a time when ResultColumnList was not
a subclass of QueryTreeNode, c.f this comment in
/**
This class needs a treePrint method, even though it is not a
descendant of QueryTreeNode, because its members contain tree..
so I intend to remove this.

I think truncating already seen subtrees is a good idea as it (a) reduces the size of the printed tree and (b) makes it easier to see that two trees are actually the same one. Doing it always is fine with me. As you mentioned, the current output is often inconsistent, and I find it unlikely that anyone rely on the exact output, so I don't see a need for a knob. It is debug output, after all. If someone does need a stable, parsable output, DERBY-791 sounds like a better fit.

Knut Anders Hatlen
added a comment - 10/Mar/09 14:40 I think truncating already seen subtrees is a good idea as it (a) reduces the size of the printed tree and (b) makes it easier to see that two trees are actually the same one. Doing it always is fine with me. As you mentioned, the current output is often inconsistent, and I find it unlikely that anyone rely on the exact output, so I don't see a need for a knob. It is debug output, after all. If someone does need a stable, parsable output, DERBY-791 sounds like a better fit.

Dag H. Wanvik
added a comment - 09/Mar/09 18:12 I could imagine printing already seen subtrees something like this:
resultColumns:
org.apache.derby.impl.sql.compile.ResultColumnList@1e1be92
org.apache.derby.impl.sql.compile.ResultColumn@1038de7
** truncated **
org.apache.derby.impl.sql.compile.ResultColumn@1490eb5
** truncated **
Is this helpful? Searching backward in the printout would then reveal these subtrees fully printed.
Do we want a new knob to enable this, or could it be done always?