This class implements the TreeReader interface to read Penn Treebank-style
files. The reader is implemented as a push-down automaton (PDA) that parses the Lisp-style
format in which the trees are stored. This reader is compatible with both PTB
and PATB trees.
One small detail to note is that the PennTreeReader
silently replaces \* with * and \/ with /. Two possible designs
for this were to make the PennTreeReader always do
this or to make the TreeNormalizers do this. We
decided to put it in the PennTreeReader class itself
to avoid the problem of people making new
TreeNormalizers and forgetting to include the
unescaping.

Method Detail

readTree

Reads a single tree in standard Penn Treebank format from the
input stream. The method supports additional parentheses around the
tree (an unnamed ROOT node) so long as they are balanced. If the token stream
ends before the current tree is complete, then the method will throw an
IOException.

Note that the method will skip malformed trees and attempt to
read additional trees from the input stream. It is possible, however,
that a malformed tree will corrupt the token stream. In this case,
an IOException will eventually be thrown.