Philip A. Bralich wrote:
> Thus, a lack of criticism should be interpreted as acceptance of
these arguments.
Here it is:
1) Bralich's first mail is an ad for the products of his company to
80%. The follow-up mails were ads to 90%-100%.
2) Bralich's view is anglo-centristic.
3) Bralich's view on syntax is related to some historically grown
traditions that are not necessarily right.
4) Bralich is listing the properties of his system and try to define
those as the standard.
5) Bralich does not contribute to the field.
1) and 4) are obvious. I try to explain 2), 3) and 5) a little bit
further.
> Of course, any theory of syntax, whatever its assumptions and
> methods, should be able to translate its structures into the Penn
> Treebank style if their work is thorough and complete. The ability
> to generate these labeled brackets and trees in itself constitutes a
> good test of a theories maturity.
This statement proves the anglo-centristic viewpoint of the
author. What about Russian or Chinese? What about theories that
describe fragments of other languages than English. Are the produced
structures supposed to be mapped to the Penn Treebank? And who dares
to say that trees are the appropriate structure for language?
> THE STANDARDS: In addition to using the Penn Treebank II guidelines
> for the generation of trees and labeled brackets and a dictionary
> that is at least 35,000 words in size and works in real time and
> handles sentences up to 15 to 20 words in length, we suggest that
> NLP parsers should also meet standards in the following seven areas
> before being considered "complete." The seven areas are: 1) the
> structural analysis of strings, 2) the evaluation of acceptable
> strings, 3) the manipulation of strings, 4) question/answer,
> statement/response repartee, 5) command and control, 6) the
> recognition of the essential identity of ambiguous structures, and
> 7) lexicography.
The construction and maintenance of a lexicon is a costly process. So
does Bralich claim that a syntactic theory the proponents of which
cannot afford to build a system with at least 35,000 lexical entries
is a bad theory? This question was raised by John Phillips (9.276) and
Bralich answered that one can buy a lexicon from the Linguistic Data
Consortium. Again what about other languages than English?
> It is important to recognize that EAGLES and the MUC conferences,
> groups that are charged with the responsibility of developing
> standards for NLP do not mention any of the following criteria and
> instead limit themselves to largely general characteristics of user
> acceptance or vague categories such as "rejects ungrammatical input"
> rather than specific proposals detailed in terms of syntactic and
> grammatical structures and functions that are to be rejected or
> accepted.
> There is almost no reference to specific grammatical structures, the
> Penn Treebank II guidelines, or references to current working
> parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html).
Why should there be? The purpose of a standard is to fix certain
things that a group of people agrees about. Is the right structure
that should be assigned to sentences agreed upon yet? So it is rather
the other way round: A standardization of grammatical structures at
the present time would be a big mistake.
And: What would be the prediction that the Tree Bank makes regarding
Chinese sentences?
5) The main problem that I had with Bralich's posting is that it does
not contribute anything to the field of linguistics or computational
linguistics.
His company has implemented a program that can perform certain tasks.
This is a good and an interesting thing. But what does this tell the
scientific community?
In the last century there was a machine that could play chess. It
turned out that inside of the machine there was sitting a little dwarf
making reasonable moves.
Bralich can argue that there is no dwarf in his laptop. Okay, there is
another example. In this century a machine was build that really could
play chess. It even defeated the champion of the world. But what does
this tell us about the nature of chess? What did the moves the machine
made tell us about the algorithms used? Nothing. And one problem of
the competition has been that it was not possible to study the
behavior of the machine. The machine is top secret.
So what is the point of somebody turning up at a conference with a
black box saying this program can handle phenomenon XY? How do we know
that this program handles phenomenon AB as well? Shall we sit there
and try all sentences with Bralich's program to guess what the theory
behind this program is?
Following Bralich's logic there is no underlying linguistic theory
because if there would be, a documentation of it could be found on
their web pages.
Unless he does not reveal his assumptions about linguistic theory his
mails have to be regarded as pure advertisements of his products and
should be banned from the Linguist Mailing List.
Stefan Mueller
-
Language Technology Lab
DFKI GmbH Tel.: (+49 - 681) 302 - 5295
Stuhlsatzenhausweg 3 Fax: (+49 - 681) 302 - 5338
D-66123 Saarbruecken http://www.dfki.de/~stefan/http://www.dfki.de/~stefan/Babel/Interaktiv/Babajava/

At 09:30 PM 2/25/98 +0000, Phil Bralich wrote:
>You are obviously not reading the standards. Take a look at them
>closely (appended below for convenience). They are very simple and
>you will note that most people have assumed that that much at least
>had already been accomplished by those theories and yet, in spite of
>the fact that there is nothing that prevents their formalisms from
>being programmed they cannot produce parsers that meet even those
>minimal standards.
[much snipped]
List of standards:
>1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS
OF >STRINGS, the parser should:, 1) identify parts of speech, 2)
identify >parts of sentence, 3) identify internal clauses (what they
are and >what their role in the sentence is as well as the parts of
speech, >parts of sentence and so on of these internal clauses), 4)
identify >sentence type (without using punctuation), 5) identify tense
and voice >in main and internal clauses, and 6) do 1-5 for internal
clauses.
I strongly suspect (and in this I agree with Steven Spackman, in the
same post) that machine implementation of such elementary analyses as
_4) sentence type_ are beyond the range of ability of any parser that
does not share a knowledge of human culture sufficient to include
complex pragmatic analyses. I base this on examples like the
following (which I owe to a brilliant talk by the late Dwight Bolinger
at a CLS some years ago):
1) \I'm sure. (single falling tone)
2) \You're sure. (single falling tone)
As Bolinger pointed out, 1) is a statement, while 2) is a question.
His point was that intonation is only loosely coupled to syntax, and
that sentence type is equally so. Only some entity with the knowledge
of pragmatics that human beings have could keep track of this stuff.
It well may be that a parsing program could recognize the syntactic
import of utterances at the level at which people interact with
computers (`Open pod bay six, HAL'), but such utterances are a small
subset of overall language, and therefore it is unlikely that being
able to implement a particular parser will be a test of
<italic>real</italic> language parsing.
Geoff Nathan
Geoffrey S. Nathan
Department of Linguistics
Southern Illinois University at Carbondale,
Carbondale, IL, 62901 USA
Phone: +618 453-3421 (Office) FAX +618 453-6527
+618 549-0106 (Home)

There have been several more reactions to my post concerning whether
or not the ability of a theory of syntax to be implemented in a
programming langauge constitutes a fair and accurate indepenend and
objective test of a theories scope and efficiency. In order to save
bandwidth I will respond one last time to this thread and try and
cover the widest range of crticisms as possible.
I am sorry to be the one to have to bring to you news of a serious
problem in your field, but the fact remains that the theories that you
have grown to know and love over the last 30+ years have a dirty
little secret: They cannot be programmed to save their lives. This
thread has taken on more of a life than I expected, so if all are
agreed I will make this the last post for this particular thread
(though not this subject I am sure). Please do not see this as an
opportunity to let your venom fly as I will respond to posts that I
feel must be responded to. I think it is easiest to frame this in
terms of arguments that are "out there" and my responses to them.
The garden path arguments and my responses:
1. The standards I have proposed have already been met. (They have
not). Not by a long shot. Just print out the standards, put a copy
of Ergo software in your pocket and then go and compare them with any
parsing system anywhere.
2. The standards I propose are idiosyncratic to Ergo's theory or they
are somehow unfair. Look at them yourself and ask if you and most of
the field hasn't believed they are commonplace expectations for any
theory or any parser.
3. Current problems with NLP have to do with working with the last
10%. That is, the pretense is they can already handle 90% of what
needs to be done but more is required. This is dead wrong. Parsers
outside of Ergo hardly begin to touch the standards we have proposed:
few of them doing anything more than part of speech analysis. If you
look at the output on speech rec systems you will see their NLP
abilities are well under 1% of the task (handling only a few hundred
commands). Ergo can improve that by another 60-80% increasing the
number of possible commands to many thousands, making the first spoken
language operating systems possible.
3. Parsing is not a good test of a theory even though there has never
been a theoretical mechanism proposed thatin principle could not be
programmed. Note that other NLP researchers are not anxious to argue
that their theories are better BECAUSE they cannot be programmed.
That would end virtually any hope of funding that may exist for them
in the NLP arena. Thus, I believe it is safe to say that all other
syntactic theoreticians agree wholeheartedly that programming is a
good test of a theory. I have yet to see one theoretical syntactician
to argue this claim. Though it does seem that there are those in the
field who believe parsing is not a good test. (Statisticians
probably--the last thing they would want is for a theory of syntax to
do better than their number crunching). Perhaps syntacticians with
other theories would like to take up the debate. Would a theory of
math that could not be programmed to make calculators then be a better
theory of math because it was using less mundane criteria than formal
consistency?
4. Statistics alone is sufficient to analyze the facts of human
language:
Wrong: statistics will never provide sufficient information about the
internal structure of strings to manipulate structures or to do
question/answer, statement/response repartee. (Aside: Does a vote for
Ergo equal a vote against statistics? Perhaps.)
5. People will not accept NLP until disfluencies and other gaps are
handled.
This is more than a little bizarre. By this logic speech recognition
should have sold nothing to date and even current products should be
stamped as not fit for human consumption. Believe me, when you can
type or speak the following to your search engine, people will forget
about the disfluencies and gaps.
Who was the eighth President of the United States?
Hey Mickey, what time is it?
6. Parsers are too cumbersome to be made readily available to the
general public. Again not true: Ours is a standard Windows 95 program
that fits on one disk (including the 75,000 word dictionary) and will
run on any 486 or better PC. If it is NOT superior to the others they
should be able to do the same.
7. There is something inherently wrong with the Penn Treebank
standard.
Doesn't matter: it is a true demonstration of a parsers ability to do
part of speech tagging as well as to do a thorough analysis of
internal structure. If this is done it Shouldn't take more than a few
weeks for the programmers to convert their Parser's output into the
Penn Treebank style. That is just not a big programming task.
Besides the Penn Treebank II guidelines are the standards accepted by
this field. (Of course, we also need equivalent standards for other
languages.)
8. Changing one structure into another or doing q&a makes untenable
theoretical claims about the relationships between structures. Again
not so--if you have properly analyzed the internal structure of
strings you should be able to change a question to a statement and a
statement to a question whether or not you believe this is what goes
on in the brain. The structures are so totally predictable, one from
the other, that this too should only take a programmer a week or so
(if the analysis of internal structure has been done correctly in the
first place).
9. People could respond intelligently to my claims, they are just too
busy with other things or too put off by my arrogance (accuracy?).
Wrong: this is a written record respected in the community and as
available as a library book (just type my name in a Net Search if you
want to find these arguments): not to respond is to acquiesce.
There is still a serious problem underlying the lack of response from
people who know this field, For syntacticians, if they say that
theories can be tested by their ability to be implemented as a parser
they have to produce a parser of at least equal uality to the Ergo
parser or concede ours is best; however, if they say that there are
more important issues than parsing (thereby demonstrating their theory
CANNOT be implemented in a parser) they must forever write off funds
for parsing until such time as they have amended their theory or their
opinion.
For statisticians, if they say that a theory of syntax can be parsed
at all, they are in danger of admitting there is no particular need
for statistical parsers. If they say that theories of syntax cannot
create parsers or cannot create parsers equal to statistical parsers
they must come up with a statistical parser that can meet or beat
those very ordinary standards that I have proposed. This is
especially difficult for them because there is no way that a
statistical parser will ever analyze internal structure to a
significant enough degree to do q&a or manipulate structures
(otherwise they would have developed a theory of syntax and would once
again remove the need for statistical parsers).
Finally, download a BracketDoctor (perhaps these arguments as wel),
take it to classes or to presentations or to conferences, and ask
questions based on what it can do. If you are given straight answers
with evidence of better results from other parsers you will KNOW I am
wrong. If anything else occurs (e.g. dead silence, dirty looks,
accusations of political incorrectness, shunning, or whatever) you
know there is substance in my arguments. Gauge my arguments not by
the intellectualized cloudiness of responses, but by the lack or
presence of physical evidence (don't go by oral reports alone) from
other parsers that can meet the standards I have provided. I have
provided very ordinary standards (repeated below) such that anyone
should be able to judge this. Look closely at the standards; you will
see they are fair and relatively simple. Then, BracketDoctor and
arguments in hand, go out and find the physical evidence yourself.
Phil Bralich
THE STANDARDS: In addition to using the Penn Treebank II guidelines
for the generation of trees and labeled brackets and a dictionary that
is at least 35,000 words in size and works in real time and handles
sentences up to 15 to 20 words in length, we suggest that NLP parsers
should also meet standards in the following seven areas before being
considered "complete." The seven areas are: 1) the structural analysis
of strings, 2) the evaluation of acceptable strings, 3) the
manipulation of strings, 4) question/answer, statement/response
repartee, 5) command and control, 6) the recognition of the essential
identity of ambiguous structures, and 7) lexicography. (These same
criteria have been proposed for the coordination of animations with
NLP with the Virtual Reality Modeling Language Consortium--a
consortium (whose standards were recently accepted by the ISO)
designed to standardize 3D environments. (See
http://www.vrml.org/WorkingGroups/NLP- ANIM).
It is important to recognize that EAGLES and the MUC conferences,
groups that are charged with the responsibility of developing
standards for NLP do not mention any of the following criteria and
instead limit themselves to largely general characteristics of user
acceptance or vague categories such as "rejects ungrammatical input"
rather than specific proposals detailed in terms of syntactic and
grammatical structures and functions that are to be rejected or
accepted. The EAGLES site is made up of hundreds of pages of
introductory material that is very confusing and difficult to
navigate; however, once you actually find the few standards that are
being proposed you will find that they do not come close to the level
of precision and depth that is being proposed here and for that reason
should be rejected until such time as these higher and more demanding
levels of expectation of the NLP systems is included there as well.
These are serious matters and a group like EAGLES should not ignore
extant NLP tools simply because they are not mainstream or because
mainstream parsers cannot meet these requirements (evnthough the Ergo
parser is better known than almost all other parsers). Just go
through their pages and try to find EXACTLY what a parser is expected
to do under these guidelines. There is almost no reference to
specific grammatical structures, the Penn Treebank II guidelines, or
references to current working parsers as models
(http://www.ilc.pi.cnr.it/EAGLES/home.html).
If the EAGLES' standards are ever to gain any credibility and respect
they are going to have to be far more specific about grammatical and
syntactic phenomena that a system can and cannot support. There
should also be some requirement that the systems being judged offer a
demonstration of their abilities to generate labeled brackets and
trees in the style of the Penn Treebank II guidelines. I suggest the
following as a far more exacting and far more demanding test of
systems than is offered by EAGLES or any of the MUC conferences.
HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS: 1. At
a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
STRINGS, the parser should:, 1) identify parts of speech, 2) identify
parts of sentence, 3) identify internal clauses (what they are and
what their role in the sentence is as well as the parts of speech,
parts of sentence and so on of these internal clauses), 4) identify
sentence type (without using punctuation), 5) identify tense and voice
in main and internal clauses, and 6) do 1-5 for internal clauses.
2. At a minimum from the point of view of EVALUATION OF STRINGS, the
parser should: 1) recognize acceptable strings, 2) reject unacceptable
strings, 3) give the number of correct parses identified, 4) identify
what sort of items succeeded (e.g. sentences, noun phrases, adjective
phrases, etc), 5) give the number of unacceptable parses that were
tried, and 6) give the exact time of the parse in seconds.
3. At a minimum, from the point of view of MANIPULATION OF STRINGS,
the parser should: 1) change yes/no and information questions to
statements and statements to yes/no and information questions, 2)
change actives to passives in statements and questions and change
passives to actives in statements and questions, and 3) change tense
in statements and questions.
4. At a minimum, based on the above basic set of abilities, any such
device should also, from the point of view of QUESTION/ANSWER,
STATEMENT/RESPONSE REPARTEE, he parser should: 1) identify whether a
string is a yes/no question, wh-word question, command or statement,
2) identify tense (and recognize which tenses would provide
appropriate responses, 3) identify relevant parts of sentence in the
question or statement and match them with the needed relevant parts in
text or databases, 4) return the appropriate response as well as any
sound or graphics or other files that are associated with it, and 5)
recognize the essential identity between structurally ambiguous
sentences (e.g. recognize that either "John was arrested by the
police" or "The police arrested John" are appropriate responses to
either, "Was John arrested (by the police)" or "Did the police arrest
John?").
5. At a minimum from the point of view of RECOGNITION OF THE
ESSENTIAL IDENTITY OF AMBIGUOUS STRUCTURES, the parser should
recognize and associate structures such as the following: 1)
existential "there" sentences with their non-there counterparts
(e.g. "There is a dog on the porch," "A dog is on the porch"), 2)
passives and actives, 3) questions and related statements (e.g. "What
did John give Mary" can be identified with "John gave Mary a book."),
4) Possessives should be recognized in three forms, "John's house is
big," "The house of John is big," "The house that John has is big," 5)
heads of phrases should be recognized as the same in non-modified and
modified versions ("the tall thin man in the office," "the man in the
office," the tall man in the office" and the tall thin man in the
office" should be recognized as referring to the same man (assuming
the text does not include a discussion of another, "short man" or "fat
man" in which case the parser should request further information when
asked simply about "the man")), and 6) others to be decided by the
group.
6. At a minimum from the point of view of COMMAND AND CONTROL, the
parser should: 1) recognize commands, 2) recognize the difference
between commands for the operating system and commands for characters
or objects, and 3) recognize the relevant parts of the commands in
order to respond appropriately.
7. At a minimum from the point of view of LEXICOGRAPHY, the parser
should: 1) have a minimum of 50,000 words, 2) recognize single and
multi-word lexical items, 3) recognize a variety of grammatical
features such as singular/plural, person, and so on, 4) recognize a
variety of semantic features such as +/-human, +/-jewelry and so on,
5) have tools that facilitate the addition and deletion of lexical
entries, 6) have a core vocabulary that is suitable to a wide variety
of applications, 7) be extensible to 75,000 words for more complex
applications, and 8) be able to mark and link synonyms.
Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822
Tel: (808)539-3920
Fax: (808)539-3924

I apologize if others have already made these points---perhaps I've
come to the NLP/syntax discussion mid-stream.
Given my perspective as a generative linguist, I would like to suggest
that Philip Bralich add a couple of items to his list of NLP parser
"tests"; such additions are related to various points about what
should constitute the "best" or most "mature" theory of syntax.
1.) A parser must hypothesize structural representations word-by-word,
i.e. the parser cannot wait until all words have been encountered in
order to assign parts of speech, assign brackets/constituent
structure, anticipate the correct analysis of potential ambiguities
(or any of the other tasks Bralich claims that a parser should do).
2.) As a corollary to (1), a parser should be "garden pathed" by the
same sentences which garden path the Natural Language Processors it is
meant to model (i.e. humans); furthermore, the parser should be able
to re-parse those sentences which humans can re-parse, and the parser
should utterly fail to re-parse the garden path sentences which
are---though technically grammatical---totally opaque to humans'
re-parsing. It should also be the case that the parser accepts those
ungrammatical sentences which humans seem to "parse" just fine even
though the sentence is ungrammatical, e.g. (the classic example)
"More people have been to Paris than I have"--- nothing about that
sentence is glaringly unparsable, but when you stop to think about
what it means, it makes no sense.
3.) The structure of the parser, particularly in the way it encodes
the structures/mechanisms of a given syntactic theory, should provide
an explanation as to why certain sentences are difficult for humans to
parse and others---though superficially more "complex"---are easy to
parse.
That last point constitutes the connection to syntactic theory,
otherwise the parser is nothing more than a technical re-description
of the problem. Allow me to offer the following additions to
Bralich's notion of a "mature" theory. A "mature" theory of syntax
should have within its mechanisms an inherent capacity to:
4.) provide a principled account of the range of variation in human
language (where "principled" is taken to mean that you don't simply
code variation in your programs, but rather the range of variation
follows as a natural consequence from the interactions of the
mechanisms of the theory);
5.) provide a principled account of the natural acquisition of
grammar; (for example, why is it that children learning Italian (a
language with a full verbal inflectional paradigm) acquire
subject-verb inversion earlier than children learning English (with
its impoverished inflectional paradigm) acquire subject-auxiliary
inversion;)
6.) provide a principled account of language change---especially the
means to delineate in a principled way those changes which are
sociological in nature from those which are fundamental changes in the
grammars of the speakers; (for example, the theory should be able to
explain why "Went John to London?" was replaced by "Did John go to
London?", and as a consequence of explaining that change, the theory
should also make clear whether the chronologically parallel loss of
quasi-double object constructions like "Mary gave to John a book" was
purely coincidental or was actually another surface manifestation of a
core change in the grammars;)
7.) make explicit the relationship between the issues in (5) and (6),
i.e. to the extent that a given theory of syntax provides an accurate
model of acquisition, the theory should thereby inherently provide a
model for how the recursive process of acquisition generates specific
changes in grammatical systems.
And finally, as an addendum to the specific metric Bralich defines for
evaluating syntactic theories (i.e. that they must be encodable in a
computer language), let's further stipulate that any implementation of
a syntactic theory in a computer language should be breakable in ways
which produce the same varieties of dysfunction which we see in
aphasia.
A tall order? No doubt. But here's the point (in case the
implicitness of my commentary has too thoroughly masked the message
for any one given person): proclaiming that a theory should do X, Y,
and Z, and then concluding that your theory is best because it does X,
Y, and Z better than anyone else's theory is, well (how to put
this....?), limited.
Frankly, hats off to Philip Bralich for creating a software
application which can do everything he says it can. (I, for one, have
about as much spare time as it takes to write this note; testing his
claims doesn't make the cut for things I make time for. (So why have
I taken the time to write this note? Because independently of the
accuracy of Bralich's specific claims, there's a much bigger issue
here---))
So hats off to the successes, but let's be clear about how a research
program unfolds: if "ability to be encoded in a computer language" is
one of the rubrics you use in developing your theory, then in all
likelihood you will make decisions about any number of various
components of the theory which are based, at least in part, on the
inherent structures and mechanisms which define computer systems.
If what you want out of your theory is something that will give you an
edge in a multi-billion dollar industry, then it doesn't matter if
your syntactic theory and resultant parser make use of computer
structures/mechanisms which humans don't possess.
But if you want a theory which provides insights into the mechanisms
of Natural Language Processing in the sense that your modeling how
humans being do it, then it seems to me that computers can provide a
magnificently sophisticated and powerful blackboard which we can use
in the exploration of theoretical issues.
Crucially, such explorations can---and do---happen even though the
particular theory in question is not able to be encoded in such a way
that it produces a parser which meets Bralich's specs.
The bottom line is that it's a mistake of scope to equate
"encodability in a computer language and the computerized parser that
can produce" with "the best theory of syntax".
Respectfully,
Mark Arnold