Marpa resources

Thu, 29 Mar 2012

The question of parsing
English and other natural languages
has come up
in the course of my work on
the Marpa parser.
As in the case of Perl,
I first posed the question of whether any
algorithm running on a Turing machine can parse
the target language.
This post contains
what I hope the reader will find to be
a rigorous demonstration
that the syntax of the English language is undecidable.

When I say “undecidable”,
I mean that term in the strict sense.
Undecidability is not vagueness or
uncertainty -- undecidability is the certainty
that a “decision” of the matter
is not possible.
I will give a specific example of
an English-language sentence which is unsyntactic if and
only if it is syntactic.

The Demonstration

The sentence hinges on the distinction between sentences
in the passive voice,
and sentences which contain a copular verb
(in this case, “to be”)
and an adjectival complement.
For example,

(A) “The door was opened”

is a passive voice sentence.
In the sentence

(B) “The door was open”,

on the other hand, “open”
is an adjectival complement,
and “was” is a copular verb.
(While the passive voice of English
is the subject of considerable discussion,
the terminology and analysis
of this post
follows Quirk et al.,
A comprehensive grammar of the English language,
pp. 159-171.
and
Language
Log, and sticks to terrain which should not
be controversial, even for non-specialists.)

Now consider the following sentence.

(C) “The window was closed.”

Is this a sentence with a predicate in the
passive voice,
or is “was” a copular verb
with an adjectival complement?
In standard dictionaries, you will find
“closed” is listed both
as an adjective and as the past participle
of the verb
“close”, so that either could be
the case.
Further information would be needed to decide the syntax of (C).

Next I make two observations.
The first observation is that a sentence can be syntactically
correct, but not meaningful.
The traditional example is

(G) “Green dreams sleep furiously”.

Another example is

(H) “The first even prime greater than two crossed the
street symmetrically”.

Both these sentences allow easy syntactic analysis: each has a subject,
and a predicate.
Nouns, adjectives, adverbs and verbs can be readily
identified and given fixed locations in a formal syntactic structure.
But neither sentence has any meaning, except in a poetic or highly
figurative sense.

The second observation is that, while we can have correct syntax
without a semantics,
we cannot have meaning without syntax.
That is, sentences like

(I) “Airplane from to the and for of or pilot up”,

while they might contain words which could be selected
and rearranged to be meaningful,
do not mean anything.

We now return to the passive-versus-adjective decision,
where we observed that

(P) “The window was closed”

could be an example of either a verb in the passive voice
or of a copular verb with an adjectival complement.
In the sentence

(Q) “The window was closed, but the door was open”,

the question is resolved by coordination.
Since in (Q) “open” can only be an adjectival complement,
the expectation is that “closed” will also be.
Similarly, in the sentence

(R) “The window was closed, but the door was opened”,

coordination with
“opened” decides the matter in favor of the passive
voice.
In a sentence where the coordination is closer,
and includes a by-phrase to show agency for both verbs,

(S) “The window was closed and the door opened by the same person”,

the presumption that both verbs are in the passive voice becomes
a certainty.

We are now in a position to consider our undecidable sentence,

(U) “The window was closed and the door opened by the burglar after
he discovered that the window was in fact
a beautifully executed trompe d'oeil mural.”

The window is fake.
It therefore could not have been
closed by the burglar, which makes its syntax wrong.
The verb “was closed”, before semantic feedback
is taken in account, clearly is in the passive voice.
After the semantic feedback to the syntax is considered, however, it is clear
that
“was closed” cannot be in the passive voice,
and that therefore (U) is unsyntactic.

But if a sentence does not have correct syntax,
it cannot have a semantics.
Since the only problem with the syntax of (U) was a result
of the semantic feedback, if semantics is not considered (U)
has correct syntax.
So (U) is incorrect syntactically if and only if it has
correct syntax.

We cannot decide (U) to have incorrect syntax, because in that
case it becomes meaningless and the syntax becomes unobjectionable.
But neither can we decide (U) to have correct syntax,
because that makes (U) meaningful.
When (U) is meaningful,
it has incorrect syntax,
because the passive voice cannot be correct
given the meaning.

Our only choice is to say that the correctness of
the syntax of (U) cannot be decided.
It is not true
that (U) has correct syntax,
but neither is it true
that (U) has incorrect syntax.
This concludes the demonstration.

A Thought Problem

To a reader unconvinced
by the preceding demonstration,
I would urge that,
as a thought problem,
she consider
a computer program attempting to
parse English.
Such a program would have to know
at least some semantics, and
would have to feed this knowledge back
to the parsing process.
But the preceding demonstration shows that
such a feedback loop, if completely effective,
will encounter sentences like (U),
where it cannot decide whether the syntax of
the sentence is correct or incorrect.
I believe that,
by reflecting on this thought problem,
the candid reader will be able to convince herself
that a human being
parsing English sentences
like (U)
will have the same problems,
and for similar reasons.

About Syntax and Semantics

It needs to be emphasized that the diagonalization
in this post's demonstration
is not about semantic truth and has nothing to do
with semantic paradox.
This post is
about the feedback from semantics into syntax,
and the relationship between them.
By way of contrast,
consider these sentences:

(X) is false; (Y) attempts a contradiction but fails;
and (Z) achieves a semantic contradiction.
But in none of them is the semantics an issue for the
syntax,
and all three are correct syntactically.
In each case,
even when there are semantic issues,
the feedback that the semantics provide the
syntax is unproblematic.