Friday, October 23, 2009

Asking right questions

When I said that semantics of 'John has two sisters' was |{ x | SISTER(x, JOHN) }|=2, I wasn't quite correct. In fact there's nothing in the text that preventing John to have 5 or 42 sisters. It's the Maxim of Quality which may limit the sister count to 2. Being not an absolute rule, this maxim can be easily flouted and the sentence could actually mean that John has more than 2 sisters in a right context.

Things get even more interesting if we just add one word: John has two beautiful sisters. There just isn't a default meaning here! John may have 2 sisters that are beautiful, but he may have 2 beautiful sisters and another 3 who are not so beautiful.

The question is, what should computer do in such situations. Should it apply pragmatic knowledge and disambiguate everything immediately after syntactic analysis using whole context? Or should it maintain an intermediate semantic representation and give it to some pragmatics module who could infer everything from the semantics? I clearly prefer modularization, i.e. the latter possibility. Of course I don't suppose any sequentiality, the modules may run in parallel interactively.

If we separate semantics from pragmatics, the representation problem arises again, even harder now. The semantic structure should be very generic, it should be interpretable in all the ways that were possible with the original text (minus the resolved lexical/syntactic ambiguities). And at the same time there should be no way of understanding it in any other way. If we just replace = with >= in the John has two sisters meaning, the pragmatics module still won't be able to apply the Quality Maxim. Such a meaning could well be produced from John has at least two sisters which is unambiguous with respect to sister count. So it still should be some kind of =2, but in a form open for interpretation. What a format could it be? I don't know. Yet.