The Contextors' Parser assigns syntactic structure trees to strings of words in English. Developing the parser is a fresh trial of teaching a machine rules about different linguistic aspects of English. In the process of adding rules to the parser and examining its solutions, some interesting theoretical issues arise.

Introducing the Contextors Parser

Abstract:
In this article we introduce the goals and notions that have been guiding us in developing our syntactic parser, among them a flexible scheme for writing linguistic rules, transparency of each rule in the system, an advanced testing tool that is sensitive to the smallest changes and a mechanism for retrieving syntactic and lexical features from every node within a given phrase. We also refer the reader to some of the parser's applications.

Rule-based parser

The Contextors’ Parser assigns syntactic structure trees to strings of words in English. Developing the parser
is a fresh trial of teaching a machine rules about different linguistic aspects of English. The parser development is an on-going process and we are adding support for more linguistic phenomena all the time, with the intention to cover all grammatical structures of the English language, getting closer and closer to the Perfect ParserTM.

Theoretical research

In the process of adding rules to the parser and examining its solutions, some interesting theoretical issues arise. We research these cases and develop rules that prevent the generation of wrong solutions. You can find examples in our article about the different uses of of-preposition phrases, and others.

Developed by linguists

The parser and its development environment were built from scratch in order to support, from the beginning, a scheme for writing grammar rules that makes it possible to incorporate linguistic insights. A lot of effort was made in order to let linguists develop and test rules themselves. You can read here about the concept of Language Engineering and the process of creating rules. The main principle we follow is to put behind the scene all methods that are related to the model of the parser and let the linguist express themselves with high-level methods that match most closely the linguistic language they use.

Overcoming the rule-based challenge

The parser design and the tools for testing and debugging we’ve developed, make it possible to insert a new rule or adjust one in accordance with all relevant other rules in the system. This allows us to avoid conflicts and stay in control over the development process. The output of the parser is always predictable as it is based on rules. And because our system is transparent, our linguists can examine each step of the parsing process and trace any parsing problem to the specific piece of code that is responsible for it. Our testing tools allow us to test any change over a big number of examples and see its impact.

Linguistic programming language

In order to formalize different kinds of rules, we’ve developed various methods to express linguistic rules and principles. We extract different linguistic properties of the input and use them while parsing. Often, these methods are used as building blocks for additional more complicated rules. Moreover, eliminating wrong solutions is often achieved by formulating principles that are operative across several syntactic rules. All these methods combine to form a very rich and flexible linguistically oriented programming language.

An acceptable input of the parser may be a sentence or a phrase. The output is a syntactic structure tree which combines the representations of syntactic categories and grammatical functions (read more about it here). We can choose what level of details to visualize and highlight part of the tree. The strategy we choose for the parser is to give all possible solutions for a given input. Depending on an application needs, we can reduce the level of details and then show less solutions.

Applications

As mentioned above, the parser coverage is constantly increasing. We see that a good coverage of certain fields of the language can benefit interesting products such as the voice conjugator and other tools we’ve developed. The detailed analysis of the parser and the linguistic methods we use allow us to modify text while preserving its meaning and its basic structure.

How can you use the Contextors Parser?

We’ve opened an API for the parser. The API assigns to sentences their grammatical attributes (tense, polarity, voice, etc.) and main components (subject, verb, object, etc.). We are looking for beta users to start building interesting applications based on it. Apply for access here.

Related Articles

Imagine you are an expert in the art of drawing circles. That is just your thing. You are aware of circles on so many different levels and have spent a long time learning and researching the topic. Imagine someone wants to hire you for your expertise (drawing circles!), wouldn’t that be great? Then you discover that you are only given paper, no pen/pencil/computer. That’s it. Make it work.

The Contextors’ syntactic parser assigns each input sentence a syntactic structure tree, a structure that represents the way in which the words of the sentence are put together. There three notions pertaining to trees, namely constituent structure, syntactic category and grammatical function, are all represented in the contextors trees.

We thought it would be good to let you know what we have been up to in the last few months, besides publishing posts on our website. Well, we have been very busy improving our parser and the conjugators that are based on it. Some of the new features were added thanks to your input sentences, which made us aware of problems we had overlooked.