Attempto Controlled English

I’ve known about it for some time, but I never fiddled with it because the standard implementation setup is rather elaborate. I wanted a nice, simple package in Haskell which would define a parser and a printer only, much like haskell-src-exts does. That way I can use ACE to parse some simple English for all sorts of purposes1, with a simple familiar API that I can peruse on Hackage. Partly it’s also a good learning experience.

So I went through the paper The Syntax of Attempto Controlled English to see whether it was comprehensive enough to write a parsec parser out of. It was! I first wrote a tokenizer in with Attoparsec and wrote some tests. From those tokens I produced a set of combinators for Parsec, then I wrote a parser. While writing the parser I produced a set of test-cases for each grammar production. Finally, I wrote a pretty printer, and wrote some tests to check that print . parse . print . parse = id.

Newbies to Haskell parsing might find it an interesting use-case because it tokenizes with Attoparsec (from Text) and then parses its own token type (Token) with Parsec. A common difficulty is to avoid parsing from String in Parsec, which most tutorials use as their demonstration.

The Hackage package is here. I find the documentation interesting to browse. I tried to include helpful examples for the production rules. You shouldn’t have to know syntax theory to use this library.

Here is an ACE sample. We can parse the sentence “a <noun> <intrans-verb>” like this:

Anything to do with vocabulary is written as <foo>. The parser actually takes a record of parsers so that you can provide your own parsers for each type of word. These words are not of interest to the grammar, and your particular domain might support different types of words.

I.e. we get back what we put in. I also wrote a HTML printer. A more complicated sentence demonstrates the output:

for each <noun> <var> if a <noun> that <trans-verb> some <noun> and <proper-name>’s <noun> <trans-verb> 2 <noun> then some <noun> <intrans-verb> and some <noun> <distrans-verb> a <intrans-adj> <noun> <proper-name>’s <noun> <adverb>.

Can be printed with

fmap (renderHtml . toMarkup) . parsed specification

and the output is:

for each<noun><var>if a<noun>that <trans-verb>some<noun> and <proper-name>'s<noun><trans-verb>2<noun> then some<noun><intrans-verb> and some<noun><distrans-verb>a<intrans-adj><noun><proper-name>'s<noun><adverb>.

The colors and parenthesizing embellishments are just to demonstrate what can be done. I’m not sure this output would actually be readable in reality.

This is a good start. I’m going to leave it for now and come back to it later. The next steps are: (1) write more tests, (2) add feature restrictions and related type information in the AST, (3) add a couple sample vocabularies, (4) implement the interrogative (useful for query programs) and imperative moods (useful for writing instructions, e.g. text-based games).

Specifically, I want to use this to experiment with translating it to logic-language databases and queries, and from that produce interactive tutorials, and perhaps experiment with a MUD-like game that utilizes it.↩