Jocelyn Paine

Dr. Dobb's Bloggers

An Online Budget Questionnaire, JavaCC, and the Three Ways of Putting Together

April 21, 2010

After the ash, the papers will again be full of post-Election austerity. Roads or railways, councils or care homes: where will the Chancellor cut to chop the UK's massive deficit? I and a colleague have put up a poll that surveys how people would trade-off these cuts, and how this correlates with voting
preference, at
www.sharesim.arachsys.com/login/budget10.jsp
and Facebook group
Balance
the Budget. We used a questionnaire-compiler which is a nice application of the parser-builder JavaCC.

After the ash, the papers will again be full of
post-Election austerity.
Roads or railways, councils or care homes: where will
the Chancellor cut to chop the UK's
massive deficit?
I and a colleague have put up a poll that
surveys how people would trade-off these
cuts, and how this correlates with voting
preference, at
www.sharesim.arachsys.com/login/budget10.jsp
and Facebook group
Balance
the
Budget.
We used a questionnaire-compiler
which is a nice application of the
parser-builder JavaCC.

The questionnaire was written by
my colleague John Green
of Research for Today.
It is based round
four grids, rendered as HTML tables.
Each grid has a row for each area of
public spending surveyed: affordable
housing; Trident nuclear
submarines; other defence; nationwide 40Mb
broadband; care homes.
Each row has cells for possible
levels of service: 20,000 more affordable
homes per year; 40,000; 60,000.
Cells are also equipped with a radio button
and a score stating the relative cost
of that level of service:

In each grid,
the person doing the poll has to
click buttons so that the costs sum
to a particular budget saving: £60 billion for
the first grid; £40 billion for
the second; £20 billion; as now. It turns
out to be surprisingly difficult — one
might say taxing — as you discover
just how much the UK cannot afford.

The questionnaire also contains
multiple-choice questions
having just one row or
column of radio buttons or
checkboxes. These ask for such characteristics
as sex,
age band, and political preference: Labour,
Tory, or Lib Dem.

These and the rest of the questionnaire
are
coded in a mini-language
that I implemented.
Here's an example:

By the way, I used the bar | as a string
delimiter rather than double quote, to make
it easier to include quotes in strings.

It is useful to design mini-languages
for tasks like this, but of course one needs
to specify their grammar. Here is a grammar for
the fragment above, written in
BNF. I've extended it with the +
operator, which by convention means "one or more
occurrences of":

To write a parser in JavaCC, one
transcribes BNF into
its grammar-description language. This
is easy: JavaCC's grammar-description
language is designed to be close to
BNF.

An important thing JavaCC needs to know is
which are the reserved words, or what
compiler-writers call "tokens", in
its input.
For my mini-language example, they
are asks,
horizontal,
layout,
options,
page break,
question,
text,
and
vertical.

One must also say how tokens
are separated. In common with
almost all programming languages —
except Fortran, Snobol, and I suppose, Python —
our mini-language is completely free-format. So
I tell JavaCC to ignore
spaces, tabs, and newlines and carriage
returns between tokens. One does
this by writing a regular expression
that specifies the characters to be
skipped.

I let questionnaire authors
put comments in their files. This
is good practice for any
mini-language. It allows the author
to annotate their file with
information about creation and revision
dates, why changes were made, the purpose
of a particular statement, and so on.
I decided to use C-style comments: either two
slashes up to the end of the line, or slash-star
up to star-slash. Again, one specifies
these with regular expressions.

Having told JavaCC about tokens, the gaps
between tokens, and comments, one can now
code a grammar. In JavaCC's
grammar-description language, each
rule looks
rather like a Java method.
For example, the rule for item

resembles a method named item,
with a body that says "either
call the
question rule, or, call
the non-question
rule".

It isn't enough
merely to recognise whether
a file holds a valid
questionnaire or not.
One wants the parser to build
a representation of the questionnaire.
To do this, one augments the
grammar with "semantic actions". That's
the computer scientist's name for
pieces of code that take the
strings recognised by the right-hand
side of a grammar rule and do
stuff with them, such as
storing them in the fields of
a data structure. In JavaCC, the
semantic actions are Java statements.

So a JavaCC file contains definitions
of tokens and of the spaces
between tokens, definitions of comments,
a grammar, and the semantic
actions inside
the grammar. One must
embed this in a class definition.
This can contain methods and
variables to
be called by the semantic actions.
One must also set up the
input stream that the parser will read
text from, and set some options.
Amongst other things, these control
the diagnostics to
be generated.
And then, one feeds the JavaCC file
through JavaCC, and out comes a
Java file. Compile this, and
you have your parser.

The above would apply, mutatis mutandis,
to any parser-builder, not just JavaCC.
Something else that should always apply is
the tight
connection between grammar, data structure,
and control structure. In sequential
programming, there are three
main control structures. These
are: sequencing, or doing one thing
and then another thing; alternation, or
doing one thing OR another thing;
and iteration, or doing
one thing over and over
again. Java does sequencing with
semicolons, alternation with "if"
and "switch", and iteration with loops.

Corresponding notions in a grammar are:
recognising one thing and then another thing;
recognising one thing OR another thing;
and recognising one thing over and
over again. The BNF notation I used
indicates the first by concatenation, the second
by the | operator, and the third
by the + operator.

And corresponding to these
in data structures are:
storing one thing and another thing;
storing one thing OR another thing;
and storing one thing over
and over again. Almost
every language uses multiple
fields for the first, and
arrays, lists, vectors and so on
for the third. A principled
language like Pascal uses "variant
types" for the second; some other languages
call these "unions".
Java doesn't quite have this notion,
but bodges it with
subclasses.

What is the connection of these "Three
Ways of Putting Together" with our
questionnaires?
I needed to
decide what kind of data structure
the semantic actions in my grammar
would build.
The answer is simple: data structures
that parallel the structure of
the grammar.

So I represented the entire questionnaire
as an instance of class Questionnaire,
with a field that was a vector of Items.
I defined class Item, and
its subclasses Question

and NonQuestion. A
Question had instance variables
holding the text
that it asks,
a vector of options,
and a LayoutSpecifier.
The LayoutSpecifier was a Java Enum
with two elements: HORIZONTAL
and VERTICAL. Class

NonQuestion

had subclasses Text
and PageBreak.
And a Text contained
one instance variable, a String.
Compare this with my BNF.

By the way, the previous paragraph, and my
grammar, are for the
questionnaire fragment I
showed earlier. The questionnaires that we
use in real life contain other kinds of
question, such as the grids. They also contain
other constructs. For example,
conditional jumps that
skip part of a questionnaire if the
respondent gives certain answers.

How does the structure of
questionnaires relate to the Web?
When somebody logs in to our budget poll,
my Java code looks up the name of the file
that contains the mini-language
definition for the budget10
questionnaire. It passes the name
to the parser instance I generated
with JavaCC. JavaCC has translated
my grammar rules into methods,
the top-level one being named
questionnaire. My code
calls this,
passing it
an input stream opened
onto this file. This method
calls the subordinate
methods corresponding to subordinate
rules, as well as the semantic actions.

The semantic actions build an
instance of class
Questionnaire. This then
has to be rendered as
HTML. And here we again see the
Three Ways of Putting Together.
The top-level render-as-HTML
method sequences through the fields of
the Questionnaire
instance.
It iterates over the
vector of Item

instances in one of these fields.
And it
has alternative
methods for outputting
an Item, depending
on whether this is a

Question
or
a NonQuestion.

So the
Three Ways of Putting Together
apply to grammar,
to data, and to control.
By designing these so that
one mirrors another, we can write
really clean code.

The same applies to processing
the answers to a questionnaire.
Once we have answers from enough
respondents, John will analyse them with
his SIMALTO ("Simultaneous, Multi-Attribute,
Level Trade-Off") algorithms, and
publish the results, correlating
with respondents' expressed voting preference.
But how do the answers get
to that stage? The respondent's
browser sends each page's answers
to our server
in the normal way,
as a string of
name=value parameters at the end of a
URL. The server passes these to
my Java code, which stores
them in an instance of my
class AnswersToQuestionnaire.
This
parallels
the question parts of
class Questionnaire.
(The non-question parts, namely text
and page breaks, don't generate answers, and therefore
are irrelevant.)
Class AnswersToQuestionnaire

contains a vector
of instances of class AnswerToQuestion,
analogous to my
Question class.

Having these parallel structures
makes it easy to associate answers with
the question they belong to. This
helps when saving the answers to file,
and when
displaying already-answered questions.
It is also useful in reporting errors,
as when the respondent's
budget expenditure falls outside the
permitted range. Luckily, on the Web, the
consequence is one of our
red error messages. If you are
Chancellor, the consequences of
inapposite budget expenditure are less
benign.

After the ash, the papers will again be full of post-Election austerity. Roads or railways, councils or care homes: where will the Chancellor cut to chop the UK's massive deficit? I and a colleague have put up a poll that surveys how people would trade-off these cuts, and how this correlates with voting
preference, at
www.sharesim.arachsys.com/login/budget10.jsp
and Facebook group
Balance
the Budget. We used a questionnaire-compiler which is a nice application of the parser-builder JavaCC.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!