How to Create a Program: a Small Software Project in C++

Programming expert Bartosz Milewski presents an example of how to design and implement a small program (a command-line symbolic calculator) using many of the techniques applicable in more large-scale software development.

When you write a program, you don't ask, "How can I use a
particular language feature?" You ask, "What language feature will
help me solve my problem?"

Programmers are rarely totally satisfied with their code. After I wrote the
program for my
article on polymorphisma
simple parser-calculatorI immediately noticed that I could have done it
better. I felt that the top-level view of the program was not very clear. The
nodes of the parsing tree needed access to components such as the Store
(the calculator's "memory"), the symbol table (to print
meaningful error messages), and maybe even to the function table (to evaluate
built-in functions). I could have made all these objects global and thus known
to everybody, or passed references to them to all the nodes of the tree.

The order of construction of these objects was important, so I felt that
maybe they should be combined into one high-level object, the
Calculator. The temptation to redesign and rewrite the whole program
was strong. Maybe after the nth iteration I could come up with something
close to the ideal?

Then I decided that I shouldn't do it. Part of making a program is
coming up with not-so-good solutions and then improving on them. I would have
cheated if I had come up with the best, and then reverse-engineered it to fit
the top-down design and implementation process. So here it isthe actual,
real-life process of creating a program.

Design Specification

The purpose of the calculator program is to accept arithmetic expressions
from the user, evaluate them, and display the results. The expressions are to be
parsed by a simple top-down recursive descent parser (if you have no idea what
I'm talking about, don't panicit's just a name for what
I'm going to describe in detail later).

NOTE

In principle, the type of parser is an implementation detail and as such
would not normally be mentioned in the functional specification. In our case,
however, the type of parser (unfortunately) influences the choice of the
grammar, which is part of the functional specification.

The parser's goal is to convert the input string into an arithmetic
tree. The simple grammar accepted by the parser is defined as follows.

The parser is looking for an expression:

An expression is a term followed by a plus or a minus sign,
which is followed by another expression. If an expression doesn't
contain any plus or minus signs, it's equal to a term.

A term is a factor multiplied or divided by another
term. If a term doesn't contain any multiplication or division
operators, it's equal to a factor.

Figure 1 The parsing of an arithmetic expression. The arrows with numbers correspond to the rules of grammar that are used to make each parsing step.

This grammar is simple and natural, but it has one flawthe arithmetic
operators are right-associative instead of being left-associative. That means,
for example, that a - b + c will be parsed as a -
(b + c), which is not exactly what we'd expect. There are
standard ways of fixing this grammar or modifying the parser, but that's
beyond the scope of this section. (We'll revisit this issue later.)

Since we want our calculator to be able to use variables, we need a symbol
table. A symbol table remembers names of variables. We'll also need
to be able to assign values to variables. We can do it by expanding the
definition of expression to include the following clause:

An expression can also be a term followed by the equal sign
followed by another expression.

Not every term is appropriate on the left side of an assignment. It has to be
an lvalue something that can be on the left side of an assignment.
In our case, the only allowed type of lvalue will be an identifier corresponding
to a variable. Similarly, anything that can be put on the right side of the
assignment is called an rvalue. In our case, any expression is a valid
rvalue.

NOTE

You can also derive lvalue from "location value," since it
represents a location where a value can be stored, and rvalue from "read
value," since a value can be read from it.

We'll also introduce a scanner, which will convert the input string into
a series of tokens. It will recognize arithmetic operators, numbers, and
identifiers. This way, the parser will have an easier jobmatching
sequences of tokens to grammatical productions.