peek() returns non-zero value if the next token is equal to the given string.
accept() reads the next token, if it’s equal to the given string, otherwise it
returns 0. And expect() helps us to check language syntax.

the harder part

As you can see from the language grammar, statements and various expression
types are strongly interconnected. It means we have to write all parser
functions at once, keeping in mind the recursion. Let’s go again from top
to bottom. Here’s our top-level compiler() functions:

It reads type name, then an identifier. If it’s followed by a semicolon -
it’s a variable declaration. If it’s followed by a paren - it’s a function.
Function scans function arguments one by one, and if function is not
followed by a semicolon - it’s a definition (function with a body), otherwise -
it’s just a declaration (just function name and prototype).

Here, typename() is function that just skips the valid type name. We accept
only int and char and various pointers to them (char *):

The most interesting part is the statement() function. It parses a single
statement, which can be a block, a local variable definition/declaration,
a return statement etc. Here how it should look like:

So, if it’s a block { .. } - just read statements until end of block is met.
If it starts with a type name - it’s a local variable. Conditional statements
(“if/then/else”) and loops are just stubs for now. Think of how you would
implement them according to the grammar we use.

Anyway, most of the statement contain expressions inside. So, we need to make a
function that parses an expression. Expression parser is a recursive descent
parser, so it’s a number of functions that call each other recursively until
primary expression is found. Primary expression as we can see from the grammar
is a number (constant) or an identifier (variable or function).

It’s a big piece of code, but don’t be afraid - it’s really simple.
Every function that parses expression type first tries to call a
more prioritized expression parser. Then, if an expected operator is found -
it calls more prioritized expression parser again. Now it has parsed both
parts of a binary expression (like x+y, or x&y, or x==y), so it can perform
an operation and return. Some expression can be “chained” (like a+b+c+d), so
we parse them with loops.

We put debug output after every expression parser function. This will give us
an interesting result. For example, if we parse this piece of code:

All our expressions are written in a postfix form (instead of 2+3 it’s 2 3
+). This is a natural form for stack machines, when operands are placed on
the stack, then a function called pops up the operands, processes them and puts
the result back on the stack.

Though it might not be an optimal architecture for most modern CPUs, which are
register-based, it’s still very simple and fits our compiler needs.

symbols

Ok, we are good. We’ve got a lexer and a parser in less than 300 lines of code.
What we need to do is to add some functions to work with the symbols (like
variable names, or functions). A compiler should have a table of symbols to
quickly find their addresses, so when you write “i = 0” - it means put zero
into the location at address 0x1234 in RAM (if symbol “i” has address 0x1234 in
memory).
Also, when you call “func()” it means - jump to address 0x5678 (if symbol “func”
has value of 0x5678).

We use the following structure for symbols:

struct sym {
char type;
int addr;
char name[];
};

Here type has special meaning. We use a single-letter codes to detect symbol
type:

L - is a local variable. addr stores variable location on the stack

A - function argument. addr also stores the location on the stack

U - undefined global variable. addr stores absolute address in RAM.

D - defined global variable. Same as above.

So far, I’ve added two functions: sym_find(char *s) to find symbol by its
name, and sym_declare() to add a new symbol.