Robert Myers <rmyers1400@attbi.com> wrote:>I find myself in the position of needing to know more about compilers>than I thought I would ever want to know.

Compiling has several conceptual phases, as follows:

1. Syntax analysis: Do the programs follow the grammatical rules of
the language? Consider English:

The fat lady sang.

This is syntactically valid, while "lady the fat sang" isn't.

2. Semantic analysis. Does the sentence have any meaning?

The airplane stirs the crockpot menacingly.

This is syntactically valid (nouns, verbs, adverbs all in the right
place) but semantically invalid, as it has no meaning.

3. Code generation; once a program is syntactically and semantically
valid, it's necessary to generate code (for a compiler; for an
interpreter, some internal representation of the program is executed
directly).

Lexical analysis ("lexing", "scanning") is the process of grouping
input characters into "words" (known as "tokens" in the technical
jargon) of the language. For natural language, you do this process
essentially automatically, but if you've ever worked with a child
learning to read, you can see that it *is* a process.

Parsing is the syntactical analysis phase: are the incoming "words"
in the right order? Do they make sense? Parsing is done via
grammars, which specify the way sentences (statements, in programming
languages) are built up from tokens.

Terminals in grammars are the names for tokens. For example, you might
have a rule:

expression := IDENTIFIER operator IDENTIFIER SEMICOLON

Here, all the uppercase words are terminals or tokens; the scanner
returns something to the parser indicating it has seen the the
given token, which may consist of a single character, or of multiple
characters.

You then need another rule (known in the jargon as a "production") that
describes what an operator is:

operator := PLUS | MINUS | MULT | DIV | MOD

The vertical bar meaning "or".

It is conventional that terminals be spelled in upper case, non-terminals
in lower (or mixed case), but that is only convention. Non-terminals
are exactly that: they can expand into other sequences of terminals
and/or non-terminals.

At some point, the grammar must make it possible for the non-terminals
to expand into a final sequence of terminals.