The parser has to check the query string (which arrives as
plain ASCII text) for valid syntax. If the syntax is correct a
parse tree is built up and handed
back; otherwise an error is returned. The parser and lexer are
implemented using the well-known Unix tools yacc and lex.

The lexer is defined in the file
scan.l and is responsible for
recognizing identifiers, the SQL key words etc. For every key word or
identifier that is found, a token is
generated and handed to the parser.

The parser is defined in the file gram.y and consists of a set of grammar rules and actions that are executed whenever a rule is
fired. The code of the actions (which is actually C code) is
used to build up the parse tree.

The file scan.l is transformed to
the C source file scan.c using the
program lex and gram.y is transformed to gram.c using yacc. After these transformations have
taken place a normal C compiler can be used to create the
parser. Never make any changes to the generated C files as they
will be overwritten the next time lex or yacc is called.

Note: The mentioned transformations and
compilations are normally done automatically using the
makefiles shipped with the
PostgreSQL source
distribution.

A detailed description of yacc or the grammar rules given in
gram.y would be beyond the scope of
this paper. There are many books and documents dealing with
lex and yacc. You should be familiar with
yacc before you start to study
the grammar given in gram.y otherwise
you won't understand what happens there.

The parser stage creates a parse tree using only fixed rules
about the syntactic structure of SQL. It does not make any
lookups in the system catalogs, so there is no possibility to
understand the detailed semantics of the requested operations.
After the parser completes, the transformation process takes the tree handed
back by the parser as input and does the semantic
interpretation needed to understand which tables, functions,
and operators are referenced by the query. The data structure
that is built to represent this information is called the
query tree.

The reason for separating raw parsing from semantic analysis
is that system catalog lookups can only be done within a
transaction, and we do not wish to start a transaction
immediately upon receiving a query string. The raw parsing
stage is sufficient to identify the transaction control
commands (BEGIN, ROLLBACK, etc), and these can then be correctly
executed without any further analysis. Once we know that we are
dealing with an actual query (such as SELECT or UPDATE), it
is okay to start a transaction if we're not already in one.
Only then can the transformation process be invoked.

The query tree created by the transformation process is
structurally similar to the raw parse tree in most places, but
it has many differences in detail. For example, a FuncCall node in the parse tree represents
something that looks syntactically like a function call. This
may be transformed to either a FuncExpr or Aggref node depending on whether the
referenced name turns out to be an ordinary function or an
aggregate function. Also, information about the actual data
types of columns and expression results is added to the query
tree.