The python grammar is built at startup from the pristine CPython grammar file
(see pypy/interpreter/pyparser/metaparser.py). The grammar builder first
represents the grammar as rules corresponding to a set of Nondeterministic
Finite Automatons (NFAs). It then converts them to a set of Deterministic
Finite Automatons (DFAs). The difference between a NFA and a DFA is that a NFA
may have several possible next states for any given input while a DFA may only
have one. DFAs are therefore more limiting, but far more efficient to use in
parsing. Finally, the assigns the grammar builder assigns each DFA state a
number and packs them into a list for the parser to use. The final product is
an instance of the Grammar class in pypy/interpreter/pyparser/parser.py.

The workhorse of the parser is the add_token method of the Parser class.
It tries to find a transition from the current state to another state based on
the token it receives as a argument. If it can’t find a transition, it checks
if the current state is accepting. If it’s not, a ParseError is
raised. When parsing is done without error, the parser has built a tree of
Node.

The glue code between the tokenizer and the parser as well as extra Python
specific code is in pypy/interpreter/pyparser/pyparse.py. The
parse_source method takes a string of Python code and returns the parse
tree. It also detects the coding cookie if there is one and decodes the source.
Note that __future__ imports are handled before the parser is invoked by
manually parsing the source in pypy/interpreter/pyparser/future.py.

Before writing bytecode, a symbol table is built in
pypy/interpreter/astcompiler/symtable.py. It determines if every name in the
source is local, implicitly global (no global declaration), explicitly global
(there’s a global declaration of the name in the scope), a cell (the name in
used in nested scopes), or free (it’s used in a nested function).

Bytecode is emitted in pypy/interpreter/astcompiler/codegen.py. Each
bytecode is represented temporarily by the Instruction class in
pypy/interpreter/astcompiler/assemble.py. After all bytecodes have been
emitted, it’s time to build the code object. Jump offsets and bytecode
information like the line number table and stack depth are computed. Finally,
everything is passed to a brand new PyCode object.