Compiler Construction/Semantic Analysis

This is roughly the equivalent of checking that some ordinary text written in a natural language (e.g. English) actually means something (whether or not that is what it was intended to mean).

The purpose of semantic analysis is to check that we have a meaningful sequence of tokens. Note that a sequence can be meaningful without being correct; in most programming languages, the phrase "x + 1" would be considered to be a meaningful arithmetic expression. However, if the programmer really meant to write "x - 1", then it is not correct.

Semantic analysis is the activity of a compiler to determine what the types of various values are, how those types interact in expressions, and whether those interactions are semantically reasonable. For instance, you can't reasonably multiply a string by class name, although no editor will stop you from writing

"abc" * MyClass

To do this, the compiler must first identify declarations and scopes, and typically records the result of this step in a set of symbol tables. This tells it what specific identifiers means in specific contexts. It must also determine the types of various literal constants; "abc" is a different type than 12.2e-5.

Then it must visit all locations where identifiers and literals are used, and verify that the use of the identifier/literal, and the results computed, are compatible with the language definition (as in the above example).

As to how this is done: typically the source code is parsed, some representation of the program is constructed (syntax trees are very popular), and that representation is walked ("visited") element by element to collect/validate the semantic information. The symbol table is usually just a set of hash tables associated with the syntax tree representing a scope, hashing from identifiers to structures containing type declarations.

In a statically typed language, immediately following the parsing phase is the type checking phase. This attempts to catch programming errors based on the theory of types. In practice this is checking things like a variable declared as a string is not used in an expression requiring an integer.

In a dynamically typed language no type checking is performed (it is actually deferred until runtime).