Leave your Comments

Text from page-1

Compiler Construction
Part – A
UNIT – 1
Chapter 1
1. Introduction
1.1 Language Processors
A compiler is a program that can read a program in one language (source language) and
translate it into an equivalent program in another language (target language). If the target
program is an executable machine-language program, it can then be called by the user to
process inputs and produce outputs.
An interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in
the source program on inputs supplied by the user.
The machine-language target program produced by a compiler is usually much faster than an
interpreter at mapping inputs to outputs. An interpreter, however, can usually give better error
diagnostics than a compiler, because it executes the source program statement by statement.
A language-processing system typically involves – preprocessor, compiler, assembler and
linker/loader – in translating source program to target machine code.
1.2 The structure of a Compiler
Analysis: source program to intermediate representation (front end)
Synthesis: intermediate representation to target program (back end)
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program is
either syntactically ill formed or semantically unsound, then it must provide informative
messages, so the user can take corrective action. The analysis part also collects information
about the source program and stores it in a data structure called a symbol table, which is
passed along with the intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
The phases of a compiler are: lexical analyzer (scanning) (linear analysis), syntax analyzer
(hierarchical analysis) (parsing), semantic analyzer, intermediate code generator, machineindependent code optimizer, code generator and machine-dependent code optimizer
1

Text from page-2

Symbol table manager and error handler are two independent modules which will interact
with all phases of compilation. A symbol table is a data structure containing a record for each
identifier with fields for the attributes of the identifier. When an identifier in the source
program is detected by the lexical analyzer, the identifier is entered into the symbol table.
Each phase can encounter errors. After detecting an error, a phase must somehow deal with
that error, so that compilation must proceed, allowing further errors in the source program to
be detected.
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads
the stream of characters making up the source program and groups the characters into
meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as
output a token of the form
(token-name, attribute-value)
that it passes on to the subsequent phase, syntax analysis. In the token, the first component
token-name is an abstract symbol that is used during syntax analysis, and the second
component attribute-value points to an entry in the symbol table for this token. Information
from the symbol-table entry is needed for semantic analysis and code generation.
The second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like intermediate
representation that depicts the grammatical structure of the token stream. A typical
representation is a syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition. It also gathers type
information and saves it in either the syntax tree or the symbol table, for subsequent use
during intermediate-code generation.
In the process of translating a source program into target code, a compiler may construct one
or more intermediate representations, which can have a variety of forms. Syntax trees are a
form of intermediate representation; they are commonly used during syntax and semantic
analysis.
The machine-independent code-optimization phase attempts to improve the intermediate code
so that better target code will result. Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes less power.
The code generator takes as input an intermediate representation of the source program and
maps it into the target language. If the target language is machine code, registers Or memory
locations are selected for each of the variables used by the program. Then, the intermediate
instructions are translated into sequences of machine instructions that perform the same task.
A crucial aspect of code generation is the judicious assignment of registers to hold variables.
Different compiler construction tools are: parser generators, scanner generators, syntaxdirected translation engines, code-generator generators, data-flow analysis engines, compilerconstruction toolkits.
2

Text from page-3

1.3 The Evolution of Programming Languages
The move to higher-level languages
The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950's. Initially, the instructions in an assembly
language were just mnemonic representations of machine instructions. Later, macro
instructions were added to assembly languages so that a programmer could define
parameterized shorthands for frequently used sequences of machine instructions.
Impacts on compilers
Compilers can help promote the use of high-level languages by minimizing the execution
overhead of the programs written in these languages. Compilers are also critical in making
high-performance computer architectures effective on users' applications. In fact, the
performance of a computer system is so dependent on compiler technology that compilers are
used as a tool in evaluating architectural concepts before a computer is built.
1.4 The Science of Building a Compiler
A compiler must accept all source programs that conform to the specification of the language;
the set of source programs is infinite and any program can be very large, consisting of
possibly millions of lines of code. Any transformation performed by the compiler while
translating a source program must preserve the meaning of the program being compiled.
Compiler writers thus have influence over not just the compilers they create, but all the
programs that their compilers compile. This leverage makes writing compilers particularly
rewarding; however, it also makes compiler development challenging.
Modeling in compiler design and implementation
The study of compilers is mainly a study of how we design the right mathematical models
and choose the right algorithms, while balancing the need for generality and power against
simplicity and efficiency.
The science of code optimization
The term "optimization" in compiler design refers to the attempts that a compiler makes to
produce code that is more efficient than the obvious code. "Optimization" is thus a misnomer,
since there is no way that the code produced by a compiler can be guaranteed to be as fast or
faster than any other code that performs the same task.
Finally, a compiler is a complex system; we must keep the system simple to assure that the
engineering and maintenance costs of the compiler are manageable. There is an infinite
number of program optimizations that we could implement, and it takes a nontrivial amount
of effort to create a correct and effective optimization. We must prioritize the optimizations,
implementing only those that lead to the greatest benefits on source programs encountered in
practice.
Thus, in studying compilers, we learn not only how to build a compiler, but also the general
methodology of solving complex and open-ended problems. The approach used in compiler
development involves both theory and experimentation. We normally start by formulating the
problem based on our intuitions on what the important issues are.
3

Text from page-4

1.5 Applications of Compiler Technology
Implementation of high-level programming languages
A high-level programming language defines a programming abstraction: the programmer
expresses an algorithm using the language, and the compiler must translate that program to
the target language. Generally, higher-level programming languages are easier to program in,
but are less efficient, that is, the target programs run more slowly. Programmers using a lowlevel language have more control over a computation and can, in principle, produce more
efficient code. Unfortunately, lower-level programs are harder to write and — worse still —
less portable, more prone to errors, and harder to maintain. Optimizing compilers include
techniques to improve the performance of generated code, thus offsetting the inefficiency
introduced by high-level abstractions.
Optimizations for computer architectures
The rapid evolution of computer architectures has also led to an insatiable demand for new
compiler technology. Almost all high-performance systems take advantage of the same two
basic techniques: parallelism and memory hierarchies. Parallelism can be found at several
levels: at the instruction level, where multiple operations are executed simultaneously and at
the processor level, where different threads of the same application are run on different
processors. Memory hierarchies are a response to the basic limitation that we can build very
fast storage or very large storage, but not storage that is both fast and large.
Parallelism-All modern microprocessors exploit instruction-level parallelism. However,
this parallelism can be hidden from the programmer. Programs are written as if all
instructions were executed in sequence; the hardware dynamically checks for
dependencies in the sequential instruction stream and issues them in parallel when
possible. In some cases, the machine includes a hardware scheduler that can change the
instruction ordering to increase the parallelism in the program. Whether the hardware
reorders the instructions or not, compilers can rearrange the instructions to make
instruction-level parallelism more effective.
Memory Hierarchies- A memory hierarchy consists of several levels of storage with
different speeds and sizes, with the level closest to the processor being the fastest but
smallest. The average memory-access time of a program is reduced if most of its accesses
are satisfied by the faster levels of the hierarchy. Both parallelism and the existence of a
memory hierarchy improve the potential performance of a machine, but they must be
harnessed effectively by the compiler to deliver real performance on an application.
Design of new computer architectures
In the early days of computer architecture design, compilers were developed after the
machines were built. That has changed. Since programming in highlevel languages is the
norm, the performance of a computer system is determined not by its raw speed but also by
how well compilers can exploit its features. Thus, in modern computer architecture
development, compilers are developed in the processor-design stage, and compiled code,
running on simulators, is used to evaluate the proposed architectural features.
Program translations
While we normally think of compiling as a translation from a high-level language to the
machine level, the same technology can be applied to translate between different kinds of
languages.
4