Goals of a compilerCode produced should be reasonably efficient and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+. 2x50000sum = 0.0for(i=0;i

General Structure of a Compiler

The Compilation ProcessPhase I: Lexical analysisCompiler examines the individual characters in the source program and groups them into syntactical units called tokensPhase II: ParsingThe sequence of tokens formed by the scanner is checked to see whether it is syntactically correctScannerSourcecodeGroups oftokensParserGroups oftokenscorrectnot correct

The Compilation ProcessPhase III: Semantic analysis and code generationThe compiler analyzes the meaning of the high-level language statement and generates the machine language instructions to carry out these actions

Code GeneratorGroupsof tokensMachinelanguage

The Compilation ProcessPhase IV: Code optimizationThe compiler takes the generated code and sees whether it can be made more efficientCode OptimizerMachinelanguageMachinelanguage

Phase I: Lexical AnalysisLexical analyzerThe program that performs lexical analysisMore commonly called a scannerJob of lexical analyzerGroup input characters into tokensTokens: Syntactical units that are treated as single, indivisible entities for the purposes of translationClassify tokens according to their type

Phase I: Lexical AnalysisInput to a scanner - A high-level language statement from the source programScanners output - A list of all the tokens in that statement - The classification number of each token foundScannersum=sum+a[i];sum 1=3+4a1[12i1]13;6

Phase II: Parsing Parsing phaseA compiler determines whether the tokens recognized by the scanner are a syntactically legal statementPerformed by a parser

Phase II: Parsing Output of a parserA parse tree, if such a tree existsAn error message, if a parse tree cannot be constructedSuccessful construction of a parse tree is proof that the statement is correctly formed

ExampleHigh-level language statement: a = b + c

Grammars, Languages, and BNFSyntaxThe grammatical structure of the languageThe parser must be given the syntax of the languageBNF (Backus-Naur Form) Most widely used notation for representing the syntax of a programming languageliteral_expression ::= integer_literal | float_literal | string | character

Grammars, Languages, and BNFIn BNFThe syntax of a language is specified as a set of rules (also called productions)A grammarThe entire collection of rules for a languageStructure of an individual BNF ruleleft-hand side ::= definition

Grammars, Languages, and BNFBNF rules use two types of objects on the right-hand side of a productionTerminalsThe actual tokens of the languageNever appear on the left-hand side of a BNF ruleNonterminalsIntermediate grammatical categories used to help explain and organize the languageMust appear on the left-hand side of one or more rules

Grammars, Languages, and BNFGoal symbolThe highest-level nonterminalThe nonterminal object that the parser is trying to produce as it builds the parse treeAll nonterminals are written inside angle bracketsJava BNF

Parsing Concepts and TechniquesFundamental rule of parsing:By repeated applications of the rules of the grammar-If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid statement of the language else the sequence of tokens is not a syntactically valid statement of the language

Parsing Concepts and TechniquesLook-ahead parsing algorithms - intelligent parsersOne of the biggest problems in building a compiler is designing a grammar that:Includes every valid statement that we want to be in the languageExcludes every invalid statement that we do not want to be in the language

Parsing Concepts and TechniquesAnother problem in constructing a compiler: Designing a grammar that is not ambiguousAn ambiguous grammar allows the construction of two or more distinct parse trees for the same statement NOT GOOD - multiple interpretations

Phase III: Semantics and Code GenerationSemantic analysisThe compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically validIf they are valid the compiler can generate machine language instructions else there is a semantic error; machine language instructions are not generated

Phase III: Semantics and Code GenerationCode generationCompiler makes a second pass over the parse tree to produce the translated code

Phase IV: Code OptimizationTwo types of optimizationLocal Global Local optimizationThe compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code blockRelatively easy; included as part of most compilers:

Phase IV: Code OptimizationGlobal optimizationThe compiler looks at large segments of the program to decide how to improve performanceMuch more difficult; usually omitted from all but the most sophisticated and expensive production-level optimizing compilersOptimization cannot make an inefficient algorithm efficient - only makes an efficient algorithm more efficient

SummaryA compiler is a piece of system software that translates high-level languages into machine languageGoals of a compiler: Correctness and the production of efficient and concise codeSource program: High-level language program