Transcript

1.
1 LOVELY PROFESSIONAL UNIVERISTY Term paper OF System software (CAP-318) TOPIC NAME :- Source To Source CompilerSource to source complier

2.
2 LOVELY PROFESSIONAL UNIVERISTYSUBMITTED TO: SUBMITTED BY:MR. SANDEEP SHARMA NAME- VISHAL KEDIA CLASS- BCA-MCA (INT) R. NO. – RE37D1-A09 REGD NO-3010070236ACKNOWLEDGEMENTIt is very difficult for any Term Paper schedule to be satisfactorily completedwithout cooperation and benefit of advice from a large no, of persons whether theyare Engineers or Experts in their field of specialization.We sincerely extend our gratitude to all those who helped us to do this Term Paper,however this quite inadequate for the precious time they devoted to me. I amdeeply thankful to MR SANDEEP SHARMA our term paper guide who gave usher immaculate support and guided us throughout the project work. The TermPaper would not have been possible without his Moral Support. Vishal KediaSource to source complier

3.
3 LOVELY PROFESSIONAL UNIVERISTY Cer tificate This is to certify that this Minor Term Paper titled “Source To Source Compiler “has been submitted in the partial fulfillment of the Minor TermPaper in the course System Software. It has been further certified that this Term Paper is an Original work carried out by VISHAL KEDIA under the continuous guidance of MR. SANDEEP SHARMALecturer,Lovely Professional University,Source to source complier

4.
4 LOVELY PROFESSIONAL UNIVERISTYContents1.Introduction (Project Name and Description)...........................................................................................4 Compiler - Native versus cross compiler.............................................................................................11 Compiler - One-pass versus multi-pass compilers..............................................................................11 Compiler - Compiled versus interpreted languages................................................................................12 1. Introduction (Project Name and Description) Source to Source Compiler This Project is based on the working process of the Source CompilerA source-to-source compiler is a type of compiler that takes a high level programming language as itsinput and outputs a high level language. For example, an automatic parallelizing compiler will frequentlytake in a high level language program as an input and then transform the code and annotate it withparallel code annotations (e.g., OpenMP) or language constructs (e.g. Fortrans DOALL statements).Another purpose of source-to-source-compiling is translating legacy code to use the next versionof the underlying programming language or an API that breaks backward compatibility. It willperform automatic code refactoring which is useful when the programs to refactor are outside thecontrol of the original implementer (for example, converting programs from Python 2 to Python3, or converting programs from an old API to the new API) or when the size of the programmakes it impractical or time consuming to refactor it by hand.Source to source complier

5.
5 LOVELY PROFESSIONAL UNIVERISTY What is Compiler?A compiler is a special type of computer program that translates a human readable text file into a formthat the computer can more easily understand. At its most basic level, a computer can only understandtwo things, a 1 and a 0. At this level, a human will operate very slowly and find the information containedin the long string of 1s and 0s incomprehensible. A compiler is a computer program that bridges this gap.In the beginning, compilers were very simple programs that could only translate symbols into the bits,the 1s and 0s, the computer understood. Programs were also very simple, composed of a series of stepsthat were originally translated by hand into data the computer could understand. This was a very timeconsuming task, so portions of this task were automated or programmed, and the first compiler waswritten. This program assembled, or compiled, the steps required to execute the step by step program.These simple compilers were used to write a more sophisticated compiler. With the newer version, morerules could be added to the compiler program to allow a more natural language structure for the humanprogrammer to operate with. This made writing programs easier and allowed more people to beginwriting programs. As more people started writing programs, more ideas about writing programs wereoffered and used to make more sophisticated compilers. In this way, compiler programs continue toevolve, improve and become easier to use.Source to source complier

6.
6 LOVELY PROFESSIONAL UNIVERISTYCompiler programs can also be specialized. Certain language structures are better suited for a particulartask than others, so specific compilers were developed for specific tasks or languages. Some compilersare multistage or multiple pass. A first pass could take a very natural language and make it closer to acomputer understandable language. A second or even a third pass could take it to the final stage, theexecutable file.The intermediate output in a multistage compiler is usually called pseudo-code, since it not usable by thecomputer. Pseudo-code is very structured, like a computer program, not free flowing and verbose like amore natural language. The final output is called the executable file, since it is what is actually executedor run by the computer. Splitting the task up like this made it easier to write more sophisticatedcompilers, as each sub task is different. It also made it easier for the computer to point out where it hadtrouble understanding what it was being asked to do.Errors that limit the compiler in understanding a program are called syntax errors. Errors in the way theprogram functions are called logic errors. Logic errors are much harder to spot and correct. Syntax errorsare like spelling mistakes, whereas logic errors are a bit more like grammatical errors.Example of Execution of Java Source CodeSource to source complier

7.
7 LOVELY PROFESSIONAL UNIVERISTY What is Cross Compiler?Source to source complier

8.
8 LOVELY PROFESSIONAL UNIVERISTYCross compiler programs have also been developed. A cross compiler allows a text file set ofinstructions that is written for one computer designed by a specific manufacturer to be compiledand run for a different computer by a different manufacturer. For example, a program that waswritten to run on an Intel computer can sometimes be cross compiled to run a on computerdeveloped by Motorola. This frequently does not work very well. At the level at which computerprograms operate, the computer hardware can look very different, even if they may look similarto you.Cross compilation is different from having one computer emulate another computer. If a computer isemulating a different computer, it is pretending to be that other computer. Emulation is frequentlyslower than cross compilation, since two programs are running at once, the program that is pretendingto be the other computer and the program that is running. However, for cross compilation to work, youneed both the original natural language text that describes the program and a computer that issufficiently similar to the original computer that the program can function on to run on a differentcomputer. This is not always possible, so both techniques are in use. History of CompilerSeveral experimental compilers were developed in the 1950s (see, for example, the seminal work byGrace Hopper on the A-0 language), but the FORTRAN team led by John Backus at IBM is generallycredited as having introduced the first complete compiler, in 1957. COBOL was an early language to becompiled on multiple architectures, in 1960.The idea of compilation quickly caught on, and most of the principles of compiler design were developedduring the 1960s.A compiler is itself a computer program written in some implementation language. Early compilers werewritten in assembly language. The first self-hosting compiler â€” capable of compiling its own sourcecode in a high-level language â€” was created for Lisp by Hart and Levin at MIT in 1962. The use of high-level languages for writing compilers gained added impetus in the early 1970s when Pascal and Ccompilers were written in their own languages. Building a self-hosting compiler is a bootstrappingproblem -- the first such compiler for a language must be compiled either by a compiler written in adifferent language, or (as in Hart and Levins Lisp compiler) compiled by running the compiler in aninterpreter.Compiler construction and compiler optimization are taught at universities as part of the computerscience curriculum. Such courses are usually supplemented with the implementation of a compiler for aneducational programming language. A well documented example is the PL/0 compiler, which wasoriginally used by Nicklaus Wirth for teaching compiler construction in the 1970s. In spite of its simplicity,the PL/0 compiler introduced several concepts to the field which have since become establishededucational standards: 1. The use of Program Development by Stepwise RefinementSource to source complier

9.
9 LOVELY PROFESSIONAL UNIVERISTY 2. The use of a Recursive descent parser 3. The use of EBNF to specify the syntax of a language 4. The use of P-Code during generation of portable output code 5. The use of T-diagrams for the formal description of the bootstrapping problem Types of CompilerA compiler may produce code intended to run on the same type of computer and operatingsystem ("platform") as the compiler itself runs on. This is sometimes called a native-codecompiler. Alternatively, it might produce code designed to run on a different platform. This isknown as a cross compiler. Cross compilers are very useful when bringing up a new hardwareplatform for the first time (see bootstrapping). A "source to source compiler" is a type ofcompiler that takes a high level language as its input and outputs a high level language. Forexample, an automatic parallelizing compiler will frequently take in a high level languageprogram as an input and then transform the code and annotate it with parallel code annotations(e.g. OpenMP) or language constructs (e.g. Fortrans DOALL statements). 1. One-pass compiler, like early compilers for Pascal o The compilation is done in one pass, hence it is very fast. 2. Threaded code compiler (or interpreter), like most implementations of FORTH o This kind of compiler can be thought of as a database lookup program. It just replaces given strings in the source with given binary code. The level of this binary code can vary; in fact, some FORTH compilers can compile programs that dont even need an operating system. 3. Incremental compiler, like many Lisp systems o Individual functions can be compiled in a run-time environment that also includes interpreted functions. Incremental compilation dates back to 1962 and the first Lisp compiler, and is still used in Common Lisp systems. 4. Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations o This Prolog machine is also known as the Warren abstract machine (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this. 5. Just-in-time compiler, used by Smalltalk and Java systems o Applications are delivered in byte code, which is compiled to native machine code just prior to execution. 6. A re-targetable compiler is a compiler that can relatively easily be modified to generate code for different CPU architectures. The object code produced by these is frequently of lesser qualitySource to source complier

10.
10 LOVELY PROFESSIONAL UNIVERISTY than that produced by a compiler developed specifically for a processor. Re-targetable compilers are often also cross compilers. GCC is an example of a re-targetable compiler. 7. A parallelizing compiler converts a serial input program into a form suitable for efficient execution on parallel computer architecture.Source to source complier

11.
11 LOVELY PROFESSIONAL UNIVERISTY Compiler - Native versus cross compilerMost compilers are classified as either native or cross-compilers.A compiler may produce binary output intended to run on the same type of computer andoperating system ("platform") as the compiler itself runs on. This is sometimes called a native-code compiler. Alternatively, it might produce binary output designed to run on a differentplatform. This is known as a cross compiler. Cross compilers are very useful when bringing up anew hardware platform for the first time (see bootstrapping). Cross compilers are necessary whendeveloping software for microcontroller systems that have barely enough storage for the finalmachine code, much less a compiler. Compilers which are capable of producing both native andforeign binary output may be called either a cross-compiler or a native compiler depending on aspecific use, although it would be more correct to classify them as a cross-compilers.Interpreters are never classified as native or cross-compilers, because they dont output a binaryrepresentation of their input code.Virtual machine compilers are typically not classified as either native or cross-compilers.However, if need be, they can be classified as one or the other, especially in the less usual caseswhere a compiler is running inside the same VM (making it a native compiler), or where acompiler is capable of producing an output for several different platforms, including a VM(making it a cross-compiler). Compiler - One-pass versus multi-pass compilersAll compilers are either one-pass or multi-pass. 1. One-pass compilers like early compilers for the Pascal programming language. o The compilation is done in one pass over the program source, hence the compilation is completed very quickly. 2. Multi-pass compilers, like 2-pass compilers or 3-pass compilers o The compilation is done step by step. Each step uses the result of the previous step and creates another intermediate result. This can improve final performance at the cost of compilation speed.While the typical multi-pass compiler outputs machine code from its final pass, there are severalother types: • A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortrans DOALL statements).Source to source complier

12.
12 LOVELY PROFESSIONAL UNIVERISTY • Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations o This Prolog machine is also known as the Warren Abstract Machine (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this. • Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .Nets Common Intermediate Language (CIL) o Applications are delivered in byte code, which is compiled to native machine code just prior to execution. Compiler - Compiled versus interpreted languagesMany people divide higher-level programming languages into compiled languages andinterpreted languages. However, there is rarely anything about a language that requires it to becompiled or interpreted. Compilers and interpreters are implementations of languages, notlanguages themselves. The categorization usually reflects the most popular or widespreadimplementations of a language -- for instance, BASIC is thought of as an interpreted language,and C a compiled one, despite the existence of BASIC compilers and C interpreters.There are exceptions; some language specifications assume the use of a compiler (as with C), orspell out that implementations must include a compilation facility (as with Common Lisp). Somelanguages have features that are very easy to implement in an interpreter, but make writing acompiler much harder; for example, SNOBOL4, and many scripting languages are capable ofconstructing arbitrary source code at runtime with regular string operations, and then executingthat code by passing it to a special evaluation function. Compiler- Compiler DesignIn the past, compilers were divided into many passes to save space. A pass in this context is a runof the compiler through the source code of the program to be compiled, resulting in the buildingup of the internal data of the compiler (such as the evolving symbols table and other assistingdata). When each pass is finished, the compiler can free the internal data space needed during thatpass. This multi pass method of compiling was the common compiler technology at the time, butwas also due to the small main memories of host computers relative to the source code and data.Many modern compilers share a common two stage design. The front end translates the sourcelanguage into an intermediate representation. The second stage is the back end, which works withthe internal representation to produce code in the output language. The front end and back endmay operate as separate passes, or the front end may call the back end as a subroutine, passing itthe intermediate representation.This approach mitigates complexity separating the concerns of the front end, which typicallyrevolve around language semantics, error checking, and the like, from the concerns of the backend, which concentrates on producing output that is both efficient and correct. It also has theadvantage of allowing the use of a single back end for multiple source languages, and similarlyallows the use of different back ends for different targets.Source to source complier

13.
13 LOVELY PROFESSIONAL UNIVERISTYOften, optimizers and error checkers can be shared by both front ends and back ends if they aredesigned to operate on the intermediate language that a front-end passes to a back end. This canlet many compilers (combinations of front and back ends) reuse the large amounts of work thatoften go into code analyzers and optimizers.Certain languages, due to the design of the language and certain rules placed on the declaration ofvariables and other objects used, and the pre declaration of executable procedures prior toreference or use, are capable of being compiled in a single pass. The Pascal programminglanguage is well known for this capability, and in fact many Pascal compilers are themselveswritten in the Pascal language because of the rigid specification of the language and thecapability to use a single pass to compile Pascal language programs.Compiler Front EndThe compiler front end consists of multiple phases itself, each informed by formal languagetheory: 1. Lexical analysis - breaking the source code text into small pieces (tokens or terminals), each representing a single atomic unit of the language, for instance a keyword, identifier or symbol names. The token language is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning. 2. Syntax analysis - Identifying syntactic structures of source code. It only focuses on the structure. In other words, it identifies the order of tokens and understand hierarchical structures in code. This phase is also called parsing. 3. Semantic analysis is to recognize the meaning of program code and start to prepare for output. In that phase, type checking is done and most of compiler errors show up. 4. Intermediate language generation - an equivalent to the original program is created in an intermediate language.Source to source complier

14.
14 LOVELY PROFESSIONAL UNIVERISTYCompiler Back EndWhile there are applications where only the compiler front end is necessary, such as staticlanguage verification tools, a real compiler hands the intermediate representation generated bythe front end to the back end, which produces a functional equivalent program in the outputlanguage. This is done in multiple steps: 1. Compiler Analysis - This is the process to gather program information from the intermediate representation of the input source files. Typical analysis are variable define-use and use-define chain, data dependence analysis, alias analysis etc. Accurate analysis is the base for any compiler optimizations. The call graph and control flow graph are usually also built during the analysis phase. 2. Optimization - the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are in-line expansion, dead code elimination, constant propagation, loop transformation, register allocation or even auto parallelization. 3. Code generation - the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modesSource to source complier

15.
15 LOVELY PROFESSIONAL UNIVERISTY Bibliography• SYSTEM PROGRAMMING BY JOHN J. DONOVAN• WWW.GOOGLE.COMSource to source complier