Recent Updates

01/08/2017.

The purpose of this part is to ensure you all have a working and compatible LLVM installation. In order to avoid potential compatibility issues generated from students using different LLVM versions than the expected (3.9.1), we provide a Docker image with bearbones Ubuntu 16.04 and a clean LLVM-3.9.1 installation.

In this guide, we provide instructions on how to install Docker and pull the LLVM image. In case you are not able to use Docker, you will have to install LLVM manually.

Warning! Read Carefully.

Using the provided Docker container requires a installing Docker, 3GB of free space and root access on the host machine (admin rights for windows).

If you are not able to use the provided container, you can install LLVM on your own. Make sure you install LLVM 3.9. We will expect your results to match ours.

Docker containers are not intended to store data. We highly recommend you develop your solutions locally and only use docker to compile and run. The following guides show you how to do that. Obtained results should be stored locally as well. If you develop within the container you are at risk of losing your work. You have been warned.

When you work within the provided container (interactively or not) you are automaticaly logged in as root. If you delete the mounted directory containing your work it will be deleted from the host system. Make sure your work is secure at all times. We recommend you use some sort of version control such as git.

Installing Docker

Installing Docker should be straightforward for Windows and Mac OS users, by downloading it from the Docker website. Linux users will have to follow one of the following guides (link1, link2). The linux guides essentially try to upgrade your system to a compatible version (for example upgrading to Ubuntu 16.04). Be careful not to break your current system. If you are working with linux, having a Ubuntu 16.04 system should result in an easier docker installation.

For Windows users, Docker will require you enable Hyper-V and restart your computer. Some Windows 10 versions do not have Hyper-V. If you face any issues with installing Docker on Windows, installing Docker Toolbox instead of Docker should be the easiest way out. As an added benefit you will later be able to use our linux scripts to start the docker image.

Pulling the LLVM image.

After succesful instalation of Docker, open a command prompt or shell and execute the following command (Windows users should skip the "sudo" part):

Linux:

$ sudo docker pull prodromou87/llvm:3.9

Windows (Powershell prefered):

$ docker pull prodromou87/llvm:3.9

Docker should automaticaly start downloading and extracting the provided LLVM image. If you skip this step the image will automatically be downloaded the first time you attempt to start it. Once finished you can verify you have it by typing

Linux:

$ sudo docker images

Windows:

$ docker images

The Design of LLVM

At a high level, LLVM can be split conceptually into three parts:

A compiler frontend called Clang that takes C++ and translates it to a simplified program representation called LLVM IR,

a target-independent backend that performs various analyses and transformations on LLVM IR, and

a target-dependent backend that lowers LLVM IR to native machine code like x86.

In this class, we’ll only be adding code to (2). In particular, we will be writing additional passes that analyze and transform LLVM IR programs.

LLVM is architected such that a single run consists of a sequence of many passes run back to back. By keeping the logic of each pass relatively isolated from the others, LLVM achieves a nice modularity that makes it easy to add new passes. Additionally, you’ll probably find that LLVM IR is a refreshingly simple representation compared to, say, C++ or x86 assembly. This allows our analyses to avoid the complexity of reasoning about the idiosyncrasies of those more complex languages.

Getting started with LLVM in Docker.

This guide will provide information on how you can develop code locally and then compile and run it within the docker container.

Clone the "getting started" code

Start by cloning the CSE231_Project folder from github:

$ git clone https://github.com/prodromou87/CSE231_Project.git

Take a minute to see what we prepared for you in this folder. You get a set of guides on how to generate IR code, how to compile and how to run a pass. All these guides are also covered in this page.

The mount_and_launch.sh script will automatically mount the other three folders to the appropriate mount points and start a shell in the docker image. All you have to do is write your code under the Passes directory, run mount_and_launch.sh and you are ready to compile and run. You will use the Output folder to store results from running your passes and the Tests folder to store the test cases your passes will run on. Spend some time to familiarize yourselves with the directory structure. Also take a look at the script's code so you understand exactly how it works, in case you need to modify it later on. We run the image with the "--rm" option so it automatically stops and kills the running image after you exit it and "-it" for an interactive session (opening a shell).

To start a shell in the LLVM docker image, type:

Linux:

$ sudo ./mount_and_launch.sh

Windows:

Windows users will have to open the file names "windows_docker_command.txt", modify the paths in the command and then copy-paste it in a Powershell. Some windows machines forbid the execution of scripts in Powershell for security purposes. It's up to you if you wish to enable it and convert the provided file to a Powershell script in the future.

Info!

If you installed Docker toolbox instead of Docker, you should be able to use the mount_and_launch.sh script. Try not to have spaces in the path to the cloned "starter code" because it might lead to unpredictable errors.

Info!

Regardless of the host OS, after you start an interactive session in Docker you will all be working in the same system. The remaining guides in this page will be the same for Windows, Mac OS and Linux users.

Understanding Docker mount points

When you start a shell in Docker, you will find the LLVM source code under /LLVM_ROOT/llvm. The compiled LLVM is located in /LLVM_ROOT/build. The environment in the provided image comes pre-configured so you don't have to type in full paths every time you need to run an LLVM command.

The first thing you'll notice is a message saying that we are working on something. We automaticaly configured the image to move into the /LLVM_ROOT/build directory and run a cmake command to allow you to just go in and compile your passes without having to worry about anything else. This procedure should be quick.

Mount points:

The folders in the "getting started" code you cloned in the previous step will be mounted at the following points within the container (Verify by moving into those mount points (cd command) and make sure the content of the folders is there.

Passes --> /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project

Tests --> /tests

Output --> /output

Spend some time exploring these folders and verify that changes from the host are immediately observable in the image and vice versa. Careful when deleting files because they will be deleted on the host as well.

Info!

If you installed LLVM manually, you will have to generate a folder under the lib/Transforms folder to store your passes, as well as modify the appropriate CMakeLists.txt files.

Generating IR code for a test case

Under the provided "Tests" folder you will find a "Hello World" program written in C++. We will use this as the first test case for your first LLVM pass. You will need to be able to convert source code to IR code for your passes to operate on. Feel free to write your own small program while following this code. All you have to do is create a new folder under "Tests" and write your code in it. Otherwise simply follow this guide to compile the Hello World program.

First, start the docker image by (sudo) executing the provided script. Then navigate (cd) to /tests and make sure the code is mounted properly. Then cd into the HelloWorld folder. You need to use clang to generate IR code by typing the following command:

# clang -O0 -S -emit-llvm HelloWorld.cpp

This will generate a new file called HelloWorld.ll. Since we are working in a mounted folder, the file is immediately present on your host machine as well. You can read the file and see how IR code looks like. You just compiled your first program into LLVM IR. Congratulations.

First Look at LLVM IR

Go ahead and open HelloWorld.ll, which contains the human-readable LLVM IR produced from HelloWorld.cpp in the same directory. You will notice that LLVM IR looks a lot like assembly code. In fact, LLVM IR is an assembly language, with a few unusual features:

Writing and compiling your first LLVM pass

The LLVM pass code is provided for convenience. To understand exactly what each line of code means (or even better try to write it on your own), follow this guide.

For each new pass you create in the future, you will have to create a new directory under "Passes" with your code and the appropriate CMakeLists.txt file in it. Instructions on what to include in that file can be found in the link provided earlier. To make sure your new pass will be compiled, you have to include your new directory in the CMakeLists.txt file under the "Passes" folder, simply following the existing syntax.

You are now ready to compile your first pass. Start the docker image and wait for cmake to complete. You only need to compile the passes under CSE231_Projects instead of the entire LLVM. To do that, you simply cd into /LLVM_ROOT/build/lib/Transforms/CSE231_Project and run make. If everything went well you should be able to find your pass under /LLVM_ROOT/build/lib. Keep in mind that after exiting the docker image, you will have to re-compile your passes. You might want to keep it running between compiling and running.

# cd /LLVM_ROOT/build/lib/Transforms/CSE231_Project
# make
(Pass can now be found under /LLVM_ROOT/build/lib)

Running your first LLVM pass

After compiling your first LLVM pass and generating the IR code for your test cases, all that's left is to run it. The provided test pass prints the message "TEST: ", followed by each function name in the IR code. Output is redirected to the stderr output because we used the function errs() in the source code of the pass (outs() would have sent it to standard out). Because of the pre-configured environment you should be able to run the pass from any directory. You just have to make sure to store the output in the mounted /output folder.

We used 2> /output/test_pass_output.txt to redirect the standard error output to the mounted output directory. You can organize that directory in any way you want. We got rid of the standard output with > /dev/null because we have no use for it. Try to run the pass without it to see what exactly is printed out just for reference.

Let’s dissect this command line:

opt is LLVM’s command line tool for executing passes.

-load LLVMTestPass.so causes opt to load the shared library that contains the Hello pass. We don't need to specify the full path to the pass since the docker image's environment was appropriately configured to know where to search.

-TestPass tells opt that we wish to run the TestPass pass, which was bound to the -TestPass command line flag by the RegisterPass<TestPass> declaration in TestPass.cpp.

By default, the output of opt is a transformed program. Since the Hello pass doesn’t perform any transformations, we just redirect the output to /dev/null to ignore it.

Analogous to -TestPass, opt has flags to enable each of the other passes provided by LLVM. If you’re curious, running opt -help dumps out a massive laundry list of all the passes that are built into LLVM.

Although the TestPass pass may be trivial, it contains a lot of the boilerplate that you will need to use and understand to write your own passes. Part of this assignment will be to read the documentation and learn how to use the existing infrastructure provided by LLVM.

Here is some high-level guidance for how to understand Hello.cpp.

In LLVM, each pass is implemented in a separate C++ class that inherits from one of the subclasses of the Pass class: TestPass subclasses FunctionPass, which implements functionality for analyses and optimizations that only look at a single function at a time. There are analogous passes like ModulePass and BasicBlockPass that look at entire modules or single basic blocks, respectively.

For each kind of Pass, there is a corresponding entry point function called runOn<suffix>, where <suffix> depends on the kind of pass. For example, FunctionPass defines a virtual function called runOnFunction(Function &F) that you fill in with your pass’s implementation. You don’t worry about how this function gets called: you simply write the details of the function, and the system makes sure it gets called for each function in the program.

Idiomatic I/O in LLVM uses the output streams provided by raw_ostream.h rather than those in standard C++. errs() is the error output stream, and there is a corresponding outs() for standard output.

Read the Quick Start section of Writing An LLVM Pass, which goes line-by-line explaining Hello.cpp. Use the rest of the guide as a resource for navigating the LLVM APIs.

Installing LLVM manually

Info!

After you are done installing LLVM manually, follow the previous guides to get the "getting started" code, compile and run an LLVM pass. You will not need to use the docker commands. In most cases you will have to specify full paths to LLVM passes and tools, unless you configure the PATH variable.

Now we’re ready to compile everything. Doing so requires a C++ compiler like g++, so please install one now if it’s not already available. If you are on Ubuntu, the easiest way to do this is to install the build-essential package via sudo apt-get install build-essential.

Once you’re ready, compile like so:

$ cd llvm
$ mkdir build
$ cd build
$ ../src/configure
$ make

LLVM is a large codebase, so this will take a long time (~30 minutes). If all went well, you should see the following message: