COMP207P Compilers Guidelines Part 1: Lexing and Parsing

Published on February 17, 2017
under Blog

Abstract

During my 2nd year as a computer scientist at UCL I got a chance to work on an amazing compilers coursework. It was a part of COMP207P Compilers module and together with 2 of my teammates I was faced with a challenge to develop the compiler front-end for a fictitious $\tilde{Z}$ language.

In this article I'll be giving general guidelines on how to get started with this coursework. It probably won't make much sense to you if you're not a computer science student at UCL, but I do give some useful tips for using JFlex and CUP with Intellij Idea so you might wanna check that part out. Unfortunately I cannot talk about the actual implementation because of plagiarism concerns.

Gaps in knowledge can vary a lot from person to person so I'll cover the basics of pretty much everything that you might need in this coursework. Hopefully, if you happen to be my teammate some time in the future, you'll know all of this :^) . Keep in mind that I'm working on Linux so some of the guidelines here might not work for you, in which case I suggest you use one of the Linux machines in CS labs.

Finally, keep in mind that I won't be the one marking your coursework and I take no responsibility whatsoever for whatever happens to be the outcome. That said, I've explicitly asked for Earl Barr's permission to share these guidelines and my test runner, which you'll find below.

This article talks about the first piece of coursework in the COMP207P course, the one concerned with compiler's frontend. The article about the second part of the coursework can be found here.

Please feel free to comment below or let me know through other media if you have any concerns or suggestions about the guidelines on this page. I will try to fix them to the best of my ability.

Setting up the project

You'll most definitely be working in groups so you absolutely must use Git. I'm gonna talk about how to setup a Git repo for this coursework. In the end I'll mention continuous integration (CI) but it's up to you whether you're gonna use it or not.

Setting up Git

First of all, setup a Git repository. You can use GitHub, GitLab.com, or UCL CS GitLab, whichever you prefer - at the end of the day they all use Git. That said, remember that you'll most likely have to submit your coursework through UCL CS GitLab. Knowing this, I personally went with GitHub anyway, at least for the development phase.

Install Git and setup a repository for your project (following tutorials from the service you chose if needed) and git clone it to somewhere on your local machine. I strongly suggest you look at this guide if you lack experience with Git, your teammates will thank me later. Additionally, if you want a nice way to manage the collaboration process, take a look at this branching model.

From here onward I will assume that you have git installed and your repository is in a folder called comp207p (replace with whatever name you chose). That is, if you run git status when inside directory comp207p you should see something like this:

$ git status
On branch master
nothing to commit, working tree clean

To make life easier for yourself and your teammates, you'll need to add a .gitignore file. Unless you know how to make your own, I suggest using the one I put on GitHub Gist. Simple create a .gitignore file in your repo directory (e.g. comp207p/.gitignore) and paste the contents of the Gist in. This will make sure you don't clutter the repository by pushing temp/generated files.

Getting the source code

First of all, grab the project.zip from the Moodle page for this coursework (only available to UCL students). Unzip it into the folder with your Git repo (comp207p in my case). The structure you'll have would look something like this:

Technically, you can begin writing everything from scratch, but to get some code to start with you can download the Compiler demo from COMP207P Moodle page, which will give you an archive called tool-demo.tar.gz. From this archive you'll only need 2 files, Lexer.lex and Parser.cup, both of which are located in the src/ folder. Copy these 2 files into the src/ folder of your project.

Warning: The Lexer.lex and Parser.cup we used accept a different language, so you will definitely need to change and delete some parts.

To make sure the project compiles you need to define a function called syntax_error and a boolean called syntaxErrors, as described in paragraphs 21 and 22 of project.pdf supplied with coursework source. Somewhere between parser code {: and :} in your Parser.cup add this:

Finally, on line 23 in src/SC.java you might want to uncomment e.printStackTrace(); so that your parser reports Java errors (and not just Parser errors). Remember to comment it out again before submitting.

Preparing your IDE

While you're free to use whatever editor/IDE you like (especially if it's Vim :>) I'd strongly suggest using IntelliJ Idea by JetBrains. If you're a student you get a free license for almost all JetBrains software, including IntelliJ Idea Ultimate. The steps below only apply to this IDE.

I'm gonna assume you got yourself a license and installed Idea. To make working with JFlex and CUP easier you will need 2 Idea plugins - JFlex Support and Cup Support. Install them by going to File > Settings > Plugins > Browser repositories... and searching for an appropriate title. You'll also need BashSupport to run tests and .ignore plugin might prove useful for development in general.

After you restart Idea the plugins will begin to work. Syntax highlighting and auto-completion for .cup should turn on automatically. For .lex files, you might have to enable it manually by right-clicking on Lexer.lex in Project browser, choosing Associate with File Type and picking JFlex from the list.

Testing and CI

Given you have Java 8 installed, you now can run make and make test in your repo to run the test suite provided with the source of the coursework. Unsurprisingly, most of them will fail. Technically, this is enough to get you going and you can jump straight to the development process. That said, you might find my approach to testing a bit more convenient, I'll talk more about it below.

My approach to testing

I like pretty console output so I wrote a test runner which I quite modestly called tim-test.sh, it looks something like this:

It recursively iterates over directories you specify and runs tests, using tests/custom/ directory as the default. It can also run a single test, I'll talk more about it below, but if you want to use it straight away you can find the script on GitHub Gist.

To use the test runner, paste the contents of the Gist mentioned above into a file called tim-test.sh (or any other name you want, really) in the root of your project. Make sure it's executable by running chmod +x tim-test.sh and you're good to go. Running ./tim-test.sh without any parameters will give you usage instructions.

To see tim-test.sh in action, let's create 3 very basic tests and put them into tests/custom/. By the way, all of this done can be done without leaving IntelliJ Idea using the Project browser. First one, tests/custom/n-empty-main.s:

main {
};

Second, tests/custom/p-main.s:

main { print 0; };

And finally, third, tests/custom/i-p-main-read.s:

main { read test; };

Now you can run tests by executing ./tim-test.sh all, ./tim-test.sh dir <path-to-test-dir> or ./tim-test.sh one <path-to-test-file>. You should get the output similar to the one seen below.

Test files with names beginning with p are expected to be parsed successfully, n should fail to parse and i will not be run but will still be displayed in output. Keep in mind that tim-test.sh runs make clean and make before running tests, so you don't have to manually recompile the source.

If you're using IntelliJ Idea and have the BashSupport plugin installed, you can automate the process by creating a Run configuration. Make sure you choose the Bash preset and set settings to something similar to the picture below. Once you're done, you can quickly execute tests using Shift + F10 shortcut without ever leaving the editor.

Continuous integration

CI is a bit out of scope of this article so I'm not gonna talk much about it. I personally use Travis CI with GitHub, but there are other solutions like GitLab CI. If you're using tim-test.sh, here's an example .travis.yml config:

The only dependency is the JDK, because tim-test.sh only uses Shell scripting. If you want to use make and make test for CI, make sure to specify Python as a dependency.

Conclusion

So by now you should have a Git project set up, proper JFlex and Java CUP highlighting in Intellij Idea and testing pipeline that uses tim-test.sh. Now you just have to develop a working lexer and parser for $\tilde{Z}$, which is the easy part :>.

Make sure you understand the basics of JFlex and Java CUP before you start. Also, please report any issues you find with this article are tim-test.sh so I can fix them before someone else gets affected.

End of Article

Timur Kuzhagaliyev Author

I'm a Computer Science student at UCL, spending a year in Caltech. Check out my GitHub to see what I'm up to. Feel free to like and share: