Makefile Tutorial
An Introduction to Compilation & Makefiles

In this tutorial, you will:

learn about the basics on compiling C++

learn how to write a simple makefile

A short aside on compilers

In this class, we’re hoping to use (at least mainly) the clang C++ compiler.
Clang is a compiler frontend based on LLVM, a project based here at UIUC, and
is generally considered more modern (and informative), while being a mostly
drop-in replacement for gcc. It provides the default C/C++ compiler on systems
designed by Apple, and is becoming increasingly more popular for both
industrial and common use. In previous semesters, we taught this course using
exclusively the gcc C++ compiler. This tutorial will be executed chiefly using
clang, and you are encouraged to follow suit. However, the alternative gcc
command will be provided as well, for historical reasons and your interest.

Take note of the difference between a compiler and the language itself—a
language is a standard, and a compiler interprets according to an
implementation of that standard. (Fun fact: Neither the gcc C++ compiler nor
the clang C++ compiler are actually C++ standard compliant.) In practice in
this class, the differences should not overly concern you. However, if you run
two of the clang/gcc paired commands below, such as the one which invokes their
respective preprocessors, you may find that they do in fact have different
internal behaviour.

(Make sure you’ve followed the directions on the Course Setup page to
check out your subversion repository first.)

Open up the file hello_world.cpp in your favourite text editor, and let’s
walk through what it’s doing.

The first line includes the library iostream, the standard input/output
(i/o) streams library. It’s not important that you understand it intimately,
but you’ll use it a lot in the near future. More relevantly, it’s useful for
the upcoming educational example on running the macro preprocessor.

Next, on line 3, we’re defining a function called main that returns an
int (in main’s case, that’s a return code that usually indicates if the
run was successful; we didn’t write a return statement, but in this case,
the return is implicit), and takes no parameters (the empty parens).

On line 4, there’s a helpful and informative one-line comment.

Line 5 is the line that actually does the work. std::cout is a function
from the library iostream that allows us to print something to the standard
out stream. << is the insertion operator (you’ll learn more about operators
later; all you need to know now is that this is print statement syntax), and
the string after that is what we’re printing to standard out.

All in all, a very bare-bones Hello, world implementation.

Let’s try compiling it manually:

clang++ hello_world.cpp -o hello

or (note that the syntax is the same):

g++ hello_world.cpp -o hello

The -o flag tells the compiler to give the executable an alternative name. Otherwise, the default name is a.out.

./hello

The ./ simply tells your shell to search the current directory for the
executable, rather than its normal executable paths. If all goes well, you
should see Hello, world! printed as output. But now let’s try to get a little
more in-depth. You can get rid of the executable you made by typing:

rm hello

And an

ls

should verify its disappearance. Run the following command:

clang++ -save-temps hello_world.cpp -o hello

or:

g++ -save-temps hello_world.cpp -o hello

The flag -save-temps tells the compiler to retain the temporary files it
makes when we compile our program… so we can look at them! Listing the
contents of your current directory should yield four new files: naturally the
executable hello, but also hello_world.ii, hello_world.s, and
hello_world.o, the temporary files we asked the compiler to save, and our
guides into the slightly more technical aspects of basic compilation.

Running the macro preprocessor: What is hello_world.ii?

Run the following line:

clang++ -E hello_world.cpp -o preprocessed.ii

or

g++ -E hello_world.cpp -o preprocessed.ii

Then:

cat preprocessed.ii

If all goes well, your terminal will spit out a large amount of somewhat
unintelligible code, but at the bottom, there’s the code for our Hello, world
program (with the comment stripped out). So what did the preprocessor do?

All it really did for this program was replace our “include” directive
(#include <iostream>) with the actual text of the library we included (and, of
course, strip the comment out).

What does that actually mean? Well, if you were capable of compiling this
program at all, somewhere on the machine (be it virtual, remote, or physically
present) that compiled it, there exists a file called iostream, which
contains the C++ code that implements the i/o streams library. If you were
using clang, it will be located in the directory where the library libc++
(libcxx) is installed. If you were using gcc, it’s in the directory where
libstdc++ (libstdcxx) is installed. Don’t worry about the specific libraries,
it doesn’t really matter, but if you were so inclined, you would be able to
find the code on your own machine. There is no magic involved here.

Back to the preprocessed code. In this case, the only included library was
iostream, but it would do exactly the same thing for any other included
library. If you had a million include directives, it would go through those
millions of lines, find each file you referenced, and tack it to your program,
so that when you referenced a function or class defined in one of those
standard library files, it would make sense to the compiler—like std::cout
in this case, which is a function defined in iostream, that you wouldn’t have
been able to use without including the code. Of course the preprocessor has
plenty of other jobs as well, but we won’t cover them now.

Question: Why did we enclose the library name, iostream, in angle brackets?
It’s not just so our code looks cooler—we could have said #include
"iostream" too (feel free to try it out), so what’s the difference? The
difference (in clang and gcc) is that using angle brackets specifies that the
preprocessor should look in the standard compiler include paths, and quotes
tell it to search the current directory first, and via the standard paths only
if that fails. Note that the true standard definition is a little more
complicated than this: technically, both behave in an “implementation-defined
manner” (any implementation could treat that differently if it so wished) but
that’s not very important for us.

Now you can run:

cat hello_world.ii

Look familiar? That’s the output file the preprocessor dumped, and it is
identical to the output you saw when you ran the preprocessor yourself. This is
the file that the compiler really compiles—not your plain, unpreprocessed
source file.

If you want to be sure, try running:

diff hello_world.ii preprocessed.ii

diff returns no output if the files it’s comparing are identical. Make sure
that both hello_world.ii and preprocessed.ii were produced by the same
compiler, though!

The actual compilation step: What is hello_world.s?

Now let’s take a look at the next temporary file. Print the contents of hello_world.s:

cat hello_world.s

For those of you who have seen assembly code before, the output should be
recognisable. If you haven’t, assembly is the low-level intermediate between
normal, higher-level programming languages like C++, and the machine code that
your computer actually executes. In this case, the compiler (this is the step
of compilation that’s actually called compilation) has translated the
preprocessed source code from C++ to assembly, and dumped the output as
hello_world.s. Let’s ask our compiler to directly compile the code that we
preprocessed into assembly code:

clang++ -S preprocessed.ii -o compiled.s

or

g++ -S preprocessed.ii -o compiled.s

Use diff to verify that the files are the same (again, remember to make sure
that both hello_world.s and compiled.s were produced by the same compiler):

diff hello_world.s compiled.s

If you used gcc, there shouldn’t be any differences. With clang, the only line
that should be different is a line stating what preprocessed file the assembly
was generated from.

Question: Why don’t we just write everything in assembly language? Well, for
one, it’s kind of annoying to write all the time, and higher level ideas are
harder to keep abstract without our human-friendly programming languages.
Perhaps more importantly, assembly isn’t portable in the slightest. Assembly
languages are specific to a specific architecture, so what assembles and runs
on my machine may not run without alteration on yours. That’s pretty annoying,
and compilers work pretty well, so most people normally leave the assembly to
them.

Assembly: What is hello_world.o?

The next step is assembling the code—that just means translating the assembly
code from hello_world.s into machine-readable code. That’s known as object
code, and the standard suffix for object code is .o—and unlike .s, you’re
likely to see quite a few .o files as you continue in this course. That doesn’t
mean you have to read them, though. If you:

cat hello_world.o

you’ll fast realise it would be a somewhat unrealistic expectation anyway.

If you want to ask your compiler to assemble your assembly code, you can do
this:

clang++ -c compiled.s -o assembled.o

or

g++ -c compiled.s -o assembled.o

Linking: Generating the final executable.

Linking is the final step, and arguably the most important and relevant to you.
It’s the part you’ll interact with most, and besides perhaps flat out failure
to compile at all, it’s the part of compiling you’ll be most confused by,
particularly at the beginning of this class, when you’re responsible for all of
your own compilation. Linking problems are some of the most notorious issues
people have early on in this class… so pay attention to it, and perhaps you
will be spared the “undefined reference” trauma.

Hint for the future

“Undefined reference” errors are pretty much always linking errors, and you
will probably have them. Remember this.

All a linker does is take all the object files tossed out by the assembling
step, and join them together into a single executable—in this case, the file
hello which you ran earlier. We only have one object file in our Hello, world
program, so this linking process is very uninteresting, but very soon (like,
later in this tutorial), you’ll be dealing with multiple object files.

Run the following, to have our compiler link our object file and output our
final executable, hello_manual:

clang++ assembled.o -o hello_manual

or

g++ assembled.o -o hello_manual

Feel free to verify that it does exactly the same thing as our original
executable, hello:

./hello_manual
./hello

Congratulations, you’ve just compiled your own miniature program!

Dealing with multiple object files

Let’s visit the example directory animals now.

cd ../animals/
ls

The files you’ll see listed are dog.hpp, dog.cpp, and main.cpp. Feel free
to check out the source code. dog.hpp is a C++ header file, what we’d call
the definition of the Dog class, and dog.cpp is a source file, the
implementation for said class. You’ll become more familiar with the details of
that relationship as the class moves on, but right now, just know that
together, they make the Dog class. main.cpp might look more familiar to
you. It’s a lot like hello_world.cpp from the last exercise, in that it has
some includes and it has an executable main function. In that main
function, it calls a constructor for the class Dog, and asks the object it
creates to do a number of things. But including the Dog header file doesn’t
actually make the source code available. First, compile the main object file:

clang++ -c main.cpp -o main.o

or

g++ -c main.cpp -o main.o

Then, try compiling dog_program:

clang++ main.o -o dog_program

or

g++ main.o -o dog_program

That’s what we did before for our Hello, world program, so what happened this
time? You got a bunch of “undefined reference” errors, and if you remember what
we said a few paragraphs up, “undefined reference” errors are pretty much
always linking errors. The compiler’s telling us that it doesn’t know what the
function Dog::bark() (or any Dog function) does, because it doesn’t have
that information in main.cpp. The solution is to compile a separate object
file for the Dog class. In general, you’ll have one object file per .cpp
source file, compiled together with its header file (.h or .hpp) and other
necessary dependencies. So let’s compile an object file for the Dog class.

clang++ -c dog.cpp

or

g++ -c dog.cpp

And then:

ls

You’ll see that it added a new file called dog.o, the object file for the
Dog class (if you include the header in the compilation, you’ll also see a
.h.gch or .hpp.gch file. The .gch file is a precompiled header; all that
happens with that is in the future, for fulfilling an #include "dog.hpp"
directive, the precompiled header is preferentially used). So now if we wanted
to compile these together, we would do this:

clang++ dog.o main.o -o dog_program

or

g++ dog.o main.o -o dog_program

And that should complete just fine. Try running it like so:

./dog_program

But what happens if we change something? If we just change something in
main.cpp, like the Dog’s name, we just have to run that final linking
command again, and that’s easy. But if we change something in the Dog class
itself, like adding a new function, or changing an implementation, we have to
recompile the Dog object file, and then link it back to the main object file.
That may not seem like a big deal now, but it gets annoying extremely fast when
you have more than a single tiny class.

Introducing the program make

Those of you with some experience in compilation are probably aware of a common
Unix utility called make. It’s a program extremely widely used on Unix based
systems (Microsoft also has a Visual Studio spinoff called nmake), generally
to build executable program files from source files. (Don’t let the “expected
use” case fool you, though—make is not a program limited by the narrow
realm of compilation, as you’ll see before this tutorial is over.)

The best instruction is by example, so let’s build a basic Makefile for our
dog_program. Open a file called Makefile (make sure it’s titlecase—make
will recognise the lowercase makefile as well, but our autograder won’t, so
it’s good to get into the habit now) with your preferred text editor (mine is
emacs, yours may not be, so replace “emacs” with your editor of choice if you
disagree):

emacs Makefile

Note that you won’t see the new file in your directory until you save it.

Makefile rules are written in the format:

target : tgt_dependency1 tgt_dependency2 ...
command

So if our target is dog.o, what are the dependencies (the files needed to make
the target)? They’re dog.cpp and dog.hpp, of course. And the command is the
same as the one we used to compile the object file to begin with. So our rule
for dog.o, the dog object file, will look like this:

dog.o : dog.cpp dog.hpp
clang++ -c dog.cpp

Copy that into your new Makefile, and save it (for the makefile examples, I
won’t explicitly give you the gcc equivalents, but if you want to use gcc
instead, just replace all references to clang++ with g++). Now let’s write a
rule for main.o:

main.o : main.cpp
clang++ -c main.cpp

Tabbing in makefiles

Remember: the tab is very important—if you don’t tab the second line of a
rule, you’ll get the error “*** missing separator. Stop.” Don’t forget your
tabs!

You can remove everything in the directory besides dog.cpp, dog.hpp,
main.cpp, and Makefile for the demonstration to have any real effect, and
then execute make.

rm dog.o dog_program
make

If you ls now, you’ll see that it’s built the target dog.o (and left the
precompiled header as well). But what is make doing?

An aside about the order in which make interprets makefiles

When called, make will search the current directory for a file called Makefile
or makefile (again, for your sanity and grades, please only use Makefile,
titlecase, with a capitalised M). If it finds one, it will execute the first
rule in the file, and if one of the dependencies of the first target does not
yet exist, it will search for a rule that creates it. So for example, if I have
a makefile like so:

then make, when called with no arguments, will attempt to build the target
animal_assembly. Assuming the dependencies moose, goose, and cat are
already available in the directory, it will completely ignore the rules for
them, and build animal_assembly from what’s present. If moose and cat are
available, but goose is not, it will note that moose is present, see that
goose is not present, look for a rule to build goose, find the rule, build
goose, and then note that cat is present and build animal_assembly. If
none of moose, goose, cat are present, it will have to build all of them
using the rules available.

Well, then if make is called with no arguments, it will make the target moose
and stop. If you wanted it to make animal_assembly, you would then have to call
it like so:

make animal_assembly

So a good rule of thumb is to put the final and most important command (for our
purposes, the one that finally links the object files together into an
executable) at the top.

Now back to our dog example. For our dog program, what the above means is
that we should put the rule for the whole program at the top. How should we
write it? Well, perhaps as you’d expect at this point:

dog_program : dog.o main.o
clang++ dog.o main.o -o dog_program

Put that at the top of your makefile, save it, and run make again.

make
ls

Now you should see the executable dog_program, which should behave as it has
in all previous post-compilation incarnations.

Now let’s do one final thing—in general, you should do this when writing your
own Makefiles, but it’s especially useful for instructive purposes: we’ll
write a clean rule.

clean :
rm dog_program *.o

Add that to the bottom of your Makefile (as long as it’s not the top, it
doesn’t really matter, but in long Makefiles, you want to separate the
clean targets from real compilation-relevant targets for clarity), save it,
and run make again, passing clean as an argument to invoke the clean rule:

make clean
ls

What happened? We’ve deleted all of the executables and compilation byproducts
that we created, to clean up the directory. But the most notable thing about
this rule compared to the others we’ve seen is that it a.) lacks dependencies
and b.) doesn’t perform anything compilation-related in its command. Let’s talk
about those two things a bit.

The dependency list

The dependency list you write for a target exists so that make knows what
other targets to ensure you have before you run the command, but if the targets
are guaranteed to be present and make isn’t responsible for updating them,
make technically doesn’t need to check for anything. (It does not parse the
actual command you give it, so it will not know what files to look for based on
that.) Try deleting the dependency list of the target dog.o, and then
running:

make clean
make dog.o

Since dog.cpp and dog.hpp are present in the directory, and make doesn’t have
to rebuild them individually when they change (as it does for dog.o), make will
have never have errors when compiling that line. But if you deleted the
dependency list for the target dog_program and ran:

make clean
make

make will output an error that the recipe for target 'dog_program' failed,
because dog.o was not in the dependency list, and make therefore did not
check to make sure it existed. As such, it didn’t bother to build it. As for
including dependencies that make will never have to build (such as .h/.hpp
and .cpp files), well, it’s simply good practice to document the dependencies
of each target thoroughly. It’s cleaner for other people to read, and it’s a
good way for you to confirm that you’re doing what you wanted to do,
particularly late at night when the lines start to blur together. And now onto
point B.

make will run anything you ask it to, because it’s not as smart as you think it is

This is what we were referring to earlier, when we said make was not limited
to compilation-related commands. Let’s move over to a different directory, for
some make-related messing about.

cd ../file_meddling/
ls

As you can see, the Makefile is currently the only thing in this directory.
It’s a very small and simple one, so open it up with your favourite text
editor, and try guessing what it will do. It’s not compilation—it’s something
altogether much sillier. When you have your prediction, execute make:

make
ls

And now there’s a new file in the directory. The command

cat silly_file

will yield the somewhat accurate phrase “Hello, there is nothing important
here”—I say somewhat because while the file and indeed the phrase itself are
completely unimportant, the concept is, in fact, important. make is not a
magical program that intuits the mysterious delicacies of compilation by
parsing incomprehensible syntax and making anything more of it than what you
yourself put there. make is simply executing the command you gave it, and it
does so blindly, and without any particular personal interest in the results.
Feel free to execute the following now:

make move_file
ls

Now, when make executes the rule for the target move_file, it simply renames
the file silly_file to something even more ungainly. And finally:

make delete_file
ls

removes the file altogether. Usually a rule like this will be named clean,
and it’s very acceptable to stick to that convention for the rest of your life.
However to illustrate that there is nothing magical about the target name
clean (or indeed, any target name at all), in this Makefile, we are using the
clean target to populate our directory with junk. Try it:

make clean
ls

Note that there are now five empty junk files (the directory is not cleaner),
and feel free to remove them:

make really_clean

(For the future, it is recommended that this educational example not be taken
too deeply to heart. Conventions exist for a reason, and that reason is usually
to make everybody’s lives easier. It is always worth knowing, though, that
conventions are ultimately just that—conventions.)

Another important concept is understanding the control flow. In what order
would the commands have to have gone in order to create a new file and fill it
with text? Cheerfully, make will tell you what command it’s executing as it
executes them, but don’t take that for granted. Walk through the Makefile
yourself. In fact, let’s do it together.

The first rule you hit is the rule for the target all. all is a phony target,
commonly used both in the real world and in CS225, placed at the top of a
Makefile, which, in its typical use case, will list all relevant targets
which produce executables as dependencies. This ensures that make will
compile all of the executables for which there are rules listed. In this case,
we’ve just put it at the top because we can. It, of course, is not currently
responsible for any executables.

When you read the rule for all, you see the dependency listed is
fill_file_with_nonsense. Obviously fill_file_with_nonsense doesn’t actually
exist in the directory, so we skip down to the rule for
fill_file_with_nonsense. The dependency listed is create_file, which also
isn’t a real file, so we skip to the rule for create_file, which tells us it
has no dependencies, and to touch silly_file. touch is a standard Unix
program that can create, as we have done here, an empty file.

Once that’s done, we can finish up the rule to “build” fill_file_with_nonsense,
which pipes the string “Hello, there is nothing important here” into the newly
created file silly_file.

Then we can finish up “building” the target all, for which the command is to
print the string “I have mostly created a lot of junk today!” to standard out.
And so it does. Take note that, of course, it “builds” none of the targets that
are not present in its direct control flow, so the unmentioned targets have to
be explicitly passes as arguments to make in order for it to build them.

Just to be really clear, let’s add another rule to our Makefile. Open the
Makefile in your text editor of choice, and write the rule open_file:

open_file :
gedit another_silly_file

(If you do not have gedit installed, use another text editor.) Now run:

make open_file

and the gedit text editor will open another_silly_file. Feel free to make a
little change and run make open_file again. It will open the same file. And
because of our cleverly repetitive naming scheme, we can even delete it with

make delete_file

So hopefully now the basics are painfully clear. Let’s move on.

Marvelous macros

Now let’s gloss over a basic component of makefile syntax that we’ve so far
neglected to mention. Makefile syntax allows for a certain kind of variable
called a macro. Macros are useful in a standard makefile essentially for the
same reason that variables are useful in a normal program—they allow you to
quickly define parts of your program which will appear repeatedly, and if you
later to decide to change that part of the program, well, it’s a single change,
rather than the countlessly many that are possible in large makefiles. In this
class, you will never actually need macros to write an effective and mostly
unrepetitive makefile, but it’s not a bad habit to get into, so let’s see an
example.

cd ../macro_intro/

You may notice that our Hello, world example from ages ago has returned, and
now we have a makefile for it. Open up the Makefile. There’s some rather
strange syntax in here, so let’s try to break it down.

First, we’ve defined a macro called CXX. Unfortunately, this is a special
macro, so we’re going to ignore it briefly and jump to FLAGS. FLAGS is a
macro we defined to refer to the flags we’re passing our compiler; in this
case, the flag is -O, an optimisation option that turns on a series of other
flags which it’s not important for you to know right now (see the clang/gcc
documentation for that information). FLAGS of course isn’t restricted in
value to valid flags—we could have said FLAGS = some moose have large
antlers and make would have been perfectly happy with that, until the call
to clang++ failed later (you can try it out; make will actually try to
execute g++ some moose have large antlers hello_world.cpp -o hello).

Now let’s talk about CXX. Not all macro names in the Makefile language are
completely without meaning—there is a certain set of names which do have a
default meaning. In this case, we’ve defined CXX = clang++. The CXX macro’s
default value is usually g++ on Linux systems, so if we never defined the
macro CXX, when we used it in the command to compile the executable, it would
have probably used g++ instead. Try running make right now, and you should
see the following output:

make
clang++ -O hello_world.cpp -o hello

But if you delete the line that says CXX = clang++, what happens?

make
g++ -O hello_world.cpp -o hello

Feel free to replace the line now.

When you call a macro, enclose it like so: $(MACRO). That’s simply makefile
language syntax. (You may have noticed that my example macro’s name was all
uppercase—as in fact, all of my macros thus far have been. This is not
syntactically required, but it is conventional.)

So that explains most of what’s going on in this file, but the strange symbols
$? and $@ remain, perhaps, mysteries. As you might guess, those are also
macros—they’re special predefined macros in the makefile language, with the
respective meanings “names of the dependencies (newer than the target)” and
“name of the target”, so in this case, $? refers to hello_world.cpp
(provided that you make clean before you make), and $@ refers to hello,
incidentally (purposefully) the name of the executable created as well. Using
shorthand like this is a good motivation to name targets after the file the
rule creates (this is, of course, also conventional, and increases the
readability of your Makefiles drastically). Special predefined macros aren’t
important for you to know—there are others we haven’t yet mentioned—but as
you go about life in CS225 and the real world, you are bound to come across
them.

Compiler and linker flags in CS225

For this class we are going to have a very standard set of flags to pass during
compilation and linking. We are going to define these as macros in each
assignment’s Makefile. Here is an example of what those look like (taken from
lab_intro):

# This defines our compiler and linker, as we've seen before.
CXX = clang++
LD = clang++
# These are the options we pass to the compiler.
# -std=c++1y means we want to use the C++14 standard (called 1y in this version of Clang).
# -stdlib=libc++ specifies that we want to use the standard library implementation called libc++
# -c specifies making an object file, as you saw before
# -g specifies that we want to include "debugging symbols" which allows us to use a debugging program.
# -O0 specifies to do no optimizations on our code.
# -Wall, -Wextra, and -pedantic tells the compiler to look out for common problems with our code. -Werror makes it so that these warnings stop compilation.
CXXFLAGS = -std=c++1y -stdlib=libc++ -c -g -O0 -Wall -Wextra -Werror -pedantic
# These are the options we pass to the linker.
# The first two are the same as the compiler flags.
# -l<something> tells the linker to go look in the system for pre-installed object files to link with.
# Here we want to link with the object files from libpng (since we use it in our code) and libc++. Remember libc++ is the standard library implementation.
LDFLAGS = -std=c++1y -stdlib=libc++ -lpng -lc++abi -lpthread

A final diversion: The makefile language is Turing complete?

Limited the uses may be for such information, but particularly thanks to its
support for lambda abstractions and combinators, the makefile language is
actually a complete functional programming language. Will you ever need to
write a Fibonacci number generator in the makefile language? Probably not, but
you certainly can.

cd ../functional_fun/
make

This will, of course, get quite slow as $$n$$ gets large (the naive solution
takes exponential time), so I suggest you stop the process with a well timed
Ctrl-C as it begins to lag.

fin

That concludes the tutorial on compilation and Makefiles. If you have any
questions, please feel free to look up the concepts yourself, or take them to
the CS225 Piazza newsgroup, or ask your TAs or classmates for help.