Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm currently using VC++ but it's the same with any compiler. Why is this?

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

14 Answers
14

Header files

Every single compilation unit requires hundreds or even thousands of headers to be (1) loaded and (2) compiled.
Every one of them typically has to be recompiled for every compilation unit,
because the preprocessor ensures that the result of compiling a header might vary between every compilation unit.
(A macro may be defined in one compilation unit which changes the content of the header).

This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit,
and additionally, every header has to be compiled multiple times
(once for every compilation unit that includes it).

Linking

Once compiled, all the object files have to be linked together.
This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

Parsing

The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate.
This takes a lot of time.

Templates

In C#, List<T> is the only type that is compiled, no matter how many instantiations of List you have in your program.
In C++, vector<int> is a completely separate type from vector<float>, and each one will have to be compiled separately.

Add to this that templates make up a full Turing-complete "sub-language" that the compiler has to interpret,
and this can become ridiculously complicated.
Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations.
Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker.
(It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers,
which means far more code has to be parsed and compiled for every compilation unit.
In plain C code, a header typically only contains forward declarations, but very little actual code.
In C++, it is not uncommon for almost all the code to reside in header files.

Optimization

C++ allows for some very dramatic optimizations.
C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes),
but even a simple C++ template metaprogram can easily generate dozens or hundreds of classes,
all of which are inlined and eliminated again in the optimization phase.

Moreover, a C++ program must be fully optimized by the compiler.
A C# program can rely on the JIT compiler to perform additional optimizations at load-time,
C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

Machine

C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86).
(This is mentioned out of completeness only because it was mentioned in comments and such.
In practice, this step is unlikely to take more than a tiny fraction of the total compilation time).

Conclusion

Most of these factors are shared by C code, which actually compiles fairly efficiently.
The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates.
They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

Regarding point 3: C compilation is noticably faster than C++. It's definitely the frontend that causes the slowdown, and not the code generation.
– TomDec 7 '08 at 7:02

62

Regarding templates: not only vector<int> must be compiled separatedly from vector<double>, but vector<int> is recompiled in each compilation unit that uses it. Redundant definitions are eliminated by the linker.
– David Rodríguez - dribeasDec 31 '08 at 14:16

I haven't used Delphi or Kylix but back in the MS-DOS days, a Turbo Pascal program would compile almost instantaneously, while the equivalent Turbo C++ program would just crawl.

The two main differences were a very strong module system and a syntax that allowed single-pass compilation.

It's certainly possible that compilation speed just hasn't been a priority for C++ compiler developers, but there are also some inherent complications in the C/C++ syntax that make it more difficult to process. (I'm not an expert on C, but Walter Bright is, and after building various commercial C/C++ compilers, he created the D language. One of his changes was to enforce a context-free grammar to make the language easier to parse.)

Also, you'll notice that generally Makefiles are set up so that every file is compiled separately in C, so if 10 source files all use the same include file, that include file is processed 10 times.

It's interesting to compare Pascal, since Niklaus Wirth used the time it took the compiler to compile itself as a benchmark when designing his languages and compilers. There is a story that after carefully writing a module for fast symbol lookup, he replaced it with a simple linear search because the reduced code size made the compiler compile itself faster.
– Dietrich EppDec 17 '12 at 4:15

Parsing and code generation are actually rather fast. The real problem is opening and closing files. Remember, even with include guards, the compiler still have open the .H file, and read each line (and then ignore it).

A friend once (while bored at work), took his company's application and put everything -- all source and header files-- into one big file. Compile time dropped from 3 hours to 7 minutes.

Well, file access sure has a hand in this but as jalf said, the main reason for this will be something else, namely the repeated parsing of many, many, many (nested!) header files that completely drops out in your case.
– Konrad RudolphNov 25 '08 at 19:06

8

It is at that point that your friend needs to set up precompiled headers, break dependancies between different header files (try to avoid one header including another, instead forward declare) and get a faster HDD. That aside, a pretty amazing metric.
– Tom LeysNov 25 '08 at 19:49

6

If the whole header file (except possible comments and empty lines) is within the header guards, gcc is able to remember the file and skip it if the correct symbol is defined.
– CesarBNov 26 '08 at 1:03

11

Parsing is a big deal. For N pairs of similarly-sized source/header files with interdependencies, there are O(N^2) passes through header files. Putting all text into a single file is cutting down that duplicate parsing.
– TomDec 7 '08 at 7:07

7

Small side note: The include guards guard against multiple parsings per compilation unit. Not against multiple parsings overall.
– Marco van de VoortJan 12 '12 at 11:52

The cost added by pre-processing is trivial. The major "other reason" for a slowdown is that compilation is split into separate tasks (one per object file), so common headers get processed over and over again. That's O(N^2) worst-case, vs. most other languages O(N) parsing time.
– TomDec 7 '08 at 7:05

You could tell from the same argumentation that C, Pascal etc. compilers are slow, which is not true on average. It has more to do with C++'s grammar and the huge state that a C++ compiler has to maintain.
– Sebastian MachJun 10 '11 at 8:40

2

C is slow. It suffers from the same header parsing problem as is the accepted solution. E.g. take a simple windows GUI program that includes windows.h in a few compilation unit, and measure the compile performance as you add (short) compilation units.
– Marco van de VoortDec 2 '14 at 12:18

Another reason is the use of the C pre-processor for locating declarations. Even with header guards, .h still have to be parsed over and over, every time they're included. Some compilers support pre-compiled headers that can help with this, but they are not always used.

I think you should bold the comment on precompiled headers to point out this IMPORTANT part of your answer.
– KevinNov 25 '08 at 18:37

5

If the whole header file (except possible comments and empty lines) is within the header guards, gcc is able to remember the file and skip it if the correct symbol is defined.
– CesarBNov 26 '08 at 1:02

5

@CesarB: It still has to process it in full once per compilation unit (.cpp file).
– Sam HarwellMar 25 '10 at 17:38

1) The infinite header reparsing. Already mentioned. Mitigations (like #pragma once) usually only work per compilation unit, not per build.

2) The fact that the toolchain is often separated into multiple binaries (make, preprocessor, compiler, assembler, archiver, impdef, linker, and dlltool in extreme cases) that all have to reinitialize and reload all state all the time for each invocation (compiler, assembler) or every couple of files (archiver, linker, and dlltool).

Note that John, the moderator of comp.compilers seems to agree, and that this means it should be possible to achieve similar speeds for C too, if one integrates the toolchain fully and implements precompiled headers. Many commercial C compilers do this to some degree.

Note that the Unix model of factoring everything out to a separate binary is a kind of the worst case model for Windows (with its slow process creation). It is very noticable when comparing GCC build times between Windows and *nix, especially if the make/configure system also calls some programs just to obtain information.

A relatively large portion of software development time is not spent on writing, running, debugging or even designing code, but waiting for it to finish compiling.
In order to make things fast, we first have to understand what is happening when C/C++ software is compiled. The steps are roughly as follows:

Configuration

Build tool startup

Dependency checking

Compilation

Linking

We will now look at each step in more detail focusing on how they can be made faster.

Configuration

This is the first step when starting to build. Usually means running a configure script or CMake, Gyp, SCons or some other tool. This can take anything from one second to several minutes for very large Autotools-based configure scripts.

This step happens relatively rarely. It only needs to be run when changing configurations or changing the build configuration. Short of changing build systems, there is not much to be done to make this step faster.

Build tool startup

This is what happens when you run make or click on the build icon on an IDE (which is usually an alias for make). The build tool binary starts and reads its configuration files as well as the build configuration, which are usually the same thing.

Depending on build complexity and size, this can take anywhere from a fraction of a second to several seconds. By itself this would not be so bad. Unfortunately most make-based build systems cause make to be invocated tens to hundreds of times for every single build. Usually this is caused by recursive use of make (which is bad).

It should be noted that the reason Make is so slow is not an implementation bug. The syntax of Makefiles has some quirks that make a really fast implementation all but impossible. This problem is even more noticeable when combined with the next step.

Dependency checking

Once the build tool has read its configuration, it has to determine what files have changed and which ones need to be recompiled. The configuration files contain a directed acyclic graph describing the build dependencies. This graph is usually built during the configure step.
Build tool startup time and the dependency scanner are run on every single build. Their combined runtime determines the lower bound on the edit-compile-debug cycle. For small projects this time is usually a few seconds or so. This is tolerable.
There are alternatives to Make. The fastest of them is Ninja, which was built by Google engineers for Chromium.
If you are using CMake or Gyp to build, just switch to their Ninja backends. You don’t have to change anything in the build files themselves, just enjoy the speed boost. Ninja is not packaged on most distributions, though, so you might have to install it yourself.

Compilation

At this point we finally invoke the compiler. Cutting some corners, here are the approximate steps taken.

Merging includes

Parsing the code

Code generation/optimization

Contrary to popular belief, compiling C++ is not actually all that slow. The STL is slow and most build tools used to compile C++ are slow. However there are faster tools and ways to mitigate the slow parts of the language.

Using them takes a bit of elbow grease, but the benefits are undeniable. Faster build times lead to happier developers, more agility and, eventually, better code.

Especially true if BigClass happens to include 5 more files that it uses, eventually including all the code in your program.
– Tom LeysNov 25 '08 at 19:50

7

This is perhaps one reason. But Pascal for example just takes a tenth of the compile time an equivalent C++ program takes. This is not because gcc:s optimization take longer but rather that Pascal is easier to parse and don't have to deal with a preprocessor. Also see Digital Mars D compiler.
– Daniel OMar 27 '09 at 10:20

1

It's not the easier parsing, it is the modularity that avoids reinterpreting windows.h and umpteen other headers for each compilation unit. Yes, Pascal parses easier (though mature ones, like Delphi are more complicated again), but that is not what makes the big difference.
– Marco van de VoortNov 29 '13 at 15:34

1

The technique shown here which offers an improvement in compilation speed is known as forward declaration.
– DavidRRApr 8 '15 at 13:51

writing classes in just one file . wouldn't it be messy code?
– FennekinOct 22 '15 at 17:33

An easy way to reduce compilation time in larger C++ projects is to make a *.cpp include file that includes all the cpp files in your project and compile that. This reduces the header explosion problem to once. The advantage of this is that compilation errors will still reference the correct file.

For example, assume you have a.cpp, b.cpp and c.cpp.. create a file: everything.cpp:

I fail to see the objection to this method. Assuming you generate the includes from a script or Makefile, it is not a maintenance problem. It does in fact speed up compilation without obfuscating compilation issues. You could argue memory consumption on compilation but that is rarely an issue on modern machine. So what is the object to this approach (aside from the assertion that it's wrong)?
– rileybertonMar 4 '13 at 1:33

8

@rileyberton (since someone upvoted your comment) let me spell it out: no it doesn't speed compilation up. In fact, it makes sure that any compile takes the maximum amount of time by not isolating translation units. The great thing about them is, that you don't need to recompile all .cpp-s if they didn't change. (That's disregarding stylistic arguments). Proper dependency management and perhaps precompiled headers are much much better.
– seheMar 4 '13 at 8:55

6

Sorry, but this can be a very efficient method for speeding up compilation, because you (1) pretty much eliminate linking, and (2) only have to process commonly used headers once. Also, it works in practice, if you bother to try it. Unfortunately, it makes incremental rebuilds impossible, so every build is completely from scratch. But a full rebuild with this method is a lot faster than what you'd get otherwise
– jalfMar 4 '13 at 9:00

4

@BartekBanachewicz sure, but what you said was that "it doesn't speed compilation up", with no qualifiers. As you said, it makes every compile take the maximum amount of time (no partial rebuilds), but at the same time, it dramatically reduces the maximum compared to what it'd otherwise be. I'm just saying it's a bit more nuanced than "don't do this"
– jalfMar 4 '13 at 9:04

2

Have fun with static variables and functions. If I want a big compilation unit, I'll create a big .cpp file.
– gnasher729Mar 27 '14 at 9:17

The trade off you are getting is that the program runs a wee bit faster. That may be a cold comfort to you during development, but it could matter a great deal once development is complete, and the program is just being run by users.

Most answers are being a bit unclear in mentioning that C# will always run slower due to the cost of performing actions that in C++ are performed only once at compile time, this performance cost is also impacted due runtime dependencies (more things to load to be able to run), not to mention that C# programs will always have higher memory footprint, all resulting in performance being more closely related to the capability of hardware available. The same is true to other languages that are interpreted or depend on a VM.

There are two issues I can think of that might be affecting the speed at which your programs in C++ are compiling.

POSSIBLE ISSUE #1 - COMPILING THE HEADER: (This may or may not have already been addressed by another answer or comment.) Microsoft Visual C++ (A.K.A. VC++) supports precompiled headers, which I highly recommend. When you create a new project and select the type of program you are making, a setup wizard window should appear on your screen. If you hit the “Next >” button at the bottom of it, the window will take you to a page that has several lists of features; make sure that the box next to the “Precompiled header” option is checked. (NOTE: This has been my experience with Win32 console applications in C++, but this may not be the case with all kinds of programs in C++.)

POSSIBLE ISSUE #2 - THE LOCATION BEING COMPILED TO: This summer, I took a programming course, and we had to store all of our projects on 8GB flash drives, as the computers in the lab we were using got wiped every night at midnight, which would have erased all of our work. If you are compiling to an external storage device for the sake of portability/security/etc., it can take a very long time (even with the precompiled headers that I mentioned above) for your program to compile, especially if it’s a fairly large program. My advice for you in this case would be to create and compile programs on the hard drive of the computer you’re using, and whenever you want/need to stop working on your project(s) for whatever reason, transfer them to your external storage device, and then click the “Safely Remove Hardware and Eject Media” icon, which should appear as a small flash drive behind a little green circle with a white check mark on it, to disconnect it.

As already commented, the compiler spends a lot of time instantiating and instantiating over again the templates. To such an extend that there are projects that focus on that particular item, and claim an observable 30x speed-up in some really favorable cases. See http://www.zapcc.com.