Manage Physical Dependencies of a Project to reduce Compilation Time

How to manage Physical Dependencies of a Project to reduce Compilation Time

Introduction

In our daily experience, we are making our project in different files not in a single file. We will do this because we want to reduce the compilation time during the development as well as reuse the code written in different files. For example you want to make some project, which has 10,000 lines of code, now during the development of project or after it if we change any single line, then compiler has to recompile all the 10,000 lines. In today's computes it might not be a big problem, but it will eventually become a nightmare when projects become larger. On the other hand if we split our project into more than one file, such as 10 files each contain 1000 lines, then any change in one file ideally should not effect in other files.

During the development of a project, we usually talk about the design of classes, discussing in terms of design pattern, and describe the relationship among the classes. But most of the times we are not concern about the files, in which those classes are written. In any large-scale project, it is not also worth full to study the physical design of the project, but in some cases, where the project size is very huge, it in inevitable.

Details

It is very common in large-scale project, that it has some general-purpose classes, which are useful in other projects too. So natural solution to use those classes in other projects are to make those classes in separate files. It is common practice of C++ users to make two files for one class, one file contains definitions, and the other has implementation of class and its member functions. Such as a Point class will be something like this

But if you didn't program carefully then sometimes it is not possible to just include these two files in other project and use it. One of the most common problem that may arise is to use include some other definition files too in your project which you might not needed. And other files may also need some other files, so at the end you may have to include a bunch of files to just use one single class.

One example of the is that you might want to use Database class, which is created in some library or DLL, then you might also need to include the definition files of some other classes in that library such as RecordSet.H, DBFactory.H and DBException.H etc. Situation is even worse if you have to include the definition files of different Database classes such as OralceInterface.H, SQLInterface.H and SybaseInterface.H etc.

It is better to see carefully which files are included in files. Especially which files are included in Definition file (Header file)? Because if you change anything in any definition file then all the files, weather it is Definition file or Implementation file, needs to recompile. For compilers prospective, a CPP file with all preprocessor expended, it is called translation unit. In other words, translation unit is an Implementation file with all the definition files included. Here is one such example of translation unit.

Now if you change anything in any of the definition file, which is included in Camera.CPP, it means you have changed this translation unit and now it has to be recompiled. Situation becomes more serious if these definition files are included in more than one translation units now the change in one definition file needs to recompile all those translation units.

Changes in definition files can be minimized if we use them only for definition not for implementation.

"In other words implementation of a function should not be in header file even if it is only one line function. If performance is concerned, then that function can be declared inline explicitly. Now if there is any change in the implementation of the function only, then compiler will recompile only that translation unit".

However, in other case, the change of implementation of function means recompile all translation units, which have this header file.

If one header file is included in other header file, then change the first header file will change all the files which include either first file or second. Situation becomes even worst when header file included another header file, which includes another header file and so on. Now change in one file may need to compile not limited to one file only, but it may recompile the whole project. This diagram shows this concept clearly.

No matter your Camera class does not include Point.H or ViewPort.H directly, but in fact it is included in Camera translation unit. Now change in Point header file will compile not only Camera translation unit, but also all translation units in this example.

Basic rule of thumb to minimize physical dependencies is,

"Try to avoid inclusion of header file within a header file until you don't have any other option".

But how can we make compiler happy when we are not including header file? To see the answer of this question, we first understand in which cases we are force to include header file and in which cases we can avoid it.

You have to include the header file when you need the full detail of the class. In other words you have to include header file when you are access member function or variable of a class, inherit from a class or aggregate its object in another class. We have already decided not to write implementation code in header file; so first case will automatically be eliminated. If you use another object in member functions only, either creating its local object or use it's as a parameter, or contain pointer of another class, then you do not need to include its header file. To make compiler happy too, you can just do forward deceleration of that class in the header file. Now we can restate our basic rule to minimize physical dependencies are

"Use Forward deceleration instead of include header file wherever possible, such as in case when you are not inheriting a class or aggregate it in another class".

For example in this case we have to include Point header file in ViewPort header file.

But you have to include the ViewPort header file in Transformation implementation file, because there is no way to avoid this. But the situation is little bit better and now change in Point.H will not propagate in all translation units. At least it will not have any effect on all the translation units, which include Transformation.H file.

You can further reduce the physical dependencies by make pointer of a class rather than making the object of a class. Because in case of pointer compiler does not need full detail in header file and it can be totally eliminated.

But in this case you have to create and destroy object yourself, as well as there is an extra overhead of function calling. In addition this physical design might not fit very well to your logical design, because you are not doing inheritance, therefore you cant access protected data of a class, and cannot override virtual functions. This technique is also known as "Pointer to Implementation Principle" or in short "PImpl Principle".

There might be one solution to avoid inclusion header within a header. Include all header files in the cpp file before include its own header file. Take a look at above example, ViewPort.H need Point.H file. Now include this header file in ViewPort.CPP before include ViewPort.H file.

// ViewPort.CPP
#include"Point.h"#include"ViewPort.h"

Compilers will look this translation unit is something like this

And happy compile this unit. But there are two problems in this approach, first you have to include header files in proper order, i.e. have to remember the dependencies of header file and include it in proper order and program will not recompile even if you includes all the required header files in not proper order. The second problem is even more problematic, if you want to use ViewPort.H in any other translation unit then that translation unit will not compile until you include Point.H. From physical point of view you haven't change anything, but also create more problems by introducing dependencies among header files, which are hard to remember. Here is one more rule of thumb for manage physical dependencies

"Never make any files which are dependent on the order of header file inclusion."

About the Author

Comments and Discussions

One way you can help keep your dependencies in check is to constantly monitor them - our company produces a product called IncludeManager which draws a graph of your header file includes in real time inside Visual Studio.

This allows you to keep tabs on your dependencies and keep them in check that much easier.

As the subject says, this is a good start. Your last statement regarding inclusion order dependency is especially true !

In my apps I take things a bit further. First, my include file guards are a bit different than most people's. I use a standard form that I first saw in Borland's headers and I see in some of Microsoft's. Here it is :

Having a standard format makes it very easy to remember. I totally despise the format used in MFC apps and changing them is one of the first things that I always do.

Then I make sure that every module can compile with a given header file without depending on another one or their order. This is similar to what you are saying. A specific example: say you have an object derived from a base class. The header for the derived class would like like this :

This way, you can include the derived class' header safely and easily.

I also do this with dialogs. Say the dialog's ID was IDD_MY_DIALOG. I would put this in the dialog's header :

#ifndef IDD_MY_DIALOG
#include "Resource.h"
#endif

I do not put an include of resource.h in the app's header. Including that should NOT be a requirement of every module.

By having a standard format for nested headers file inclusion I find that I have very few problems when using this scheme. Using external guard protectors in the headers means they are always safe. I only see errors in the actual code modules and it is a simple matter to delete the offending line if it pops up because it is always easy to find. This format also improves the compilation speed considerably. I once converted a fairly large application to this format and I saw a huge reduction in the compile time.

Although I already knew everything that's said in the article (therefore it got a 4 instead of a 5), it's good to see an overview that explains everything carefully.

Although it's all true what is said in the article, I see lot of other approaches and suggestions:

Some people (including John Robbins of the excellent "Debugging Applications" book) suggest to put most of the seldom-changed include files in one header file. Then include this header file everywhere and use precompiled headers.

I noticed myself that using templates in a header file can drastically increase compile time (especially when using the Boost template classes).

I also noticed that the combination precompiled headers + full debug seems to make object files 4 times bigger (as compiled to only having full debug and no precompiled headers). It also blows up the size of the .pdb file (500MB and more for a modest application).
This increase is not seen if the application is built without debug.

What does that mean?
Should we put everything (that is not regularly changed) in one precompiled header file? Should we keep templates out of that precompiled header file? Or should we have lots of small header files and no precompiled headers? ...
Is there a header-file-expert in the house?