C++ Programming/Chapter Beyond the Standard Print version

If you either print this page, choose "Print preview" in your browser, or click Printable version, you will see the page without this notice, without navigational elements to the left or top, and without the TOC boxes on each page.

For more information about the print version, including how to create a truly printable PDF File, see Help:Print versions.

This print version may present an alternative structure and may only contain a subsection of this book and/or those parts, which are well-developed and not disputed. You may add to this print version any chapters of the book that have improved sufficiently to be included under the appropriate heading. See the discussion page for discussion about this excerpt.

This work is licensed under the Creative CommonsAttribution-Share Alike 3.0 Unported license. In short: you are free to share and to make derivatives of this work under the conditions that you appropriately attribute it, and that you only distribute it under the same, similar or a compatible license. Any of the above conditions can be waived if you get permission from the copyright holder.

Unless otherwise noted media and source code and source code usable in stand alone form have their own copyrights and different licenses (but compatible with the Copyleft nature of the work). Media may be under Fair use, that is not recognized in some jurisdictions. Source code or scripts not listing copyrights or license information shall be considered in the public domain.

Beyond the Standard

Resource Acquisition Is Initialization (RAII)

The RAII technique is often used for controlling thread locks in multi-threaded applications. Another typical example of RAII is file operations, e.g. the C++ standard library's file-streams. An input file stream is opened in the object's constructor, and it is closed upon destruction of the object. Since C++ allows objects to be allocated on the stack, C++'s scoping mechanism can be used to control file access.

With RAII we can use, for instance, Class destructors to guarantee clean up, similar to the finally keyword in other languages. Doing this automates the task and so avoids errors but gives the freedom not to use it.

RAII is also used (as shown in the example below) to ensure exception safety. RAII makes it possible to avoid resource leaks without extensive use of try/catch blocks and is widely used in the software industry.

The ownership of dynamically allocated memory (memory allocated with new) can be controlled with RAII. For this purpose, the C++ Standard Library defines auto ptr. Furthermore, lifetime of shared objects can be managed by a smart pointer with shared-ownership semantics such as boost::shared_ptr defined in C++ by the Boost library or policy based Loki::SmartPtr from Loki library.

The following RAII class is a lightweight wrapper to the C standard library file system calls.

Without using RAII, each function using an output log would have to manage the file explicitly. For example, an equivalent implementation without using RAII is this:

void example_without_RAII(){// open file
std::FILE* file_handle = std::fopen("logfile.txt", "w+");if( file_handle ==NULL){throw open_error();}try{if( std::fputs("hello logfile!", file_handle)==EOF){throw write_error();}// continue writing to logfile.txt ... do not return// prematurely, as cleanup happens at the end of this function}catch(...){// manually close logfile.txt
std::fclose(file_handle);// re-throw the exception we just caughtthrow;}// manually close logfile.txt
std::fclose(file_handle);}

The implementation of file and example_without_RAII() becomes more complex if fopen() and fclose() could potentially throw exceptions; example_with_RAII() would be unaffected, however.

The essence of the RAII idiom is that the class file encapsulates the management of any finite resource, like the FILE* file handle. It guarantees that the resource will properly be disposed of at function exit. Furthermore, file instances guarantee that a valid log file is available (by throwing an exception if the file could not be opened).

There's also a big problem in the presence of exceptions: in example_without_RAII(), if more than one resource were allocated, but an exception was to be thrown between their allocations, there's no general way to know which resources need to be released in the final catch block - and releasing a not-allocated resource is usually a bad thing. RAII takes care of this problem; the automatic variables are destructed in the reverse order of their construction, and an object is only destructed if it was fully constructed (no exception was thrown inside its constructor). So example_without_RAII() can never be as safe as example_with_RAII() without special coding for each situation, such as checking for invalid default values or nesting try-catch blocks. Indeed, it should be noted that example_without_RAII() contained resource bugs in previous versions of this article.

This frees example_with_RAII() from explicitly managing the resource as would otherwise be required. When several functions use file, this simplifies and reduces overall code size and helps ensure program correctness.

example_without_RAII() resembles the idiom used for resource management in non-RAII languages such as Java. While Java's try-finally blocks allow for the correct release of resources, the burden nonetheless falls on the programmer to ensure correct behavior, as each and every function using file may explicitly demand the destruction of the log file with a try-finally block.

Garbage collection

Garbage collection is a form of automatic memory management. The garbage collector or collector attempts to reclaim garbage, or memory used by objects that will never be accessed or mutated again by the application.

Tracing garbage collectors require some implicit runtime overhead that may be beyond the control of the programmer, and can sometimes lead to performance problems. For example, commonly used Stop-The-World garbage collectors, which pause program execution at arbitrary times, may make garbage collecting languages inappropriate for some embedded systems, high-performance server software, and applications with real-time needs.

A more fundamental issue is that garbage collectors violate locality of reference, since they deliberately go out of their way to find bits of memory that haven't been accessed recently. The performance of modern computer architectures is increasingly tied to caching, which depends on the assumption of locality of reference for its effectiveness. Some garbage collection methods result in better locality of reference than others. Generational garbage collection is relatively cache-friendly, and copying collectors automatically defragment memory helping to keep related data together. Nonetheless, poorly timed garbage collection cycles could have a severe performance impact on some computations, and for this reason many runtime systems provide mechanisms that allow the program to temporarily suspend, delay or activate garbage collection cycles.

Despite these issues, for many practical purposes, allocation/deallocation-intensive algorithms implemented in modern garbage collected languages can actually be faster than their equivalents using explicit memory management (at least without heroic optimizations by an expert programmer). A major reason for this is that the garbage collector allows the runtime system to amortize allocation and deallocation operations in a potentially advantageous fashion. For example, consider the following program in C++:

A design pattern is neither a static solution, nor is it an algorithm. A pattern is a way to describe and address by name (mostly a simplistic description of its goal), a repeatable solution or approach to a common design problem, that is, a common way to solve a generic problem (how generic or complex, depends on how restricted the target goal is). Patterns can emerge on their own or by design. This is why design patterns are useful as an abstraction over the implementation and a help at design stage. With this concept, an easier way to facilitate communication over a design choice as normalization technique is given so that every person can share the design concept.

Depending on the design problem they address, design patterns can be classified in different categories, of which the main categories are:

Patterns are commonly found in objected-oriented programming languages like C++ or Java. They can be seen as a template for how to solve a problem that occurs in many different situations or applications. It is not code reuse, as it usually does not specify code, but code can be easily created from a design pattern. Object-oriented design patterns typically show relationships and interactions between classes or objects without specifying the final application classes or objects that are involved.

Each design pattern consists of the following parts:

Problem/requirement

To use a design pattern, we need to go through a mini analysis design that may be coded to test out the solution. This section states the requirements of the problem we want to solve. This is usually a common problem that will occur in more than one application.

Forces

This section states the technological boundaries, that helps and guides the creation of the solution.

Solution

This section describes how to write the code to solve the above problem. This is the design part of the design pattern. It may contain class diagrams, sequence diagrams, and or whatever is needed to describe how to code the solution.

Design patterns can be considered as a standardization of commonly agreed best practices to solve specific design problems. One should understand them as a way to implement good design patterns within applications. Doing so will reduce the use of inefficient and obscure solutions[citation needed]. Using design patterns speeds up your design and helps to communicate it to other programmers[citation needed].

Creational Patterns

In software engineering, creational design patterns are design patterns that deal with object creation mechanisms, trying to create objects in a manner suitable to the situation. The basic form of object creation could result in design problems or added complexity to the design. Creational design patterns solve this problem by somehow controlling this object creation.

In this section of the book we assume that the reader has enough familiarity with functions, global variables, stack vs. heap, classes, pointers, and static member functions as introduced before.

As we will see there are several creational design patterns, and all will deal with a specific implementation task, that will create a higher level of abstraction to the code base, we will now cover each one.

Builder

The Builder Creational Pattern is used to separate the construction of a complex object from its representation so that the same construction process can create different objects representations.

Problem

We want to construct a complex object, however we do not want to have a complex constructor member or one that would need many arguments.

Solution

Define an intermediate object whose member functions define the desired object part by part before the object is available to the client. Builder Pattern lets us defer the construction of the object until all the options for creation have been specified.

Factory

Definition: A utility class that creates an instance of a class from a family of derived classes

Abstract Factory

Definition: A utility class that creates an instance of several families of classes. It can also return a factory for a certain group.

The Factory Design Pattern is useful in a situation that requires the creation of many different types of objects, all derived from a common base type. The Factory Method defines a method for creating the objects, which subclasses can then override to specify the derived type that will be created. Thus, at run time, the Factory Method can be passed a description of a desired object (e.g., a string read from user input) and return a base class pointer to a new instance of that object. The pattern works best when a well-designed interface is used for the base class, so there is no need to cast the returned object.

Problem

We want to decide at run time what object is to be created based on some configuration or application parameter. When we write the code, we do not know what class should be instantiated.

Solution

Define an interface for creating an object, but let subclasses decide which class to instantiate. Factory Method lets a class defer instantiation to subclasses.

In the following example, a factory method is used to create laptop or desktop computer objects at run time.

Let's start by defining Computer, which is an abstract base class (interface) and its derived classes: Laptop and Desktop.

class Computer
{public:virtualvoid Run()=0;virtualvoid Stop()=0;virtual ~Computer(){};/* without this, you do not call Laptop or Desktop destructor in this example! */};class Laptop:public Computer
{public:virtualvoid Run(){mHibernating =false;};virtualvoid Stop(){mHibernating =true;};virtual ~Laptop(){};/* because we have virtual functions, we need virtual destructor */private:bool mHibernating;// Whether or not the machine is hibernating};class Desktop:public Computer
{public:virtualvoid Run(){mOn =true;};virtualvoid Stop(){mOn =false;};virtual ~Desktop(){};private:bool mOn;// Whether or not the machine has been turned on};

The actual ComputerFactory class returns a Computer, given a real world description of the object.

Let's analyze the benefits of this design. First, there is a compilation benefit. If we move the interface Computer into a separate header file with the factory, we can then move the implementation of the NewComputer() function into a separate implementation file. Now the implementation file for NewComputer() is the only one that requires knowledge of the derived classes. Thus, if a change is made to any derived class of Computer, or a new Computer subtype is added, the implementation file for NewComputer() is the only file that needs to be recompiled. Everyone who uses the factory will only care about the interface, which should remain consistent throughout the life of the application.

Also, if there is a need to add a class, and the user is requesting objects through a user interface, no code calling the factory may be required to change to support the additional computer type. The code using the factory would simply pass on the new string to the factory, and allow the factory to handle the new types entirely.

Imagine programming a video game, where you would like to add new types of enemies in the future, each of which has different AI functions and can update differently. By using a factory method, the controller of the program can call to the factory to create the enemies, without any dependency or knowledge of the actual types of enemies. Now, future developers can create new enemies, with new AI controls and new drawing member functions, add it to the factory, and create a level which calls the factory, asking for the enemies by name. Combine this method with an XML description of levels, and developers could create new levels without having to recompile their program. All this, thanks to the separation of creation of objects from the usage of objects.

Prototype

A prototype pattern is used in software development when the type of objects to create is determined by a prototypical instance, which is cloned to produce new objects. This pattern is used, for example, when the inherent cost of creating a new object in the standard way (e.g., using the new keyword) is prohibitively expensive for a given application.

Implementation: Declare an abstract base class that specifies a pure virtual clone() method. Any class that needs a "polymorphic constructor" capability derives itself from the abstract base class, and implements the clone() operation.

Here the client code first invokes the factory method. This factory method, depending on the parameter, finds out the concrete class. On this concrete class, the clone() method is called and the object is returned by the factory method.

This is sample code which is a sample implementation of Prototype method. We have the detailed description of all the components here.

Record class, which is a pure virtual class that has a pure virtual method clone().

CarRecord, BikeRecord and PersonRecord as concrete implementation of a Record class.

An enum RECORD_TYPE_en as one to one mapping of each concrete implementation of Record class.

RecordFactory class that has a Factory method CreateRecord(…). This method requires an enum RECORD_TYPE_en as parameter and depending on this parameter it returns the concrete implementation of Record class.

To implement the pattern, declare an abstract base class that specifies a pure virtual clone() member function. Any class that needs a "polymorphic constructor" capability derives itself from the abstract base class, and implements the clone() operation.

The client, instead of writing code that invokes the new operator on a hard-wired class name, calls the clone() member function on the prototype, calls a factory member function with a parameter designating the particular concrete derived class desired, or invokes the clone() member function through some mechanism provided by another design pattern.

A client of one of the concrete monster classes only needs a reference (pointer) to a CPrototypeMonster class object to be able to call the ‘Clone’ function and create copies of that object. The function below demonstrates this concept:

Now originalMonster can be passed as a pointer to CGreenMonster, CPurpleMonster or CBellyMonster.

Singleton

The Singleton pattern ensures that a class has only one instance and provides a global point of access to that instance. It is named after the singleton set, which is defined to be a set containing one element. This is useful when exactly one object is needed to coordinate actions across the system.

Check list

Define a private static attribute in the "single instance" class.

Define a public static accessor function in the class.

Do "lazy initialization" (creation on first use) in the accessor function.

Define all constructors to be protected or private.

Clients may only use the accessor function to manipulate the Singleton.

Let's take a look at how a Singleton differs from other variable types.

Like a global variable, the Singleton exists outside of the scope of any functions. Traditional implementation uses a static member function of the Singleton class, which will create a single instance of the Singleton class on the first call, and forever return that instance. The following code example illustrates the elements of a C++ singleton class, that simply stores a single string.

class StringSingleton
{public:// Some accessor functions for the class, itself
std::string GetString()const{return mString;}void SetString(const std::string&newStr){mString = newStr;}// The magic function, which allows access to the class from anywhere// To get the value of the instance of the class, call:// StringSingleton::Instance().GetString();static StringSingleton &Instance(){// This line only runs once, thus creating the only instance in existencestatic StringSingleton *instance =new StringSingleton;// dereferencing the variable here, saves the caller from having to use // the arrow operator, and removes temptation to try and delete the // returned instance.return*instance;// always returns the same instance}private:// We need to make some given functions private to finish the definition of the singleton
StringSingleton(){}// default constructor available only to members or friends of this class// Note that the next two functions are not given bodies, thus any attempt // to call them implicitly will return as compiler errors. This prevents // accidental copying of the only instance of the class.
StringSingleton(const StringSingleton &old);// disallow copy constructorconst StringSingleton &operator=(const StringSingleton &old);//disallow assignment operator// Note that although this should be allowed, // some compilers may not implement private destructors// This prevents others from deleting our one single instance, which was otherwise created on the heap
~StringSingleton(){}private:// private data for an instance of this class
std::string mString;};

Variations of Singletons:

To do:
Discussion of Meyers Singleton and any other variations.

Applications of Singleton Class:

One common use of the singleton design pattern is for application configurations. Configurations may need to be accessible globally, and future expansions to the application configurations may be needed. The subset C's closest alternative would be to create a single global struct. This had the lack of clarity as to where this object was instantiated, as well as not guaranteeing the existence of the object.

Take, for example, the situation of another developer using your singleton inside the constructor of their object. Then, yet another developer decides to create an instance of the second class in the global scope. If you had simply used a global variable, the order of linking would then matter. Since your global will be accessed, possibly before main begins executing, there is no definition as to whether the global is initialized, or the constructor of the second class is called first. This behavior can then change with slight modifications to other areas of code, which would change order of global code execution. Such an error can be very hard to debug. But, with use of the singleton, the first time the object is accessed, the object will also be created. You now have an object which will always exist, in relation to being used, and will never exist if never used.

A second common use of this class is in updating old code to work in a new architecture. Since developers may have used globals liberally, moving them into a single class and making it a singleton, can be an intermediary step to bring the program inline to stronger object oriented structure.

Note:
In the above example, the first call to Singleton::GetInstance will initialize the singleton instance. This example is for illustrative purposes only; for anything but a trivial example program, this code contains errors.

Structural Patterns

Adapter

Convert the interface of a class into another interface that clients expect. Adapter lets classes work together that couldn't otherwise because of incompatible interfaces.

Composite

Composite lets clients treat individual objects and compositions of objects uniformly. The Composite pattern can represent both the conditions. In this pattern, one can develop tree structures for representing part-whole hierarchies.

Decorator

The decorator pattern helps to attach additional behavior or responsibilities to an object dynamically. Decorators provide a flexible alternative to subclassing for extending functionality. This is also called “Wrapper”. If your application does some kind of filtering, then Decorator might be good pattern to consider for the job

Facade

The Facade Pattern hides the complexities of the system by providing an interface to the client from where the client can access the system on an unified interface. Facade defines a higher-level interface that makes the subsystem easier to use. For instance making one class method perform a complex process by calling several other classes.

/*Facade is one of the easiest patterns I think... And this is very simple example.
Imagine you set up a smart house where everything is on remote. So to turn the lights on you push lights on button - And same for TV,
AC, Alarm, Music, etc...
When you leave a house you would need to push a 100 buttons to make sure everything is off and are good to go which could be little
annoying if you are lazy like me
so I defined a Facade for leaving and coming back. (Facade functions represent buttons...) So when I come and leave I just make one
call and it takes care of everything...
*/#include <string>#include<iostream>usingnamespace std;class Alarm
{public:void alarmOn(){cout<<"Alarm is on and house is secured"<<endl;}void alarmOff(){cout<<"Alarm is off and you can go into the house"<<endl;}};class Ac
{public:void acOn(){cout<<"Ac is on"<<endl;}void acOff(){cout<<"AC is off"<<endl;}};class Tv
{public:void tvOn(){cout<<"Tv is on"<<endl;}void tvOff(){cout<<"TV is off"<<endl;}};class HouseFacade
{
Alarm alarm;
Ac ac;
Tv tv;public:
HouseFacade(){}void goToWork(){
ac.acOff();
tv.tvOff();
alarm.alarmOn();}void comeHome(){
alarm.alarmOff();
ac.acOn();
tv.tvOn();}};int main(){
HouseFacade hf;//Rather than calling 100 different on and off functions thanks to facade I only have 2 functions...
hf.goToWork();
hf.comeHome();}

Flyweight

The pattern for saving memory (basically) by sharing properties of objects. Imagine a huge amount of similar objects which all have most of their properties the same. It's natural to move these properties out of these objects to some external data structure and provide each object with the link to that data structure.

#include <iostream>#include <string>#include <vector>#define NUMBER_OF_SAME_TYPE_CHARS 3;/* Actual flyweight objects class (declaration) */class FlyweightCharacter;/*
FlyweightCharacterAbstractBuilder is a class holding the properties which are shared by
many objects. So instead of keeping these properties in those objects we keep them externally, making
objects flyweight. See more details in the comments of main function.
*/class FlyweightCharacterAbstractBuilder {
FlyweightCharacterAbstractBuilder(){}
~FlyweightCharacterAbstractBuilder(){}public:static std::vector<float> fontSizes;// lets imagine that sizes be may of floating point typestatic std::vector<std::string> fontNames;// font name may be of variable length (lets take 6 bytes is average)staticvoid setFontsAndNames();static FlyweightCharacter createFlyweightCharacter(unsignedshort fontSizeIndex,
unsignedshort fontNameIndex,
unsignedshort positionInStream);};
std::vector<float> FlyweightCharacterAbstractBuilder::fontSizes(3);
std::vector<std::string> FlyweightCharacterAbstractBuilder::fontNames(3);void FlyweightCharacterAbstractBuilder::setFontsAndNames(){
fontSizes[0]=1.0;
fontSizes[1]=1.5;
fontSizes[2]=2.0;
fontNames[0]="first_font";
fontNames[1]="second_font";
fontNames[2]="third_font";}class FlyweightCharacter {unsignedshort fontSizeIndex;// index instead of actual font sizeunsignedshort fontNameIndex;// index instead of font nameunsigned positionInStream;public:
FlyweightCharacter(unsignedshort fontSizeIndex, unsignedshort fontNameIndex, unsignedshort positionInStream):
fontSizeIndex(fontSizeIndex), fontNameIndex(fontNameIndex), positionInStream(positionInStream){}void print(){
std::cout<<"Font Size: "<< FlyweightCharacterAbstractBuilder::fontSizes[fontSizeIndex]<<", font Name: "<< FlyweightCharacterAbstractBuilder::fontNames[fontNameIndex]<<", character stream position: "<< positionInStream << std::endl;}
~FlyweightCharacter(){}};
FlyweightCharacter FlyweightCharacterAbstractBuilder::createFlyweightCharacter(unsignedshort fontSizeIndex, unsignedshort fontNameIndex, unsignedshort positionInStream){
FlyweightCharacter fc(fontSizeIndex, fontNameIndex, positionInStream);return fc;}int main(int argc, char** argv){
std::vector<FlyweightCharacter> chars;
FlyweightCharacterAbstractBuilder::setFontsAndNames();unsignedshort limit = NUMBER_OF_SAME_TYPE_CHARS;for(unsignedshort i =0; i < limit; i++){
chars.push_back(FlyweightCharacterAbstractBuilder::createFlyweightCharacter(0, 0, i));
chars.push_back(FlyweightCharacterAbstractBuilder::createFlyweightCharacter(1, 1, i +1* limit));
chars.push_back(FlyweightCharacterAbstractBuilder::createFlyweightCharacter(2, 2, i +2* limit));}/*
Each char stores links to it's fontName and fontSize so what we get is:
each object instead of allocating 6 bytes (convention above) for string
and 4 bytes for float allocates 2 bytes for fontNameIndex and fontSizeIndex.
That means for each char we save 6 + 4 - 2 - 2 = 6 bytes.
Now imagine we have NUMBER_OF_SAME_TYPE_CHARS = 1000 i.e. with our code
we will have 3 groups of chars whith 1000 chars in each group which will save us
3 * 1000 * 6 - (3 * 6 + 3 * 4) = 17970 saved bytes.
3 * 6 + 3 * 4 is a number of bytes allocated by FlyweightCharacterAbstractBuilder.
So the idea of the pattern is to move properties shared by many objects to some
external container. The objects in that case don't store the data themselves they
store only links to the data which saves memory and make the objects lighter.
The data size of properties stored externally may be significant which will save REALLY
huge ammount of memory and will make each object super light in comparisson to it's counterpart.
That's where the name of the pattern comes from: flyweight (i.e. very light).
*/for(unsignedshort i =0; i < chars.size(); i++){
chars[i].print();}
std::cin.get();return0;}

Proxy

The Proxy Pattern will provide an object a surrogate or placeholder for another object to control access to it. It is used when you need to represent a complex object with a simpler one. If creation of an object is expensive, it can be postponed until the very need arises and meanwhile a simpler object can serve as a placeholder. This placeholder object is called the “Proxy” for the complex object.

Curiously Recurring Template

This technique is known more widely as a mixin. Mixins are described in the literature to be a powerful tool for expressing abstractions[citation needed].

Interface-based Programming (IBP)

Interface-based programming is closely related with Modular Programming and Object-Oriented Programming, it defines the application as a collection of inter-coupled modules (interconnected and which plug into each other via interface). Modules can be unplugged, replaced, or upgraded, without the need of compromising the contents of other modules.

The total system complexity is greatly reduced. Interface Based Programming adds more to modular Programming in that it insists that Interfaces are to be added to these modules. The entire system is thus viewed as Components and the interfaces that helps them to co-act.

Interface-based Programming increases the modularity of the application and hence its maintainability at a later development cycles, especially when each module must be developed by different teams. It is a well-known methodology that has been around for a long time and it is a core technology behind frameworks such as CORBA.

This is particularly convenient when third parties develop additional components for the established system. They just have to develop components that satisfy the interface specified by the parent application vendor.

Thus the publisher of the interfaces assures that he will not change the interface and the subscriber agrees to implement the interface as whole without any deviation. An interface is therefore said to be a Contractual agreement and the programming paradigm based on this is termed as "interface based programming".

Behavioral Patterns

Chain of Responsibility

Chain of Responsibility pattern has the intent to avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chains the receiving objects and passes the requests along the chain until an object handles it.

Command

Command pattern is an Object behavioral pattern that decouples sender and receiver by encapsulating a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undo-able operations. It can also be thought as an object oriented equivalent of call back method.

Call Back: It is a function that is registered to be called at later point of time based on user actions.

Iterator

The 'iterator' design pattern is used liberally within the STL for traversal of various containers. The full understanding of this will liberate a developer to create highly reusable and easily understandable[citation needed] data containers.

The basic idea of the iterator is that it permits the traversal of a container (like a pointer moving across an array). However, to get to the next element of a container, you need not know anything about how the container is constructed. This is the iterators job. By simply using the member functions provided by the iterator, you can move, in the intended order of the container, from the first element to the last element.

Let us start by considering a traditional single dimensional array with a pointer moving from the start to the end. This example assumes knowledge of pointer arithmetic. Note that the use of "it" or "itr," henceforth, is a short version of "iterator."

constint ARRAY_LEN =42;int*myArray =newint[ARRAY_LEN];// Set the iterator to point to the first memory location of the arrayint*arrayItr = myArray;// Move through each element of the array, setting it equal to its position in the arrayfor(int i =0; i < ARRAY_LEN;++i){// set the value of the current location in the array*arrayItr = i;// by incrementing the pointer, we move it to the next position in the array.// This is easy for a contiguous memory container, since pointer arithmetic // handles the traversal.++arrayItr;}// Do not be messy, clean up after yourselfdelete[] myArray;

This code works very quickly for arrays, but how would we traverse a linked list, when the memory is not contiguous? Consider the implementation of a rudimentary linked list as follows:

class IteratorCannotMoveToNext{};// Error classclass MyIntLList
{public:// The Node class represents a single element in the linked list. // The node has a next node and a previous node, so that the user // may move from one position to the next, or step back a single // position. Notice that the traversal of a linked list is O(N), // as is searching, since the list is not ordered.class Node
{public:
Node():mNextNode(0),mPrevNode(0),mValue(0){}
Node *mNextNode;
Node *mPrevNode;int mValue;};
MyIntLList():mSize(0){}
~MyIntLList(){while(!Empty())
pop_front();}// See expansion for further implementation;int Size()const{return mSize;}// Add this value to the end of the listvoid push_back(int value){
Node *newNode =new Node;
newNode->mValue = value;
newNode->mPrevNode = mTail;
mTail->mNextNode = newNode;
mTail = newNode;++mSize;}// Remove the value from the beginning of the listvoid pop_front(){if(Empty())return;
Node *tmpnode = mHead;
mHead = mHead->mNextNode;delete tmpnode;--mSize;}bool Empty(){return mSize ==0;}// This is where the iterator definition will go, // but lets finish the definition of the list, firstprivate:
Node *mHead;
Node *mTail;int mSize;};

This linked list has non-contiguous memory, and is therefore not a candidate for pointer arithmetic. And we do not want to expose the internals of the list to other developers, forcing them to learn them, and keeping us from changing it.

This is where the iterator comes in. The common interface makes learning the usage of the container easier, and hides the traversal logic from other developers.

Let us examine the code for the iterator, itself.

/*
* The iterator class knows the internals of the linked list, so that it
* may move from one element to the next. In this implementation, I have
* chosen the classic traversal method of overloading the increment
* operators. More thorough implementations of a bi-directional linked
* list would include decrement operators so that the iterator may move
* in the opposite direction.
*/class Iterator
{public:
Iterator(Node *position):mCurrNode(position){}// Prefix incrementconst Iterator &operator++(){if(mCurrNode ==0|| mCurrNode->mNextNode ==0)throw IteratorCannotMoveToNext();e
mCurrNode = mCurrNode->mNextNode;return*this;}// Postfix increment
Iterator operator++(int){
Iterator tempItr =*this;++(*this);return tempItr;}// Dereferencing operator returns the current node, which should then // be dereferenced for the int. TODO: Check syntax for overloading // dereferencing operator
Node * operator*(){return mCurrNode;}// TODO: implement arrow operator and clean up example usage followingprivate:
Node *mCurrNode;};// The following two functions make it possible to create // iterators for an instance of this class.// First position for iterators should be the first element in the container.
Iterator Begin(){return Iterator(mHead);}// Final position for iterators should be one past the last element in the container.
Iterator End(){return Iterator(0);}

With this implementation, it is now possible, without knowledge of the size of the container or how its data is organized, to move through each element in order, manipulating or simply accessing the data. This is done through the accessors in the MyIntLList class, Begin() and End().

// Create a list
MyIntLList myList;// Add some items to the listfor(int i =0; i <10;++i)
myList.push_back(i);// Move through the list, adding 42 to each item.for(MyIntLList::Iterator it = myList.Begin(); it != myList.End();++it)(*it)->mValue +=42;

To do:

Discussion of iterators in the STL, and the usefulness of iterators within the algorithms library.

Iterators best practices

Warnings on creation of and usage of

When use of operator[] is better and simplifies understanding

Cautions about the impact of templates on generated code size (this could make for a good student research paper)

The following program gives the implementation of an iterator design pattern with a generic template:

Mediator

Define an object that encapsulates how a set of objects interact. Mediator promotes loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.

Memento

Without violating encapsulation the Memento Pattern will capture and externalize an object’s internal state so that the object can be restored to this state later. Though the Gang of Four uses friend as a way to implement this pattern it is not the best design[citation needed]. It can also be implemented using PIMPL (pointer to implementation or opaque pointer). Best Use case is 'Undo-Redo' in an editor.

The Originator (the object to be saved) creates a snap-shot of itself as a Memento object, and passes that reference to the Caretaker object. The Caretaker object keeps the Memento until such a time as the Originator may want to revert to a previous state as recorded in the Memento object.

Observer

The Observer Pattern defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

Problem

In one place or many places in the application we need to be aware about a system event or an application state change. We'd like to have a standard way of subscribing to listening for system events and a standard way of notifying the interested parties. The notification should be automated after an interested party subscribed to the system event or application state change. There also should be a way to unsubscribe.

Forces

Observers and observables probably should be represented by objects. The observer objects will be notified by the observable objects.

Solution

After subscribing the listening objects will be notified by a way of method call.

Template Method

By defining a skeleton of an algorithm in an operation, deferring some steps to subclasses, the Template Method lets subclasses redefine certain steps of that algorithm without changing the algorithm's structure.

Visitor

The Visitor Pattern will represent an operation to be performed on the elements of an object structure by letting you define a new operation without changing the classes of the elements on which it operates.

Model-View-Controller (MVC)

A pattern often used by applications that need the ability to maintain multiple views of the same data. The model-view-controller pattern was until recently[citation needed] a very common pattern especially for graphic user interlace programming, it splits the code in 3 pieces. The model, the view, and the controller.

The Model is the actual data representation (for example, Array vs Linked List) or other objects representing a database. The View is an interface to reading the model or a fat client GUI. The Controller provides the interface of changing or modifying the data, and then selecting the "Next Best View" (NBV).

Newcomers will probably see this "MVC" model as wasteful, mainly because you are working with many extra objects at runtime, when it seems like one giant object will do. But the secret to the MVC pattern is not writing the code, but in maintaining it, and allowing people to modify the code without changing much else. Also, keep in mind, that different developers have different strengths and weaknesses, so team building around MVC is easier. Imagine a View Team that is responsible for great views, a Model Team that knows a lot about data, and a Controller Team that see the big picture of application flow, handing requests, working with the model, and selecting the most appropriate next view for that client.

To do:
Erm, someone please come up with a better example than the following... I can not think of any

Perhaps a banking program for customer access to their accounts: A web UI on a traditional browser vs mobile app? And then handling checking/savings accounts vs loan vs line-of-credit accounts? fwiw

For example: A naive central database can be organized using only a "model", for example, a straight array. However, later on, it may be more applicable to use a linked list. All array accesses will have to be remade into their respective Linked List form (for example, you would change myarray[5] into mylist.at(5) or whatever is equivalent in the language you use).

Well, if we followed the MVC pattern, the central database would be accessed using some sort of a function, for example, myarray.at(5). If we change the model from an array to a linked list, all we have to do is change the view with the model, and the whole program is changed. Keep the interface the same but change the underpinnings of it. This would allow us to make optimizations more freely and quickly than before.

One of the great advantages of the Model-View-Controller Pattern is obviously the ability to reuse the application's logic (which is implemented in the model) when implementing a different view. A good example is found in web development, where a common task is to implement an external API inside of an existing piece of software. If the MVC pattern has cleanly been followed, this only requires modification to the controller, which can have the ability to render different types of views dependent on the content type requested by the user agent.

Libraries

Libraries allow existing code to be reused in a program. Libraries are like programs except that instead of relying on main() to do the work you call the specific functions provided by the library to do the work. Functions provide the interface between the program being written and the library being used. This interface is called Application Programming Interface or API.

Libraries should and tend to be domain specific as to permit greater mobility across applications, and provide extended specialization. Libraries that are not, are often header only distribution, intended for static linking as to permit the compiler and the application, only to use the needed bits of code.

What is an API?

To a programmer, an operating system is defined by its API. API stands for Application Programming Interface. An API encompasses all the function calls that an application program can communicate with the hardware or the operating system, or any other application that provides a set of interfaces to the programmer (i.e.: a library), as well as definitions of associated data types and structures. Most APIs are defined on the application Software Development Kit (SDK) for program development.

In simple terms the API can be considered as the interface through which the user (or user programs) will be able interact with the operating system, hardware or other programs to make them to perform a task that may also result in obtaining a result message.

Can an API be called a framework?

No, a framework may provide an API, but a framework is more than a simple API. By default a framework also defines how the code is written, it is a set of solutions, even classes, that as a group addresses the handling of a limited set of related problems and provides not only an API but a default functionality, well designed frameworks enable its interchangeability for a similar framework, striving to provides the same API.

As seen in the File organization Section, compiled libraries consists in C++ headers files that are included by the preprocessor and binary library files which are used by the linker to generate the resulting compilation. For a dynamically linked library, only the loading code is added to the compilation that uses them, the actual loading of the library is done in the memory at run-time.

Programs can make use of libraries in two forms, as static or dynamic depending on how the programmer decides to distribute its code or even due to the licensing used by third party libraries, the static and dynamic libraries section of this book will cover in depth this subject.

Note:
As we will see when covering multi-threading when selecting libraries. Remember to verify if they conform to the your requirements on that area.

Third party libraries

Additional functionality that goes beyond the standard libraries (like garbage collection) are available (often free) by third party libraries, but remember that third party libraries do not necessarily provide the same ubiquitous cross-platform functionality or an API style conformant with as standard libraries. The main motivation for their existence is for preventing one tho reinvent the wheel and to make efforts converge; too much energy has been spent by generations of programmers to write safe and "portable" code.

There are several libraries a programmer is expected to know about or have at least a passing idea of what they are. Time, consistency and extended references will make a few libraries pop-out from the rest. One notable example is the highly respected collection of Boost libraries that we will examine ahead.

Licensing on third party libraries

The programmer may also be limited by the requirements of the license used on external libraries that he has no direct control, for instance the use of the GNU General Public License (GNU GPL) code in closed source applications isn't permitted to address this issue the FSF provides an alternative in the form of the GNU LGPL license that permits such uses but only in the dynamically linked form, this is mirrored by several other legal requirements a programmer must attend and comply to.

Libraries come in two forms, either in source form or in compiled/binary form. Libraries in source-form must first be compiled before they can be included in another project. This will transform the libraries' cpp-files into a lib-file. If a program must be recompiled to run with a new version of a library, but does not need any further changes, the library is said to be source compatible. If a program does not need to be modified and recompiled to use a new version of a library, the library is then classified as being binary compatible.

Static and Dynamic Libraries

Code simplification (no version checks as required in dynamic libraries).

Will only compile the code that is used.

Disadvantages of using static binaries:

Waste of resources: Generates larger binaries, since the library is compiled into the executable. Wastes memory as the library cannot be shared (in memory) between processes (depending on the operating system).

Program will not benefit from bug fixes or extensions in the libraries without being recompiled.

Binary/Source Compatibility of libraries

A library is said to be binary compatible if the program that dynamically links to an earlier version of that library, continues to work using another versions of the same library. If a recompilation of the program is needed for it to run with each new version the library is said to be source compatible.

Producing binary compatible libraries is beneficial for distribution but harder to maintain by the programmer. It is often seen as a better solution to do static linking, if the library is only source compatible, since it will not cause problems to the end-user.

Binary compatibility saves a lot of trouble and is a signal that the library reached a status of stability. It makes it easier to distribute software for a certain platform. Without ensuring binary compatibility between releases, people will be forced to offer statically linked binaries.

header-only libraries

Another distinction that is commonly made about libraries are on how they are distributed (regarding structure and use). A library that is contained only on header files is considered header-only library. Often this means that they are simpler and easy to use, however this will not be the ideal solution for complex code, it will not only hamper readability but result in larger compile times. Also depending on the compiler and it's optimizing capabilities (or options) can, due to the resulting inlining, generate larger binaries. This may not be as important in libraries mostly implemented with templates. Header-only libraries will always contain the source code to the implementation, commercial is rare.

Example: Configuring MS Visual C++ to use external libraries

Note:Boost.org has a install guide named Getting Started on Windows, that points to an automatic installed provided by BoostPro Computing (commonly supporting the previous and older release versions), noting also that if used with the option “Source and Documentation” deselected (selected by default), it will not show the libs/ subdirectory. This will disable the user from rebuilding part of the libraries that aren't only header files. This makes installing it yourself as shown in this section the best option.

Considering you already have decompressed and have the binary part of the Boost library built. There the steps which have to be performed:

Include directory

Set up the include directory. This is the directory that contains the header files (.h/hpp), which describes the library interface:

Library directory

Set up the library directory. This is the directory that contains the pre-compiled library files (.lib):

Library files

Enter library filenames in additional dependencies for the libraries to use:

Some libraries (such as e.g. Boost) use auto-linking to automate the process of selecting library files for linking, based on which header-files are included. Manual selection of library filenames are not required for such libraries if your compiler supports auto-linking.

Dynamic libraries

In case of dynamically loaded (.dll) libraries, one also has to place the DLL-files either in the same folder as the executable, or in the system PATH.

Run-time library

The libraries also have to be compiled with the same run-time library as the one used in your project. Many libraries therefore come in different editions, depending on whether they are compiled for single- or multi-threaded runtime and debug or release runtime, as well as whether they contain debug symbols or not.

Many of Boost's founders are on the C++ standard committee and several Boost libraries have been accepted for incorporation into the Technical Report 1 of C++0x. Although Boost was begun by members of the C++ Standards Committee Library Working Group, participation has expanded to include thousands of programmers from the C++ community at large.

The emphasis is on libraries which work well with the C++ Standard Library. The libraries are aimed at a wide range of C++ users and application domains, and are in regular use by thousands of programmers. They range from general-purpose libraries like SmartPtr, to OS Abstractions like FileSystem, to libraries primarily aimed at other library developers and advanced C++ users, like MPL.

A further goal is to establish "existing practice" and provide reference implementations so that Boost libraries are suitable for eventual standardization. Ten Boost libraries will be included in the C++ Standards Committee's upcoming C++ Standard Library Technical Report as a step toward becoming part of a future C++ Standard.

pointer containers - Containers modeled after most standard STL containers that allow for transparent management of pointers to values

property map - Interface specifications in the form of concepts and a general purpose interface for mapping key values to objects

variant - A safe and generic stack-based object container that allows for the efficient storage of and access to an object of a type that can be chosen from among a set of types that must be specified at compile time.

hash - An implementation of the hash function object specified by the C++ Technical Report 1 (TR1). Can be used as the default hash function for unordered associative containers

lambda - In the spirit of lambda abstractions, allows for the definition of small anonymous function objects and operations on those objects at a call site, using placeholders, especially for use with deferred callbacks from algorithms.

ref - Provides utility class templates for enhancing the capabilities of standard C++ references, especially for use with generic functions

result_of - Helps in the determination of the type of a call expression

base from member idiom - Provides a workaround for a class that needs to initialize a member of a base class inside its own (i.e., the derived class') constructor's initializer list

checked delete - Check if an attempt is made to destroy an object or array of objects using a pointer to an incomplete type

next and prior functions - Allow for easier motion of a forward or bidirectional iterator, especially when the results of such a motion need to be stored in a separate iterator (i.e., should not change the original iterator)

noncopyable - Allows for the prohibition of copy construction and copy assignment

addressof - Allows for the acquisition of an object's real address, bypassing any overloads of operator&(), in the process

result_of - Helps in the determination of the type of a call expression

noncopyable

Linear algebra – uBLAS

Boost includes the uBLASlinear algebra library, with BLAS support for vectors and matrices. uBlas supports a wide range of linear algebra operations, and has bindings to some widely used numerics libraries, such as ATLAS, BLAS and LAPACK.

Ceemple Boost included in Ceemple, a rapid JIT based C++ technical computing environment.

Cross-Platform development

The section is to introduce programmer to programming with the aim of portability across several OSs environments. In today’s world it does not seem appropriate to constrain applications to a single operating system or computer platform, and there is an increasing need to address programming in a cross platform manner.

One can get more information about the API and support from Microsoft itself, using the MSDN Library ( http://msdn.microsoft.com/ ) essentially a resource for developers using Microsoft tools, products, and technologies. It contains a bounty of technical programming information, including sample code, documentation, technical articles, and reference guides. You can also check out Wikibooks Windows Programming book for some more detailed information that goes beyond the scope of this book.

The Windows API has always exposed a large part of the underlying structure of the various Windows systems for which it has been built to the programmer. This has had the advantage of giving Windows programmers a great deal of flexibility and power over their applications. However, it also has given Windows applications a great deal of responsibility in handling various low-level, sometimes tedious, operations that are associated with a Graphical user interface.

Charles Petzold, writer of various well read Windows API books, has said: "The original hello-world program in the Windows 1.0 SDK was a bit of a scandal. HELLO.C was about 150 lines long, and the HELLO.RC resource script had another 20 or so more lines. (...) Veteran C programmers often curled up in horror or laughter when encountering the Windows hello-world program.". A hello world program is a often used programming example, usually designed to show the easiest possible application on a system that can actually do something (i.e. print a line that says "Hello World").

Over the years, various changes and additions were made to the Windows Operating System, and the Windows API changed and grew to reflect this. The windows API for Windows 1.0 supported less than 450 function calls, where in modern versions of the Windows API there are thousands. In general, the interface has remained fairly consistent however, and a old Windows 1.0 application will still look familiar to a programmer who is used to the modern Windows API.

A large emphasis has been put by Microsoft on maintaining software backwards compatibility. To achieve this, Microsoft sometimes went as far as supporting software that was using the API in a undocumented or even (programmatically) illegal way. Raymond Chen, a Microsoft developer who works on the Windows API, has said that he "could probably write for months solely about bad things apps do and what we had to do to get them to work again (often in spite of themselves). Which is why I get particularly furious when people accuse Microsoft of maliciously breaking applications during OS upgrades. If any application failed to run on Windows 95, I took it as a personal failure."

Variables and Win32

Win32 uses an extended set of data types, using C's typedef mechanism These include:

Of course standard data types are also available when programming with Win32 API.

Windows Libraries (DLLs)

In Windows, library code exists in a number of forms, and can be accessed in various ways.

Normally, the only thing that is needed is to include in the appropriate header file on the source code the information to the compiler, and linking to the .lib file will occur during the linking phase.

This .lib file either contains code which is to be statically linked into compiled object code or contains code to allow access to a dynamically link to a binary library(.DLL) on the system.

It is also possible to generate a binary library .DLL within C++ by including appropriate information such as an import/export table when compiling and linking.

DLLs stand for Dynamic Link Libraries, the basic file of functions that are used in some programs. Many newer C++ IDEs such as Dev-CPP support such libraries.

Common libraries on Windows include those provided by the Platform Software Development Kit, Microsoft Foundation Class and a C++ interface to .Net Framework assemblies.

Although not strictly use as library code, the Platform SDK and other libraries provide a set of standardized interfaces to objects accessible via the Component Object Model implemented as part of Windows.

API conventions and Win32 API Functions (by focus)

Time

Time measurement has to come from the OS in relation to the hardware it is run, unfortunately most computers don't have a standard high-accuracy, high-precision time clock that is also quick to access.

GetTickCount has a precision (dependent on your timer tick rate) of one millisecond, its accuracy typically within a 10-55ms expected error, the best thing is that it increments at a constant rate. (WaitForSingleObject uses the same timer).

GetSystemTimeAsFileTime has a precision of 100-nanoseconds, its accuracy is similar to GetTickCount.

QueryPerformanceCounter can be slower to obtain but has higher accuracy, uses the HAL (with some help from ACPI) a problem with it is that it can travel back in time on over-clocked PCs due to garbage on the LSBs, note that the functions fail unless the supplied LARGE_INTEGER is DWORD aligned.

File System

MakeSureDirectoryPathExists (via Image Help Library - IMAGHLP.DLL, #pragma comment( lib, "imagehlp.lib" ), #include <imagehlp.h> ) creates directories, only useful to create/force the existence of a given dir tree or multiple directories, or if the linking is already present, note that it is single threaded.

To do:
Add Basics in building "windows", Window eventhandling, Resources

Network

Network applications are often built in C++ on windows utilizing the WinSock API functions.

Resources

Resources files are perhaps one of the most useful elements included on the WIN32 API, they are how we program menu's, add icons, backgrounds, music and many more aesthetically pleasing elements to our programs. Sadly the use in compilation use of resource files is today to those using the MS Visual Studio IDE (resource editor, resource structure understanding).

Note:
This is similar to the approach of the QT library in dealing with GUI elements, but in this case the OS programing API has direct support for the handling of the resource elements.

The resources are defined in a .rc file (resource c) and are included at the linking phase of compile. Resource files work hand in hand with a header file (usually called resource.h) which carries the definitions of each ID.

Win32 API Wrappers

Since the Win32 API is C based and also a moving target and since some alterations are done in each OS version some wrappers were created, in this section you will find some of the approaches available to facilitate the use of the API in a C++ setup and provide abstraction from the low level stuff with higher level implementations of common needed features, dealing with the GUI, complex controls even communications and database access.

Microsoft Foundation Classes (MFC);

a C++ library for developing Windows applications and UI components. Created by Microsoft for the C++ Windows programmer as an abstraction layer for the Win32 API, the use of the new STL enabled capabilities is scarce on the MFC. It's also compatible with Windows CE (the pocket PC version of the OS). MFC was designed to use the Document-View pattern a variant of the Model View Controller (MVC) pattern.
More info about MFC can be obtained on the Windows Programming Wikibook.

Windows Template Library (WTL);

a C++ library for developing Windows applications and UI components. It extends ATL (Active Template Library) and provides a set of classes for controls, dialogs, frame windows, GDI objects, and more. This library is not supported by Microsoft Services (but is used internally at MS and available for download at MSDN).

a Delphi/C++ library for developing Windows applications, UI components and different kinds of service applications. Created by Borland as an abstraction layer for the Win32 API, but also implementing many non-visual, and non windows-specific objects, like AnsiString class for example.

Note:
There are more generic wrapper that do not focus exclusively on the Windows API, like the Qt (framework) or WxWidgets these are covered on the Generic Wrappers Section of the book.

Generic wrappers

Generic GUI/API wrappers are programming libraries that provide a uniform platform neutral interface (API) to the operating system regardless of underlying platform. Such libraries greatly simplify development of cross-platform software.

Using a wrapper as a portability layer will offer applications some or all following benefits:

Independence from the hardware.

Independence from the Operating System.

Independence from changes made to specific releases.

Independence from API styles and error codes.

Cross-platform programming is more than only GUI programming. Cross-platform programming deals with the minimum requirements for the sections of code that aren't specified by the C++ Standard Language, so as programs can be compiled and run across different hardware platforms.

Here is some cross-platform GUI toolkit:

Gtkmm - an interface for the C GUI library GTK+. It is not cross-platform by design, but rather mutli-platform i.e. can be used on many platform.

Qt (http://qt-project.org) - a cross-platform (Qt is the basis for the Linux KDE desktop environment and supports the X Window System (Unix/X11), Apple Mac OS X, Microsoft Windows NT/9x/2000/XP/Vista/7 and the Symbian OS), it is an object-oriented application development framework, widely used for the development of GUI programs (in which case it is known as a widget toolkit), and for developing non-GUI programs such as console tools and servers. Used in numerous commercial applications such as Google Earth, Skype for Linux and Adobe Photoshop Elements. Released under the LGPL or a commercial license.

WxWidgets (http://www.wxwindows.org/) - a widget toolkit for creating graphical user interfaces (GUIs) for cross-platform applications on Win32, Mac OS X, GTK+, X11, Motif, WinCE, and more using one codebase. It can be used from languages such as C++, Python, Perl, and C#/.NET. Unlike other cross-platform toolkits, wxWidgets applications look and feel native. This is because wxWidgets uses the platform's own native controls rather than emulating them. It's also extensive, free, open-source, and mature. wxWidgets is more than a GUI development toolkit it provides classes for files and streams, application settings, multiple threads, interprocess communication, database access and more.

Multi-tasking

Multi-tasking is a process by which multiple tasks (also known as processes), share common processing resources such as a CPU.

A computer with a single CPU, will only run one process at a time. By running it means that in a specific point in time, the CPU is actively executing instructions for that process. With a single CPU, systems using scheduling can achieve multi-tasking, by which the time of the processor is time-shared by several processes, permitting each to advance their computations, seemingly in parallel. A process runs for some time and another waiting gets a turn.

The act of reassigning a CPU from one task to another one is called a context switch. When context switches occur frequently enough, the illusion of parallelism is achieved.

Note:
Context switching has a cost; when deciding to use multi-tasks, a programmer must be aware of trade-offs in performance.

Even on computers with more than one CPU, multiprocessor machines, multi-tasking allows many more tasks to be run than there are CPUs.

Operating systems may adopt one of many different scheduling strategies, which generally fall into the following categories:

In multiprogramming systems, the running task keeps running until it performs an operation that requires waiting for an external event (e.g. reading from a tape) or until the computer's scheduler forcibly swaps the running task out of the CPU. Multiprogramming systems are designed to maximize CPU usage.

In time-sharing systems, the running task is required to relinquish the CPU, either voluntarily or by an external event such as a hardware interrupt. Time sharing systems are designed to allow several programs to execute apparently simultaneously. The term time-sharing used to define this behavior is no longer in use, having been replaced by the term multi-tasking.

In real-time systems, some waiting tasks are guaranteed to be given the CPU when an external event occurs. Real time systems are designed to control mechanical devices such as industrial robots, which require timely processing.

Multi-tasking has already been successfully integrated into current Operating Systems. Most computers in use today supports running several processes at a time. This is required for systems using symmetric multiprocessor (SMP) in distributed computing and multi-core or chip multiprocessors (CMPs) computing, where processors have gone from dual-core to quad-core and core number will continue to increase. Each technology has its specific limitations and applicability, but all these technologies share the common objective of performing concurrent processing.

Note:
Due to the general adoption of the new paradigm it becomes extremely important to prepare your code for it (plan for scalability), understand guarantees regarding parallelization, and select external libraries that provide the required support.

Processes

Processes are independent execution units that contain their own state information, use their own address spaces, and only interact with each other via inter-process communication(IPC) mechanisms . A process can be said to at least contain one thread of execution (not to be confused to a complete thread construct). Processes are managed by the hosting OS in a process data structure. The maximum number of processes that can run concurrently, depend on the OS and on the available resources of that system.

Child Process

A child process (also spawn process), is a process that was created by another process (the parent process), inheriting most of the parent attributes, such as opened files. Each process may create many child processes but will have at most one parent process; if a process does not have a parent this usually indicates that it was created directly by the kernel.

In UNIX, a child process is in fact created (using fork) as a copy of the parent. The child process can then overlay itself with a different program (using exec) as required. The very first process, called init, is started by the kernel at booting time and never terminates; other parentless processes may be launched to carry out various daemon tasks in userspace. Another way for a process to end up without a parent is, if its parent dies leaving an orphan process; but in this case it will shortly be adopted by init.

Inter-Process Communication (IPC)

IPC is generally managed by the operating system.

Shared Memory

Most of more recent OSs provide some sort of memory protection. In a Unix system, each process is given its own virtual address space, and the system, in turn, guarantees that no process can access the memory area of another. If an error occurs on a process, only that process memory's contents can be corrupted.

With shared memory, the need of enabling random-access to shared data between different processes is addressed. But declaring a given section of memory as simultaneously accessible by several processes raises the need for control and synchronization, since several processes might try to alter this memory area at the same time.

Multi-Threading

Until recently, the C++ standard did not include any specification or built-in support for multi-threading. Therefore, Threading had to be implemented using special threading libraries, which are often platform dependent, as an extension to the C++ standard.

Note:
The new C++0x standard supports multi-threading, reducing the need to know multiple APIs and increasing the portability of code.

Some popular C++ threads libraries include:(This list is not intended to be complete.)

Boost - This package includes several libraries, one of which is threads (concurrent programming). the boost threads library is not very full featured, but is complete, portable, robust and in the flavor of the C++ standard. Uses the boost license that is similar to the BSD license.

Intel® Threading Building Blocks(TBB) offers a rich approach to expressing parallelism in a C++ program. The library helps you take advantage of multi-core processor performance without having to be a threading expert. Threading Building Blocks is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanism for performance and scalability and performance. It is an open source project under the GNU General Public License version two (GPLv2) with the runtime exception.

Intel® Cilk™ Plus(Intel® Cilk™ Plus) adds simple language extensions to the C and C++languages to express task and data parallelism. These language extensions are powerful, yet easy to apply and use in a wide range of applications.

Adaptive Communication Environment (often referred to as ACE) - Another toolkit which includes a portable threads abstraction along with many many other facilities, all rolled into one library. Open source released under a nonstandard but nonrestrictive license.

Zthreads - A portable thread abstraction library. This library is feature rich, deals only with concurrency and is open source licensed under the MIT license.

Of course, you can access the full POSIX and the C language threads interface from C++ and on Windows the API. So why bother with a library on top of that?

The reason is that things like locks are resources that are allocated, and C++ provides abstractions to make managing these things easier. For instance, boost::scoped_lock<> uses object construction/destruction to insure that a mutex is unlocked when leaving the lexical scope of the object. Classes like this can be very helpful in preventing deadlock, race conditions, and other problems unique to threaded programs. Also, these libraries enable you to write cross-platform multi-threading code, while using platform-specific function cannot.

In any case when using threading methodology, dictates that you must identify hotspots, the segments of code that take the most execution time. To determine the best chance at achieving the maximum performance possible, the task can be approached from bottom-up and top-down to determine those code segments that can run in parallel.

In the bottom-up approach, one focus solely on the hotspots in the code. This requires a deep analysis of the call stack of the application to determine the sections of code that can be run in parallel and reduce hotspots. In hotspot sections that employ concurrency, it is still required to move that concurrency at a point higher up in the call stack as to increase the granularity of each thread execution.

Using the top-down approach, the focus is on all the parts of the application, in determining what computations can be coded to run in parallel, at a higher level of abstraction. Reducing the level of abstraction until the overall performance gains are sufficient to reach the necessary goals, the benefit being speed of implementation and code re-usability. This is also the best method for archiving a optimal level of granularity for all computations.

Threads vs. Processes

Both threads and processes are methods of parallelizing an application, its implementation may differ from one operating system to another. A process has always one thread of execution, also known as the primary thread. In general, a thread is contained inside a process (in the address space of the process) and different threads of the same process share some resources while different processes do not.

Atomicity

Atomicity refers to atomic operations that are indivisible and/or uninterruptible. Even on a single core, you cannot assume that an operation will be atomic. In that regard only when using assembler can one guarantee the atomicity of an operation. Therefore, the C++ standard provides some guarantees as do operating systems and external libraries.

An atomic operation can also be seen as any given set of operations that can be combined so that they appear to the rest of the system to be a single operation with only two possible outcomes: success or failure. This all depends on the level of abstraction and underlying guarantees.

All modern processors provide basic atomic primitives which are then used to build more complex atomic objects. In addition to atomic read and write operations, most platforms provide an atomic read-and-update operation like test-and-set or compare-and-swap, or a pair of operations like load-link/store-conditional that only have an effect if they occur atomically (that is, with no intervening, conflicting update). These can be used to implement locks, a vital mechanism for multi-threaded programming, allowing invariants and atomicity to be enforced across groups of operations.

Many processors, especially 32-bit ones with 64-bitfloating point support, provide some read and write operations that are not atomic: one thread reading a 64-bit register while another thread is writing to it may see a combination of both "before" and "after" values, a combination that may never actually have been written to the register. Further, only single operations are guaranteed to be atomic; threads arbitrarily performing groups of reads and writes will also observe a mixture of "before" and "after" values. Clearly, invariants cannot be relied on when such effects are possible.

If not dealing with known guaranteed atomic operations, one should rely on the synchronization primitives at the level of abstraction that one is coding to.

Example - One process

For example, imagine a single process is running on a computer incrementing a value in a given memory location. To increment the value in that memory location:

but before it can write the new value back to the memory location it is suspended, and the second process is allowed to run:

the second process reads the value in memory location, the same value that the first process read;

the second process adds one to the value;

the second process writes the new value into the memory location.

The second process is suspended and the first process allowed to run again:

the first process writes a now-wrong value into the memory location, unaware that the other process has already updated the value in the memory location.

This is a trivial example. In a real system, the operations can be more complex and the errors introduced extremely subtle. For example, reading a 64-bit value from memory may actually be implemented as two sequential reads of two 32-bit memory locations. If a process has only read the first 32-bits, and before it reads the second 32-bits the value in memory gets changed, it will have neither the original value nor the new value but a mixed-up garbage value.

Furthermore, the specific order in which the processes run can change the results, making such an error difficult to detect and debug.

OS and portability

Considerations are not only necessary with regard to the underling hardware but also in dealing with the different OS APIs. When porting code across different OSs one should consider what guarantees are provided. Similar considerations are necessary when dealing with external libraries.

Note:
For instance on the Macintosh, the set file position call is atomic, whereas on Windows, it's a pair of calls.

Race condition

A race condition (data race, or simply race), occurs when data is accessed concurrently from multiple execution paths. It happens for instance when multiple threads have shared access to the same resource such as a file or a block of memory, and at least one of the accesses is a write. This can lead to interference with one another.

Threaded programming is built around predicates and shared data. It is necessary to identify all possible execution paths and identify truly independent computations. To avoid problems it is best to implement concurrency at the highest level possible.

Most race conditions occur due to an erroneous assumption about the order in which threads will run. When dealing with shared variables, never assume that a threaded write operation will precede a threaded read operation. If you need guarantees you should see if synchronization primitives are available, and if not, you should implement your own.

Locking

Locking temporarily prevents un-shareable resources from being used simultaneously. Locking can be achieved by using a synchronization object.

One of the biggest problems with threading is that locking requires analysis and understanding of the data and code relationships. This complicates software development--especially when targeting multiple operating systems. This makes multi-threaded programming more like art than science.

The number of locks (depending on the synchronization object) may be limited by the OS. A lock can be set to protect more than one resource, if always accessed in the same critical region.

Critical section

A critical section is a region defined as critical to the parallelization of code execution. The term is used to define code sections that need to be executed in isolation with respect to other code in the program.

This is a common fundamental concept. These sections of code need to be protected by a synchronization technique as they can create race conditions.

Deadlock

A deadlock is said to happen whenever there is a lock operation that results in a never-ending waiting cycle among concurrent threads.

Synchronization

Except when used to guarantee the correct execution of a parallel computation, synchronization is an overhead. Attempt to keep it to a minimum by taking advantage of the thread's local storage or by using exclusive memory locations.

Computation granularity

Computation granularity is loosely defined as the amount of computation performed before any synchronization is needed. The longer the time between synchronizations, the less granularity the computation will have. When dealing with the requirements for parallelism, it will mean being easier to scale to an increased number of threads and having lower overhead costs. A high level of granularity can mean that any benefit from using threads will be lost due to the requirements of synchronization and general thread overhead.

Mutex

Mutex is an abbreviation for mutual exclusion. It relies on a synchronization facility supplied by the operating system (not the CPU). Since this system objects can only be owned by a single thread at any given time, the mutex object facilitates protection against data races and allows for thread-safe synchronization of data between threads. By calling one of the lock functions, the thread obtains ownership of a mutex object, it then relinquishes ownership by calling the corresponding unlock function. Mutexes can be either recursive or non-recursive, and may grant simultaneous ownership to one or many threads.

Semaphore

A semaphore is a yielding synchronization object that can be used to synchronize several threads. This is the most commonly used method for synchronization

Spinlock

Spinlocks are busy-wait synchronization objects, used as a substitute for Mutexes. They are an implementation of inter-thread locking using machine dependent assembly instructions (such as test-and-set) where a thread simply waits (spins) in a loop that repeatedly checks if the lock becomes available (busy wait). This is why spinlocks perform better if locked for a short period of time. They are never used on single-CPU machines.

Threads

Threads are by definition a coding construct and part of a program that enable it to fork (or split) itself into two or more simultaneously (or pseudo-simultaneously) running tasks. Threads use pre-emptive multi-tasking.

The thread is the basic unit (the smallest piece of code) to which the operating system can allocate a distinct processor time (schedule) for execution. This means that, threads in reality, don't run concurrently but in sequence on any single core system. Threads often depend on the OS thread scheduler to preempt a busy thread and resume another thread.

The thread today is not only a key concurrency model supported by most if not all modern computers, programming languages, and operating systems but is itself at the core of hardware evolution, such as symmetric multi-processors, understanding threads is now a necessity to all programmers.

The order of execution of the threads is controlled by the process scheduler of the OS, it is non-deterministic. The only control available to the programmer is in attributing a priority to the thread but never assume a particular order of execution.

To do:
Thread quantum

User Interface Thread

This type of distinction is reserved to indicate that the particular thread implements a message map to respond to events and messages generated by user inputs as he interacts with the application. This is especially common when working with the Windows platform (Win32 API) because of the way it implements message pumps.

Worker Thread

This distinction serves to specify threads that do not directly depend or are part of the graphical user interface of the application, and run concurrently with the main execution thread.

Thread local storage (TLS)

The residence of thread local variables, a thread dedicated section of the global memory. Each thread (or fiber) will receive its own stack space, residing in a different memory location. This will consist of both reserved and initially committed memory. That is freed when the thread exits but will not be freed if the thread is terminated by other means.

Since all threads in a process share the same address space, it makes data in a static or global variable to be normally located at the same memory location, when referred to by threads from the same process. It is important for software to take in consideration hardware cache coherence. For instance in multiprocessor environments, each processor has a local cache. If threads on different processors modify variables residing on the same cache line, this will invalidate that cache line, forcing a cache update, hurting performance. This is referred to as false sharing.

This type of storage is indicated for variables that store temporary or even partial results, since condensing the needed synchronization of the partial results in as fewer and infrequent instances possible will contribute to the reduction of synchronization overhead.

Thread Synchronization

The synchronization can be defined in several steps the first is the process lock, where a process is made to halt execution due to find a protected resource locked, there is a cost for locking especially if the lock lasts for too long.

Obviously there is a performance hit if any synchronization mechanism is heavily used. Because they are an expensive operation, in certain cases, increasing the use of TLSs instead of relying only on shared data structures will reduce the need for synchronization.

A simple thread pool. The task queue has many waiting tasks (blue circles). When a thread opens up in the queue (green box with dotted circle) a task comes off the queue and the open thread executes it (red circles in green boxes). The completed task then "leaves" the thread pool and joins the completed tasks list (yellow circles)..

On Microsoft Windows, fibers are created using the ConvertThreadToFiber and CreateFiber calls; a fiber that is currently suspended may be resumed in any thread. Fiber-local storage, analogous to thread-local storage, may be used to create unique copies of variables.

Symbian OS uses a similar concept to fibers in its Active Scheduler. An Active object (Symbian OS) contains one fiber to be executed by the Active Scheduler when one of several outstanding asynchronous calls complete. Several Active objects can be waiting to execute (based on priority) and each one must restrict is own execution time.

Exploiting parallelism

Most of the parallel architecture research was done in the 1960s and 1970s, providing solutions for problems that only today are reaching general awareness. As the need of concurrent programming increases, mostly due to today's hardware evolution, we as programmers are pressed to implement programming models that ease the complicated process of dealing with the old thread model in a way it preserves development time by abstracting the problem.

To do:
To extend

OpenMP

Chart of OpenMP constructs.

To do:
To extend

Software Internationalization

Internationalization and localization refer to how computer software is adapted for other locations, nations or cultures. This means specifically those that are non-native to the programmer(s) or the primary user group

In specific, internationalization deals with the process of designing a software application in a way that it can be configured or adapted to work with various languages and regions without heavy changes to the code base. On the other hand localization deals with the process of enabling the configuration or auto adaptation of the software to a specific region, timezone or language by adding locale-specific components and text translation.

Text encoding

Text, in particular the characters are used to generate readable text consists on the use of a character encoding scheme that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the use of its digital representation.

A easy to understand example would be Morse code, which encodes letters of the Latin alphabet as series of long and short depressions of a telegraph key; this is similar to how ASCII, encodes letters, numerals, and other symbols, as integers.

Text and data

Probably the most important use for a byte is holding a character code. Characters typed at the keyboard, displayed on the screen, and printed on the printer all have numeric values. To allow it to communicate with the rest of the world, the IBM PC uses a variant of the ASCII character set. There are 128 defined codes in the ASCII character set. IBM uses the remaining 128 possible values for extended character codes including European characters, graphic symbols, Greek letters, and math symbols.

In earlier days of computing, the introduction of coded character sets such as ASCII (1963) and EBCDIC (1964) began the process of standardization. The limitations of such sets soon became apparent, and a number of ad-hoc methods developed to extend them. The need to support multiple writing systems (Languages), including the CJK family of East Asian scripts, required support for a far larger number of characters and demanded a systematic approach to character encoding rather than the previous ad hoc approaches.

What's this about UNICODE?

Unicode is an industry standard whose goal is to provide the means by which text of all forms and languages can be encoded for use by computers. Unicode 6.1 was released in January 2012 and is the current version. It currently comprises over 109,000 characters from 93 scripts. Since Unicode is just a standard that assigns numbers to characters, there also needs to be methods for encoding these numbers as bytes. The three most common character encodings are UTF-8, UTF-16, and UTF-32, of which UTF-8 is by far the most frequently used.

In the Unicode standard, planes are groups of numerical values (code points) that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal of the first two positions in six position format (hhhhhh). As of version 6.1, six of these planes have assigned code points (characters), and are named.

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify. While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode consortium has stated that limit will never be changed.

The odd-looking limit (it is not a power of 2), is not due to UTF-8, which was designed with a limit of 231 code points (32768 planes), and can encode 221 code points (32 planes) even if limited to 4 bytes but is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two 16-bit words is used to encode 220 code points 1 to 16, in addition to the use of single words to encode plane 0.

UTF-8

UTF-8 is a variable-length encoding of Unicode, using from 1 to 4 bytes for each character. It was designed for compatibility with ASCII, and as such, single-byte values represent the same character in UTF-8 as they do in ASCII. Because a UTF-8 stream doesn't contain '\0's, you may use it directly in your existing C++ code without any porting (except when counting the 'actual' number of character in it).

UTF-16

UTF-16 is also variable-length, but works in 16 bit units instead of 8, so each character is represented by either 2 or 4 bytes. This means that it is not compatible with ASCII.

UTF-32

Unlike the previous two encodings, UTF-32 is not variable-length: every character is represented by exactly 32-bits. This makes encoding and decoding easier, because the 4-byte value maps directly to the Unicode code space. The disadvantage is in space efficiency, as each character takes 4 bytes, no matter what it is.

Optimizations

Optimization can be regarded as a directed effort to increase the performance of something, an important concept in engineering, in particular, the case of Software engineering that we are covering. We will deal with specific computational tasks and best practices to reduce resources utilizations, not only of system resources but also of programmers and users, all based in optimal solutions evolved from the empirical validating of hypothesis and logical steps.

All optimization steps taken should have as a goal the reduction of requirements and the promotion of the program objectives. Any claims can only be substantiated by profiling the given problem and the applied solution. Without profiling any optimization is moot.

Optimization is often a topic of discussion among programmers and not all conclusions may be consensual, since they are very closely related to the goals, the programmer experience, and dependent of specific setups. The level of optimization will mostly depend directly from actions and decisions the programmer makes. Those can be simple things, from basic coding practices to the selection of the tools one choses to use to create the program. Even selecting the right compiler will have an impact. A good optimizing compiler permits the programmer to define his aspirations for the optimized outcome; how good the compiler is at optimizing depends on the level of satisfaction the programmer gets from the resulting compilation.

Code

One of the safest ways of optimization is to reduce complexity, ease organization and structure and at the same time evading code bloat. This requires the capacity to plan without losing track of future needs, in fact it is a compromise the programmer makes between a multitude of factors.

Code optimization techniques, fall into the categories of:

High Level Optimization

Algorithmic Optimization (Mathematical Analysis)

Simplification

Low Level Optimization

Loop Unrolling

Strength Reduction

Duff's Device

Clean Loops

KISS

The "keep it simple, stupid" (KISS) principle, calls for giving simplicity a high priority in development. It is very similar to a maxim from Albert Einstein's that states, "everything should be made as simple as possible, but no simpler.", the difficulty for many adopters have is to determine what level of simplicity should be maintained. In any case, analysis of basic and simpler system is always easier, removing complexity will also open the door for code reutilization and a more generic approach to tasks and problems.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live."

—Martin Golding

Code cleanup

Most of the benefits of a code cleanup should be evident to the experienced programmer, they become a second nature due to the adoption of good programming style guidelines. But as in any human activity, errors will occur and exceptions made, so, in this section we will try to remember the small changes that can have an impact on the optimization of your code.

the use of virtual member functions

Remember the cost on performance of virtual members functions (covered when introducing the virtual keyword). At the time optimization becomes an issue most project design change regarding optimization will not be possible, but artifacts may remain to be cleaned up. Guaranteeing that no superfluous use of virtual (like in the leaf nodes of your class/structure inheritance trees), will permit other optimizations to occur (i.e.: compiler optimized inline).

The right data in the right container

One of the top bottleneck on today's systems is dealing with memory caches, be it CPU cache or the physical memory resources, even if paging problems are becoming increasingly rare. Since the data (and the load level) a program will handle is highly predictable at the design level, the better optimizations still fall to the programmer.

One should store the appropriate data structure in the appropriate container, prefer storing pointers to objects rather than the objects themselves, use "smart" pointers (see the Boost library) and don't attempt to store auto_ptr<> in STL containers, it is not allowed by the Standard, but some implementations are known to incorrectly allow it.

Avoid removing and inserting elements in the middle of a container, doing it at the end of the container has less overhead. Use STL containers when the number of objects is unknown; use static array or buffer when it is known. This requires the understanding of not only each container, but its O(x) guarantees.

Take as an example the STL containers on the issue of using (myContainer.empty()) versus (myContainer.size() == 0), it is important to understand that depending on the container type or its implementation, the size member function might have to count the number of objects before comparing it to zero. This is very common with list type containers.

While the STL attempts to provides optimal solutions to general cases, if performance does not match your requirements think about writing your own optimal solution for your case, maybe a custom container (probably based on vector) that does not call individual object destructors and uses custom allocators that avoid the delete time overhead.

Using pre-allocation of memory can provide some speed gains and be as simple remember to use the STL vector<T>::reserve() if permitted. Optimize the use system's memory and the target hardware. In today's systems, with virtual memory, threads and multiple-cores (each with its own cache) where I/O operations on the main memory and the amount of time spent moving it around, can slow things down. This can become a performance bottleneck. Instead opt for array-based data structures (cache-coherent data structures), like the STL vector, because data is stored contiguously in memory, over pointer-linked data structures as in linked lists. This will avoid "death by swapping", as the program needs to access highly fragmented data, and will even help the memory pre-fetch that most modern processors do today.

Consider security costs

Security always has a cost, even in programming. For any algorithm, adding checks, will result in increase the number of steps it takes to finish. As languages get more complex and abstract, understanding all the finer details (and remember them) increases the time needed to obtain the required experience. Sadly most of the steps taken by some of the implementors of the C++ language lack visibility to the programmer and since they are outside of the standard language, aren't often learned. Remember to familiarized yourself with any extensions or particularities of the C++ implementation you are using.

As a language that puts the power of decision into the programmer's hands, C++ provides several instances where the a similar result can be archived by similar but distinct means. Understanding the sometimes subtle differences is important. For instance, when deciding the needed requirements in accessing members of a std::vector, you can chose [], at() and the an iterator, all have similar results but with distinct performance costs and security considerations.

Code reutilization

Optimization is also reflected on the effectiveness of a code. If you can use an already existing code base/framework that a considerable number of programmers have access to, you can expect it to be less buggy and optimized to solve your particular need.

Some of these code repositories are available to programmers as libraries. Be careful to consider dependencies and check how implementation is done: if used without considerations this can also lead to code bloat and increased memory footprint, as well as decrease the portability of the code. We will take a close look at them in the Libraries Section of the book.

To increase code reutilization you will probably fragment the code in smaller sections, files or code, remember to equate that more files and overall complexity also increases compile time.

Function and algorithmic optimizations

When creating a function or a algorithm to address a specific problem sometimes we are dealing with mathematical structures that are specifically indicated to be optimized by established methods of mathematical minimization, this falls into the specific field of Engineering analysis for optimization.

As seen before when examining the inline keyword, it allows the definition of an inline type of function, that works similarly to loop unwinding for increasing code performance. A non-inline function requires a call instruction, several instructions to create a stack frame, and then several more instructions to destroy the stack frame and return from the function. By copying the body of the function instead of making a call, the size of the machine code increases, but the execution time decreases.

In addition to using the inline keyword to declare an inline function, optimizing compilers may decide to make other functions inline as well (see Compiler optimizations section).

ASM

If portability is not a problem and you are proficient with assembler you can use it to optimize computational bottlenecks, even looking at the output of a disassembler will often help looking for ways to improve it. Using ASM in your code brings to the table some other problems (maintainability for instance) so use it at a last resort in you optimization process, and if you use it be sure to document what you have done well.

Note:
If using the gcc compiler, the -S option will output the compilation generated assembly.

Reduction of compile time

Some projects may take a long time to compile. To reduce the time it takes to finish compiling the first step is to check if you have the any Hardware deficiencies. You may be low in resources like memory or just have a slow CPU, even having your HD with a high level of fragmentation can increase compile time.

On the other side, problems may not be due to hardware limitations but in the tools you use, check if you are using the right tools for the job at hand, see if you have the latest version, or if do, if that is what is causing trouble, some incompatibilities may result from updates. In compilers new is always better, but you should check first what has changed and if it serves your purposes.

Experience tells that most likely if you are suffering from slow compile times, the program you are trying to compile was probably poorly designed, check the structure of object dependencies, the includes and take some the time to structure your own code to minimize re-compilation after changes if the compile time justifies it.

Use pre-compiled headers and external header guards this will reduce the work done by the compiler.

Compiler optimizations

Compiler optimization is the process of tuning, mostly automatically, the output of a compiler in an attempt to improve the operations the programmer has requested, so to minimize or maximize some attribute of an compiled program while ensuring the result is identical. By rilling in the compiler optimization programmers can write more intuitive code, and still have them execute in a reasonably fast way, for instance skipping the use of pre-increment/decrement operators.

Generally speaking, optimizations are not, and can not be, defined on the C++ standard. The standard sets rules and best practices that dictate a normalization of inputs and outputs. The C++ standard itself permits some latitude on how compilers perform their task since some sections are marked as implementation dependent but generally a base line is established, even so some vendors/implementors do creep in some singular characteristic apparently for security and optimization benefits.

One notion that is good to keep in mind is that there is not a perfect C++ compiler, but most recent compilers will do several simple optimizations by default, that attempt to abstract and take advantage of existing deeper hardware optimizations or specific characteristics of the target platform, most of these optimizations are almost always welcomed but it is up to the programmer still to have and idea of what is going on and if indeed they are beneficial. As a result it is highly recommended to examine your compiler documentation on how it operates and what optimizations are under the programmer's control, just because a compiler can make some optimization in theory does not mean that it will or even that it will result in an optimization.

The most common compiler optimizations options available to the programmer fall into three categories:

Speed; improving the runtime performance of the generated object code. This is the most common optimization

Space; reducing the size of the generated object code

Safety; reducing the possibility of data structures becoming corrupted (for example, ensuring that an illegal array element is not written to)

Unfortunately, many "speed" optimizations make the code larger, and many "space" optimizations make the code slower -- this is known as the space-time tradeoff.

Making use of extended instructions sets

GPU

To do:
Add missing information

Run time

As we have seen before runtime is the duration of a program execution, from beginning to termination. This is were all resources needed to run the compiled code are allocated and hopefully released, this is the final objective of any program to be executed, as such it should be the target for ultimate optimizations.

Memory footprint

In the past computer memory has been expensive and technologically limited in size, and scarce resource for programmers. Large amounts of ingenuity was spent in implement complex programs and process huge amounts of data using as little as possible of this resource. Today, modern systems contain enough memory for most usages but capacity demands and expectations have increased as well; as such, techniques to minimize memory usage may still be essential and in fact operational performance has gained a new momentum with the increasing importance of mobile computing.

Measuring the memory usage of a program is difficult and time consuming, and the more complex the program is the harder it becomes to get good metrics. One other side of the problem is that there are no standard benchmarks (not all memory use is equal) or practices to deal with the problem beyond the most basic and generic considerations.

Note:
Take in consideration that performing memory tests in a debug compile will not, in most circumstances, produce any valid insight on memory use, at best it can provide you an indication of the expected ceiling for memory use in the tested functions.

Remember to use swap() on std::vector (or deque).

When attempting to reduce reduce (or zero) the size of a vector or deque using the swap(), on a standard container of that type, will guarantee that the memory is released and no overhead buffer for growth is used. It will also avoid the fallacy of using erase() or reserve() that will not reduce the memory footprint.

Lazy initialization

It is always needed to maintain the balance between the performance of the system and the resource consumption. Lazy instantiation is one memory conservation mechanism, by which the object initialization is deferred until it is required.

Instantiation of class Car by default instantiates the class Wheel. The purpose of the whole class is to just print the name of the car. Since the instance wheel doesn't serve any purpose, initializing it is a complete resource waste.

It is better to defer the instantiation of the un-required class until it is needed. Modify the above class Car as follows:

Now the Wheel will be instantiated only when the member function getCarSpeed() is called.

Parallelization

As seen when examining threads, they can be a "simple" form of taking advantage of hardware resources and optimize the speed performance of a program. When dealing with thread you should remember that it has a cost in complexity, memory and if done wrong when synchronization is required it can even reduce the speed performance, if the design permits it is best to allow threads to run as unencumbered as possible.

I/O reads and writes

Profiling

Profiling is a form of dynamic program analysis (as opposed to static code analysis), consists in the study of program's behavior using information gathered as the program executes. its purpose is usually to determine which sections of a program to optimize. Mostly by determining which parts of a program are taking most of the execution time, causing bottleneck on accessing resources or the level of access to those resources.

Global clock execution time should be the bottom line when comparing applications performance. Select your algorithms by examining the asymptotic order of executions, as in a parallel setup they will continue to give the best performance. In the case you find an hotspot that can not be parallelized, even after examining higher levels on the call stack, then you should attempt to find a slower but parallelizable algorithm.

To do:
small examples

branch-prediction profiler

call-graph generating cache profiler

line-by-line profiling

heap profiler

Profiler

Free Profiling tools

Valgrind ( http://valgrind.org/ ) an instrumentation framework for building dynamic analysis tools. Includes a cache and branch-prediction profiler, a call-graph generating cache profiler, and a heap profiler. It runs on the following platforms: X86/Linux, AMD64/Linux, PPC32/Linux, PPC64/Linux, and X86/Darwin (Mac OS X). Open Source under the GNU General Public License, version 2.

GNU gprof ( http://www.gnu.org/software/binutils/ ) a profiler tool. The program was first introduced on the SIGPLAN Symposium on Compiler Construction in 1982, and is now part of the binutils that are available in mostly all flavors of UNIX. It is capable of monitoring time spent in functions (or even source code lines) and calls to/from them. Open Source under the GNU General Public License.

Further reading

Modeling Tools

Long gone are the days when you had to do all software designing planning with pencil and paper, it's known that bad design can impact the quality and maintainability of products, affecting time to market and long term profitability of a project.

The solution seems to be CASE and modeling tools which improve the design quality and help to implement design patterns with ease that in turn help to improve design quality, auto documentation and the shortening the development life cycles.

UML (Unified Modeling Language)

Since the late 80s and early 90s, the software engineering industry as a whole was in need of standardization, with the emergence and proliferation of many new competing software design methodologies, concepts, notations, terminologies, processes, and cultures associated with them, the need for unification was self evident by the sheer number of parallel developments. A need for a common ground on the representation of software design was badly needed and to archive it a standardization of geometrical figures, colors, and descriptions.

The UML (Unified Modeling Language) was specifically created to serve this purpose and integrates the concepts of Booch (Grady Booch is one of the original developers of UML and is recognized for his innovative work on software architecture, modeling, and software engineering processes), OMT, OOSE, Class-Relation and OOramand by fusing them into a single, common and widely usable modeling language tried to be the unifying force, introducing a standard notation that was designed to transcend programming languages, operating systems, application domains and the needed underlying semantics with which programmers could describe and communicate. With its adoption in November 1997 by the OMG (Object Management Group) and its support it has become an industry standard. Since then OMG has called for information on object-oriented methodologies, that might create a rigorous software modeling language. Many industry leaders had responded in earnest to help create the standard, the last version of UML (v2.0) was released in 2004.

UML is still widely used by the software industry and engineering community. In later days a new awareness has emerged (commonly called UML fever) that UML per se has limitations and is not a good tool for all jobs. Careful study on how and why it is used is needed to make it useful.