C++ Lies, Damned Lies and Statistics

This is the last entry in a sequence of blog posts that resulted due to a conversation with two colleagues regarding the most efficient way to implement a piece of code to address some made up requirements.

Let me start by saying that the first choice for most developers would be the use of vectors. Once such implementation is tested, if the performance is not there, one should / must look for alternate ways to optimize and / or redesign the software.

Following are the results for some runs of the vector implementation:

<<< vector done N: 1000 elapsedTime: 1.215

<<< vector done N: 1000 elapsedTime: 1.207

<<< vector done N: 1000 elapsedTime: 1.225 (remove me)

<<< vector done N: 1000 elapsedTime: 1.199

<<< vector done N: 1000 elapsedTime: 1.199 (remove me)

For these and all runs, the function that displays the objects has been commented out. Display and in general all form of I/O tends to run very slow compared to the processor set in computers. We do not want additional I/O time to interfere / mask the performance of the actual program.

I have run the program five times. This is a minimum that should always be accomplished to gather performance data on any program. After the five runs are performed, the slowest and fastest times should be removed. The remaining times should be totaled and averaged.

The number of elements inserted and then deleted has been set to 1,000. The vector program was able to properly insert and then remove 1,000 integers from a vector in about 1.2 seconds.

The following results are from running 1,000 elements through the list program:

<<< list done N: 1000 elapsedTime: 1.322

<<< list done N: 1000 elapsedTime: 1.321

<<< list done N: 1000 elapsedTime: 1.306 (remove me)

<<< list done N: 1000 elapsedTime: 1.369 (remove me)

<<< list done N: 1000 elapsedTime: 1.359

As we did with the vectors, the list program ran five times with 1,000 entries. We should discard the fastest and slowest passes before we average the remaining times. The resulting average is about 1.3 seconds. It is just a tenth of a second slower that the implementation with vectors. If the number of entries is about 1,000 one or the other implementation may be used. If dealing with hundreds of integers, perhaps the vector implementation with some additional fine tuning (if possible) may the answer.

Last but not least, the five runs of the program using arrays follows:

<<< array done N: 10000 elapsedTime: 0.139

<<< array done N: 10000 elapsedTime: 0.153

<<< array done N: 10000 elapsedTime: 0.135 (remove me)

<<< array done N: 10000 elapsedTime: 0.15

<<< array done N: 10000 elapsedTime: 0.154 (remove me)

Two things to note in this pass. The first is that the average time is about 0.147 seconds. This is about ten times faster that the vector or list implementations. In addition, the number of elements used went up from 1,000 to 10,000. It appears that the array implementation is about two orders of magnitude (100 times) faster than the others.

As I mentioned in a previous blog entry, a few months ago I implemented my own version of ArrayList for the network real-life problem and it is also as fast as the array implementation in this set of blog entries.

In conclusion, as a software developer, one should always explore several possibilities and test them before moving on to the next task.

If you have comments or questions on this or any other software development topics please do not hesitate and contact me. If I do not know the answer top of my head, I will research and come with a very optimal approach.