Combinations in C++, Part 2

WEBINAR:On-Demand

Introduction

There are four new algorithms present in this article. The first one is about speed optimization by using integers as array indexes (I'll explain that later). The second one is about concurrent programming. (Now, with multi-core CPUs becoming mainstream in the near future, you can cut the time to find a lot of combinations by half and more.) The third and fourth ones deal with finding combinations with repetitions.

In my articles, I place a strong emphasis on explaining the techniques as clearly as possible. It is of utmost importance for you, the reader, to understand the explanations because once you have understand them, you need not know how I do it; you can implement it yourself, even in your favourite computer language. You can implement it yourself, iterative way? recursive way?, your way? Heck! your implementation could even be faster than mine. Who knows?

Source Code Version

For the source code changes, all the classes and functions fall under the stdcomb namespace and they are given the name "Combination Library" and a version (1.5.0), so that the users know which is the latest version because there are a few copies of source code floating around the Internet. Always get the latest version; it contains new features and/or bug fixes. You can always be sure that CodeGuru has the most up-to-date version.

Another Way to Find the Combinations

A combination is the way to pick a different unique smaller sequence from a bigger sequence, without regard to the ordering (positions) of the elements (in the smaller sequence). This article teaches you how to find combinations. For every technique presented in this article, I will explain them as well as I can before going on to teach you how to use the source code.

The notations used in this article:

n : n is the larger sequence from which r sequence is picked

r : r is the smaller sequence picked from n sequence.

c : c is the formula for the total number of possible combinations of r picked from n distinct objects : n! / (r! (n-r)! )

The ! postfix means factorial.

The Technique for Finding Combinations Without Repetitions

I added this section because I realised the explanation in the original article is not easy for the readers to visualise the underlying pattern. Let me explain the new approach now: the shifting technique.

To find combinations, you will use a technique that involves shifting the last combination element from the left to the right.

Find combinations of 1 from a sequence of 5

First, find combinations of 1 from a sequence of 5 numbers{0,1,2,3,4}. Please note that the red box is the combination element.

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

A total of five combinations are generated:
0
1
2
3
4

Find combinations of 2 from a sequence of 5

The first example is pretty easy, isn't it? Now, find combinations of 2 from a sequence of 5 numbers{0,1,2,3,4}. Please note the red boxes are the combination elements.

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

Oops, you can't shift the last element any more. Now, shift the first element and bring back the last element to its right side. This is shown below.

0

1

2

3

4

For the next two combinations, you continue to shift the last element.

0

1

2

3

4

0

1

2

3

4

Again, you cannot shift any more and shift the first element and bring the last element to its side.

0

1

2

3

4

Shift the last element as usual

0

1

2

3

4

Shift the first element and bring the last to its side.

0

1

2

3

4

Oops, you can shift neither the first nor last element any more. This is the end of all the combinations generated.

Steps to Find Combinations

Of course, I can go on to demonstrate finding combinations of 4 elements out of 5 and so on. But I don't. I trust that you can extrapolate the technique for finding different combinations from different sequences.

Let me define the steps for the shifting pattern you used to find all combinations.

Initially, All the element(s) must be on the leftmost of the array.

Shift the last red box until it cannot shift anymore, shift the next rightmost red box (if there is one), and bring back the last element to the rightside of it. This is the new combination.

Continue to shift the last element to the right.

It comes to a point where the last twp elements cannot shift any longer. Then, shift the 3rd one (from the right) and bring the last two to its side consecutively.

The previous point rephrased for all abitrary cases: ('i' here stands for number) It comes to a point where the last i elements cannot shift any longer; then, shift the i+1 element (from right) and bring the last i elements to its side consecutively.

Continue until all the elements cannot shift any longer/ All the combinations have been generated.

Optimised Version: Index Combination

Is there any way you can optimise the technique used in next_combination() function? The answer is yes but you have to give up the genericness of the next_combination().

The next_combination() compares the element that it is going to shift with other elements in the n sequence to know its current position in the n sequence. See if you can get rid of this finding operation.

Take a look at this combination (Take 2 out of 5):

01234
x x

If the two elements in the r sequence are integers that stores its index position in the n sequence, the finding is not needed. And with this method, the combination returned is a combination of the indexes in the n sequence is no longer a combination of objects. The timing of finding combinations from a sequence of any custom objects is the same, O(1), because an integer index is used.

The == operator is not needed to be defined for this method to work; next_combination needs it to be defined unless you used the prediate version. Other requirements remain the same. Anyway, finding combinations of objects is actually the same as finding combinations of integers. And, integer variables should be faster than class objects because of the object overhead. So, you should use this!

This technique is implemented in the CIdxComb class. Here is an example on how to use the CIdxComb class.

Benchmark Between next_combination() and CIdxComb

Below is a benchmark result of next_combination() and CIdxComb for finding all the combinations of r sequence of 6 from n sequence of 45, 10 times, using QueryPerformanceCounter(), on a Intel P4 2.66Ghz CPU and 1GB of DDR2 RAM PC.

There is a very strange occurrance that CIdxComb runs as slow or slower than next_combination in some benchmarks in debug build. I don't know why, but I think this difference is due to how iterators (which next_combination() is used) and array subscript index (which CIdxComb is used) are handled in std::vector. The above result is from a release build without any optimization.

Combinations in C++, Part 2

WEBINAR:On-Demand

A Helper Algorithm for Concurrency

Consider a case in which you need to generate lots of combinations, say 100 billion. You have a dual-core CPU PC; naturally, you would want to maximise the dual core by having some concurrent processing (mult-threading). You can do that if you can tell the first thread to calculate combinations starting from the first combination and the second thread to start from the 50 billionth combination number. Each thread will find 50 billion combinations.

The problem is you do not know what combination is at the 50 billionth position! Say you do not have a dual-core CPU PC or SMP PC; you have four computers, so how does these work out for you? You could tell the first computer to calculate from the first 25 billion, the second computer the second 25th billion, and so on. The missing link here is a way to calculate a combination based on a position in the total number of combinations so that CIdxComb or next_combination() can continue to find combinations starting from that one.

Fortunately, your hero here, who is me, has figured out a way to do this. The easiest way to explain this method is to have you walk though a simple example with me.

The Technique of Finding a Combination, Given Its Index

Now, find out what the 92nd combination of 3 out of a sequence of 10. Take note in zero-based counting, it would be number 91. In total, there are 120 combinations.

Below are the positions of the combinations that are placed side by side. I am going to show you how I derive the positions later, so right now you can take it that they are correct. The first element is represented by the red box, the second the green box, and the third by the blue box.

Find the position of the red box first.

0th

0

1

2

3

4

5

6

7

8

9

36th

0

1

2

3

4

5

6

7

8

9

64th

0

1

2

3

4

5

6

7

8

9

85th

0

1

2

3

4

5

6

7

8

9

100th

0

1

2

3

4

5

6

7

8

9

110th

0

1

2

3

4

5

6

7

8

9

116th

0

1

2

3

4

5

6

7

8

9

119th

0

1

2

3

4

5

6

7

8

9

Because the one you are going to find is the 91st and falls between 85th and 100th, so the first element of the 91st combination is 3.

To find the 2nd element (green box), you must subtract 85 from 91. The result is the 6th combination.

0th

4

5

6

7

8

9

5th

4

5

6

7

8

9

9th

4

5

6

7

8

9

12th

4

5

6

7

8

9

14th

4

5

6

7

8

9

Because the 6th falls between the 5th and 9th, the second element is 5.

Before you find the 3rd element, you must subtract 5 from 6. The result is the first combination.

0th

6

7

8

9

1st

6

7

8

9

2nd

6

7

8

9

3rd

6

7

8

9

From the first combination above, you can see that the 3rd element is 7.

Now that all the 3 elements have been found, you have the 91st combination shown below.

91th

0

1

2

3

4

5

6

7

8

9

Get back to problem of knowing the position, given a combination whose elements are side by side.

0th

0

1

2

3

4

5

6

7

8

9

36th

0

1

2

3

4

5

6

7

8

9

How do I know the second combination shown above is number 36th? Take a look below:

0th

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

36th

0

1

2

3

4

5

6

7

8

9

The combination is 36th because the total number of combinations of 2 out of 9 is 36. Because this is zero-based, the next combination is the 36th. Let me give you another example!

36th

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

64th

0

1

2

3

4

5

6

7

8

9

The combination is 64th because the total number of combinations of 2 out of 8 is 28. Add 28 to 36 and you will get 64.

Combinadic

Combinadic is the method I have just described. You may want to read more about it in the Combinadic wiki because you will find that my explanation and method are different from what is written there. There is a reason for this: I rediscovered this method myself and while writing this section, I found out about Combinadic. Rest assured that my method and Combinadic are actually the same method; just the way to get to the answer is different. A bit of trivia here: I was quite upset that I am not the first one who discovered this method, so I stopped writing this article for a year. Well, my reason for inventing this algorithm is to find a method to split the work of finding combinations into different workloads and speed up the work on many processors. As for which one I discovered first, Combinadic or Factoradic, it is Combinadic. Later, I found out the algorithm can be modified to find permutations, given its position. I also found out that I am also not the first one who discovered Factoradic.

You may have noticed the CFindCombByIdx operates on numbers only and the contents of first combination vector always starts from zero and is in running numbers. Unlike next_combination() that operates on objects by comparing, CFindCombByIdx operates on zero-based running numbers. What you will get from its results are indexes. For example, if you get a result of {0,2,1} out of {0,1,2,3,4}, it means the 1st element is index 0 of array (of objects) you originally want to find combination, the 2nd element refers to index 2 of the array and 3rd element is index 1 of the array. Finding combinations of numbers is much easier and faster than finding objects. CFindCombByIdx uses a big integer class because this algorithm uses factorials to find the permutation and factorials are usually very large numbers. Because it is a templated class and you know the factorials used are not going to exceeded 64 bit, you can use the __int64 type instead. The is the output of the above example code.

Below is the timing results of next_combination, IndexCombination and CFindCombByIdx finding combinations of 6 out of 20 elements, which is 38760 combinations in total.

time taken for next_combination: 5.04 milliseconds
time taken for IndexCombination: 0.88 milliseconds
time taken for CFindCombByIdx: 5886.43 milliseconds

A Benchmark Program for Your PC

I have included a demo program called TimeCombinations for you to benchmark your PC. It is nice to know how much time has been reduced if you have a dual-core or quad-core PC. The program does not store the combination anywhere; it discards it after every computation. If you have a uniprocessor PC, you have little use for this benchmark program. Below are the screenshots of the program.

Combinations in C++, Part 2

WEBINAR:On-Demand

Finding Combinations from a Set with Repeated Elements

Finding combinations from a set with repeated elements is almost the same as finding combinations from a set with no repeated elements: The shifting technique is used and the set needs to be sorted first before applying this technique. It is just that there are some special situations you need to take care of.

Let me use an example of finding combinations of 3 out of 8 {0,1,2,3,3,3,4,5}. The number 3 is repeated three times.

With this combination now, the green box is in the 4th column, which is 3.

0

1

2

3

3

3

4

5

If you get this next combination, you need to shift the green box to the last 3, which is the 6th column, before returning this combination result so that the next combination would not be repeated and correct.

0

1

2

3

3

3

4

5

Another situation to take care of is as below.

0

1

2

3

3

3

4

5

If you use the combination shifting technique, the next combination would be as below.

0

1

2

3

3

3

4

5

You need to shift the green box and the blue box to the 5th and 6th column, respectively, so that the next combination would be correct.

0

1

2

3

3

3

4

5

There is all you need to know and take care of when finding combinations from a set with repeated elements.

Sample Code for Finding Combinations from a Set with Repeated Elements

Below is the sample. The name of the class used to find the combinations is called "CCombFromRepSet". Because the code is straightforward and commented, I shall not explain the working of the code.

Finding Repeated Combinations from a Set with No Repeated Elements

You have come to the last algorithm in the article: finding repeated combinations from a set with no repeated elements! Let me re-iterate again: Combination is the way of picking a different unique smaller sequence from a bigger sequence, without regard to the ordering (positions) of the elements (in the smaller sequence), meaning {0,0,1}, {0,1,0} and {1,0,0} are actually the same combination because they all contain one 1 and two zeros.

The formula for calculating the total number of repeated combinations is (n + r - 1)!/(r!.(n-1)!)

Let me explain, by finding combinations of 3 out 5 {0,1,2,3,4}

0,0,0 -> the 1st combination
0,0,1
0,0,2
0,0,3
0,0,4
0,1,1 -> this combination is 0,1,1, not 0,1,0 because we already
have 0,0,1.
0,1,2
0,1,3
0,1,4
0,2,2 -> this combination is 0,2,2, not 0,2,0 because we already
have 0,0,2.
0,2,3
.
.
0,4,4
1,1,1 -> this combination is 1,1,1, not 1,0,0 because we already
have 0,0,1.
1,1,2
1,1,3
1,1,4
1,2,2 -> this combination is 1,2,2, not 1,2,0 because we already
have 0,1,2.
.
.
4,4,4 -> Last combination

As a rule of thumb, when you cross over to the next level in the leftmost column (as you may have noticed from the above explanation), the numbers in rightmost columns always follow the crossover column. For example, the next combination of 0,4,4 is 1,1,1, the first column crossover from 0 to 1, the second and third columns follow the first column. Let me give you another example: The next combination of 0,0,4 is 0,1,1, the second column crossover from 0 to 1, and the third column follows the second column. Remember, when crossing over, the minimum number in the rightmost columns after the crossover column follows the crossover column.

Example Code for Finding Combinations from a Set with Repeated Elements

Below is the sample. The name of the function used to find the combinations is called "CombWithRep". Because the code is straightforward and commented, I shall not explain the working of the code.

Conclusion

You have come to the end of the article. You have covered an optimized algorithm to find non-repeated combinations with integers, an algorithm to find non-repeated combinations, given their index so as to split the work into different work tasks for different processors, an optimized algorithm to finding combinations from a set with repeated elements and lastly, finding repeated combinations. I hope you have found my article useful and easy to understand. I love to hear your feedbacks, good or bad, so that I can better improve my article! Thanks!

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.

Thanks for your registration, follow us on our social networks to keep up-to-date