problems, solutions, quirky behaviors

Main menu

Daily Archives: June 30, 2014

The Algorithm (the problem)

Yesterday I’ve encountered the following problem – from a set of positive integer numbers, determine all positive integers that can be constructed from any subset of that set.

For example set {1,3,5} can produce the following numbers {1,3,5,4,6,8,9}. In general case it’s a NP complete problem solvable by determining all combinations (this is what I implemented). However for anything larger than ~25 numbers in the set it’s strating to get too long to find the answer. Moreover, a ridiculous set like this {1,1,1,1,1,1,1,1,1,1,1,1…} that can be enumerated by fifth-grader becomes imposible to process.

So of course I went on looking for a smarter solution and quickly enough found this: http://en.wikipedia.org/wiki/Subset_sum_problem

Admittedly, yesterday was a long day, but I was sitting in front of the screen staring at this and couldn’t for the life of me understand how and why it works. So – let me put it out here is somewhat more elaborate but kind of step-by-step way (Once rested I understood it today and even managed to improve it a bit).

This isn’t exactly the problem as stated (find if subset sums to X or find all things subsets sum up to), but the algorithm used is the same.

How it works (on example)

Wikipedia states it’s pseudo-polynomial, the reason for it is simply that it doesn’t grow polynomialy with the number of items in the set, but with the range of numbers. So your problem with set like {1,2,3,4} will not take the same amount of time as set {1,2,3,99999999} (we’ll see why)

So – the general idea is to go through all numbers from 1 to sum({set}) (so for set {1,2,3,4} we’ll go through all numbers 1..(1+2+3+4 = 10)) and try to determine if this particular number can be constructed using first 1…X elements of the set. While trying to determine this we’ll construct an array storing the values for each number.

The array to store the results will be of type boolean and sizes [SUM_OF_NUMBERS_IN_SET+1][COUNT_OF_NUMBERS_IN_SET+1]

let’s work on an example – a trivial array of {1,3} the sum of elements is 4, the number of elements is 2.

We’ll produce an empty boolean array that will look like this

X

0

1

2

0

true

true

true

1

false

false

false

2

false

false

false

3

false

false

false

4

false

false

false

this is just initially set up table, the true,true in the first row indicates, that with no elements, or with only first item or with all items from the set, we can create a subset that sums up to zero (empty set) .

Now the algorithm kicks in – for each number (1…4) we’ll determine if with first 1…2 elements of the set, we can create this number. So – how can we determine that – there are just two simple cases – either we can create this number from a smaller subset (without including current number) or we can create this number from currently analysed number and we know that the remainder we can create from previously seen numbers. The boolean array will help us keep track of that. For each element, we’ll check if either an element to the left is true (first case) or an element higher up (by the amount of currently analysed element) and adjacent to left is true (second case).

In pseudocode that will be: table[i][j] = table[i][j-1] | table[i-set[j-1]][j-1] where set[j-1] is the value of element from the set (indexed from zero, hence j-1)

So – how will our example unfold:

row 1:

X

0

1

2

0

true

true

true

1

false

true

true

2

false

false

false

3

false

false

false

4

false

false

false

[1][1] we set to true because table[1-1][0] (that is table[0][0]) is true (second case)
[1][2] we set to true because table[1][1] is true (first case)

for next iteration, on row indexed by 2, we’ll not set anything to true:

[2][1] we’ll not set to true because neither [2][0] nor [1][0] is true
[2][2] we’ll not set to true because neither [2][1] nor [-1][0] is true

for the third row:

[3][1] we’ll not set to true because neither [3][0] nor [2][0] is true
[3][2] we’ll set to true because [0][1] is true

the last one is interesting :

[4][1] we’ll not set to true, because neither [4][0] nor [3][0] (this is the equivalent of saying, with just the first element from the set, we can’t construct 4)
[4][2] we’ll set to true because [1][1] is set to true (this is the equivalent of saying, with the first two elements, we can construct 4 – take 3, and somehow constructed 1 from first element from the set)

So the final table will look like this:

X

0

1

2

0

true

true

true

1

false

true

true

2

false

false

false

3

false

true

true

4

false

false

true

The answer to question ‘can we construct X’ from the set will be in table[X][set_size -1] (last element of the row)

Code

The actual code looks like this:

All subset sums

Java

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

intsum=0;

for(inti:numbers){

sum+=i;

}

boolean[][]table=newboolean[sum+1][numbers.length+1];

Arrays.fill(table[0],true);

for(inti=1;i<=sum;i++){

for(intj=1;j<=numbers.length;j++){

table[i][j]=table[i][j-1];

if(numbers[j-1]<=i){

table[i][j]|=table[i-numbers[j-1]][j-1];

}

}

}

And no we can print what values are possible to create from such set:

Java

1

2

3

for(intw=1;w<=sum;w++){

System.out.println(table[w][numbers.length]);

}

Improvement

The improvement isn’t that big (not big-O kind of thing), but in practical applications it matters. The main observation is that we only set the value to true at certain index for a number and then we keep it true. So we only need to track at which index we found we can make this number (if). The improved code (memory footprint is way smaller) can go like this:

Improved version of subset sum

Java

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

intsum=0;

for(inti:numbers){

sum+=i;

}

int[]starts=newint[sum+1];

Arrays.fill(starts,numbers.length+1);

starts[0]=0;

for(inti=1;i<=sum+1;i++){

for(intj=1;j<=numbers.length;j++){

intk=i-numbers[j-1];

if(k>=0&&starts[k]<j){

starts[i]=j;

break;

}

}

}

That’s it! Hope you’ll find this instructive and won’t waste too much time trying to understand how exactly are those indices set and why on a sunday evening.