8/27/2010

08-27-10 - Cumulative Probability Trees

I've been thinking about Cumulative Probability Trees today; if you prefer, it's partial-sum trees. Basically you have a count for
each index C[i] , and you want to be able to quickly get the sum of all counts between 0 and i = S[i]. You want to be able to
query and update both in <= logN time. So for example just using S[i] as an array makes query be O(1), but then update is O(N), no good.

The standard solution is Fenwick Trees .
They are compact (take no more room than the C[i] table itself). They are O(log(N)) fast. In code :

F[i] contains a partial sum of some binary range
the size of the binary range is equal to the bottom bit on of i
if i&1 - it contains C[i]
if i&3 == 2 - contains Sum of 2 ending with i (eg. C[i]+C[i-1] )
if i&7 == 4 - contains Sum of 4 ending with i
if i&F == 8 - contains Sum of 8 ending with i

(follow the link above to see pictures). To get C[i] from F[i] you basically have to get the partial sums S[i] and S[i-1] and subtract them (you
can terminate early when you can tell that the rest of their sum is a shared walk).
The logN walk to get S from F is very clever :

sum = 0;
while (i > 0)
{
sum += F[i];
i = i & (i-1);
}

The i & (i-1) step turns off the bottom bit of i, which is the magic of the Fenwick Tree structure being
the same as the structure of binary integers. (apparently this is the same as i -= i & -i , though I haven't
worked out how to see that clearly).

If you put F[0] = 0 (F starts indexing at 1), then you can do this branchless if you want :

sum = 0;
UNROLL8( sum += F[i]; i = i & (i-1); );

(for an 8-bit symbol, eg 256 elements in tree).

You can't beat this. The only sucky thing is that just querying a single probability is also O(logN). There are some cases where you
want to query probability more often than you do anything else.

One solution to that is to just store the C[i] array as well. That doesn't hurt your update time much, and give you O(1) query for
count, but it also doubles your memory use (2*N ints needed instead of N).

One option is to keep C[i], and throw away the bottom level of the Fenwick tree (the odd indexes that just store C[i]). Now your memory
use is (3/2)*N ; it's just as fast but a little ugly.

But I was thinking what if we start over. We have the C[i], what if we just build a tree on it?

The most obvious thing is to build a binary partial sum tree. At level 0 you have the C[i], at level 1 you have the sum of pairs,
at level 2 you have the sum of quartets, etc :

showing the index that has the sum of that slot :
0:01234567
1:00112233
2:00001111
3:00000000

So update is very simple :

Tree[0][i] ++;
Tree[1][i>>1] ++;
Tree[2][i>>2] ++;
...

But querying a cumprob is a bit messy. You can't just go up the tree and add, because you may already be counted in a parent. So you
have to do something like :

So what if we instead design our tree to be good for query. So we by construction say that our query for cumprob will be this :

sum = Tree[0][i];
sum += Tree[1][i>>1];
sum += Tree[2][i>>2];
...

That is, at each level of the tree, the index (shifted down) contains the amount that should be added on to get the partial sum that
preceeds you. That is, if i is >= 64 , then Tree[6][1] will contain the sum from [0..63] and we will add that on.

In particular, at level L, if (i>>L)is odd , it should contain the sum of the previous 2^L items. So how do we do the update for this?

this is exactly complementary to the query in the last type of tree; we've basically swapped our update and query.

Now if you draw what the sums look like for this tree you get :

These are the table indexes :

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0

1

2

3

4

5

6

7

0

1

2

3

0

1

0

These are the table contents :

C0

C0-C1

C2

C2-C3

C4

C4-C5

C6

C6-C7

C8

C8-C9

C10

C10-C11

C12

C12-C13

C14

C14-C15

0

C0-C1

0

C4-C5

0

C8-C9

0

C12-C13

0

C0-C3

0

C8-C11

0

C0-C7

0

it should be obvious that you can take a vertical sum anywhere and that will give you the partial sum to the beginning.
eg. if I take a vertical sum in the 13th slot I get (C12-C13)+(0)+(C8-C11)+(C0-C7) = (C0-C13) , which is what I want.

Now what if we slide everything left so we don't have all those zeros in the front, and we'll go ahead and
stick the total sum in the top :

C0

C0-C1

C2

C2-C3

C4

C4-C5

C6

C6-C7

C8

C8-C9

C10

C10-C11

C12

C12-C13

C14

C14-C15

C0-C1

0

C4-C5

0

C8-C9

0

C12-C13

0

C0-C3

0

C8-C11

0

C0-C7

0

C0-C15

Now, for each range, we stuff the value into the top slot that it covers, starting with the largest range.
So (C0-C7) goes in slot 7 , (C0-C3) goes in slot 3, etc :

C0

-v-

C2

-v-

C4

-v-

C6

-v-

C8

-v-

C10

-v-

C12

-v-

C14

-v-

C0-C1

-v-

C4-C5

-v-

C8-C9

-v-

C12-C13

-v-

C0-C3

-v-

C8-C11

-v-

C0-C7

-v-

C0-C15

(the -v- is supposed to be a down-arrow meaning you get those cell contents from the next row down). Well, this is exactly a Fenwick Tree.

I'm not sure what the conclusion is yet, but I thought that was like "duh, cool" when I saw it.