Finding all Nonintersecting Hands, Continued

Tintri suffered a server room power outage Monday evening so instead of useful work I prototyped the idea laid out in the previous post!

Let's call this structure a "Discard Tree":* Each bucket at a particular level of the tree is labeled (indexed) with a fixed-sized set of cards. * Every hand within that bucket contains all of the cards in the label* To find the nonintersecting set of hands for a target hand X, we need search only those buckets whose labels are nonintersecting with X. * The list of labels is chosen to be minimal, provided that every hand belongs to at least one bucket.* Each bucket may contain a list of hands to be searched, or a subtree.

The good news is that this structure appears to achieve near-optimal reduction in search space for the size of the index. Suppose we had a huge table which, for each K-card combination, lists exactly the hands which do not intersect with those particular K cards. The fraction of the sample set that would need to be examined would be (52-K)C17 / 52C17. The Discard Tree achieves this ratio at a much lower cost.

(Is this really the "best" possible? Well, if we could efficiently calculate the intersection of two different K-card buckets we might be able to play inclusion/exclusion tricks. But, any such operation that has a per-hand cost loses to just searching through one of the buckets, so I feel fairly confident that, for a given "index size", this is the best possible reduction in search space.)

Here's some measurements from the prototype, showing the fraction of buckets visited (on average) and--- assuming equal distribution of hands to buckets--- thus the percentage of the search space to examine. I used both "flat" (single-level) tables as well as a tree lookup.

The good news here is that we can pick whichever structure of indexes is most efficient--- any tree that is "7 cards deep" is just as good at reducing the search space, whether it is a single 7-card table or a three-level lookup.

This is useful, because once we get to 7 or 8 cards, the number of buckets is prohibitively large. The minimal label sets start becoming a significant fraction of the combinatorial space.

What happens is that at K=6 we can still divide the cards into 3 groups, while with K=7 we have to drop down to two groups. Even at K=6 it is more efficient--- in terms of number of buckets--- to built the index as 3,3 rather than a flat table. This is not true for K<=5.

The other reason that being able to break the index up is useful is to reduce the number of index comparisons (which, after all, cost just as much as hand comparisons.) This lets us calculate the optimal index size for any given number of sample hands, to reduce the total number of intersection operations. (I assume that number of buckets is not a limiting factor.)

"Fixed cost" == number of index comparisons needed. "Breakeven point" == point where reduction in total space equals fixed cost of index comparisons. But, we are more interested in what is the lowest-cost option at each number of hands.

Thus, for a 10^7-hand sample, we perform hand-by-hand intersections on just 3.1% of the sample space (but an additional 1.5% overhead for labels), using 236,808 leaf buckets. The number of nonintersecting hands found should be about 23,000, and we performed 460,000 intersections to find them, a 20:1 ratio.

For a 10^8-hand sample: 1.9% of the sample space, and 0.8% overhead. Thus we find 230,000 nonintersecting hands using 2,700,000 intersections, a 12:1 ratio.