Suppose I am a high-volume broker aiming to make some money on a state lottery. In this lottery, six balls are drawn from a population of (let's say) 50, without replacement. A ticket is a choice of a size-6 subset of 1,2,..50.

The prize structure of this lottery is such that the jackpot alone doesn't impart much value to the ticket. But it turns out that lesser prizes are sufficiently large relative to their probability that the ticket has a positive expected value, which is why I'm buying a lot of tickets in the first place. For instance, I can expect to get a pretty substantial return from tickets which match 4 of the 6 numbers drawn by the lottery.

There is a substantial literature related to the "Turan problem," which asks: what is the minimum number of tickets I need to purchase in order to guarantee that one of my tickets matches 4 of the 6 numbers in the lottery?

My question is somewhat different. Let's say I have enough capital to buy a fixed number N of tickets, large in absolute terms but small relative to 50 choose 6. Then my expected gain is fixed. But of course as a wise investor I may want to minimize the variance of my winnings.

Thus my question.

If the random variable X is the number of (4 out of 6) wins among my N tickets, how small can I make Var(X) by judicious choice of ticket purchases?

(Of course, the same question applies for (k out of 6) where k=2,3,5.)

By the way, in case the setup seems unrealistic, let me add that the reason I'm asking this is that the situation described here actually happened, and I'm trying to reverse-engineer what the broker's risk-minimization strategy must have been, and assess whether it was worth it.

I am not a professional, so please consult one to fix any mistakes below. A BIBD will often have subsets of a set of size v with b many such (blocks), often all of size t (for treatment, don't ask me why) with a parameter lambda so that every pair of elements from the v set is covered by exactly lambda many blocks. To minimize variance, you want a design (which exist in the literature) with pair replaced by quadruple and with as close as possible to having a lambda. Partial designs exist, and the La Jolla Repository might help. Gerhard "Ask Me Not About Combinatorics" Paseman, 2012.10.26
–
Gerhard PasemanOct 26 '12 at 17:39

Every lottery situation I have seen with tickets of positive expected value has come from the jackpot. Are you sure the smaller prizes were large enough? Also, how large is the number of tickets you buy compared with the total number of combinations?
–
Douglas ZareOct 26 '12 at 19:25

I'm sure. The folks in question bought about 200,000 tickets per drawing and ended up making on the order of $3.5m over the life of the enterprise, never winning a jackpot.
–
JSEOct 26 '12 at 21:02

@Douglas Zare: IMO this is an unfair summary of the text. For some (in the given context) reasonable interpretations of "disadvantaged by the high volume groups" it seems true.
–
quidOct 27 '12 at 14:14

1 Answer
1

This is not a real "answer" but an observation. Each 6-tuple has ${6 \choose 4}$ 4-tuples, so it stands to reason that once $N$ is some smallish multiple $m$ of ${50\choose 4}/{6\choose 4}$ then Bob's your uncle, and you can come reasonably close to equidistribution. This is to be contrasted with (say) the binomial expectation. The number of fours (or more) you expect is ${6\choose 4}m$, and the reasonable size $N=138180$ gives $m=9$ and 135 fours. The binomial distribution gives a variance about of slightly under $135$. I expect that one can essentially ensure about 135, plus or minus a small amount, via some covering selection.

ADDITION: In verities, the binomial model does not well model the random choices, as they have considerably higher variance, 180-185 compared to 134.8. I do not understand the theoretical concepts in toto, but an aspect is that the coverage of fours from the random sixes is already askew.

UPDATE: OK, here's the skinny on covering. I did a rather simple process. Do the following 138180 times. Pick a 4-tuple that so far has not appeared 9 times. Append to it the 2 numbers for which the resulting six minimizes the sum of the current counts of its 15 sub-fours. Accumulate the counts of 4-tuples from this six.

Then apply a bit of post-processing if you want (throw out populous sixes). This gives a set of 138180 6-tuples in which every 4-tuple appears between 7 and 11 times (the average is 135/15 or 9, with random choices of sixes the four-counts will range from 0 to 20 or more). Then simulate the ${50\choose 6}$ lotteries. These give an expectation of 135 fours, the minimum was 120 and the maximum was 149. The variance was a mere 5.8, versus 135 (binomial) or 180 (random). The binomial distro gives less than 120 a 8.9% chance (and more than 149 a 10.7% chance). As added above, the actual random distro is even worse than binomial.

I think this shows that with a (small) bit of work, some quite good variance reduction is possible. You can try to further trim the ends if desired. In the actual example, their edge was about 20-25% when free bets were included (later comments suggest 15-20% over the history). The accounting on page 7 says "12.8%" for just the cash component, but I get 425840/400000 is 6.4%. This analysis also lacks the jackpot, which I guess is equally likely to help/hurt among the big players (it is slightly chancey that only 1 of the 45 jackpots was hit, given there are 2-4 groups each buying up to 1/30 of the pool every time).

I think you mean standard devation around sqrt(135)? But the point stands -- even picking things randomly gives you a level of risk which , compared with most investments with mildly positive expected value, is pretty good. I still wonder about how well one can do with a covering, though!
–
JSEOct 29 '12 at 8:19

This is awesome! But I too am confused about why the random model is not giving you something that looks like binomial.
–
JSEOct 30 '12 at 6:06

Part of it maybe the 5s and 6s effect. In that random sample, the max is 222, in there are 9 fives and 1 six, giving an "extra" four count of 36+14 above independence. The import of duplicates in the 138180 is similar, as a twofer guarantees an extra 28 non-independent for some lottery occurrence. Triplicates are likely too (80% chance).
–
JunkieOct 30 '12 at 8:55