My apologies if this question is more appropriate for mathisfun.com, but I can only get so far reading about combinatrics and set theory before the interlocking logic becomes totally blurred. If this is a totally fundamental concept, feel free just to name it so I can read and understand the math myself.

So the goal is to minimize repetition of questions on a quiz to avoid (or really to slow down) the creation of a master key. This is for a client and I've explained that to make this truly realistic the number of questions in the master pool would need to be huge, but I want to show them the math behind their idea.

So they suggested having a 20 question pool with a given set being a 5-member subset. I figured out that the total number of unique quizes $\binom{20}{5}$ would be $\frac{20!}{5!(20-5)!}$ or 15504 unique quizes. But I know that most of those quizes will be near identical and that it won't take as long for cheaters to see all 20 questions to make the key. To prove this to myself (without knowing the math), I simplified the total combinations to $\binom{4}{3}$, like so:

{a, b, c, d} = { {a,b,c}; {a,b,d}; {b,c,d}; {a,c,d} }

And I see that it only takes seeing any 2 quizes to see all 4 members of the master set. So knowing that the number of combinations (binomial coefficient!) is not equivalent to number of unique appearances of the master-set, I'd like to know the actual math involved to show the client that while they have a ton of quizes, it only takes $x$ to know all members.

Thanks as always.

Addendum

A bit more research has introduced me to the NP-complete problem known as Exact Cover, which would be (if I'm reading it right) a precise set of subsets which have a union equal to the original master-set. I just want to clarify that this constraint of perfect overlap is not necessary for my question, only the minimum number of subsets that would result in a union that has all master-set members, regardless of repetition, in order to demonstrate how many subsets are needed to know the original set (with the assumption that the seeker of the master-set knows the total membership count). I tweaked my micro-experiment from $\binom{4}{3}$ to $\binom{4}{2}$ resulting in 6 combinations and the ability to derive the master-set no longer being possible with a specific number of arbitrary subsets. Instead I get:

{a, b, c, d} = { {ab} ; {ac} ; {ad} ; {bc} ; {bd} ; {cd} }

which could derive the master set using the first three ($a$) groups, or the exact cover of ${ {a,b}; {c,d} }$. This has me thinking that the minimum subsets needed to derive the original set is equal to the number of subsets where any given member occurs (so in this case 3 $a$s, but this doesn't match up to the $\binom{4}{3}$, where it can be found with 2 subsets. The next obvious solution (to me) is that the minimum number needed to derive the master-set (blindly) is half of the total number of subsets, but I would really want a link to a proof or a simple-english demonstration on how a pool of 20 questions would require 7752 subsets to know with certainty that all 20 members have appeared at least once.

Again, thanks.

Question as Probability:

I have a bag of Scrabble tiles and I know the following:

The bag contains 20 tiles,

Each tile is unique (no two tiles have the same character),

The tiles come from a much larger (and otherwise irrelevant) set of an expansion set including numbers and non-Roman alphabet characters, thus removing any advantage of knowing that this set of 20 comes from a larger-but-limited set (in other words, the characters are only informative to each other and I may get all Klingon or a mix of Chinese and Tamil. I should not assume anything about the set other than what is in the bag).

I am allowed to perform the following steps in the order given as many times as I want:

Pull out 5 tiles,

Write down the characters drawn,

Return the tiles to the bag.

Lather, Rinse, Repeat.

Also: I have magical fingers that prevent me from drawing the same set of 5 twice, thus reducing the number of draws from infinity to 15504 possible draws.

My objective is to have all 20 characters written down eventually and then stop drawing characters.

I know that the total number of unique combinations I could draw is $\binom{20}{5}$ which is 15504. I also know that the minimum draws required is equal to $\lceil{20}/{5}\rceil$, which would be very lucky. What I am interested in is the maximum number of draws required to reveal all 20 characters.

I do not think that this is the right question for the actual real-life problem: Avoidance of repetition for a particular person will actually help people to create a complete question list quickly.
–
PhiraJan 5 '14 at 23:07

2 Answers
2

With 20 questions total; and 5 per quiz, and a sole goal to repeat as late as possible (as i unserstand your question), you shall start repeating at the fifth quiz. If you number them arbitrarily, you have $1 -5$ in quiz $1$, and $16 -20$ in quiz $4$ ( if your sole goal is to minimize/prolong the time to repetition). By the same goal, quiz $5$ will repeat $1-5$, etc. This is probably not what you'd implement as you could predict the precise questions precisely for an upcoming quiz (after a while); but - as I understand your question - what you'd do. It is not really a binomial coefficient question (I did not underrstand your separation of the pool leading to $\binom{4}{3}$). To meaningfully use something else, you need to impose further conditions.

I see that you're saying (I think) -- that the subject could derive the master-set with minimum 4 subsets if they draw the 4 non-intersecting subsets, which in my problem would be incredibly fortunate. That would be the true minimum needed, just as having all 15504 would be the true maximum, as having all subsets would eliminate any doubt. What I'm hoping exists is a formula to determine the minimum number of subsets the culprit needs to blindly obtain to ensure all master-set members will be present, knowing only the pool size and sample size.
–
AnthonyFeb 3 '13 at 6:56

I see. So you draw 5 tests at random, each time? Then you can only calculate probabilities (assuming say 5 per draw, equal likelihood of drawing any in each draw); and you could never guarantee all have been drawn (it just becomes increasingly unlikely). Is the setup?
–
gnometoruleFeb 3 '13 at 12:47

...but I also increasingly get the feeling you are asking an interesting question I simply don't get. :)
–
gnometoruleFeb 3 '13 at 12:55

1

I think you are looking at something not a million miles removed from "the coupon collector's problem" and you will find much literature if you search for that keyphrase.
–
Gerry MyersonFeb 7 '13 at 11:09

You seem to be asking about the maximal number of distinct combinations of $5$ elements chosen among $20$ such that the union of all those combination does not fill up all the $20$ element. (Then selecting one more distinct $5$-combination one is sure to cover all $20$ elements.)

It seems clear the best strategy to avoid covering all $20$ elements is to (secretly) choose one element among the $20$ that you won't ever select, until forced by the requirement of never reproducing a previous selection. This leaves you $19$ elements, of which you can present all $\binom{19}5=11628$ combinations in a random order. After that, your 11629-th combination is forced to use the final element that you so preciously wanted to keep secret.