March 2009

—————————

March's puzzle was a wee exercise in what my schoolmasters called "perms and coms" — that
is, permutations and
combinations. A key piece of arithmetic here is the binomial
coefficient or
"choice number": the number of ways of picking k objects from n, when you don't care about
the order and don't allow
replacement. The number of ways of picking two objects from four, for example (call the four A, B, C, and D), is given
by 4×3 divided by
2×1, which is of course 6. (Here they are: AB, AC, AD, BC, BD, CD.) It always goes like that. The number
of ways of picking three
objects from 20 would be 20×19×18 divided by 3×2×1, which is 1,140. Yes, the division always
comes out to a whole
number — math is neat like that!

The number of ways of picking k objects from n on these conditions is usually written
nCk. So
20C3 = 1,140. If you have
Microsoft Excel, you can do combin(20,3) to get the same result.

OK, here is the March problem. An emailer asked:

There is a set of 8 items and I am making 10 random selections with replacement. I have not been able
to determine a formula to
assess the probability of selecting a given number of items (say 5) without selecting the rest.

As so often happens, I outsmarted myself here, sending my correspondent a glib answer which was completely
wrong. (Sorry, guy.) I won't
feel too bad about this, since the handful of readers who emailed in with same-day solutions all got it wrong,
too, and there were two
credentialed mathematicians among their number! Tim Chow finally produced the right formula. Let's take a look.

—————————

Solution

Let's name the eight objects: A, B, C, D, E, F, G, and H. I'm going to call this my "alphabet."
Now I define a
"word" to be a string of letters made up by randomly selecting ten of these objects, allowing replacement
after each selection. So an
example of a word would be CGAAFDACDD. That particular word "hit" only five of the letters in my
8-letter alphabet: A. C, D,
F, and G.

How many possible 10-letter words can I make from this 8-letter alphabet? Of that total, how many will, like
the word I just showed,
"hit" precisely five letters?

The first question is easily answered. Since each of the ten positions in a pick can be filled in eight ways,
the total number of
possiblities is 8×8×8×8×8×8×8×8×8×8, known to us mathematical
sophisticates as 810.
This works out to 1,073,741,824.

To answer the second question we have to delve into the mathematics of partitions. Any 10-letter word
that uses only five letters
will divide the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} into exactly five disjoint subsets. For the word CGAAFDACDD that I
quoted above, for example,
the five subsets are {1, 8}, {2}, {3, 4, 7}, {5}, and {6, 9, 10}, corresponding to, respectively, the Cs, the G, the
As, the F, and the Ds.

Now, the number of ways of dividing {1, 2, 3, …, n} into k disjoint subsets is
given by the
Stirling number of the second kind
S(n, k). For our particular problem we want S(10,5), which is 42,525.

Since I can choose the 5 letters from my alphabet of 8 in 8C5 ways, I need to
multiply the possibilities by
that, which is 56. Also, I can assign the 5 letters to partitions in 5! (that is, factorial 5) ways: associating the
first partition with any of C,
G, A, F, or D, then the second with one of the four remaining, and so on.

Bottom line: The number of 10-letter words that use only 5 of the letters in my 8-letter alphabet is

8C5 × 5! × S(10,5)

Where S(n, k) is the Stirling number of the second kind. This is
56 × 120 × 42525,
which is 285,768,000

You can make this work for any number. If you want the probability that any random 10-letter word from
your 8-letter alphabet will
use only five letters, you need to divide by 810, the total number of 10-letter words in this alphabet. The
answer's about
0.26614219.

Just to cross-check, I did a simulation, running a full 1,073,741,824 picks with a random number generator to
see what I got. Here's what
I got.

Objects hit

Picks

Proportionof all picks

1

7

0.00000000652

2

27,150

0.0000253

3

3,133,270

0.00292

4

57,321,154

0.0534

5

285,855,613

0.266

6

459,831,099

0.428

7

237,313,892

0.221

8

30,259,639

0.0281

If you have Microsoft Excel and want to see the thing done without using Stirling numbers (though they are
present implicitly
via the recursion algorithm), Joe Shipman contributed this neat spreadsheet. If you click on
a cell you'll see the
corresponding recursion formula.