It’s Sunday morning and I get some time to myself, so I’m listening to some blues and catching up on my newsfeeds when I come across this interesting article on calculating what size group of people would be necessary to have a 50/50 chance of two of them sharing the same birthday.

The primary focus of the article is on this equation to calculate the probability of uniqueness given a sample size of r from a group of N things to choose.

$$p = \frac {N!} {N^r (N-r)!}$$

In the article, he is concerned with the possibility of overflow with the size of the factorials involved and since scipy doesn’t have log factorial he implemented his solution with gamma log — see above article for his code. So I say to myself, “Wonder if I can do this with just straight Python?”

A quick google on log factorial found this approximation of log factorial by Srinivasa Ramanujan on math.stackexchange If you have not heard of Ramanujan before — stop and google him immediately. Wow!

Ok, so no scipy needed to follow along with this article plus I get to use some very cool math. I like Sunday morning fun time. However, I then start thinking, “overflow”, hmmmm. What is the upper bound of math.factorial anyway. Since it’s the birthday problem lets see what happens with 365

Well it took a bit but it ran, with over 40 screens of numbers. Python is still going strong, so what is the upper bound of math.factorial? A search brought me here http://bugs.python.org/issue8692 and specifically this message. Which means that the max size of the result can not exceed sys.maxsize - 1 digits, or on a 64bit platform, 2**63 – 1 digits of capability. Thanks to some dedicated individuals who seemed to be having as much fun as I was, math.factorial is up to the task.

The take-away from this article is not the cool math, or the approximations of log n! — it is…

Don’t underestimate the power of Python.

Try straight Python before you move on to something more complex. The approximations of $\log n!$ were unnecessary. All that was needed was just Python and we get the following implementation of the Probability of Uniqueness: