Understanding the Birthday Paradox

23 people. In a room of just 23 people there’s a 50-50 chance of two people having the same birthday. In a room of 75 there’s a 99.9% chance of two people matching.

Put down the calculator and pitchfork, I don’t speak heresy. The birthday paradox is strange, counter-intuitive, and completely true. It’s only a “paradox” because our brains can’t handle the compounding power of exponents. We expect probabilities to be linear and only consider the scenarios we’re involved in (both faulty assumptions, by the way).

Problem 1: Exponents aren’t intuitive

Here’s an example: What’s the chance of getting 10 heads in a row when flipping coins? The untrained brain might think like this:

“Well, getting one head is a 50% chance. Getting two heads is twice as hard, so a 25% chance. Getting ten heads is probably 10 times harder… so about 50%/10 or a 5% chance.”

And there we sit, smug as a bug on a rug. No dice bub.

After pounding your head with statistics, you know not to divide, but use exponents. The chance of 10 heads is not .5/10 but .510, or about .001.

But even after training, we get caught again. At 5% interest we’ll double our money in 14 years, rather than the “expected” 20. Did you naturally infer the Rule of 72 when learning about interest rates? Probably not. Understanding compound exponential growth with our linear brains is hard.

Problem 2: Humans are a tad bit selfish

Take a look at the news. Notice how much of the negative news is the result of acting without considering others. I’m an optimist and do have hope for mankind, but that’s a separate discussion :).

In a room of 23, do you think of the 22 comparisons where your birthday is being compared against someone else’s? Probably.

Do you think of the 231 comparisons where someone who is not you is being checked against someone else who is not you? Do you realize there are so many? Probably not.

The fact that we neglect the 10 times as many comparisons that don’t include us helps us see why the “paradox” can happen.

Ok, fine, humans are awful: Show me the math!

The question: What are the chances that two people share a birthday in a group of 23?

Sure, we could list the pairs and count all the ways they could match. But that’s hard: there could be 1, 2, 3 or even 23 matches!

It’s like asking “What’s the chance of getting one or more heads in 23 coin flips?” There are so many possibilities: heads on the first throw, or the 3rd, or the last, or the 1st and 3rd, the 2nd and 21st, and so on.

How do we solve the coin problem? Flip it around (Get it? Get it?). Rather than counting every way to get heads, find the chance of getting all tails, our “problem scenario”.

If there’s a 1% chance of getting all tails (more like .5^23 but work with me here), there’s a 99% chance of having at least one head. I don’t know if it’s 1 head, or 2, or 15 or 23: we got heads, and that’s what matters. If we subtract the chance of a problem scenario from 1 we are left with the probability of a good scenario.

The same principle applies for birthdays. Instead of finding all the ways we match, find the chance that everyone is different, the “problem scenario”. We then take the opposite probability and get the chance of a match. It may be 1 match, or 2, or 20, but somebody matched, which is what we need to find.

Explanation: Counting Pairs

Makes sense, right? When comparing one person's birthday to another, in 364 out of 365 scenarios they won't match. Fine.

But making 253 comparisons and having them all be different is like getting heads 253 times in a row -- you had to dodge "tails" each time (let’s assume birthdays are independent). We use exponents to find the probability:

Our chance of getting a single miss is pretty high (99.7260%), but when you take that chance hundreds of times, the odds of keeping up that streak drop. Fast.

The chance we find a match is: 1 – 49.95% = 50.05%, or just over half! If you want to find the probability of a match for any number of people n the formula is:

Interactive Example

I didn’t believe we needed only 23 people. The math works out, but is it real?

You bet. Try the example below: Pick a number of items (365), a number of people (23) and run a few trials. You’ll see the theoretical match and your actual match as you run your trials. Go ahead, click the button (or see the full page).

As you run more and more trials (keep clicking!) the actual probability should approach the theoretical one.

Examples and Takeaways

Here are a few lessons from the birthday paradox:

sqrt(n) is roughly the number you need to have a 50% chance of a match with n items. sqrt(365) is about 20. This comes into play in cryptography for the birthday attack.

Even though there are 2128 (1e38) GUIDs, we only have 264 (1e19) to use up before a 50% chance of collision. And 50% is really, really high.

You only need 13 people picking letters of the alphabet to have 95% chance of a match. Try it above (people = 13, items = 26).

Exponential growth rapidly decreases the chance of picking unique items (aka it increases the chances of a match). Remember: exponents are non-intuitive and humans are selfish!

After thinking about it a lot, the birthday paradox finally clicks with me. But I still check out the interactive example just to make sure.

Appendix A: Repeated Multiplication Explanation (Geeky Math Alert!)

Remember how we assumed birthdays are independent? Well, they aren’t.

If Person 1 and Person 3 match, and Person 3 and 5 match, we know that 1 and 5 match also. The outcome of 1 and 5 depends on their results with 3, which means the results aren’t an independent 1/365 chance (in our case, it’s a 100% chance of a match).

When counting pairs we did math as if birthdays were like independent coin flips, and multiplied probabilities. This assumption isn’t strictly true but it’s “good enough” for a small number of people (23) compared to the sample size (365). It’s unlikely to have multiple people match and screw up the independence, so it’s a good approximation.

It’s unlikely, but it can happen. Let’s figure out the real chances of each person picking a different number:

The first person has a 100% chance of a unique number (of course)

The second has a (1 – 1/365) chance (all but 1 number from the 365)

The third has a (1 – 2/365) chance (all but 2 numbers)

The 23rd has a (1 – 22/365) (all but 22 numbers)

The multiplication looks pretty ugly:

But there’s a shortcut we can take. When x is close to 0, a coarse first-order Taylor approximation for ex is:

so

Using our handy shortcut we can rewrite the big equation to:

But we remember that adding the numbers 1 to n = n(n + 1)/2. Don’t confuse this with n(n-1)/2, which is C(n,2) or the number of pairs of n items. They look almost the same!

Adding 1 to 22 is (22 * 23)/2 so we get:

Phew. This approximation is very close, plug in your own numbers below:

Good enough for government work, as they say. If you simplify the formula a bit and swap in n for 23 you get:

and

Appendix B: The General Birthday Formula

Let’s generalize the formula to picking n people from T total items (instead of 365):

If we choose a probability (like 50% chance of a match) and solve for n:

Voila! If you take sqrt(T) items (17% more if you want to be picky) then you have about a 50-50 chance of getting a match. If you plug in other numbers you can solve for other probabilities:

Remember that m is the desired chance of a match (it’s easy to get confused, I did it myself). If you want a 90% chance of matching birthdays, plug m=90% and T=365 into the equation and see that you need 41 people.

Leave a Reply

217 Comments on "Understanding the Birthday Paradox"

Sort by:
newest |
oldest
| most voted

Herman Hiddema

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

The math here is actually wrong. The chances of individual pairs are not independent. You math would work if you take each pair and have them name a random number between 1 and 365.

With this math, taking a group of 365 people still results in a non-zero chance that they all have different birthdays.

Vote Up3Vote Down Reply

10 years 3 months ago

Aleksandar Prokopec

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

+1 exactly what I thought when I was reading this. And another +1 for the pidgeonhole counterexample ;)

Vote Up0Vote Down Reply

1 month 1 day ago

Aleksandar Prokopec

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Thanks for the info, you’re right. I did some more digging (good paper here) and birthdays aren’t mutually independent.

If Person 1 = Person 3, and Person 3 = Person 5, there isn’t an independent event that Person 1 = Person 5. The probability of 1 matching 5 has already been determined by the other statements.

From what I was able to gather, this is only a problem if there are existing overlapping pairs. For a small n relative to the number of outcomes (365), it’s unlikely to have multiple matches that affect the probability, so assuming independence may be ok for computing approximations.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

[…] Regarding your birthday, whether you are savvy with Hamming’s error correcting code or not, listen to Kalid Azad when he presents Understanding the Birthday Paradox posted at BetterExplained in which he explains the Birthday Paradox from statistics. […]

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

The last formula is incorrect, it should be:
n ~ sqt(-2 ln(1-p)) sqt(T)
^^^
or else you are finding the probability to miss.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Thanks for the tip! I fixed up the article to use p(different) and p(match), which is much more clear.

Vote Up0Vote Down Reply

10 years 3 months ago

Pseudonym

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

The “take-away lesson” about GUIDs is wrong. GUIDs are (theoretically) guaranteed to be globally unique, because they include such things as the MAC address of your network card (something which is globally unique until some cheap NIC manufacturer starts recycling them) and the current time.

The catch is that because of the time factor, the current GUID algorithm won’t last forever. We will run out in a couple of centuries.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Hi, that’s a good point about MAC addresses. However, if you consider GUIDs as just a giant random number (for the purposes of the exercise), you are looking for how many “items” out of a pool of 2^128 you can distribute before having a 50% chance of collision.

For the birthday paradox, it’s about 23 items (of a pool of 365) before a 50% chance of collision. For GUIDs, it will be roughly 2^64 items before a 50% chance of collision.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Hi Allan, I’m not too familiar with the rules of Pick3, but I’ll take a shot.

The birthday paradox helps find the chance that any two random numbers will “collide” in a set.

In Pick3, you don’t really care if two guesses collide… you want the guess to collide with the winning number. In this case, two losing tickets that both guessed 123 (when the real answer was 999) isn’t helpful.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

I am doin a science fair experiment on this i need help–and i need to know if the math is over my head??!!

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@nt: Thanks for the tip, I updated the article to make that more clear.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

hello kalid,
i read a few of your articles and think they are freaking awesome.

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Hi Zhao, thanks for the comment! I’ll try to keep cranking out the posts :).

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

[…] After reading another math explanation on why that’s true, I know that I understand it now. Sure, I might not be able to repeat (or fully understand) the math equations which generate the percentage, but I can identify the bottom line of understanding — when written in POE (plain ol’ English): […]

Vote Up0Vote Down Reply

9 years 8 months ago

demi

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Heyy ;; i have no clue how to do this!

Vote Up0Vote Down Reply

9 years 8 months ago

abc

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

I think that the math behind this birthday paradox is wrong..
The chance of two people having same birthdays is 1/365 = 0.0027397

therefore p(n)= 0.0027397 ^C(n,2)
if we take an example of 23 people
we get p(23)= 0.0027397 ^ 253 ~=0
so how is it possible??

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Hi, you’re correct 1/365 is the chance of 2 people having the same birthday. However, (1/365)^253 would be the chance of 253 people having the *same* birthday! (Which, as you see, is pretty close to zero).

For this problem, it’s important not to mix up 1/365 (the chance of 1 collision) and 364/365 (the chance of no collision). We first find the chance that somehow, everyone manages to be different:

p(23 people have different birthdays) = (364/365)^253

If there is a 40% chance that everyone is different, there is 1-40% = 60% chance that there was an overlap somewhere. Hope this helps. (Technically, we are assuming independent events but that subtlety is not important for the main point).

Vote Up0Vote Down Reply

9 years 7 months ago

abc

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

hi,
(364/365)^253 means that 253 people have different birthdays

when you check this for 366 people , there is a >=100% chance for the birthday paradox.
but when you use this fomula we get the answer as 1 – 2.6 * 10^-80 which is less than 1

why is it so??

AND I have never seen two people having the same birthday in my group which has a greater strength than 23.this cannot be a coincidence!!!

I still doubt that there is a 50% chance of people having the same birthday

Click to flag and open «Comment Reporting» form. You can choose reporting category and send message to website administrator. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Hi, when you make the probability like (364/365)^253, you are assuming independent events. What this means is that each comparison is “fresh”, with no memory of the past. It would be like having 2 people pick the same number out of 365, and choosing a different number each time.

This approximation makes the math easier, and is ok for small values. If you want the actual %, take a look at Appendix A.

Yep, the paradox seems strange, doesn’t it? Take a look at this page and run some experiments on your own to see: