Most people have some familiarity with the following situation: you’re in the plane, already crammed into your seat and you need to go to the washroom. But, at least in the planes I’ve been, the washroom seems to be occupied all the time. So you go, stay in line, get in, and when you get out the line is still there – other people waiting to take your place.

What is the actual probability to have the washroom occupied during your flight? This should be relatively easy to answer. Suppose we have N people in a plane with Nw washrooms. We want to know P(all washrooms are occupied) and we will denote it with P:

We can safely multiply in the previous relation since the probability that a washroom is occupied is independent of other washrooms being occupied or not.

To calculate the probability that a washroom is occupied we need to know in average how many times a man uses the washroom and for how long.

For the second parameter I couldn’t find any … informative links so let’s just call this tw and give it a reasonable value of 3 minutes.

The probability that a man will be in the washroom is:

Of course you can try your own calculations using your preferred values for Np and tw. Also I had friends arguing that my whole argument is not valid since they know people who stay more or less than my assumed values. Of course I’m aiming here for an “average” probability, and if you want to know exactly the value for your particular flight/home/work place, you can experimentally find the above probability per person and plug it in the equations below.

If we have N people in the same place and one washroom, the probability that the washroom is occupied will be:

since P(nobody is in the washroom) equals P(person 1 is not in the washroom) and P(person 2 is not in the washroom) and … and P(person N is not in the washroom) which are independent events and thus can be multiplied together.

Going back to equation (1) we can write now:

And to finish with a mental image here is a plot of the P(all washrooms are occupied) for different values of N and Nw. Considering that a reasonable plane capacity is around or above N = 200 you can see for yourself that there’s a good reason the washroom is almost always occupied.

There can be no denying that chance is a part of justice. After all that is why all statues that represent it are blind. And (un)fortunately it will probably stay that way as long as humans are part of it. Also jury trial is probably as old as the concept of justice itself. What it means, in the current times, is that a group of people is chosen to decide questions of fact in a trial. What I’m interested in this post is to see how the bias of the general populace can influence jury decisions.

In the modern justice systems there is also this concept called presumption of innocence. It is a very nice concept in theory but as usual in practice sometimes it’s not easy to follow it. Sometimes people just assume that if you’re on trial you must be guilty. Sometimes the trial involves powerful emotions an it’s easier to follow them then try to reason logically. As I said the jury is made of people, and people have their biases.

Let’s suppose the following scenario: there are N people in a jury, and they have to decide if the defendant is guilty or not. But the trial was not extremely conclusive (which is normal, because if the verdict would be obvious this post would have no purpose). So either the lawyers were really good (or really bad) or the evidence was not strong enough for any side, but the jury members have to basically give a guilty/not guilty verdict by chance. Of course each member of the jury will use its own rationalization to justify the answer that (s)he gave but I’m aiming for a probability problem here, not a sociology one.

In most cases a jury needs unanimity to give a verdict. So let’s add some more constraints to our problem. Let’s suppose that if the majority of the jury will vote one way (guilty or not guilty) they will convince the rest to vote the same way. Also if the guilty/not guilty votes are equal the jury will vote again (randomly) until we have majority in which case we apply the first rule.

So we can have multiple results of the jury vote depending on the number of members in the jury. If N is even there are 3 possible resulting scenarios:

the defendant goes to jail if more than members of the jury vote that he’s guilty

the defendant goes free if less than members of the jury vote that he’s guilty

the jury will vote again if the votes are split equally. This scenario is here just for completeness since the jury will have to vote again

If N is odd we have only the first two. From now on I will consider N = 12 for the purpose of getting some actual calculations done.

All jury members are NOT biased

In this case each jury member will give the guilty/not guilty verdict with probability of 0.5. So the defendant has equal chances to end up in jail or free, which makes sense since the trial was inconclusive and there is no way to take a clear decision.

All jury members are identically biased

In this case each jury member will give the guilty/not guilty verdict with probability different from 0.5. An easy model for this is to imagine we have a big bowl with black and white balls, and the probability to extract a black ball (guilty) is equal to the bias of the jury. We then proceed and extract N times (with replacement) from that bowl.

The probabilities corresponding to the 3 possible scenarios described earlier are:

Only ONE jury member is biased

In this case each jury member will give the guilty/not guilty verdict with probability of 0.5 except one of them which is biased. An easy model for this is to imagine we have two big bowls with black and white balls, one with an equal number of black and white balls from which we extract (with replacement) N-1 times and one from which we extract once where the probability to extract a black ball (guilty) is equal to the bias of the single jury member.

We can organize the possible outcomes in the following table (just to make it easy to see what terms we need to add):

1-p(bias)

p(bias)

C(11, 0)0.511

free

free

C(11, 1)0.511

free

free

C(11, 2)0.511

free

free

C(11, 3)0.511

free

free

C(11, 4)0.511

free

free

C(11, 5)0.511

free

re-vote

C(11, 6)0.511

re-vote

guilty

C(11, 7)0.511

guilty

guilty

C(11, 8)0.511

guilty

guilty

C(11, 9)0.511

guilty

guilty

C(11, 10)0.511

guilty

guilty

C(11, 11)0.511

guilty

guilty

The terms in the first column are the probabilities for possible outcomes of the (N-1) judge members who have no bias. The first row shows how the biased single member can vote. To get the probabilities of each possible scenario just add all the products as described in the table.

A couple of days ago, one of my friends asked me to calculate some probability related to Texas hold’em. The probability he wanted to calculate was quite simple, but exactly this is the problem. The easier it looks to calculate, the more opportunities for mistake. This reverse can be also true, if it’s hard to calculate you probably won’t even try.

The problem is the following: given that the 2 cards you have in your hand are of the same suit, what is the probability that you will get 4 cards of the same suit after the flop.

Since people learn from their mistakes, but smart people learn from other people’s mistakes, lets see how to calculate this probability. The way it works is that you know 2 cards so you have 50 left. Now, it can be argued that the other players have each 2 cards too, thus it will be less than 50 cards in the game. But from your point of view it really doesn’t matter if the other players have cards or how many players are there. Imagine the following scenarios:

You have a pack of 52 cards. You give yourself 2 cards, then you draw some cards and just leave them on the table, then you draw 3 more and put them face up (the flop in Texas hold’em). Here it is obvious that you have 50 cards and you just draw 3 of them randomly

You have a pack of 52 cards. You give 2 cards to you and a number of other players and then draw 3 more and place them face up. There is no difference in the way the cards are drawn compared to the previous scenario.

Of course if you would knew the cards of the other players you could calculate more accurately the probability that we’re looking for. But based only on the knowledge of your 2 cards you can use the first scenario regardless of the number of players at the table. It’s the same as if you have a dice. If you just know it’s just a dice you can say the probability to get the number 1 is 1/6. But if you would know in detail the mass distribution of the material of the dice, the initial position of the dice, and the initial force and moment you could give a more accurate probability to get number 1. But this is already getting into the philosophy of probability so lets leave this for another post.

So we have 50 cards and we draw 3 of them. This means C(50, 3) is the total number of outcomes. But we already have 2 card of a particular suit and we want 2 more of the same suit. Since we are left with 11 cards of our suit of interest in the rest of the cards, and we want any 2 of them that means C(11, 2) favorable outcomes. And for the third card in the flop we can have anything else, so that means 50 – 2 = 48 outcomes. This gives us:

(Un)fortunately we had the right result, and it wasn’t what we got. The mistake was that for the third card we multiplied by the number of cards remaining in the deck. However there are cards of the same suit after we draw the 4 we were interested in. So what we calculated was the probability to get 4 or 5 cards of the same suit after the flop. The requirement was to calculate the probability to get only 4 cards of the same suit.

So the correct answer if that for the third card we can have any card of another suit. That means 39 outcomes giving the right result of:

Life is usually complicated. But sometimes you can simplify some specific problems to yes/no or good/bad answers. If you have a sequence of independent repetitions of an event, with two possible outcomes, and the probabilities of those outcomes does not vary between repetitions, we call that a Bernoulli trial.

If all the above conditions are fulfilled, we can denote p as the probability of success and q = (1 – p) the probability of failure for one such event. For a sequence of n events we have the probability to be successful in k of them:

While this may seem a bit boring, we can get use the above relation for various probability estimates, as long as we apply it correctly, and the events are Bernoulli trials.

Example 1

Let’s say you have to pass a test. There are 20 multiple choice questions, each one with 4 choices and one correct answer. What is the probability to pass if you do not study? That would be the probability to answer correctly to 12 questions from 20, with p being 0.25 (1 out of 4 possible choices):

A probability of 0.075% is really low, so you will probably fail without study. However this particular test also has other rules like: 3 questions are mandatory to answer correctly, or you fail regardless of the other answers. So let’s suppose you study somewhat so you’ll correctly answer those 3 mandatory questions. You probability to pass becomes:

which is 120 times larger that if you don’t study at all, but still low.

In the end the best idea to pass any test is to study for it, but in case you don’t feel like it at least you can estimate how much you should study based on how lucky you feel.

Example 2

Let’s suppose we’re playing a game where you can win 1 dollar with probability p = 0.5 and lose 1 dollar with q = 1 – p. The expected value of the game is:

so in the long run we would expect to neither gain, nor loose from repeatedly playing this game.

However let’s calculate the probability to win at least 4 dollars after 20 games. This is the probability to win 11 games (this means 11 vs. 9 so we gain 2 dollars) or the probability to win 12 games (this means 12 vs. 8 so 4 dollars). We can add the probabilities because the 2 events are mutually exclusive. According to what we discussed this means:

This looks like a good enough probability to win. For a run of 80 games we have:

So although the probability decreases with the number of runs it is still looks significant.

So how does this work? On one side we have the expected value of the game 0, so you expect to win 0 dollars on the long run but if we actually calculate the probability to win a couple of dollars in a series of repeated games we get big numbers. The answer is simple actually, there is a probability to lose at least 4 dollars and in this case it is equal with the probability to win, so you will probably get nothing in the end of a long series of games.

Always think of all the probabilities or you will miss important ones.

Conditional probability is the probability of an event A, given that some other event B occurred. There is no implicit causal or temporal relationship between the two events. In most of the practical examples, there is such a relationship, but this is not necessary for the mathematical relations described here.

Examplesource: A car is produced in 2 models: Model 1 and Model 2. Each model is produced in 2 designs: hatchback and wagon. Off all the cars produced, 30% are Model 1, and 20% are wagons. But 50% of Model 1 are wagons.

I know Venn diagrams are usually made with circles, but I drew it with rectangles since it’s easier to represent the areas exactly. This diagram illustrates the basic rules of conditional probability:

and

Independent and mutually exclusive events

The event A is said to be independent of B iff:

in other words, if the fact that B occurs (or not) has no effect of the probability of A. Now, are two mutually exclusive events (events that have no basic outcomes in common) also independent? Two mutually exclusive events are actually dependent of each other, since the occurrence of one of them means the second event cannot happen.

If events A and B are independent it follows that events A and B’ (the complement of B); A’ (the complement of A) and B; and A’ and B’ are independent also. A Venn diagram of 2 mutually exclusive events A and B has no intersection between them, while the diagram of 2 independent events A and B will have the following relation . This means – according to equation (1) – that the ratio of and is equal with the ratio of and and equal with the proportion of A in the whole sample space.

Bayes’ theorem

If we equal in equations (1) and (2) above, we get Bayes’ theorem:

Confusion of the inverse

Confusion of the inverse is the assumption that P(A|B) is the same as P(B|A). The classical example for this is the assumption that if you’re tested positive for a particular disease that means you have that disease. Let’s try to put some numbers on this example. Usually we know the following:

P(ill) – the probability that a specific population has the disease. This is usually determined through statistics by recording the number of known cases of the disease. You can probably find this number from your favorite statistics agency.

P(positive|not ill) – the false positive rate for the test (type I error). This is the probability that the test will be positive for a person that does not have the disease. This value is usually denoted by . Conversely P(negative|not ill) =

P(negative | ill) – the false negative rate for the test (type II error). This is the probability that the test will be negative for a person that really has the disease. This value is usually denoted by . Conversely P(positive|ill) =

Let’s apply Bayes’ theorem now:

The people for which the test is positive can be split in ill or not ill. Since (being ill and testing positive) is mutually exclusive with (not being ill and testing positive) we can write:

and then by using equation (2) we get:

which is obviously different from P(positive|ill) = .

How different? Well for any reasonable good test we should expect and to be really small. So below you can see 2 graphs (3D because we have 2 independent variables and ) for 2 different values of P(ill).

While P(positive|ill) depends only on the parameters of the test, P(ill|positive) depends on the parameters of the test and also on P(ill). The difference between the two is larger when P(ill) is smaller, meaning if the disease is rare, it is less probable that one test will tell you that you are sick (unless it has amazingly low false positive and false negative rates).

Conditional probabilities are easy to guesstimate and difficult to calculate. And remember that usually in this case guesstimate is more of a wild guess based on your emotions at that particular moment then an estimate. When dealing with conditional probabilities it is better to think, calculate them, calculate them again and then look for an independent confirmation just to be sure that you’ve done it right.

I got this problem from a friend, and although the answer can be found with a web search I will reproduce it here for future reference.

What are the probabilities of the different poker hands? First let me be specific in that I’m talking about the probability of getting one of the poker hands at the first draw from a full deck of cards. So we have 52 cards (no wild cards) and we draw 5, or we a number of players and we give 5 cards to each.

Both of the above situations are equivalent. There is no difference in the way you calculate the probability if you draw the first 5 cards or you draw the 2nd, 6th, 10th, 14th and 18th card from the deck.

Royal Flush

Royal flush is ace, king, queen, jack, ten in the same suit. The probability is:

There is only 1 possible combination for each suit that can give you a royal flush, multiplied with the number of suits.

Straight Flush

Straight flush is 5 cards in rank sequence and of the same suit. The probability is (including royal flush):

There are 10 ways of getting 5 cards from 14 in rank sequence (because we can have ace-high or an ace-low straight flush) for each suit, multiplied with the number of suits.

Four of a kind

Four of a kind means 4 cards of one rank, and an unmatched card of another rank. The probability is:

There are 13 possible ways to get 4 cards of the same rank (that’s the number of ranks available) and there are (52 – 4) possible ways to get the remaining 5th card.

Full house

Full house means 3 matching cards of one rank, and 2 matching cards of another rank. The probability is:

There are possible ways to get 2 suits from the 13 available. This is because in this case it matters if for example we have 3 cards of clubs and 2 cards of spades or 3 cards of spades and 2 cards of clubs. There are and possible ways of getting 3 and respectively 2 cards from the 4 cards of the same rank.

Flush

Flush means 5 cards of the same suit, not in rank sequence, excluding the straight flush and royal flush. The probability is:

There are possible ways to get 5 cards from each suit and we have to subtract the 40 straight flushes (see above) from that number.

Straight

Straight means 5 cards in rank sequence but in more than one suit. The probability is:

There are 10 ways of getting 5 cards in rank sequence and combinations available for those 5 cards. As before we have to subtract the 40 straight flushes from that number.

Three of a kind

Three of a kind means 3 cards of the same rank, plus 2 unmatched cards. The probability is:

OK, so there are possible ways of selecting 3 cards from 4 of the same rank, and we have to multiply this with the number of ranks. For the remaining 2 cards there are possible ways of selecting the rank so it would not match the rank of the first 3 cards, and combinations available for those 2 cards.

Two pair

Two pair means two cards of the same rank, plus two cards of another rank (that match each other but not the first pair), plus one unmatched card. The probability is:

We have possible ways of selecting 2 different ranks from the 13 available, multiplied twice with which is the number of possible ways of selecting 3 cards from 4 of the same rank. For the remaining card we have only 11 suits available and 4 possible cards per suit.

One pair

One pair means two cards of the same rank, plus three other unmatched cards. The probability is:

We have possible ways of selecting 2 cards from 4 of the same rank multiplied with the number of ranks. For the remaining 3 cards we have possible ways of selecting a different rank and combinations available for those 3 cards.

For a final overview, the below pie chart shows all the probabilities discussed previously (all of them are in the chart, just that some of them are really small).

Let’s suppose you play with an opponent the following game: if you win you get 1 dollar, if you lose you lose 1 dollar. The probability to win the game is p (and conversely the probability to lose the game is q = 1 – p), you start with i dollars and your opponent with N – i dollars (N is the total sum in the game). What is the probability to win the whole sum of money, before you go broke? Or in scientific terms we’re looking for the probability to win everything P(N|i), that is the probability to win N dollars conditioned by starting with i dollars. After a not so complicated proof the result is:

This problem is known as gambler’s ruin and was first proposed by Christiaan Huygens. Let’s plot the above result and see if we can draw some conclusions from this problem.

First, in the case of a perfectly fair game (p = q = 0.5, meaning no player has an advantage) the probability to win the total amount is dependent only on how much money you start with. In other words the one with the most money, has a bigger probability to win. Or if you prefer the corollary, if you want all the money in a game try to start the game with most of them in your pocket.

If the game is not perfectly fair it gets even worse. If N = 50 dollars and you start with 20 dollars, the probability to win everything drops to below 0.1 if the probability of winning one game is just slightly against you (0.48). The probability to win everything decreases slowly after that (if the probability to win a game goes below 0.48) but that just shows that a small difference is all it takes, you don’t need to cheat to badly to get an effect.

So the lesson is simple. If you want to maximize the probability to win all the money in a series of games, try to start with (much) more money than your opponent(s) and try to have a bigger chance of winning the game than your opponent(s). This is the basic principle why professional gambling establishments cannot lose since they will make sure that both of the above conditions are met for them.

Annex

To find P(N|i) let’s draw a Markov chain. This is basically a graphical representation of the gaming process as a series of states (characterized by the amount of money you have won i). The future state is determined only by the present state and the probabilities p and q. The states 0 and N are special end states, meaning if you reach them the game has ended. Also in the 0 and N states we have the boundary conditions P(N|0) = 0 (you can’t win the game if you have 0 dollars) and P(N|N) = 1 (you have won the game if you have N dollars).

For any state i except 0 and N we have probability p to go to state i + 1 (green transition) and probability q to go to state i – 1 (red transition). So we can write the following:

We know that P(N|0) = 0 and by writing equation (1) for each transition:

Adding all the above:

Writing the other boundary condition P(N|N) = 1 for equation (2) we get:

Replacing (2) into (3) we get the relation we were looking for:

Of course this result is not valid for ρ = 1, but this case is easily demonstrated by replacing ρ into equation (2), where we get: