So I recently came across the Monty Hall probability problem. I'm not sure how familiar people are with it, but I guessed the answer and chose wrong (predictably). I've been out of mathematics for a while and now I'm trying to wrap my head around probability because it's two days until holiday break and work is slow as ****. So first, the problem, and a poll for the answer (no cheating, guess first then look it up)!

Quote:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?"

Is it to your advantage to switch your choice?

No, it's advantageous to keep your original answer.:

3 (10.0%)

Yes, it's advantageous to change your original answer.:

12 (40.0%)

Changing the answer does not change your odds of winning the car:

13 (43.3%)

Goat is delicious and I already own a car, so it's to my advantage to win the goat. Thus, I choose (answer below):

2 (6.7%)

Total:

30

Now once you've answered, look up the answer online (and go make me a meal if you chose the goat).

My question is, if we mix it up... Imagine, for example, that there are not three doors but 300 doors. There’s still just one good prize, with the rest being goats (the bad prize).

So you pick a door—say number #274. There’s a 1/300 chance you’re right. This needs to be emphasized: you’re almost certainly wrong. Then the game show host opens 298 of the remaining doors: 1, 2, 3, and so on. He skips door #59 and your door, #274. Every open door shows a goat. Should you switch?

The answer is the same as the first part - but I'd like, if possible, for someone to explain the logic behind it to me. I did some rough pen and paper sketches for it, but I suck at these. What is the probability if you switch, and what is the probability if you stay? I think it's 1/300 if you stay, and 299/300 if you switch... is that right?

You should always change your answer. The game show is set up to make it appear as though changing or staying have the same odds, and since most people have a bias towards sticking with their first answer (they'd rather be wrong on their first pick, than to move from a right answer to a wrong one), most people will make the wrong choice.

The key to understanding this is that the game host picks which of the other two doors to open. It is not random and he will never open the one with the prize. If he randomly opened one of the other two doors, with a chance that he'd reveal the car (which presumably would mean you'd lose automatically), but instead revealed a goat *then* it would be a 50/50 chance between the two doors remaining. But since he opens the door without the car if one of the other two has the car behind it, it changes the probability.

When you make your selection, you have a 1 in 3 chance that the door you pick has the car. Which means that the other 2 doors collectively have a 2 in 3 chance of containing the car. Showing you the goat behind one of those two doors does not change the odds that one of them contains the car. It only guarantees that if there is a car behind one of those two, it must be behind the door he didn't reveal. Which means that the odds of the car being behind the door you picked remains at 1 in 3, while the odds of the car being behind the door you didn't pick and the host didn't reveal is 2 in 3.

You will win twice as often if you change your selection.

Oh. To answer your hidden question: Yes. That is correct. Because by revealing all the doors you didn't select but one, that one door takes on the probability of holding the prize equal to all the doors. Another way to look at it is that if any of those doors held the prize, it will be behind the one door he didn't show you. Thus, the odds of the prize being behind your door is always 1/n (where n is the total number of doors), but the odds of it being behind the one unselected and unrevealed door will be n-1/n.

You'll usually pick the wrong door the first time, so most of the time (2/3) the car will be behind one of the other doors. Since whichever one it isn't behind is eliminated, the single door left at the end is most often the winning one.

You'll usually pick the wrong door the first time, so most of the time (2/3) the car will be behind one of the other doors. Since whichever one it isn't behind is eliminated, the single door left at the end is most often the winning one.

Yeah, I don't get this.

If there are 3 doors and 1 car, the probability of me picking the car up front is 1/3. If it is known that the host will always open whichever door remains unpicked and contains a goat, then it is also known that when prompted to change, there will exist 1 door with a goat, and 1 door with a car. Where does the assumption that the one I originally picked probably does not contain the car come from? Or in other words... two scenarios:

Scenario 1:

1. Show a person 3 doors, 1 of which has a car behind it. 2. Prompt the person to pick one. Then remove whichever remaining door does NOT have the car. 3. Prompt the person to change their pick between the remaining two, if desired.

Scenario 2:

1. Show a person 3 doors, 1 of which has a car behind it. 2. Remove one of the doors which does NOT contain the car. 3. Prompt the person to pick one of the remaining doors

In either case, the chance of selecting the cup with the ball comes down to the final choice between the two remaining cups whether that choice is the initial selection or the second chance, and the original pick seems to have no bearing on the result.

So in the end, you know that one of the two remaining cups has a ball. Whether you change your pick or not, the probability that the cup you pick has a ball is 1/2.

You initial pick has a different probability, because you'll usually pick the wrong door. Therefore most of the time, the car will be behind one of the other doors. Think of the two doors left as a single unit. They have a probability of having the car of 66%

If you have a choice of two doors, one of which has a 66% chance of having the car, and one has a 33% chance of having the car, which one will you pick?

Edit: I just remember this post cause it was a big arguement with a select few failing to see the advantage of changing your pick. Basically it boils down to what the probability that your first choice was wrong. By changing, you are betting that your first choice was wrong, which is the statistical advantage.

You initial pick has a different probability, because you'll usually pick the wrong door. Therefore most of the time, the car will be behind one of the other doors. Think of the two doors left as a single unit. They have a probability of having the car of 66%

All right, see? Now THAT was the kind of answer I was looking for. I understood mathematically why it worked, but that wording just made it click! Rate-up for you, Xsarus!

Another way of looking at it is that you're really being given a choice between picking one door, which must contain a car for you to win, or two doors either one of which can contain a car for you to win. The odds of the car being behind one of the two doors you didn't pick is twice as likely as the car being behind just the one door you picked.

It's a bit of a brain teaser because at first glance it appears like you have two doors and one of them has a car, so it should be 50/50. It's not though. Mythbusters also did a bit on this, and showed the logic and the math behind it, and then just in case people weren't convinced, they rigged a game show and ran a big series of rounds, with one of them always staying and one of them always changing. They also tested the psychological aspect of the game, which is that people's tendencies are to not switch, and sure enough something like 90% of their test group would not switch their answer when given the chance.

As you can see, in only 3 of those 9 situations was your original door choice correct. So you'd have 1/3 odds of winning if you stayed. In 6 of those 9 situations, you'd win the car if you change doors, for 2/3 odds of winning.

The intuitive way to think of it is that Monty's door reveals are not random. He has knowledge of where the car is, and so by leaving a particular door closed, he is in fact giving you extra information about that door. Well, sometimes he is:

A) If you picked the car door correctly at first (1/3 of the time), Monty is just revealing any random doors. So he's giving you NO information in that case.

B) If you picked an empty door at first (2/3 of the time), Monty can't just open random doors. He is narrowing down all the remaining choices down to the exact door HE KNOWS has a car behind it. So 2/3 of the time, Monty's reveal is giving you the information of which car has the door.

This concept becomes more clear when you envision 300 doors. The fact that Monty opened all those 298 other doors and left just that one effing specific door gives you a pretty good hint that it's the winning door. That "hint" would only be a red herring in the 1/300 chance that you picked correctly at first.

Wait, if the host is going to open an "incorrect" door each time, isn't there really just one choice between 2 doors? The first one being completely meaningless?

Not exactly... The host doesn't just choose an incorrect door and leave you with two choices... he eliminates all the possibilities except the one you chose and one that you did not choose. Now the choice becomes not "Which door is right?" but instead "How likely is it that I was wrong in my first selection?" If you had a 33% chance you were right the first time, you had a 66% chance you were wrong. The only way to lose is if you chose the correct door the first time. But there was only a 33% chance that you selected correctly the first time.

Wait, if the host is going to open an "incorrect" door each time, isn't there really just one choice between 2 doors? The first one being completely meaningless?

Not exactly... The host doesn't just choose an incorrect door and leave you with two choices... he eliminates all the possibilities except the one you chose and one that you did not choose. Now the choice becomes not "Which door is right?" but instead "How likely is it that I was wrong in my first selection?" If you had a 33% chance you were right the first time, you had a 66% chance you were wrong. The only way to lose is if you chose the correct door the first time. But there was only a 33% chance that you selected correctly the first time.

Edited, Dec 20th 2012 8:46pm by TirithRR

Yeah, I get that. I just don't see why the probably doesn't reset the second time when you're asked to choose between 2 doors. I mean if you "forgot" which door you chose the first time, the odds of choosing the right answer would be 50/50.

Kinda get it with lotto tickets...

You want the numbers you chose or the numbers I chose? One of them is the winning ticket.

Iterations: 1,000,000 % of time door change resulted in a win: 33% % of time first choice resulted in a win: 17% % of time the player didnt wint at all: 50%

Those are strange numbers though. The code is doing exactly what it's supposed to, but it's not really doing something terribly useful. Since it's randomly determining whether you switch or stay, the output is misleading. A better way would have run X iterations in which the player always switches and X iterations in which the player always stays.

If you always stay, you will win 33% of the time. If you always switch, you will win 66% of the time. Obviously, if you stay half the time and switch half the time, you'll win about half the time. Just want to make it clear that this doesn't mean that your odds are even between the two choices. It's just that the data output is presented strangely. At no point are you "odds of winning" a round of this game 50%.

Quote:

I'm still not sure I understand the reason behind it (none of your descriptions make sense to me), but the numbers don't lie.

Strange. I'd have thought one of the several explanations would have clicked the lightbulb on by now. It's a bit tricky to see it, but once you do, it's really quite obvious and you'll wonder why you ever thought differently. It's not even complex math. It's just so hard to untrain our brains to think "two doors, one car, even chance".

Yeah, I get that. I just don't see why the probably doesn't reset the second time when you're asked to choose between 2 doors. I mean if you "forgot" which door you chose the first time, the odds of choosing the right answer would be 50/50.

Correct. Because if you "forgot" which door you picked, it's the same as if the host eliminated one of the doors before you picked. In which case you have a 50/50 chance. But you don't forget. You know that you picked one out of three doors randomly, and that you had a 1 in three chance that the door you picked had a car behind it. Since the host will always eliminate one of the remaining 2 doors, and that door will always be one that does not have a car behind it, the remaining door has a 2/3 chance of having a car behind it, while the one you picked only has a 1/3 chance.

The reason this is the case is because the hosts choice is not random. If there's a car behind one of the two other doors, he will not eliminate that one. So if you pick door number 1 and the car is behind door number 2, he'll eliminate door 3. If it's behind door 3, he'll eliminate door 2. In effect, this means that if either door 2 or door 3 has the car, it will always be the door not eliminated and thus the door you are given the opportunity to pick in the second part will always contain the car if either one of those two doors did.

Imagine if instead of revealing a door without a car behind it, the host instead just said: "You can keep your one door, or you can choose to open both of these doors and if either one has a car, you win". You'd take that offer every single time, right? Well, that's exactly what he's doing when he eliminates one of the other two doors and ask if you want to switch for the third. It's the same thing. If either one of those two doors has the car behind it, you win by switching.

Yeah, I get that. I just don't see why the probably doesn't reset the second time when you're asked to choose between 2 doors. I mean if you "forgot" which door you chose the first time, the odds of choosing the right answer would be 50/50.

Yes, if you "forgot" what happened before your final choice, it would be 50/50. A coin has no "memory", and so each flip is an independent event. But Monty's actions are not independent events. His door revealing is dependent on what door you chose initially. And so it gives you information about the situation. It'd be unwise to "forget" that information.

Yeah, thinking about it I bet my youngest would have picked the right door before me. We do it all the time to help them learn.

Which one is the red crayon? *points to a pile of like 50 crayons*

No sweetie that's yellow.

Here... *grabs the red crayon along with the yellow crayon and holds them up*

Which one is the red crayon?

Good job, that's right!

Wonder why it seems so counter-intuitive in the 'door' situation? General distrust of someone like a game-show host who may be "trying to trick you?" You know, replace the host with someone trustworthy opening the door, and see if the answer changes.

Edit: (also I realize my example isn't exactly the same thing, but it helped with the understanding )

It's only counter-intuitive in some sense because you can't see it. If for example, you were a robot that either always swapped or always held, and the doors were already open, this is what it would look like. 1 = original, and held pick, 2 = swapped pick X = unavailable pick. No amount of shuffling will change these results.

Iterations: 1,000,000 % of time door change resulted in a win: 33% % of time first choice resulted in a win: 17% % of time the player didnt wint at all: 50%

Those are strange numbers though. The code is doing exactly what it's supposed to, but it's not really doing something terribly useful. Since it's randomly determining whether you switch or stay, the output is misleading. A better way would have run X iterations in which the player always switches and X iterations in which the player always stays.

If you always stay, you will win 33% of the time. If you always switch, you will win 66% of the time. Obviously, if you stay half the time and switch half the time, you'll win about half the time. Just want to make it clear that this doesn't mean that your odds are even between the two choices. It's just that the data output is presented strangely. At no point are you "odds of winning" a round of this game 50%.

Leave it to you to misinterpret completely valid numbers.

If you pick the first door and never change your mind, your odds of winning are 17%. If you pick the first door and change your mind, your odds of winning double (~33%).

In either single case, your odds of not winning at all are at least 66% (83% if you pick door 1 and never change). The winning scenarios are mutually exclusive, since one cannot win by sticking with their original pick AND changing their pick. The 50% non-winning attempts is an artifact of the test, and does not imply that your actual odds of winning are 50%. The point of the test is to illustrate that your chances of winning do in fact double by changing your pick when given the opportunity.

Theophany wrote:YOU'RE AN ELITIST @#%^ AETHIEN, NO WONDER YOU HAVE NO FRIENDS AND PEOPLE HATE YOU. someproteinguy wrote:Aethien you take more terrible pictures than a Japanese tourist. Astarin wrote:One day, Maz, you'll learn not to click on anything Aeth links.

Iterations: 1,000,000 % of time door change resulted in a win: 33% % of time first choice resulted in a win: 17% % of time the player didnt wint at all: 50%

Those are strange numbers though. The code is doing exactly what it's supposed to, but it's not really doing something terribly useful. Since it's randomly determining whether you switch or stay, the output is misleading. A better way would have run X iterations in which the player always switches and X iterations in which the player always stays.

If you always stay, you will win 33% of the time. If you always switch, you will win 66% of the time. Obviously, if you stay half the time and switch half the time, you'll win about half the time. Just want to make it clear that this doesn't mean that your odds are even between the two choices. It's just that the data output is presented strangely. At no point are you "odds of winning" a round of this game 50%.

Leave it to you to misinterpret completely valid numbers.

I'm not misinterpreting the numbers. And they are "valid" in that they generate exactly what the script intends. My point is that what the script is doing isn't really what most people want to know when running it. See. I read the script and know exactly what it's doing. It runs a million iterations. Each time through it randomly assigns the prize to the door, randomly determines your starting pick, excludes the first non prize containing door not picked, and then randomly picks between the remaining two doors.

It's giving you the breakdown of what happens if you change your pick half the time and don't change your pick the other half. Which is *not* what you just wrote:

Quote:

If you pick the first door and never change your mind, your odds of winning are 17%. If you pick the first door and change your mind, your odds of winning double (~33%).

See how you are misinterpreting the data? The script does not tell you your odds of winning based on a given selection. It tells you how often out of a set in which you randomly change or don't change your mind various outcomes will occur. That is not remotely the same.

If you change your mind, you will win 66% of the time. If you don't change you mind, you will win 33% of the time. Those are your "odds of winning".

The fact that you misunderstood the results is because of precisely that poor output I talked about. I saw it, you didn't.

Quote:

In either single case, your odds of not winning at all are at least 66% (83% if you pick door 1 and never change).

This is completely wrong. You've failed to understand what the data means. Seriously. Go back. Look at what the script is doing. And then think about it for about an hour or so. It might just come to you what you're doing wrong here.

Quote:

The winning scenarios are mutually exclusive, since one cannot win by sticking with their original pick AND changing their pick. The 50% non-winning attempts is an artifact of the test, and does not imply that your actual odds of winning are 50%. The point of the test is to illustrate that your chances of winning do in fact double by changing your pick when given the opportunity.

This part you got right, which is why I'm scratching my head that you don't understand what's wrong with the first part. Since the test data basically includes half cases where you change your pick, and half where you don't. The actual odds on each pick are twice as high as those presented in the output (which is what caught my eye as being odd about it). You're correct that the point of this script is to show that your odds double if you switch versus if you stay, but the way the output is formatted makes it seem like the base odds aren't nearly as good as they actually are.

A better way to have tested this (and how I would have written the script) would be to run a large sample of tests in which the initial pick and placement of the prize are random each round and the script always change your pick, and a second set where you always keep your pick. Output the win percentage for each of those two and you'll get 66% and 33% respectively. The third line of output is completely meaningless and should not even be there. What it tells us isn't relevant outside of the methodology of the test itself.

Your overall odds of winning if you do this many times and change half the time and don't change half the time is 50%. Your overall odds of losing therefore are also 50%. But this is meaningless because the question isn't "what are my odds if I do this a million times and randomly decide to change or not change my initial pick?". The question is "what are my odds in this round, right now, if I change my pick versus if I don't?".

Just in case anyone's still confused about the problem with that script (and why the output is misleading), I'll try to explain in a bit more clear manner:

The output that reads "% of time door change resulted in a win: " is telling you how many times out of all the iterations, that this outcome occurred. It's simply counting up the total number of times out of X iterations that a door change resulted in a win. Why this is misleading is that since the odds of a door change is 50%, then half of the iterations don't involve a door change (door change is random, right?). Thus, in half of the iterations there is a zero chance of a door change resulting in a win. Out of the half in which the door is changed 66% of them resulted in a win. But the script does not calculate "winning odds" by dividing the number of wins when changing the door by the number of times the door was changed. It simply reports the percentage of time is happens out of the entire set. So since half the set does not involve a door change, the number is half as large as it would be if we were calculating the "odds of winning if you change your door".

Imagine we run just 100 iterations, the breakdown might look like this:

All the script is doing is giving you the results of lines 1 and 3, and then adding up lines 2 and 4 to show how many loses there were. But this is really meaningless output. What we want to know is the odds of each choice winning. To do that, we should divide line 1 by the sum of lines 1 and 2, then divide line 3 by the sum of lines 3 and 4, and output those results. This gives us how many times a given choice won or lost out of the number of times that choice was made.

When we do that we get 33/50=.66, and 17/50=.34. Which is the correct "odds of winning" for each choice.

It's a classic case of someone writing a script without really thinking through how he intended the output to be used. It's technically correct because he doesn't say that the output represents the odds of wining. But many people will make the mistake that this is what they're getting. You have to do additional math to get those odds (and have data that isn't presented in the output, like number of times that one choice was made versus another). What's really interesting in this case is that he actually had to significantly increase the complexity of the script in order to make the output less useful and more likely to be misinterpreted.

It's a bit of a pet peeve of mine, because I see this sort of mistake in programing all the time. Many coders are very good at the technical part of their craft, but fail a bit at the human factors part. It's great that he came up with a fairly clever way of manipulating the numbers assigned in his array so as to minimize loops, variables and assignments ( a nice clean "single pass" script). But he should have spent more time thinking "what are people going to use the output for?". That would have radically changed how he approached the whole thing.