This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.

HD

The tutor makes it really simple. The given examples really helped to understand the concepts and apply it to a wide range of problems. Thank you for this. Wish I could complete the assignments too.

SS

Jul 27, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

Great course! Explained the concepts so clear and crisp and the exercises with R are great. The project reinforces all the concepts. All in all, a great course for beginners in statistics and R.

수업에서

Introduction to Probability

Welcome to Week 3 of Introduction to Probability and Data! Last week we explored numerical and categorical data. This week we will discuss probability, conditional probability, the Bayes’ theorem, and provide a light introduction to Bayesian inference. Thank you for your enthusiasm and participation, and have a great week! I’m looking forward to working with you on the rest of this course.

강사:

Mine Çetinkaya-Rundel

Associate Professor of the Practice

스크립트

The World Values Survey is an ongoing worldwide survey that polls the world population about perceptions of life, work, family, politics, etc. The most recent phase of the survey that polled 77,882 people from 57 countries estimates that 36.2% of the world's population agree with the statement, men should have more right to a job than women. The survey also estimates that 13.8% of people have a university degree or higher and that 3.6% of people fit both criteria. Let's start by listing what we know. 36.2% of the world's population agree with the statement men should have more right to a job than women. So probability of agree is 0.362. 13.8% of people have a university degree or higher, so probability of a university degree is 0.138. It's actually university degree or higher but we'll just use a shorthand notation here. And 3.6% of people fit both criteria. So probability of agree and university degree is 0.036. First question we'll tackle is are agreeing with the statement, men should more right to a job than women, and having a university degree or higher disjoint events? Let's bring back the list of givens we listed earlier. Since the probability of agreeing with this statement and having a university degree or higher is not 0, the events are not disjoint. Next we're asked to draw a Venn diagram summarizing the variables and their associated probabilities. We have two events that we determine to be non-disjoint, so we start by drawing two overlapping circles, one for agree and one for university degree. Then we mark the joint probability in the middle, the 3.6% of people who fit both criteria. We know that the total percentage of those who agree is 36.2% and this includes those who also have a university degree or higher. So to find those who agree, but don't have a university degree, we subtract the two probabilities and find that 32.6% of people agree with the statement, but don't have a university degree. Similarly, 13.8% of people have a university degree or higher, and taking out those who also agree with the statement leaves us with 10.2% of people who disagree with the statement but have a university degree or higher. Next we want to find the probability that a randomly drawn person has a university degree or higher or agrees with the statement about men having more right to a job than women. Let's also put back on the screen what we know so far. We're looking for the probability of agree or university degree. And that should remind us the general addition rule, probability of A or B is equal to probability of A plus probability of B minus probability of A and B. Or, in context, probability of agree plus probability of university degree minus probability of agree and university degree. From here onwards, we can just plug in the probabilities that we already know. That's 0.362 for P of agree, 0.138 for P of university degree, and -0.036 for the intersection, resulting in 0.464. So there's about a 46% chance that a randomly drawn person has a university degree or higher, or agrees with the statement about men having more right to a job than women. An alternative way of getting at the same answer would be using the Venn diagram. The desired probability is basically represented by the area covered by the two circles. So we could simply add all of the shown probabilities where we have already adjusted for the double counting due to the joint probability in the intersection of the two circles and arrive at the same answer. What percent of the world population do not have a university degree and disagree with the statement about men having more right to a job than women? We could simply phrase this as probability of neither agree nor having a university degree, which is basically going to be the complement of probability of agree or having a university degree that we found earlier. We had found that that probability was 46.4%, so the complement is 53.6%. On the Venn diagram, this is basically the area in the sample space outside of agree and university degree. Next we evaluate independence. Does it appear that the event that someone agrees with the statement is independent of the event that they have a university degree or higher? Remember the product rule that says if A and B are independent, probability of A and B is equal to probability of A times probability of B. We can easily check if this is the case by setting up an equation where we check if probability of agree and university degree equals probability of agree times probability of university degree. We have all three of these as givens from our introduction earlier, so all we need to do is plug them in. That is, is 0.036 equal to 0.362 times 0.138? The right-hand side of the equation is approximately 5%, which is not equal to 0.036, therefore we decide that the two events do not appear to be independent. Lastly, let's take a look at this. What is the probability that at least 1 in 5 randomly selected people agree with the statement about men having more right to a job than women? Remember, probability of agree is 0.362 and this is really the only relevant information for this question. If selecting 5 people randomly, our sample space for the number of people who might agree with this statement range from 0 to 5. It is possible that none of them agree, just one agrees, two agree, etc., all the way to all five agree. We're interested in instances where at least one person agrees with this statement. So we can divide up the sample space into two complimentary events, none indicated by the 0 and at least 1 which covers all possible outcomes from 1 through 5. To find the probability of at least 1 out of 5 people agreeing with the statement, we simply subtract the probability of its complement, none agree, from 1. So that is 1 minus the probability of all of them disagreeing. Let's take a moment to think about this. If none of them agree that basically means each one of them disagrees. So first we need to figure out what is the probability that any given person disagrees with the statement? That is also going to be a complement, and it's the complement of the probability of agreeing, which we know to be 0.362. So the probability that any given person disagrees with the statement is 0.638. We're going to use this and we need five such people to make up our desired outcome of five people all disagreeing with the statement. Plugging that back into the formula, we have 1 minus 0.638 to the 5th power. We can multiply each one of these probabilities because we know that whether one person in our group disagrees with the statement is independent of another because we're randomly sampling them. So the result comes out to be 0.894. So there is roughly 89.4% chance that at least 1 person out of 5 randomly selected people agree with the statement about men having more right to a job than women. In this example we brought together many of the concepts that we've learned recently. We touched on sample spaces, we talked about disjoint, complementary, and independent events. We also used the addition rule for unions of events, as well as the multiplication rule for joint probabilities of independent events, both to calculate further probabilities and to check independence as well.