Another Introduction to Inference

This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public. Using numerous data examples, you will learn to report estimates of quantities in a way that expresses the uncertainty of the quantity of interest. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The course introduces practical tools for performing data analysis and explores the fundamental concepts necessary to interpret and report results for both categorical and numerical data

審閱

DS

This course is an excellent overview of inferential statistic tests / hypothesis tests and confidence intervals. The organization and material is quite good, with exercises and applications using R.

RR

Jun 15, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

Awesome. I loved the way this course is done. I know what Test Statistic to use for what type of data and under which conditions. I am preparing a cheat-sheet that will be shared with all later on.

從本節課中

Inference and Significance

Welcome to Week Two! This week we will discuss formal hypothesis testing and relate testing procedures back to estimation via confidence intervals. These topics will be introduced within the context of working with a population mean, however we will also give you a brief peek at what's to come in the next two weeks by discussing how the methods we're learning can be extended to other estimators. We will also discuss crucial considerations like decision errors and statistical vs. practical significance. The labs for this week will illustrate concepts of sampling distributions and confidence levels.

教學方

Mine Çetinkaya-Rundel

腳本

Let's take a brief pause in this video. What we want to do, is to do another introduction to inference, before we move on to the nitty gritty of hypothesis testing methods that actually rely on the central limit theorem that we've learned recently. We're going to review simulation based inference. That is the activity that we did at the end of unit one. As well as review what we've learned so far, about the hypothesis testing framework. Before we move on to learn more about it. Remember when we looked at this contingency table as part of the simulation and friends activity. We did at the end of unit one. We had an experiment done on bank managers. These were all male bank managers. And they were presented the same employee file, or resume. Basically, all of the qualifications were the same except half of them were labeled male and half of them were labeled female. And we saw, throughout our exploratory data analysis, that the percentage of males promoted were 21 out of 24. So that's 88% of the males. However only 14 out of 24, 58% of the females were promoted. We were clearly able to see a difference between the percentage of males and females promoted. However, instead of jumping the gun, we said that there could be two competing claims that actually explain what's going on here. These two claims are one, there is nothing going on. So, what that means is that promotion and gender are independent. There is no gender discrimination. And, that the observed difference in proportions is simply due to chance. This is what we call our null hypothesis, and we're going to use this terminology more and more in the class, so make sure that you're comfortable with that. The other possibility is that there is, indeed, something going on. In other words, promotion and gender are dependent, there is gender discrimination. Observed difference in proportions, that's the proportions of promotions between males and females, is not due to chance. That is what we call our alternative hypothesis. So how did we finally make the decision? Remember that we actually did a simulation based inference where we simulated the experiment under the assumption of the null hypothesis being true. In other words, under the assumption of independence. In other words, leaving everything up to chance. And at each simulation, we calculated the difference between the proportion of promotions in the observed data, that difference was 30%. That's the 88% for the males, minus the 58% for the females. And then we looked to see, does 30% appear to be a usual outcome when we leave things up to chance? Or does it not? Each one of these dots here in the dot plot represents a one simulated difference between the proportions of males and females getting promoted. 30% does not seem like a usual outcome. In fact, it's quite unlikely to obtain a result like the actual data, or something even more extreme in these simulations. More extreme meaning or something in the data meaning male promotions being 30% more or even higher than female promotions. Therefore we had decided to reject the null hypothesis in favor of the alternative. So to reminder ourselves of the framework. We start with a null hypothesis that we usually call it H naught and that represents our status quo. And we also have an alternative hypothesis our HA that represents our research question, in other word, what we're testing for. We conduct a hypothesis test under the assumption that the null hypothesis is true, either via simulation, that's what we did at the end of unit one, or using theoretical methods. Methods that rely on the central limit theorem and that's what we're going to do in this unit. If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, we stick with the null hypothesis. If they do, then we reject the null hypothesis in favor of the alternative.