Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population.
We will start by considering the basic principles of significance testing: the sampling and test statistic distribution, p-value, significance level, power and type I and type II errors. Then we will consider a large number of statistical tests and techniques that help us make inferences for different types of data and different types of research designs. For each individual statistical test we will consider how it works, for what data and design it is appropriate and how results should be interpreted. You will also learn how to perform these tests using freely available software.
For those who are already familiar with statistical testing: We will look at z-tests for 1 and 2 proportions, McNemar's test for dependent proportions, t-tests for 1 mean (paired differences) and 2 means, the Chi-square test for independence, Fisher’s exact test, simple regression (linear and exponential) and multiple regression (linear and logistic), one way and factorial analysis of variance, and non-parametric tests (Wilcoxon, Kruskal-Wallis, sign test, signed-rank test, runs test).

ZH

This course was relatively difficult for my slow brain. I learned A LOT. but I still feel the need for doing this course one more time

FY

Jan 25, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

A very fast course! Really gave me a hard time. Very good content and good organization. A lot of calculations on the final.

From the lesson

Simple regression

In this module we’ll see how to describe the association between two quantitative variables using simple (linear) regression analysis. Regression analysis allows us to model the relation between two quantitative variables and - based on our sample -decide whether a 'real' relation exists in the population. Regression analysis is more useful than just calculating a correlation coefficient, since it allows us assess how well our regression line fits the data, it helps us to identify outliers and to predict scores on the dependent variable for new cases.

Taught By

Annemarie Zand Scholten

Assistant Professor

Emiel van Loon

Assistant Professor

Transcript

In this video you will learn how to assess whether the predictor in a regression model provides a good description of the response variable. You'll learn to assess predictive power of a regression model by using the proportion of explained variation referred to as r squared. Consider the example where we predicted popularity of cat videos represented by the number of video views using the cat's age as a predictor. We hypothesized that videos of younger cats will be more popular. Suppose we collected some data and calculated the intercept and regression coefficient. The obvious question to ask is, how well does our regression model describe the observations? A very useful coefficient is r squared. Which is exactly what it looks like. The square of the correlation coefficient which varies between zero and one. Let's see how you should interpret r squared. Regression tells us whether variation in the predictor goes together, or co-varies, with variation in the response variable. For example, when lower cat age goes together with higher video popularity. If cat age covaries perfectly with video popularity, then all the variation, each change in video popularity, is perfectly predicted or explained by a corresponding change in cat age. r squared tells us how closely our sample approximates this ideal situation. It tells us, out of all the variation in the response variable, popularity, what proportion is explained by the predictor cat age. Mathematically, all the variation in response variable is expressed as the total sum of squares. You get the total sum of squares by taking the differences between each of the zero popularity square, y sub i, and the mean Y bar, squaring in each differences and adding them. Don't forget to square, otherwise the negative and positive differences will cancel each other out. Notice that this measure of variation is almost the same as the variance, we just don't divide by n - 1. So what part of the total variation is explained by our predictor? Well, we already know which part it doesn't explain. That's the error in our model, called the residuals, the variation that we failed to capture. Remember, the residual sum of squares is calculated by adding the square differences between the observations, y sub i, and the predictions, y hat sub i. If we take the total sum of squares and subtract the residual sum of squares, we get the regression sum of squares. The variation in video popularity that is accurately captured by our model. If you divide the regression sum of squares by the total sum of squares, you get r squared. The proportion of variation in the response variable explained by the predictor. We can visualize this as the part of the variation around the mean explained by the regression line. In simple linear regression, you can find this proportion by manually calculating the total and residual sum of squares, or you can simply square the correlation. Both methods give the same results. What happens if our model predicts the observations perfectly? Well, then the residuals are zero, there is no error. In that case r squared equals the total sum of squares divided by the total sum of squares. In other words, r squared = 1. What if the predictor is unrelated to the response variable, and provides no help at all in predicting it? The scatter plot will look something like this with random data points. What's the best prediction you can make in this situation? Well, if the predictor is useless, the response variable provides the only helpful information. The best guess is the mean popularity score of all videos in our sample. This produces a horizontal line with an intercept equal to the mean of the response variable. As a consequence, the residuals, the differences between each prediction and observation, are the differences between each observation and the made. The residual sum of squares is the same as the total sum of squares. Subtracting them will result in zero, so r squared will be zero. In this worst case scenario, our model captures none of the variation in the response variable. In our example, the value of r squared is 0.49 which is a pretty high value for these types of variables. In the behavioral and social sciences relationships between variables are often complicated and influenced by many other factors. This is why with real data, we're generally already very happy with r squared values of 0.25. But you should remember that the value of r squared really depends on the type of variables you're investigating. In some cognition, medical or biological research fields, you might see much higher values.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.