Understanding statistics is essential to understand research in the social and behavioral sciences. In this course you will learn the basics of statistics; not just how to calculate them, but also how to evaluate them. This course will also prepare you for the next course in the specialization - the course Inferential Statistics.
In the first part of the course we will discuss methods of descriptive statistics. You will learn what cases and variables are and how you can compute measures of central tendency (mean, median and mode) and dispersion (standard deviation and variance). Next, we discuss how to assess relationships between variables, and we introduce the concepts correlation and regression.
The second part of the course is concerned with the basics of probability: calculating probabilities, probability distributions and sampling distributions. You need to know about these things in order to understand how inferential statistics work.
The third part of the course consists of an introduction to methods of inferential statistics - methods that help us decide whether the patterns we see in our data are strong enough to draw conclusions about the underlying population we are interested in. We will discuss confidence intervals and significance tests.
You will not only learn about all these statistical concepts, you will also be trained to calculate and generate these statistics yourself using freely available statistical software.

DA

One of the best courses of statistics for the beginners. The concepts are well explained, the learning path well researched and above everything the R labs were ideal for the beginners.

EM

Jan 09, 2016

Filled StarFilled StarFilled StarFilled StarFilled Star

Only the firs week of this course, but I can already tell that it's going to be incredibly useful to me. I've learned a lot and especially love the introduction to R through datacamp!

From the lesson

Probability Distributions

Probability distributions form the core of many statistical calculations. They are used as mathematical models to represent some random phenomenon and subsequently answer statistical questions about that phenomenon. This module starts by explaining the basic properties of a probability distribution, highlighting how it quantifies a random variable and also pointing out how it differs between discrete and continuous random variables. Subsequently the cumulative probability distribution is introduced and its properties and usage are explained as well. In a next lecture it is shown how a random variable with its associated probability distribution can be characterized by statistics like a mean and variance, just like observational data. The effects of changing random variables by multiplication or addition on these statistics are explained as well.The lecture thereafter introduces the normal distribution, starting by explaining its functional form and some general properties. Next, the basic usage of the normal distribution to calculate probabilities is explained. And in a final lecture the binomial distribution, an important probability distribution for discrete data, is introduced and further explained. By the end of this module you have covered quite some ground and have a solid basis to answer the most frequently encountered statistical questions. Importantly, the fundamental knowledge about probability distributions that is presented here will also provide a solid basis to learn about inferential statistics in the next modules.

Taught By

Matthijs Rooduijn

Dr.

Emiel van Loon

Assistant Professor

Transcript

When you know the probability distribution of a random variable, you can start to make calculations for that variable. One of the first things you'd like to know are summary statistics that capture the essence of the distribution well, similar to what you do with observational data. In this video, I'll explain how a mean of a probability distribution is calculated and also show what happens with the mean when adjusting a random variable or when combining different random variables. The mean of a random variable denoted by the symbol, mu, provides the expected average outcome of many observations. It's therefore also called the expected value of that random variable which is written by the symbol E. The mean of a discrete random variable is the probability weighted average of all possible values that the random variable can take. So it is the sum of each possible value times its probability. For a continuous random variable, essentially the same applies. To account for continuity, the summation symbol is replaced by the integral and the probability's not defined as a discreet value is i, but is a function of x. An example. Suppose you have to travel on a daily basis and cross three traffic lights. Waiting at a traffic light will take an extra two minutes of your total travel time. You have kept a record of the frequency by which you had to wait for none up to three traffic lights. This is the probability table. The mean waiting time you can expect for any travel is calculated as follows, leading to a 2 minutes and 15 seconds waiting time. Interestingly, the specific value of 2 minutes and 15 seconds will never occur. You would either wait 0, 2, 4, or 6 minutes. Now let's look at some properties of the mean of a random variable. First we consider what happens if a random variable x is adjusted by adding a value a and multiplying it with a value b. Then the mean is affected as follows. Let's return to our example. As it turns out, you found a shorter route, saving one minute on your trip. But at the same time, traffic got busier and you're waiting times have increased to two and a half minutes per traffic light. That's an increase of 25%. The time you save with the shortcut corresponds with the value of a in the equation, while the factor of 1.25 increase corresponds with b. The new probability distribution for each outcome is provided by the following table. The new mean waiting time turns out to be 2 minutes and 45 seconds. And by accounting for the time gain with the shortcut, you would expect an average 1 minute and 45 seconds net delay in the new situation. These calculations are equivalent to applying the equation for changing the mean with proper values of a and b. Let's now consider what happens if two random variables are added or subtracted. It turns out that the mean of random variables that are added or subtracted is simply the sum or difference of their individual means. And it doesn't even matter whether the variables are independent. For example, suppose that you would like to calculate the mean waiting time for a week. Then you could simply add up the mean waiting times for the individual days that you travel. Let me summarize what I hope you understood from this video. The mean, or expected value of a discrete random variable is the sum over all the values that the variable may take times their probabilities. If the random variable is changing through multiplication or addition by a constant, the mean is changing accordingly. And the mean of several random variables added together is the sum of their means, even if the different variables are not statistically independent.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.