Degrees of Freedom Tutorial

A lot of researchers seem to be struggling with their understanding of the statistical concept of degrees of freedom. Most do not really care about why degrees of freedom are important to statistical tests, but just want to know how to calculate and report them. This page will help. For those interested in learning more about degrees of freedom, take a look at the following resources:

I couldn’t find any resource on the web that explains calculating degrees of freedom in a simple and clear manner and believe this page will fill that void. It reflects my current understanding of degrees of freedom, based on what I read in textbooks and scattered sources on the web. Feel free to add or comment.

Conceptual Understanding

Let’s start with a simple explanation of degrees of freedom. I will describe how to calculate degrees of freedom in an F-test (ANOVA) without much statistical terminology. When reporting an ANOVA, between the brackets you write down degrees of freedom 1 (df1) and degrees of freedom 2 (df2), like this: “F(df1, df2) = …”. Df1 and df2 refer to different things, but can be understood the same following way.

Imagine a set of three numbers, pick any number you want. For instance, it could be the set [1, 6, 5]. Calculating the mean for those numbers is easy: (1 + 6 + 5) / 3 = 4.

Now, imagine a set of three numbers, whose mean is 3. There are lots of sets of three numbers with a mean of 3, but for any set the bottom line is this: you can freely pick the first two numbers, any number at all, but the third (last) number is out of your hands as soon as you picked the first two. Say our first two numbers are the same as in the previous set, 1 and 6, giving us a set of two freely picked numbers, and one number that we still need to choose, x: [1, 6, x]. For this set to have a mean of 3, we don’t have anything to choose about x. X has to be 2, because (1 + 6 + 2) / 3 is the only way to get to 3. So, the first two values were free for you to choose, the last value is set accordingly to get to a given mean. This set is said to have two degrees of freedom, corresponding with the number of values that you were free to choose (that is, that were allowed to vary freely).

This generalizes to a set of any given length. If I ask you to generate a set of 4, 10, or 1.000 numbers that average to 3, you can freely choose all numbers but the last one. In those sets the degrees of freedom are respectively, 3, 9, and 999. The general rule then for any set is that if n equals the number of values in the set, the degrees of freedom equals n – 1.

This is the basic method to calculate degrees of freedom, just n – 1. It is as simple as that. The thing that makes it seem more difficult, is the fact that in an ANOVA, you don’t have just one set of numbers, but there is a system (design) to the numbers. In the simplest form you test the mean of one set of numbers against the mean of another set of numbers (one-way ANOVA). In more complicated one-way designs, you test the means of three groups against each other. In a 2 x 2 design things seem even more complicated. Especially if there’s a within-subjects variable involved (Note: all examples on this page are between-subjects, but the reasoning mostly generalizes to within-subjects designs). However things are not as complicated as you might think. It’s all pretty much the same reasoning: how many values are free to vary to get to a given number?

Df1

Df1 is all about means and not about single observations. The value depends on the exact design of your test. Basically, the value represents the number of cell means that are free to vary to get to a given grand mean. The grand mean is just the mean across all groups and conditions of your entire sample. The cell means are nothing more than the means per group and condition. We’ll call the number of cells (or cell means) k.

Let’s start off with a one-way ANOVA. We have two groups that we want to compare, so we have two cells. If we know the mean of one of the cells and the grand mean, the other cell must have a specific value such that (cell mean 1 + cell mean 2) / 2 = grand mean (this example assumes equal cell sample sizes, but unequal cell sample sizes would not change the number of degrees of freedom). Conclusion: for a two-group design, df1 = 1.

Sticking to the one-way ANOVA, but moving on to three groups. We now have three cells, so we have three means and a grand mean. Again, how many means are free to vary to get to the given grand mean? That’s right, 2. So df1 = 2. See the pattern? For one-way ANOVA’s df1 = k – 1.

Moving on to an ANOVA with four groups. We know the answer if this is a one-way ANOVA (that is, a 4 x 1 design): df1 = k – 1 = 4 -1 = 3. However, what if this is a two-way ANOVA (a 2 x 2 design)? We still have four means, so to get to a given grand mean, we can have three freely varying cell means, right? Although this is true, we have more to deal with than just the grand mean, namely the marginal means. The marginal means are the combined cell means of one variable, given a specific level of the other variable. Let’s say our 2 x 2 ANOVA follows a 2 (gender: male vs. female) x 2 (eye color: blue vs. brown) design. In that case, the grand mean is the average of all observations in all 4 cells. The marginal means are the average of all eye colors for male participants, the average of all eye colors for female participants, the average of all genders for blue-eyed participants, and the average of all genders for brown-eyed participants. The following table shows the same thing:

Brown eyes

Blue eyes

Male

CELL MEAN
Brown eyed males

CELL MEAN
Blue eyed males

MARGINAL MEAN
of brown eyed males
and blue eyed males

Female

CELL MEAN
Brown eyed females

CELL MEAN
Blue eyed females

MARGINAL MEAN
of brown eyed females
and blue eyed females

MARGINAL MEAN
of brown eyed males
and brown eyed females

MARGINAL MEAN
of blue eyed males
and blue eyed females

GRAND MEAN

The reason that we are now dealing with marginal means is that we are interested in interactions. In a 4 x 1 one-way ANOVA, no interactions can be calculated. In our 2 x 2 two-way ANOVA, we can. For instance, we might be interested in whether females perform better than males depending on their eye color. Now, because we are interested in cell means differences in a specific way (i.e., we are not just interested in whether one cell mean deviates from the grand mean, but we are also interested in more complex patterns), we need to pay attention to the marginal means. As a consequence, we now have less freedom to vary our cell means, because we need to account for the marginal means (if you want to know how this all works, you should read up on how the sums of squares are partitioned in 2 x 2 ANOVA’s). It is also important to realize that if all marginal means are fixed, the grand mean is fixed too. In other words, we do not have to worry about the grand mean anymore for calculating our df1 in a two-way ANOVA, because we are already worrying about the marginal means. As a consequence, our df1 will not lose a degree of freedom because we do not want to get to a specific grand mean. Our df1 will only lose degrees of freedom to get to the specific marginal means.

Now, how many cell means are free to vary before we need to fill in the other cell means to get to the four marginal means in the 2 x 2 design? Let’s start with freely picking the cell mean for brown eyed males. We know the marginal mean for brown eyed males and blue eyed males together (it is given, all marginal means are), so I guess we can’t choose the blue eyed males cell mean freely. There goes one degree of freedom. We also know the marginal mean for brown eyed males and brown eyed females together. That means we can’t choose the brown eyed female cell mean freely either. And as we know the other two marginal means, we have no choice in what we put in the blue eyed females cell mean to get to the correct marginal means. So, we chose one cell mean, and the other three cell means had to be filled in as a consequence to get to the correct marginal means. You know what that means don’t you? We only have one degree of freedom in df1 for a 2 x 2 design. That’s different from the three degrees of freedom in a 4 x 1 design. The same number of groups and they might even contain the same observations, but we get a different number of degrees of freedom. So now you see that using the degrees of freedom, you can infer a lot about the design of the test.

You could do the same mental exercise for a 2 x 3 design, but it is tedious for me to write up, so I am going to give you the general rule. Every variable in your design has a certain number of levels. Variable 1 in the 2 x 3 design has 2 levels, variable 2 has 3 levels. You get df1 when you multiply the levels of all variables with each other, but with each variable, subtract one level. So in the 2 x 3 design, df1 would be (2 – 1) x (3 – 1) = 2 degrees of freedom. Back to the 2 x 2 design, df1 would be (2 – 1) x (2 – 1) = 1 degrees of freedom. Now let’s see what happens with a 2 x 2 x 2 design: (2 – 1) x (2 – 1) x (2 – 1) = still 1 degrees of freedom. A 3 x 3 x 4 design (I hope you’ll never have to analyze that one): (3 – 1) x ( 3 – 1) x (4 -1) = 2 x 2 x 3 = 12 degrees of freedom.

By now, you should be able to calculate df1 in F(df1, df2) with ease. By the way, most statistical programs give you this value for free. However, now you’ll be able to judge whether researchers have performed the right analyses in their papers to some extent based on their df1 value. Also, df1 is calculated the same way in a within-subjects design. Just treat the within-subjects variable as any other variable. Let’s move on to df2.

DF2

Whereas df1 was all about how the cell means relate to the grand mean or marginal means, df2 is about how the single observations in the cells relate to the cell means. Basically the df2 is the total number of observations in all cells (n) minus the degrees of freedoms lost because the cell means are set (that is, minus the number of cell means or groups/conditions: k). Df2 = n – k, that’s all folks! Say we have 150 participants across four conditions. That means we will have df2 = 150 – 4 = 146, regardless of whether the design is 2 x 2, or 4 x 1.

Most statistical packages give you df2 too. In SPSS, it’s called df error, in other packages it might be called df residuals.

For the case of within subjects-designs, things can become a bit more complicated. The following paragraphs are work in progress. The calculation of df2 for a repeated measures ANOVA with one within-subjects factor is as follows: df2 = df_total – df_subjects – df_factor, where df_total = number of observations (across all levels of the within-subjects factor, n) – 1, df_subjects = number of participants (N) – 1, and df_factor = number of levels (k) – 1. Basically, the take home message for repeated measures ANOVA is that you lose one additional degree of freedom for the subjects (if you’re interested: this is because the sum of squares representing individual subjects’ average deviation from the grand mean is partitioned separately, whereas in between-subjects designs, that’s not the case. To get to a specific subjects sum of squares, N – 1 subject means are free to vary, hence you lose one additional degree of freedom).

Conclusion

You should be able to calculate df1 and df2 with ease now (or identify it from the output of your statistical package like SPSS). Keep in mind that the degrees of freedom you specify are those of the design of the effect that you are describing. There is no such thing as one set of degrees of freedom that is appropriate for every effect of your design (although, in some cases, they might seem to have the same value for every effect).

Moreover, although we have been discussing means in this tutorial, for a complete understanding, you should learn about sums of squares, how those translate into variance, and how test statistics, such as F-ratio, work. This will make clear to you how degrees of freedom are used in statistical analysis. The short functional description is that, primarily, degrees of freedom affect which critical values are chosen for test statistics of interest, given a specific alpha level (remember those look-up tables in your early statistics classes?).

Why do we use n-1 degrees of freedom? Computing the mean uses all observations, divided by n. The mean is then used in computing sum of squares (the mean needs to be known, otherwise you can’t compute sum of squares). That fixes one number (the mean), and therefore you lose one degrees in freedom in computing sum of squares. If the mean is known, n-1 observations are free to vary. The last one no longer gets to be freely picked to get to a given mean.

Great article! Easy to understand! I am currently examining a PhD student’s thesis and found the student does not have a good understanding of different ANOVA analysis, as the dfs he reported do not match his ANOVA design. I will definitely recommend this article to him.

Only concern I have is –
In the case of within subjects-designs (the paragraph just above the CONCLUSION), regarding “df_subjects = number of participants (N) – 1”, I think the N should be better defined as “number of groups of subjects” or “number of levels of subjects”, as number of participants can be confused with the total number of subjects.

Thanks a lot for the explanation. Learnt a lot in university, training of Six sigma but I always were wondering that what degree of freedom is, I read this article 1 year ago. But now read it again and still feel very useful and interesting.

“Why do I care?” So I understand what the concept is. What does it mean to me when I am considering my ANOVA? For instance, I understand WHY I care about significance or the values I receive from a t-test or various coefficients.

I don’t understand WHY I care about DF. It’s a value and it is meaningless to me.

You care because the p-value you compute depends on the degrees of freedom. The F-value is the ratio of between-group and within-group variance (or more precisely, sum of squares). The higher the ratio, the more between-group variance relative to within-group variance and so the less likely that you’ll observe your data (or more extreme data) given that the null hypothesis is true. The more degrees of freedom (more participants, more observations), this becomes even less likely and that’s why the same F value will produce smaller p-values with higher degrees of freedom.

Try and look up the p-value of an F-value without knowing the degrees of freedom; it’s impossible.

Really good explanation in terms of calculation (the best I have seen).

I still struggle with the purpose. Using the ANOVA example its very clear to easy report differences between groups when we look at the P value (and go on to do post hoc etc) . However reports that I have been exposed to do not interpret the F value nor the df… for instance testing eye colour (IV, 3 levels) against IQ (DV) …

if we compare results of 2 completely separate studies with different numbers of people.

assuming both studies showed differences that were significant … what would the difference be between these F value and df results … F(2,197) = 5.55 and F(2,997) = 25.5 ?

The F values are important to determine significance jointly with the degrees of freedom. From the degrees of freedom in the F-tests you mention I can see that the second result is based on more observations. Given that larger sets of observations are better at estimating the true population parameters, I would update my belief about the effect you are testing more for the F(2,997) test than for F(2,197). Moreover, the same F value will yield lower p-values given larger df2. In other words: when you test small effects, you need more observations to have the power to detect the effects. However, this quickly moves the discussion from degrees of freedom to power and sample size.

thanks ron. i am a master degree student in cameroon(africa) still strungling to write my theses.there are many things that confuse me when deeling with statistics. DF is one of such.let me say i have understood 50%.now, my problem is HOW is it helpfull?HOW can i explain DF to an ordinary man.HOW DOES IT EXPLAIN A SITUATION?

It is helpful if you want to compute a statistic. If an ordinary man is interested in it he/she should get into statistics which are more complex than you can explain in a single sentence.

The easiest explanation would be that in order to compute whether a difference between two conditions of at least the sampled size would occur if in reality there is no difference, you need to estimate the within-condition and between-condition variance. With those two ingredients you can compute a t, F, p-value. However, we know that the estimation of both variances is systematically biased, unless you account for the degrees of freedom. If you want to know why, you would need to read old statistics texts, but I am fine with assuming that this is true as I don’t want to be a statistician. There’s no shorter way to say it.

No, you can choose whatever numbers as the first two, the third is the only one that then will have to be set according to a specific number to make the mean have a certain value. The only constraints you have for choosing the values are introduced if you want the values to be realistic. In that case they would be dependent on the measurement level and value range of your measurement. However, that is irrelevant for the degrees of freedom.

Thanks so much for your great tutorial! always assumed I am useless with numbers and avoided them, but picked up a study which requires me to do stats and this has been no end of help! will be bookmarking this site!!

I have a question. If there are 7 roads and 4 paints and each road is marked with each of the 4 paints and a total of(7*4=28) brightness are recorded. I am doing a 2 way ANOVA on this data. df1 is (7-1)+(4-1) =9 right? and df2=28-(7+4)=17 right?..the total mean, means across each of the 4 groups and also means across each of the 7 groups are know. When I am doing the analysis using SAS I get df1=9 but df2=18(not 17). I am confused. Please help. BTW Nice article.

Hi Prof Ron,
Thanks for simplifying undertanding of df. What is the minimum total df allowed to have confidence in P or F test? That is if someone show a skeleton ANOVA that show a total df (n-1) = <20

This depends on the theoretical effect size tested with the ANOVA. The number of observations (or dfs) needed to detect an effect can be computed with power analysis. I would not trust between subjects ANOVAs that show an effect with small sample sizes (like N = 20) if the theoretical effect is supposed to be small (even though the observed effect has to be large in order to be significant with such a small sample size). Hope this helps.

Fractional degrees of freedom are usually a result of a correction, for instance when correcting for non-sphericity using Greenhouse Geisser, or when using Welch’s t-test instead of Student’s t-test. The interpretation of the degrees of freedom is not straightforward then.

A z-test is based on the z distribution, which in contrast to a t-distribution or F-distribution takes no degrees of freedom as parameters. The shape of the distribution is always the same, it’s a normal distribution. The shape of the t and F distributions change as the parameters (the degrees of freedom) change. You can also see this in the critical value for significance, z > 1.96 is always significant (with alpha = 0.05), but whether a t-value is significant depends on the degrees of freedom (the higher the degrees of freedom the lower t may be to be called significant). It’s just the way the distributions are defined (and turn out to be useful).