The coefficient of determination is an important quantity obtained from regression analysis. In this lesson, we will show how this quantity is derived from linear regression analysis, and subsequently demonstrate how to compute it in an example.

Pizza!

Do you have a favorite pizza place? Let's just suppose you want to find out how additional pizza toppings affect the total cost of a pizza across all the different pizza places in your city. To do this you pick up the phone and start calling all the different pizza places, writing down the total cost of the pizza with one, two, three, etc., toppings on it at each place.

Once you are done, you will need to fit your data with an equation and, just as importantly, find out if your mathematical model for the data is a good fit.

Coefficient of Determination Derived

In this lesson, we will talk about a statistical construct that is used to estimate the predictive power of you model. The coefficient of determination denoted as big R2 or little r2 is a quantity that indicates how well a statistical model fits a data set. In mathematical terms, it specifies how much of the variation in the dependent variable y is characterized by a variation in the independent variable x.

You may be wondering what r is, since we only defined r2. You can think of the correlation coefficient denoted as big R or little r as a measure of the statistical relationship between x and y. As the focus of this lesson is the coefficient of determination, just remember that r stands for the correlation coefficient, simple as that.

Okay, let's do a simple derivation of the coefficient of determination. In the image, you see we start with plot containing a set of points, x and y, in which we assume there is a linear relationship between the x and y variables. Note that this linearity assumption is made to simplify the derivation and that a similar process can be used for non-linear models.

Shown is a plot with three sample points. We now try to find the regression line, which a line of best fit for the data points. The line in green shows one attempted line of best fit.

We can simplify this line by the equation y = mx + b, which is the standard equation for a line. To calculate the sum of the squared errors between each data point and our line of best fit, we perform the follow computation:

In this equation the term SSEreg line stands for the square sum of errors from the regression line.

Our next step is to find out how the y value of each data point differs from the mean y value of all the data points. In particular we need to compute the sum of the squares of these differences to the right of the equals sign, as shown below.

The term SSEmean y line stands for squared sum of errors from the mean y value.

We now have everything we need to compute the coefficient of determination, as you can see below.

Coefficient of Determination Computed

Let's do an example together, to solidify everything I just covered as it's probably a bit confusing. Suppose we are given the following data set you see in this table.

x

y

70

3

82

10

88

12

93

16

105

21

115

45

How do we calculate the determination coefficient in this case?

We can start by calculating the correlation coefficient using the following formula:

Here is a data table with the calculated values with n being the sample size of 6.

n=6

x

x2

y

y2

xy

70

4900

3

9

210

82

6724

10

100

820

88

7744

12

144

1056

93

8649

16

256

1488

105

11025

21

441

2205

115

13225

45

2025

5075

sums:

553

52267

107

2975

10954

Plugging in these values into the equation for little r, I just gave you, we get r = 0.92782. To compute the coefficient of determination, all we need to do is square r. Doing so we arrive at r2 = 0.8609. You can now see a visual representation of all of this.

Now try rewinding back to the data set and solving for r and r2 by yourself, just for fun and practice.

Lesson Summary

Since we did cover quite a bit, I think it's time we recap everything, no? In this lesson we have learned about the coefficient of determination in the context of linear regression analysis. This quantity, designated as big R2 or little r2, indicates how well a statistical model fits a data set.

In addition, recall that the correlation coefficient, denoted as R or r, is a measure of the statistical relationship between x and y. To derive the coefficient of determination it is necessary to start with a simple dataset and make an attempt to draw the line of best fit, subsequently observing the errors between the regression line and each data point, as well as the errors of the y coordinates of each point and the mean y value. We can come up with an expression for the coefficient of determination. Furthermore, we have seen an example of computing the coefficient of determination, by first calculating the correlation coefficient and then squaring it.

Summary:

Earning College Credit

Did you know… We have over 200 college
courses that prepare you to earn
credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the
first two years of college and save thousands off your degree. Anyone can earn
credit-by-exam regardless of age or education level.

Students Love Study.com

Earn College Credit

Over 65 million users have prepared for
{{displayNameByProductKey[registrationData.product || cocoon]}} and other
exams on Study.com

Teachers Love Study.com

"The videos have changed the way I teach! The videos on Study.com accomplish in
5 minutes what would take me an entire class."

- Chris F.

Teachers Love Study.com

"It provides a quick and engaging way to cover material needed to understand readings we are covering in class."

Teresa P.

Ohio, United States

"It provides a quick and engaging way to cover material needed to understand readings we are covering in class."

Teresa P.

Ohio, United States

"A teacher friend recommended Study.com for social studies. I enjoy assigning the videos to my students. The videos are short, to the point, and the quiz allows me to test their knowledge on whatever subject in social studies I am teaching at the time."

Nancy A.

Ohio, United States

"Every time I have searched for a lesson, there has been a perfect match to my needs as a middle school teacher of science, and algebra."

Kathy S.

New Jersey, United States

"Your lessons are very well developed, especially the videos that use analogies for scientific phenomena. Great way to memorize science concepts."

Lusy D.

California, United States

"I love the way the lessons are laid out in small chunks with quizzes to make sure you understand a concept before moving on. Excellent!"

Brandy K.

"I am a 7th-grade teacher and often use it for language arts and world history. The students find it quite engaging. On a professional note, it has helped me pass 2 out of the for 4 Single Subject CSET English Exams! Now I am using it to help me pass the last 2 subtest exams."

Scott S.

California, United States

"As a math/science tutor I find these lessons extremely helpful when introducing concepts to my students or reinforcing what they have been taught."

Tim H.

Barbados

"I like that students can retake quizzes until they achieve a perfect score. I also like the
ability to create "guided note templates" from the transcripts of each video lesson."

Jaime B.

Teacher, High School 9-12 Computer Science

West Plains, MO

Over 65 million users have prepared for
{{displayNameByProductKey[registrationData.product || cocoon]}} and other
exams on Study.com

Over 65 million users have prepared for
{{displayNameByProductKey[registrationData.product || cocoon]}} and other
exams on Study.com