ANOVA

About this lesson

The Analysis of Variance (ANOVA) test is a commonly used test in Lean Six Sigma projects. It allows the comparison from multiple data sets to determine whether there is a statistical difference in those data sets. The analysis can be easily done in both Excel and Minitab. This course addresses the basic of ANOVA.

Exercise files

Quick reference

ANOVA

ANOVA is a hypothesis test for comparing the means across multiple samples to determine if they are statistically equivalent.

When to use

The ANOVA tool is widely used in Lean Six Sigma. It is the tool that is used in Gage R&R studies and with Design of Experiments. However, with respect to hypothesis testing, ANOVA is used to test for the equivalence of means across multiple samples when either the X or Y is discrete and the other is continuous.

Instructions

ANOVA stands for ANalysis Of VAriance. It tests the means of multiple samples to determine their equivalence. Unfortunately, when the P Value is low and the Null hypothesis is rejected, the ANOVA does not specifically identify which sample was different. A further study of the data, or in the case of Minitab, the Boxplots, is needed to determine which sample is different.

The ANOVA function performs the same analysis as a Two-sample T Test. When there are only two samples, either hypothesis test can be used. However, when there are more than two samples, the ANOVA should be used. Multiple T Tests could be performed with every combination of samples, but each of those would be susceptible to a Type I Error. When doing multiple tests, the errors begin to compound.

Excel and Minitab can both calculate ANOVA for one or two Y variable. Minitab can also calculate an ANOVA with more than two variable.

Excel – single Y variable

Data Analysis > ANOVA Single Factor

Enter data range, data must be in adjacent columns and each column is a sample set of data.

Excel – two Y variables

Data Analysis > ANOVA Two Factor without Replication

Enter data range, data must be in adjacent columns and each column is a sample set of data

Minitab – single Y variable

Stat > ANOVA > One Way

Select the format of your data and then the data columns

With the Option button you can change the relationship and you can change the assumption of equal variances (based upon results of the Bartlett’s test).

With the graphs button you can select the graph of your choice to visualize the comparison of the mean values.

Minitab – multiple Y variables

Stat > ANOVA > General Linear Model > Fit General Linear Model

Select your Y Response variables

Select you X Factor variables

With the Model button, interaction between factors can be added as another variable.

Hints & tips

If your analysis indicates you should reject the Null hypothesis, rerun the analysis after dropping the data column that is the farthest from the other mean values.

ANOVA is rather forgiving on the Normality assumption.

00:05Hello, I'm Ray Sheen.

00:06We're now going to look at the ANOVA test.

00:08This test becomes strongly linked to lean six sigma because it is used in so

00:13many of the advanced lean six sigma techniques.

00:16Again we start with hypothesis testing decision tree.

00:20We have normal data discrete and continuous x and y and

00:24multiple variables, that get us to the ANOVA.

00:28So let's look at the one way ANOVA.

00:30ANOVA stands for Analysis of Variance.

00:33We compared the means from each of the samples in the analysis to determine if

00:37one of the means is statistically different from the other.

00:41In that regard, it is similar to the 2-Sample T-Test, but

00:44instead of just two samples, it can have many samples.

00:48It is often used to identify significant subsets of the data in large populations.

00:53That is the feature that is used in both the gauge R&R analysis, and

00:57the design of experiments analysis.

00:59It can sort through the many different data points based upon the categories

01:03being used.

01:03And determine which if any provide a result that is statistically different

01:07from the others.

01:09You may be wondering why we would use ANOVA when we could do the same thing with

01:13T-Tests.

01:15Well, the problem is that you can't get the same fidelity in the answer

01:18by using multiple T-Tests.

01:20Keep in mind that with our normal Lean Six Sigma confidence level of 0.95,

01:25there is still a 5% chance for a type I error of false positive.

01:29Now if we have four samples, we would need to conduct six tests.

01:33So the probability of a type I error is compounded.

01:37That means that our chance of making a type I error is now 26.5%.

01:42And keep in mind that the Type II error probabilities often even higher than

01:46the Type I error.

01:47So the chance of the Type II error is often even higher still.

01:51The more T-Tests we run, the more likely we are to make a wrong decision,

01:55either Type I or Type II.

01:58However, with ANOVA is it only one test that needs to be run and

02:02that will check all those relationships between the sample means.

02:06In addition to that,

02:07ANOVA is somewhat forgiving on the assumption of normal data.

02:11It will tolerate a low level of non normal behavior and

02:14still provide excellent results.

02:16So let's look at how we do this test.

02:19In Excel, select the data analysis menu from the data ribbon then

02:23select ANOVA single factor.

02:25Annotate whether the data is grouped by rows or columns and provide the range for

02:30your data table.

02:31The rows or columns must be next to each other and

02:34there can not be any blank rows or columns in the data.

02:37Excel will calculate a P-value.

02:39In this example, the P-value of 0.22 is less than our 0.05,

02:44so we eject the null hypothesis.

02:47The mean values are statistically different.

02:49Minitab will go one step further in testing using ANOVA.

02:53Start in a similar manner, go to the Stat pull down menu and select ANOVA,

02:58then select One Way.

02:59Select the format of your data columns like you did on other test and

03:03then select the data column for analysis.

03:06Then go to the Option panel to select equal variances if you have that

03:10condition.

03:11Also, you can go to the Graphs panel to select the type of graphs you want.

03:16I recommend the interval plot under data plots and the three in one residual plot.

03:20Minitab will provide both plots and

03:23a summary of the analysis in the session window.

03:26As you can see,

03:27one of the items it provides is a P-value which is still 0.022.

03:32Let's take a minute to look at the graphs that we get from Minitab.

03:36The Minitab and Excel P-value will tell us if there's a statistically

03:40significant difference in the means.

03:42But they don't tell you, which sample was the problem.

03:45That is where the graphs add value.

03:47A quick glance at the graphs will normally reveal the difference.

03:51As I mentioned on the last slide, select the Graphs button,

03:54then select both the data graph and residual graphs.

03:58Personally, I like the interval plot, but if a different view works better for you,

04:02then use it.

04:03You can select individual residual plots or do the three in one or

04:07four in one option, depending on the type of test.

04:10So let's look at the interval plot, we have four different categories of data and

04:14the mean was definitely changing between the different categories.

04:18Time 1 and time 4 don't even overlap at the edges of their confidence interval.

04:23So they are clearly not from the same population.

04:26We'll probably need to look at other contextual information to understand

04:30possible reasons for

04:31these differences, that could include looking at the residual plot.

04:35The normality line does not look normal and

04:38we would expect that since there are different populations in the data.

04:41This is a typical reason for non-normality and

04:45we see that the histogram were heavily excude.

04:47Again, an indication that the data is not normal.

04:50It's interesting to see that the verses fit data shows that the range of

04:55residuals was similar, except for that one outlier in the sample on the far right.

05:00That point is up at the upper right hand corner of the plot.

05:03But since the residuals versus fit is similar in other respects,

05:07there is probably not a time-based pattern at work.

05:10But rather, some other attribute that is not captured in the data that is affecting

05:15each of these samples.

05:16So far, we've been trying to evaluate one factor with multiple samples.

05:21However, we can also have multiple factors when working with ANOVA.

05:25Both Excel and Minitab can run an ANOVA with two response factors and

05:29in fact Minitab can work with even more than two.

05:32In Excel, select the Data Analysis menu on the data ribbon and

05:36then select ANOVA Two Factor without replication.

05:40We'll talk about replication when we do the class on the designer experiments.

05:45Now enter the data range in the same way you would with one way ANOVA.

05:48The Excel results will provide a P-value for both column analysis and row analysis.