Each person was given a sample of the four waters at the beginning of the test, and told which one was which, so they knew how each water tasted. At any time during the test, they were allowed to go back to the samples and re-taste them.

After tasting each sample, they were given 12 unmarked cups of water, and asked to select the correct water based upon its taste and smell. Each of the four water brands were provided three times in the study (12 cups total, see image below).

12

11

10

9

8

7

1

2

3

4

5

6

The correct answer, along with the answer for each of the three testers are displayed below in Table 1.

Table 1. Correct and Chosen Answers for Water Test

Cup #

Actual

Tester #1

Tester #2

Tester #3

% Correct

1

Generic

Generic

Tap

Fiji

33%

2

Tap

Zephyrhills

Generic

Tap

33%

3

Fiji

Fiji

Fiji

Generic

67%

4

Zephyrhills

Fiji

Fiji

Generic

0%

5

Fiji

Tap

Tap

Zephyrhills

0%

6

Tap

Zephyrhills

Zephyrhills

Tap

33%

7

Generic

Fiji

Fiji

Zephyrhills

0%

8

Zephyrhills

Tap

Generic

Fiji

0%

9

Tap

Tap

Tap

Zephyrhills

67%

10

Generic

Generic

Generic

Generic

100%

11

Fiji

Generic

Zephyrhills

Zephyrhills

0%

12

Zephyrhills

Fiji

Fiji

Zephyrhills

33%

Overall

42% (4)

33% (3)

42% (4)

8% (1)

Having each brand show up more than once allows us to test how repeatable each tester is. In other words, if one tester correctly chooses the Fiji water the first time, but chooses it incorrectly the other two times, then it shows that the first selection may have been more of a lucky guess, rather than strong evidence that the tester could differentiate between the water.

In order to apply statistical analysis to this experiment, we used Minitab’s Attribute Agreement Analysis test. For those of you not familiar with this technique, it is a method for determining how well different people can select the correct answer from a list of choices.

Here is the Minitab Analysis of the results, summarized to highlight the key points

* NOTE * Single trial within each appraiser. No percentage of assessment agreement within appraiser is plotted.

To summarize the analysis above, the numbers in bold are the Kappa values. A kappa value greater than 0.7 is considered acceptable, meaning that our testers are able to adequately select that brand from the rest of them. As you can see, there are no brands with kappa values greater than 0.7, therefore we conclude that with an overall kappa value of 0.067, the testers are not able to determine a difference between the brands of water. In fact, since some of the values were close to zero, it means that they could have done just as well if they guessed (random chance), than actually tasting the water and making a selection. The brands highlighted in red were actually below zero, which means that they were worse than random chance, so the testers would have done better by simply guessing. Bottom line: Stop buying bottled water, just reuse your water bottles by filling them up with filtered tap water (not recommended for long term use). Not only will it help your own pocketbook, but you’ll help the environment, by preventing the creation of new bottles and reduce the transportation costs associated with getting the bottles to your local store.

Conclusion: So how is this study applicable to your company? Most processes collect some kind of data, and typically there are codes that get assigned to designate the type of transaction, type of defect, or some other reason. Without validating the ability of the people to correctly classify these codes into the right buckets, there is a possibility that the codes are being incorrectly used, and people are misinformed on what is really going on in the process.

Let’s say you are collecting data on reasons for late payments from your customers. You generate a report that shows the Top 5 reasons for late payments.

Reason

Percentage

Missing Paperwork

33%

Problem with Service Provided

25%

No Reason Provided by Customer

18%

Wrong Information on Invoice

13%

Wrong Amount on Invoice

5%

Naturally, you would start working on the “Missing Paperwork” category, but you are assuming that you have a good measurement system that is correctly coding these late payments into the correct defect code. The only way to know is by performing an Attribute Agreement Analysis. If it does not pass (poor Kappa values), then you must conclude that the defect codes are not accurate, and must be further clarified in order to get a “true” picture of which issue to focus on.

Let’s assume that your coding criteria is clarified for your people, and the data is cleaned up with this criteria. Now let’s look at the Top 5 issues…

Reason

Percentage

Wrong Information on Invoice

42%

Missing Paperwork

23%

Problem with Service Provided

15%

No Reason Provided by Customer

12%

Wrong Amount on Invoice

5%

As you can see, the order of reasons has changed after the criteria was improved, so now I can correctly go out and investigate why there is “Wrong Information on Invoice” instead of the previous problem of “Missing Paperwork”

Attribute Agreement Analysis allows you to have confidence that your attribute (coding, pass/fail) data is accurate, so you make good decisions and prioritize your efforts in the right direction.

Subscribe to our mailing list for updates on training, presentations, site updates, events, news and other relevant information about Lean Six Sigma and the Environment. Text "EARTHCON" to 33733 or enter your email address below