Interested in using digital technologies for teaching and learning - you've come to the right place!

Sunday, 20 September 2015

Understanding the Effects of Coding on Mean and Standard Deviation

Calculating summary stats from coded data has the potential to be to one of the driest topics in statistics - we're talking Weetabix without milk. I referred to my text book for some inspiration;

"Coding is useful as it simplifies the arithmetic of calculating the mean"

Well that's sold it! Please, we live in the 21st century, we have calculators.

The main purpose of teaching coding data in S1 should be to develop a deeper understanding of measures of central tendency and spread.

Is the standard deviation and mean of a data set sensitive to the units of the data?

If I have two sets of data that have been recorded using different datums how can I expect this to affect the summary stats?

These are the sort of questions that I want my students to be able of answer - they're not really about churning through calculations; they're about understanding. In addition I want this lesson to lay the groundwork for seeing how and why a data set can be transformed and modeled using a standard normal distribution which is an important later lesson.

I made this sketch to demonstrate what is happening when a data set is transformed/coded.

The $x_i$ data points can be moved around. Just to recap the concepts of mean and standard deviation try moving a point or two and asking the questions How will increasing the value of $x_4$ affect the mean? What about the standard deviation?
You can the start to ask questions about what will happen if the parameter's b or a are changed so that the data set is coded and then use the sketch to examine any predictions by checking the $y_i$, $\bar{y}$ and $σ_y$ boxes. Although I haven't included it on the sketch, don't forget to bring in how variance scales with respect to the standard deviation.

Some nice additional questions to ask are; Will scaling the data always affect the mean? Can you give me an example of a set of data points where changing the parameter $a$ will not affect the mean? Does the data set have to be symmetrical for this to be the case?
I'm not saying this sketch is going to set your coding lesson on fire but some students find coding quite confusing and I have found this visual certainly helps out. Give it a go!