Simulation Based Inference (SBI) With Extra-Occupational Students – Towards Data Science in Teaching

This post is based on joined work with Oliver Gansser, Matthias Gehrke, Bianca Krol, and Norman Markgraf.

The FOM is a private University of Applied Science in Germany for people studying while working. We are offering several, mainly economic related bachelor and master study programs in 29 study centers across Germany. The size of the courses with statistical content varies: from 15 to 150 students – or even more.

We used a relaunch of our BA degree in Summer 2016 to rethink and rebuild our curriculum in the different introductory statistics courses.

We used a relaunch of our BA degree in Summer 2016 to rethink and rebuild our curriculum in the different introductory statistics courses.

Inspired by Diez, Barr, & Çetinkaya-Rundel (2014), Ismay & Kim (2018) and Pruim, Horton, & Kaplan (2015) we introduced simluation based inference with help of the R package mosaic. As our program is desigend for extra-occupational students, we stress the application of statistical methods in the professional life of the students. We have chosen “mosaic approach” by Pruim, Horton and Kaplan because of its simple and coherent kind of literate programming approach for data analysis, as pointed out by Donald Knuth:

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

By the way: our introductory example is a triangle test – performed in a pub. This setting seems to be be quite engaging … Our didactical approach is build on interaction and activation; for example almost every 4th slide is a quiz or exercise.

As already pointed out, we run different courses at both Bachelor and Master levels in numerous study centers. Therefore we applied a modular lecture slide concept (for the different curricula in the different majors) via so called child chunks in RMarkdown, jointly developed in a GitHub repository. In our pretest we – and our students – were quite excited to find that the concepts of statistics are made more accessible – and can be applied to real-life problems.

But how can we convince our heterogeneous colleagues about SBI (and R)? In February 2018 we organized a central workshop which was also streamed in the internet where we introduced the concepts and teaching materials. There we presented all the arguments, like Hesterberg (2014) or Chance, Wong & Tintle (2016) and the results of our own pre-tests. It was nice to see that the concept was supported by both, those who focus on statistical theory and those who focus on (computational) application, but I think the final icebreaker was the song “The Bootstrap Begins” by Giles Hooker, available on CAUSEweb.org. With all these available resources we were starting to fly from a well-feathered nest. A big thank to all of you! One remaining problem for our students is that there is to my best knowledge so far no German textbook for SBI.

Of course there is still much room for improvement and evaluation and we are still learning about the misconceptions of our students (like e.g. bootstrapping is used for obtaining a normal distribution) but we think we are much better than the old formulary and pocket-calculator centered statistics lectures.

With the (very) basic statistical computing and the formula interface in R mosaic, we are paving the way to topics and concepts more related to data science, like data wrangling or algorithmic modeling (Breiman (2001)). In times of Big Data and Artificial Intelligence we feel obliged to teach very basic ideas of these rather new strains of thought – while not forgetting (or maybe even focus stronger on) the epistemological background and logical foundations of inference and probability.

We gratefully acknowledge that our work was supported by an internal teaching innovation grant by our institution.