Bayesian statistics made simple

Audience level:

Intermediate

Category:

Science

Description

An introduction to Bayesian statistics using Python. Bayesian statistics are usually presented mathematically, but many of the ideas are easier to understand computationally. People who know Python can get started quickly and use Bayesian analysis to solve real problems. This tutorial is based on material and case studies from Think Bayes (O’Reilly Media).

Abstract

Bayesian statistical methods are becoming more common and more important, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start.

I will present simple programs that demonstrate the concepts of Bayesian statistics, and apply them to a range of example problems. Participants will work hands-on with example code and practice on example problems.

Attendees should have at least basic level Python and basic statistics. If you learned about Bayes’s theorem and probability distributions at some time, that’s enough, even if you don’t remember it!

Attendees should bring a laptop with Python and matplotlib. You can work in any environment; you just need to be able to download a Python program and run it. I will provide code to help attendees get set up ahead of time.

Statistical inference with computational methods

Audience level:

Intermediate

Category:

Science

Description

Statistical inference is a fundamental tool in science and engineering, but it is often poorly understood. This tutorial uses computational methods, including Monte Carlo simulation and resampling, to explore estimation, hypothesis testing and statistical modeling. Attendees will develop understanding of statistical concepts and learn to use real data to answer relevant questions.

Abstract

Do you know the difference between standard deviation and standard error? Do you know what statistical test to use for any occasion? Do you really know what a p-value is? How about a confidence interval?

Most students don’t really understand these concepts, even after taking several statistics classes. The problem is that these classes focus on mathematical methods that bury the concepts under a mountain of details.

This tutorial uses Python to implement simple statistical experiments that develop deep understanding. Attendees will learn about resampling and related tools that use random simulation to perform statistical inference, including estimation and hypothesis testing. We will use pandas, which provides structures for data analysis, along with NumPy and SciPy.

Suppose I capture and tag 10 rock hyraxes. Some time later, I capture another 10 hyraxes and find that two of them are already tagged. How many hyraxes are there in this environment?

This is an example of a mark and recapture experiment, which you can read about on Wikipedia. The Wikipedia page also includes the photo of a tagged hyrax shown above.

As always with problems like this, we have to make some modeling assumptions.

1) For simplicity, you can assume that the environment is reasonably isolated, so the number of hyraxes does not change between observations.

2) And you can assume that each hyrax is equally likely to be captured during each phase of the experiment, regardless of whether it has been tagged. In reality, it is possible that tagged animals would avoid traps in the future, or possible that the same behavior that got them caught the first time makes them more likely to be caught again. But let's start simple.

My solution to this problem uses the computation framework from my book, Think Bayes. The framework is described in this notebook. If you have read Think Bayes or attended one of my workshops, you might want to attempt this problem before you look at my solution.

If you solve this problem analytically, or use MCMC, and you want to share your solution, please let me know and I will post it here.