If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.

You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.

Develop your understanding of probability and statistics by writing and testing code

Run experiments to test statistical behavior, such as generating samples from several distributions

Use simulations to understand concepts that are hard to grasp mathematically

Learn topics not usually covered in an introductory course, such as Bayesian estimation

Import data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics tools

Use statistical inference to answer questions about real-world data

Chapter 1 Statistical Thinking for Programmers

Do First Babies Arrive Late?

A Statistical Approach

The National Survey of Family Growth

Tables and Records

Significance

Glossary

Chapter 2 Descriptive Statistics

Means and Averages

Variance

Distributions

Representing Histograms

Plotting Histograms

Representing PMFs

Plotting PMFs

Outliers

Other Visualizations

Relative Risk

Conditional Probability

Reporting Results

Glossary

Chapter 3 Cumulative Distribution Functions

The Class Size Paradox

The Limits of PMFs

Percentiles

Cumulative Distribution Functions

Representing CDFs

Back to the Survey Data

Conditional Distributions

Random Numbers

Summary Statistics Revisited

Glossary

Chapter 4 Continuous Distributions

The Exponential Distribution

The Pareto Distribution

The Normal Distribution

Normal Probability Plot

The Lognormal Distribution

Why Model?

Generating Random Numbers

Glossary

Chapter 5 Probability

Rules of Probability

Monty Hall

Poincaré

Another Rule of Probability

Binomial Distribution

Streaks and Hot Spots

Bayes’s Theorem

Glossary

Chapter 6 Operations on Distributions

Skewness

Random Variables

PDFs

Convolution

Why Normal?

Central Limit Theorem

The Distribution Framework

Glossary

Chapter 7 Hypothesis Testing

Testing a Difference in Means

Choosing a Threshold

Defining the Effect

Interpreting the Result

Cross-Validation

Reporting Bayesian Probabilities

Chi-Square Test

Efficient Resampling

Power

Glossary

Chapter 8 Estimation

The Estimation Game

Guess the Variance

Understanding Errors

Exponential Distributions

Confidence Intervals

Bayesian Estimation

Implementing Bayesian Estimation

Censored Data

The Locomotive Problem

Glossary

Chapter 9 Correlation

Standard Scores

Covariance

Correlation

Making Scatterplots in Pyplot

Spearman’s Rank Correlation

Least Squares Fit

Goodness of Fit

Correlation and Causation

Glossary

Colophon

Allen B. Downey

Allen Downey is an Associate Professor of Computer Science at the Olin College of Engineering. He has taught computer science at Wellesley College, Colby College and U.C. Berkeley. He has a Ph.D. in Computer Science from U.C. Berkeley and Master’s and Bachelor’s degrees from MIT.

Excellent book for those who want to learn statistics. Even if you are not yet a great programmer, you will find the content accessible and will be able to master it through the examples and exercises.

I bought this book hoping for a readable introduction to statistics. It's written very much for a classroom scenario, however. Its explanation of concepts is quite poor, and there is little practical grounding. I don't feel like it's thought me about concepts I can go on to use in other areas.

Statistics gets a little respect in Operations research, in part because it gets taught as a bunch of formulas and computer procedures. And the problem with the way that it is taught is that the formulas don't mean anything, and the student may know her way around menus, but that does not mean that she knows under what circumstances to use what method. And everything is learned in isolation, often without practice in getting her hands dirty. Think Stats gives students the chance to get their hands dirty.

Because it uses a programming language (Python) it covers data analysis from beginning to end: viewing data, calculating descriptive statistics, identifying outliers, describing data using the distributions (and explaining what the distributions really mean!). Going through this small book, the goal is understanding and using statistics, not just learning statistics. I have a number of college undergraduate students working on projects. I have started giving them this to work on when they first start with me, both for the programming in Python and to learn statistics and data analysis so they can be useful.

I received a free electronic copy of Think Stats from the O'Reilly Blogger review program.