Simulating conditional probability - looping versus vectorization

Conditional probability can be tricky if you're new to the topic, or if you've got a particularly complex question/problem. One particularly useful trick is to simply simulate your probability with some general programming language.

Let's look at a very simple example. Say we roll a fair six-sided die, and flip coins equal to the number rolled. What is the probability we flip exactly 2 heads?

A die is modeled well with a uniform random variable $N \sim \text{unif}(1,6)$. A series of coin flips is modeled with a binomial random variable $X \sim \text{binom}(n, 0.5)$ where $n$ is the number of coin flips, and $0.5$ is the probability of a fair coin landing on heads.

If we didn't know how to approach this problem initially, it's relatively straightforward to do so in code. A simulation is usually just a loop, where we perform the same action over and over. When we're dealing with probabilities that can be different from one run to another, running multiple iterations helps us approach the true probability.

Let's try this. We'll create a loop, and in each iteration we will generate a random number between 1 and 6 to simulate the die roll, and then using that we'll sample from a binomial distribution to represent our coin flips.

I run this with 10 million iterations, which is far more than needed to approximate this value. The reason is not just to approximate the answer, but to show that this is a relatively slow approach, it took 38 seconds total. We can do a lot better by using vectorized operations (in Python we do this with numpy, but other languages like Juila, Matlab, and R support these natively).

Let's look at the same 10 million iterations, this time instead of a loop we build an array of 10 million random samples from a uniform distribution, and pass them directly into the binomial function, which can use this array immediately.

This is not the most complex example, but it's illustrates the point of vectorized versus looped operations. When you're working with a library like numpy, or tensorflow, or in any language that makes arrays a first class citizen your goal should always be to compute with vectorized operations. Your free time will thank you for it, and usually as a bonus your code will be more concise and readable. This tends to be one of the first things I end up teaching to any math and statistics friends who are trying their hand at programming, I feel like it's always a good thing to explain to those new at numerical programming.