Notes on math, coding, and other stuff

Tag: probability

Last month, when I traveled to Hong Kong, I stayed at a guesthouse in a place called the Chungking Mansions. Located in Tsim Sha Tsui, it’s one of the most crowded, sketchiest, and cheapest places to stay in Hong Kong.

Chungking Mansions in Tsim Sha Tsui

Of the 17 floors, the first few are teeming with Indian and African restaurants and various questionable businesses. The rest of the floors are guesthouses and private residences. One thing that’s unusual about the building is the structure of its elevators.

The building is partitioned into five disjoint blocks, and each block has two elevators. One of the elevators only goes to the odd numbered floors, and the other elevator only goes to the even numbered floors. Neither elevator goes to the second floor because there are stairs.

Elevator Schematic of Chungking Mansions

I lived on the 14th floor, and man, those elevators were slow! Because of the crazy population density of the building, the elevator would stop on several floors on the way up and down. Even more, people often carried furniture on the elevators, which took a long time to load and unload.

To pass the time, I timed exactly how long it took between arriving at the elevator on the ground floor, waiting for the elevator to come, riding the elevator up, and getting off at the 14th floor. After several trials, the average time came out to be about 4 minutes. Clearly, 4 minutes is too long, especially when waiting in 35 degrees weather without air condition, so I started to look for optimizations.

The bulk of the time is spent waiting for the elevator to come. The best case is when the elevator is on your floor and you get in, then the waiting time is zero. The worst case is when the elevator has just left and you have to wait a full cycle before you can get in. After you get in, it takes a fairly constant amount of time to reach your floor. Therefore, your travel time is determined by your luck with the elevator cycle. Assuming that the elevator takes 4 minutes to make a complete cycle (and you live on the top floor), the best case total elevator time is 2 minutes, the worst case is 6 minutes, and the average case is 4 minutes.

It occurred to me that just because I lived on the 14th floor, I don’t necessarily have to take the even numbered elevator! Instead, if the odd numbered elevator arrives first, it’s actually faster to take the elevator to the 13th floor and climb the stairs to the 14th floor. Compared to the time to wait for the elevator, the time to climb one floor is negligible. I started doing this trick and timed how long it took. Empirically, this optimization seemed to speed my time by about 1 minute on average.

Being a mathematician at heart, I was unsatisfied with empirical results. Theoretically, exactly how big is this improvement?

Let us model the two elevators as random variables and , both independently drawn from the uniform distribution . The random variables represent model the waiting time, with 0 being the best case and 1 being the worst case.

With the naive strategy of taking the even numbered elevator, our waiting time is with expected value . Using the improved strategy, our waiting time is . What is the expected value of this random variable?

For two elevators, the solution is straightforward: consider every possible value of and and find the average of . In other words, the expected value of is

Geometrically, this is equivalent to calculating the volume of the square pyramid with vertices at (0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 0), and (1, 1, 1). Recall from geometry that the volume of a square pyramid with known base and height is .

Therefore, the expected value of is , which is a 33% improvement over the naive strategy with expected value .

Forget about elevators for now; let’s generalize!

We know that the expected value of two uniform random variables is , but what if we have n random variables? What is the expected value of the minimum of all of them?

I coded a quick simulation and it seemed that the expected value of the minimum of n random variables is , but I couldn’t find a simple proof of this. Searching online, I found proofs here and here. The proof isn’t too hard, so I’ll summarize it here.

Lemma: Let be the c.d.f for , where each is i.i.d with uniform distribution . Then the formula for is

Proof:

Now to prove the main claim:

Claim: The expected value of is

Proof:

Let be the p.d.f of , so . From this, the expected value is

This concludes the proof. I skipped a bunch of steps in the evaluation of the integral because Wolfram Alpha did it for me.

For some people, this sort of travel frustration would lead to complaining and an angry Yelp review, but for me, it led me down this mathematical rabbit hole. Life is interesting, isn’t it?

I’m not sure if the locals employ this trick or not: it was pretty obvious to me, but on the other hand I didn’t witness anybody else doing it during my stay. Anyhow, useful trick to know if you’re staying in the Chungking Mansions!

It is often said that all pop music “sound the same”. If this is the case, then a computer program should be able to compose pop music, right?

This is the problem we studied for the AI final project (CS486) — kind of. We restrict to the simpler problem of harmonizing an existing melody: in other words, given a fixed melody, find a sequence of chord progressions that match it.

We formulated the task as a probabilistic optimization problem, and devised an algorithm to solve it using Hidden Markov Models (HMMs). Then, we used the Music21 library to generate MIDI files from the output so that we can listen to it.

In the experiment, we chose the melody from If Only by JJ Lin. Here’s a demo:

Music Theory Model

In our model, pop music consists of two parts: a melody line consisting of a sequence of individual notes, and a harmony line of a sequence of chords. Generally, the melody line is the notes sung by the lead vocalist, and the harmony line is played by accompanying instruments (piano, guitar, digital synthesizers). Conventionally, the chord progression is written as a sequence of chord names above the melody; the exact notes in a chord progression is up to the performer’s discretion.

Above: Example of two-part song with melody and chord progression

It is hard to quantify exactly what make a chord progression “good”, since music is inherently subjective and depends on an individual’s musical tastes. However, we capture the notion of “pleasant” using music theory, by assigning a penalty to musical dissonance between a note in the melody and a note in the chord.

According to music theory, the minor second, major second, and tritone (augmented fourth) intervals are dissonant (including the minor and major seventh inversions). Therefore, our program tries to avoid forming these intervals by assigning a penalty.

Above: List of dissonant intervals. All other intervals are consonant.

We assume that all notes in the melody lie on either a major or minor scale in some fixed key. The set of permissible chords for a scale are the major and minor chords where all of its constituent notes are in the scale. For example, in the scale of G major, the permissible chords are {G, Am, Bm, C, D, Em}.

This is a vastly simplified model that does not capture more nuanced chords that appear in pop music. However, even the four-chord progression G – D – Em – C is sufficient for many top-40 pop songs.

First attempt: Harmonize each bar independently

The first thing we tried was to looking at each bar by itself, and searching through all of the permissible chords to find the chord that harmonizes best with the bar.

In other words, we define the penalty of the chord for a melody bar is the number of times a melody note forms a dissonant interval with a chord note, weighted by the duration of the note. Then, we assign to each bar the chord with the lowest penalty.

This assigns a reasonable-sounding chord for each bar on its own, but doesn’t account for the transition probabilities between chords. We end up with the A minor chord repeated 3 times consecutively, which we’d like to avoid.

How to account for chord transitions, while still harmonizing with the melody in each bar? This is where we bring in Hidden Markov Models.

Second attempt: HMMs and the Viterbi Algorithm

Hidden Markov Models are a type of probabilistic graphical model that assumes that there is an unobservable hidden state that influences the output variable. In each timestep, the hidden state transitions to a different state, with probabilities given by a transition matrix.

A standard problem with HMMs is: given a sequence of observations, find the most likely sequence of hidden states that produced it. In our problem, the hidden states are the unknown sequence of chords, and the observed sequence is the bars of our melody. Now, solving the most likely sequence problem yields the chord progression that best harmonizes with the melody.

To encode the rule that we don’t like having the same chord consecutively for multiple bars, we assign the transitions so that the probability of transition from any chord to itself is low. The transition probability to every other chord is equal.

Above: Transition probabilities for chords (only 3 shown here, but this generalizes easily). A chord is equally likely to transition to any other chord, but unlikely to transition to itself.

The only thing left to do is solve the most likely sequence problem. We used a modified version of the Viterbi algorithm to do this — it solves the problem quickly using dynamic programming.

Running the Viterbi algorithm on our test melody:

Above: Using HMMs to eliminate repeated chords

And there we go — no more repeated chords! It doesn’t produce the same chord sequence as my manual harmonization, but sounds quite natural (and doesn’t violate any musical constraints).

Results and Future Directions

Using HMMs, we were able to harmonize a short melody segment. By looking at the entire melody as a whole, the HMM algorithm is able to produce a better chord progression than by optimizing every bar locally. Our implementation uses a greatly simplified musical model: it assumes the melody is in G major, and only considers 6 basic major and minor chords, but it successfully demonstrates a proof of concept.

When implementing this project, we thought this was new research (existing literature on harmonization focused mostly on classical music, not pop music). Alas, after some more literature review, it turns out that a Microsoft research team developed the same idea of using HMMs for melody harmonization, and published a series of papers in 2008. Oh well. Not quite publishable research but still a cool undergraduate class project.

In art class a few months ago, I made a dice out of clay. This dice isn’t quite your typical dice. Its structure is tampered with ever so slightly, so that just of the material is removed.

Let us model this crude, imperfect dice as a hypothetical, perfect dice. Suppose this dice is represented as a cube, composed of 64 smaller ‘cubelets’ (imagine the cubelets as 1x1x1 and the dice as 4x4x4). Furthermore, we imagine four ‘layers’ of cubelets, such that each layer is a 4×4 square.

In this dice, we remove one of the 16 cubelets from the second layer, like this:

(keep in mind that this is still a solid cube)

Assuming that the cube is made out of uniform density material (and that the missing cubelet has no mass), what is the probability of landing on each of the six sides when the cube is thrown?

A mathematical interpretation

I will solve a problem which may or may not be equivalent to throwing a dice. Instead of throwing the dice, we place the dice on a table with a random orientation, and let go. We then observe which side the dice lands on.

This appears to be a legit and identical interpretation, but there is a flaw. With an uneven dice, it is not guaranteed that each orientation would appear with uniform likelihood. Nevertheless, for our purposes we ignore this and we assume that this is an identical problem.

Which side will it land on? The center of gravity of the dice must be directly on one of the six sides; this side is the side that the dice will fall onto.

Consider a sphere of some radius, centered at the cube’s center of gravity and entirely contained in the cube:
Lines are drawn from the center of gravity to the eight vertices of the cube, essentially dividing the sphere into six pyramid-like parts.

The chance of landing on a side of the cube is proportional to the side’s surface area when ‘projected’ onto the surface of the sphere. This is because the side can be divided into infinitely many small ‘pyramids’, each one of which corresponds to the dice landing on that side when left to drop at that specific angle.

Such a portion of the surface area is called a solid angle, which is measured in steradians (sr). Whereas the entire circle is denoted as radians, the entire sphere is denoted by sr. Likewise, one hemisphere would be sr, and so on.

There is an efficient formula for calculating the solid angle of a tetrahedral projection onto a sphere, given by Oosterom and Strackee. If , , are vectors of three vertices of a tetrahedron (in relationship to the fourth), then the subtended solid angle is given by:

Here , , are magnitudes of the vectors, the dot represents a dot product, and is the determinant of the scalar product of the vectors.

The magnitude of the 3-D euclidean distance:

The dot product of two vectors is:

Finally,

Calculating the probability

Let us position the cube on a 3D coordinate, such that the corners are the origin and . The empty cube is the cube with center .

To find the center of gravity, we just take the averages of the centroids of all the smaller cubes (the missing cube contributing 0 to the sum). This turns out to be with or about 1.9921.

We have ; now we let . To facilitate vector manipulations, we will now transpose the cube such that the center of gravity is at :
Because of symmetry, it is obvious that three of the six sides should have one equal probability, and the other three sides to have a different, but again equal, probability. Thus it is not strictly necessary to go through the calculations for all six sides; we only need to do them for two of the sides.

A face forms a solid angle for a square pyramid, not a tetrahedron (for which we have a formula for). To solve this, we note that a square pyramid can be divided into two tetrahedrons, and the solid angle for it is simply the sum of solid angles for the two tetrahedrons.

Now by doing these calculations, we get a probability of 16.740% of landing on each of the three heavier sides, and 16.594% on each of the three lighter sides (sides closer to the missing cubelet).

By comparison, the probability of landing on any side on a fair dice is 16.667%. As only of the dice’s mass is removed, the resulting difference is slightly less than 1%.

I’ve began studying some probability and statistics for fun, and I’m using a book called Fifty Challenging Problems in Probability by Mosteller:

Printing math textbooks is a great way to use up the school’s printing credits at the end of the year.

The first problem is about socks in a drawer. At first glance it looks really, deceptively simple. But when I start doing the problem I find that it’s a bit trickier than I expected.

Solutions to the problems are provided at the back of the book. For this problem, I find that the book’s solution is really ugly and unintuitive. Thus I’m going to give an alternate solution here.

Problem (my version)

A drawer contains red and black socks. I randomly take a sock out of the drawer. I then randomly take another sock out of the drawer.

The probability that both are red is exactly . Determine how many socks of each color there are in the drawer.

Solution

Let r be the number of red socks in the drawer, and b be the number of black socks. Then the total number of socks in the drawer is r+b.

The probability that a sock you randomly take is red would be

Now the probability that the next sock you take would also be red is

This is because there is one less red sock in the drawer, and at the same time there is one less sock (of any color) in the drawer.

The probability that both socks will be red is the two probabilities multiplied:

Now we’re solving the equation to find possible values of r and b. There are two variables, and one equation. However, both variables have to be positive integers. This type of equation is called a diophantine equation.

We begin by simplifying the equation into a quadratic.

Now using the quadratic formula to solve for r:

It’s not necessary to consider the negative root of the quadratic. Simplifying the quadratic further, we get

Here we can make a substitution. Let . Then,

In this equation r will be an integer as long as b is an integer and k is an odd integer.

We are now finding integer solutions to the equation

..where k is also odd.

Simplifying, we get:

At this point we can drop the condition that k is odd, because if k is even then left hand side is even, but the right hand side is 1. As a result, k cannot be even if it’s a solution to this equation.

We’ve basically converted this problem into a form of Pell’s equation, which generally has this form:

..where D is a constant and x and y are unknown integers. Our instance of Pell’s equation is the case where D=8.

Fortunately for us, the Pell’s equation is well studied. We can use existing results in number theory to help us with this problem.

By trial and error, it can be found that the smallest nontrivial solution, or the fundamental solution to the equation is . Once the fundamental solution to a Pell type equation has been found, all other solution pairs can be generated. And there are infinitely many such pairs.

Generating solution pairs can be done with this recurrence relation:

It’s simple to write a computer program to generate these pairs. For instance, the first ten pairs of are:

Problem 285 of Project Euler was released yesterday; it requires some geometry and probability sense.

The problem goes like this:

Albert chooses two real numbers a and b randomly between 0 and 1, and a positive integer k.

He plays a game with the following formula:

The result k’ is rounded to the nearest integer. If k’ rounded is equal to k, then Albert gets k points. Otherwise, he gets nothing.

Find the expected value of the total score if Albert plays this game with .

For example (from the problem statement), let k=6, a=0.2, and b=0.85.

Here the expression gives the number 6.484. When rounded to the nearest integer, it becomes 6, which is equal to k. Therefore, Albert gets 6 points.

Yup – another expected value problem. But the concept of this problem is simple:

For any value of k, there are certain values of a and b where Albert will get points. Otherwise, he does not get points. Here the gray blob is the area where he does get points, and the white area is where he does not.

The red square is the sample space of the random variables a and b, since and .

The probability of Albert getting the points is the fraction of the square that’s gray.

The area of the red square is always 1. The area of the gray area varies with k. Thus the idea is to calculate the area of the gray area for each value of k from 1 to .

Now we can get to the details of the problem.

Notice that if , then (the difference between less-equal and less is irrelevant in this problem):

In a similar way, the equation of the larger circle is .

These are equations of circles. A circle has the equation where is the center and r is the radius.

Thus both the smaller and the larger circles have a center at . The radii of the circles are different: the radius of the smaller circle is and the radius of the larger circle .

Plotting the inequalities:

We can see now that this is a calculus problem! Indeed, the shaded area is the area of the larger circle inside the box – area of the smaller circle inside the box.

Initially I tried integrating the circles directly, but I ran into problems while integrating and I was unable to correctly integrate the equation. Instead, I integrated an equivalent problem:

It’s obvious that the areas do not change with this transformation.

So now for the smaller circle the equation is . Solving for b, we get:

.. And this equation for the larger circle:

The variable of integration here is a, and k is only a constant. Therefore we can replace the radius with r and get this equation to integrate:

With the help of an integration table, this can be integrated fairly easily.

There’s one additional problem – computing the integrating ‘limits’ of the two circles. Let’s call them and for small and large respectively.

Both of them can be computed using the pythagorean theorem.

The formula for (it should be obvious by now what should be):

Now we can get this really big formula for computing the area for any integer k:

Also, fortunately for me, it is impossible for the upper bound to cross the circle at the top or right, I mean, we never have something like this –

A while ago I wrote this blog post on this problem. The previous code was able to produce the answer in about 20 minutes, using several gigabytes of RAM.

You may want to read the previous blog post first.

There is a simpler and more efficient solution (used by inamori, zwuupeape, and many others).

The new approach

We do not need to know the exact value of the solution, only an approximate. This approximation has to be accurate only to six digits.

Any any step, there is a certain probability that the ant will reach the final state in that step.

Let’s call this function : the probability that the ant will reach the final state on step s:

At very small values of s, the probability is zero because it takes a certain number of steps to get to the final position. 44 is the smallest number of steps to get from the initial to the final position, and the probability increases from there.

But it only increases until a certain point. After that point, the probability decreases. After all, if we run this for thousands of steps, how likely is the ant to still not have reached the final position?

However, the graph continues forever to the right. Even after thousands of steps, there is still a chance, however small, that the final position hasn’t been reached.

Let’s define as the weighted average of the probabilities of each value of from 0 to s:

The expected value and the answer for this problem would simply be the weighted average for the entire graph.

Since s must be an integer greater than zero, we don’t have to do any integration with curves. A bit misleading because I’ve drawn and as curves, when they are not.

But because cannot be expressed algebraically, we cannot evaluate it as it approaches infinity. Instead, we pick a point that gives us enough precision:

Either that, or we keep calculating the weighted average while increasing s until we have enough precision.

Calculating the probabilities

In order to calculate a weighted average , we need to be able to calculate ; while is the probability function for the final state, we’ll need to know the probabilities for the in-between states as well.

The probabilities of the states on step s depend most upon the probabilities of the previous step, s-1.

So if on step s-1 there’s a 0.6 probability that we’re on state X, then on step s we should have a 0.3 probability of being on either Y or Z as there’s an equal chance of moving to either state.

And if multiple states can go to our state, we can get the probability by adding up the probabilities:

Coding a solution

At first we start on the initial state, with a probability of 1 (certain chance) of being on it.

On each step we generate a new set of probabilities using rules I described above, and so on.

A lot of code is stolen from my previous work on this problem. The State class is essentially unchanged in this version.