I am preparing for interviews and I was told to try and answer as many problems in the Mark Joshi book as possible.

Question:

Suppose an asset takes values from a discrete set $v_j$ and the probabilities of $v_j$ is $p_j$.
Write an algorithm that produces the random variable for this asset from a uniformly distributed random variable

I am not sure if I understand the question posed here. Any suggestions or clarification is greatly appreciated.

4 Answers
4

For this question you are given some function random() yielding a uniform random number and what we want is a function next() which yields realizations of a random $X$ variable with values $v_j$ such that $P(X=v_j)=p_j$.

From standard textbooks we know the following transformation: If $u_i$ are uniform random numbers and $g$ the inverse of a cumulative distribution function $F$ then $g(u_i)$ are realizations of a random variable with c.d.f $F$.

In our case the best way is to store the values in a vector $v[]$ so we have just to handle the random index $Y$ with $P(Y=j)=p_j$. The c.d.f is given by
$$
F[k] = P(Y\le k) = \sum_{j \le k} p_j
$$
and can be stored in another array, Now the inverse is easy to implement:

$\begingroup$As I understand I first create a vector of uniform random numbers of size $n-1$ then I create a function cdf $F$ (I used normal CDF). Then I create a function $g$ that computes the inverse of $F$?$\endgroup$
– WolfyJul 8 '17 at 17:03

$\begingroup$You can create a vector or call the function repeatedly as implemented above. Best way is to directly implement $g$ (the inverse of $F$). For the normal distribution code for the inverse c.d.f. exists (e.g. Numerical Recipees) and for the exponential it's easy to write down analytically.$\endgroup$
– JaFaJul 10 '17 at 7:08

Write a program that returns "Yes" with probability p and "No" otherwise. You are given a function runif(), which returns a random number between 0 and 1.

Solution:

r = runif(); if $r<p$ then return "YES"; else return "NO".

From this well known example you have to generalize to more than 2 possible outcomes. Imagine that you have to simulate a lottery spinning wheel with unequal sectors. of width $p_1,p_2,\cdots$ with $r$ representing the distance (fraction of a circle) by which the wheel turned.

Solution:

After you draw a uniform random number r=runif(), you check its value. If $r<p_1$ you return $v_1$, else if $r<p1+p2$ you return $v_2$ else if $r<p_1+p_2+p_3$ you return $v_3$,... The general case, for arbitrary $n$, is just a simple loop testing if $r$ is in the right range and returning the corresponding value for that sector. (Do you think you can write the code?).

The answers above which suggest simulating uniform $U\sim(0,1)$ and returning $J$ such that $\sum_{i=1}^{J-1} p_i \le U < \sum_{i=1}^{J} p_i$ will be fine for the interview. However, since this isn't just a job interview Q&A site, I'd like to remark that for large sample spaces, or if you make a lot of samples from even a modestly sized sample spaces, this is extremely inefficient. I'd be aghast to see this being used in production code in the 21st century for these cases- Walker alias sampling, for example, is a much better alternative.

$\begingroup$if you're given the pdf (as in the question), then why not use the inverse transform method?$\endgroup$
– willJul 7 '17 at 12:20

$\begingroup$@will - as I mentioned, it is inefficient in general. If the $p_j$ have a special form that allows you to invert the CDF then, sure, awesome- do that. Or, as your answer alludes to, you could approximately invert the CDF by e.g. fitting splines... as long as you're happy with approximate simulation (often we are, but that's not the question here). All I'm saying is that repeatedly adding up a bunch of numbers isn't that smart.$\endgroup$
– P.WindridgeJul 7 '17 at 15:42

$\begingroup$I would disagree that it is inefficient. Perhaps less accurate, but efficiency is not even remotely a concern-that is all inside your spline code, which you can get extremely efficient...$\endgroup$
– willJul 7 '17 at 23:17

$\begingroup$I think we have different definitions of efficient and different use cases in mind (your idea of efficient sounds like "it's fast on all the examples I tried", which is fine, of course).$\endgroup$
– P.WindridgeJul 8 '17 at 8:47

The question is asking you to produce an algorithm to generate random numbers with a given probability distribution, having been provided with uniformly distributed random numbers.

essentially, you need to provide some function, $f(\phi(\ldots)): \tilde{X}\sim U(0,1) \to \tilde{Y} \sim \phi(\ldots)$ . That is, a function of a pdf, that will convert uniformly distributed random numbers to random numbers distributed according to the pdf provided.

There are several ways to do it described on wikipedia, in the case of this specific question, the one you want is inverse transform sampling. You just need to integrate the pdf to get a cdf to use. You can then create an interpolating spline for it, on some grid, and then swap x and y to invert it.