Random walks are a gold mine for a wide variety of stochastic theory and practice. They are easy to explain, easy to code, and easy to misunderstand. In this section, we start out with the simplest imaginable random walk and then show how things can go wrong.

Let's examine the problem of the one-dimensional random walk. Consider a particle at the origin ($x=0$), with $p$ probability of moving to the right and $q=1-p$ probability of moving to the left. When the particle reaches $x=1$, the experiment terminates. On average, how many steps are required for this to terminate if $p=1/2$? This seems like a reasonable question, doesn't it?

The first thing we have to do is establish the conditions under which termination occurs. Thus, we need the probability that the particle ultimately reaches $x=1$. We'll call this probability $P$. On average, how many steps does it make to terminate?

where $p$ is the probability it moves to $x=1$ from $x=0$ on the first try, and $q$ is the probability it does not, but then ultimately makes it back to $x=0$ (with probability $P$) and then again makes it from there to $x=1$, again with probability $P$ (thus, $P^2$).

There are two solutions to $P$, $P=1$ and $P=p/(1-p)$. The crossover point is $p=1/2$. This is drawn below.

A straight-forward question to ask is what is the average number of steps it takes to ultimately terminate this random walk? The quick analysis above says that the particle will ultimately terminate, but, on average, how many steps are required for this to happen?

Unfortunately, the analysis above is not very helpful here because the statement is about the probability of ultimate termination, not the probability of termination given a particular point somewhere on the left of the origin.

Fortunately, we have all the Python-based tools to experimentally get at this. The following generator describes the $p=1/2$ random walker.

Now, here's where things get interesting! Fifty samples of the random walk is really not that many, so we'd like to hopefully get a better average by generating a lot more. If you try to generate, say, 1000 realizations, you'd be in for a long wait! This is because some of the random walks are really, really long! Furthermore, this is a persistent phenomenon; it's not just a bad draw from the random deck. Even if there is only one really long walk, it seriously distorts the average. It's tempting to conclude that this is just some outlier and get on with it, but not doing so will lead us to a very powerful theorem in stochastic processes.

To get a better picture of what's going on here, let's re-define our random walker function so we can limit how far it can go and thereby how long we'd have to wait.

In [7]:

defwalk(limit=50):'limited version of random walker'x=0whilex!=1andabs(x)<limit:x+=random.choice([-1,1])yieldx

Because we really just want to count the walks, we can save memory by not collecting the individual steps and just reporting the length of the walk.

In [8]:

defnwalk(limit=500):'limited version of random walker. Only returns length of path, not path itself'n=x=0whilex!=1andn<limit:x+=random.choice([-1,1])n+=1returnn# return length of walk

In [9]:

len_walk=[nwalk()foriinrange(500)]# generate 500 limited random walks

The usual practice would be to draw a histogram (i.e. an approximation of the probability density) here, but sometimes the automatic binning makes things hard to see. Instead, a cumulative distribution plot is helpful here. The excellent pandas module and some very useful data structures for this kind of work.

What's interesting about the above plot is how steep the slope is. For a path length of one (terminating at the first step), we already have 50% of the probability accounted for. By a path-length of 100, we already have about 90% of the probability. The problem is squeezing out the remaining probability mass means computing random walks longer than 500 (our arbitrary limit). We can do all the above steps for higher limits far above 500, but this observation still holds. Thus, the problem with averaging this is that getting more probability further out competes with the lengths of those further paths. For the average to converge, we want to asymptotically get more improbable paths relatively faster than those paths grow!

Let's examine the standard deviation of our averages.

In [11]:

defestimate_std(limit=10,ncount=50):'quick estimate of the standard deviation of the averages'ws=array([[nwalk(limit)foriinrange(ncount)]forkinrange(ncount)])return(limit,ws.mean(),ws.mean(1).std())forlimitin[10,20,50,100,200,300,500,1000]:print'limit=%d,\t average = %3.2f,\t std=%3.2f'%estimate_std(limit)

So far, we have been looking at this using samples and started suspecting that there is something going on with the convergence of the average. However, this is not uncovering what's going on under the sheets with the convergence of the average. Yes, we've tinkered with varying the sample size, but it could still be the case that there is some much larger sample size out there that would cure all the problems we have so far experienced.

The next code-block assembles some drawing utilities on the excellent networkx package so we can analyze this problem using graph combinatoric algorithms.

This next code-block constructs the lattice graph that we'll explain below.

In [13]:

defconstruct_graph(level=5):g=Graph()g.level=levelg.add_nodes_from([(i,dict(pos=i))foriindiagwalker(g.level)])forx,ying.nodes():ify!=1andx<g.level:g.add_edge((x,y),(x+1,y-1))g.add_edge((x,y),(x+1,y+1))g.node[(0,0)]['val']=1L# long intforjindiagwalker(g.level):ifj==(0,0):continuex,y=jg.node[j]['val']=sum(g.node[k]['val']forking.predecessors(j))returngg=construct_graph()

The figuree below shows a directed graph with the individual pathways the particle can take from $(0,0)$ to a particular end. The notation $(k,j)$ means that at iteration $k$, the particle is at $x=-j$ on the line. For example, all paths start at $(0,0)$ and there is only one path from $(0,0)$ to $(1,1)$. Likewise, the next terminus is at $(3,1)$, which is the pathway corresponding to reaching $x=1$ in three steps. There is only one path on the digraph that leads here. However, in this case, there are many pathways that are three-steps long, but that do not reach the terminus. For example, $(3,-1)$ is a path that leads to $x=-1$ in three steps.

Let's use some of the powerful algorithms in networkx to pursue these ideas.

The lattice diagram above shows the potential paths to termination for the random walk. For example, the all paths that lead to the node labeled (5,1) are those paths that take exactly five steps to terminate. Note that this is a directed graph so the graph can only be tranversed in the direction of the arrows (arrowheads denoted by thick ends as shown). Fortunately, networkx has powerful tools for computing these paths as shown in the following.

In [15]:

l5=list(nx.all_simple_paths(g,(0,0),(5,1)))foriinl5:printifig,ax=subplots()fig.set_size_inches((6,6))g.draw(ax,with_labels=0)g.subgraph(l5[0]).draw(ax=ax,with_labels=1,node_color='b',node_size=700)g.subgraph(l5[1]).draw(ax=ax,with_labels=1,node_color='b',node_size=800)ax.set_title('Pathways that terminate in five steps',fontsize=16)

The figure shows the two pathways that terminate in five steps. The other fact we need is how many pathways do not terminate in five steps, then we'll be on our way to computing a probability. Fortunately, we already have this built into our initial graph construction.

This is a conditional probability because the only way to terminate in five steps is to not have terminated at step 1 or 3, which is emphasized by the hat on top of those digits in the above equation. To compute the unconditional probability, we need to account for

The following plot shows the probability of termination for each number of steps.

In [19]:

fig,ax=subplots()ax.axis(ymax=1)ax.plot(p.keys(),p.values(),'-o')ax.set_xlabel('exact no. of steps to terminate',fontsize=18)ax.set_ylabel('probability',fontsize=18)ax.axis(ymax=.6)

Out[19]:

(0.0, 16.0, 0.0, 0.6)

Thus, we have a computational method for computing the probability of terminating in a given number of steps. What we need now is a separate analytical result that confirms our work so far. This is where the book by Feller comes in (Feller, William. An Introduction to Probability Theory and Its Applications: Volume One. John Wiley & Sons, 1950.).

In random walk terminology, the probability that first visit to $x=1$ takes place at the nth step is denoted as $\phi_n$. Furthermore, the generating function is defined as:

$$ \Phi(s) = \sum_{n=0}^\infty \phi_n s^n$$

we will need two other random variables. $N$ is the first time that the particle reaches $x=1$. This is a random variable with probability $\phi_n$. Likewise, $N_1$ is the number of trials required to move the particle from anywhere to the left of $x=-1$ to $x=0$. Also, $N_2$ is the number of trials required to move the particle from $x=0$ to $x=1$. Now, here comes the key step: these three random variables are independent with the same probability distribution. With these definitions, we have

$$ N = 1 + N_1 + N_2$$

In other words, you need $N_1$ trials to make it from wherever the particle was on the left of the origin back to the origin; and then you need $N_2$ trials to make it from there to $x=1$. The first $1$ accounts for the first step to the left from the origin. Thus we have,

How do we pick between them? The limit of $ \Phi_2(s) $ is unbounded as $s \rightarrow 0$. Thus, $\Phi(s=1) = \Phi_1(s=1)$.

One of the properties of the $\Phi$ function is that $\Phi(s=1)$ should sum to one. In this case, we have

$$ \Phi(s=1) = \Phi_1(s=1)= \frac{1-|p-q|}{2 q}$$

Thus, if $p < q$, we have

$$ \sum \phi_n = p/q$$

which doesn't equal one. The problem here is that the definition of the random variable $N$ only considered the conclusion that the particle would ultimately reach $x=1$. The other possibility, which accounts for the missing probability here, is that the particle never terminates. Similarly, when $ p \ge q $, we have

$$ \sum \phi_n = 1$$

This should be no surprise because we saw this exact same result when we first started investigating this! Namely, $p \ge q \implies p \ge 1/2$, which is what we concluded from our first plot. This is the situation where the particle always terminates.

The first derivative of $\Phi(s)$ function evaluated at $s=1$ is the mean. Thus,

Now, we can try this formula out against our graph construction and see if it matches. The downside is that scipy.misc.comb cannot handle fractional terms, so we need to use sympy instead, as shown below.

In [20]:

importsympy# analytical result from Fellerpfeller=dict([(2*k-1,sympy.binomial(1/2,k)*(-1)**(k-1))forkinrange(1,10)])fork,vinp.iteritems():assertv==pfeller[k]# matches every item!

Now that we know exactly why the mean does not converge for $p=1/2$, let's check the calculation for when we know it does converge, say, for $p=2/3$. In this case the average number of steps to terminate is

$$ \mathbb{E}(N)\bigg|_{p=2/3} = 3 $$

The code below redefines our earlier code for this case.

In [21]:

defnwalk(limit=500):'p=2/3 version of random walker. Only returns length of path'n=x=0whilex!=1andn<limit:x+=random.choice([-1,1,1])# twice as many 1's as beforen+=1returnn# return length of walk

In this long post, we thoroughly investigated the random walk and the lack of convergence in the average we noted when equiprobable steps are used. We then pursued this using both computational graph methods as well as analytical results. It's important to reflect on what would have happened if we had not noticed the strange convergence of the equiprobable case. Most likely, we would have just ignored it as some kind of sampling problem that is cured asymptotically. However, that did not happen here, and this kind of thing is easy to miss in real problems that have not been so heavily studied as the random walk. Thus, the moral of the story is that it pays to have a wide variety of analytical and computational tools (e.g. from the scientific Python stack) available, and to use both of them in tandem to chase down strange results, however mildly unexpected. Furthermore, concepts that sit at the core of more elaborate methods should be understood, or at least characterized as carefully as possible, because once these bleed into complicated meta-models, it may be impossible to track the resulting errors down to the source.