A very simple stochastic model of diachronic change

Consider an entity (for example, a language) which may or may not have a particular property (for example, obligatory coding of grammatical number). For convenience and interpretation-neutrality, we shall say that the entity is positive if it has this property and negative if it does not have this property. Consider the entity as it changes over the course of a number of events (for example, transmissions of the language from one generation to another) in which the entity’s state (whether it is positive or negative) may or may not change. For every nonnegative integer , let represent the entity’s state after exactly events have occurred, with negativity being represented by 0 and positivity being represented by 1. The initial state is a constant parameter of the model, but the states at other times are random variable whose “success” probabilities (i.e. values of 1 under their probability mass functions) are determined by and the other parameters of the model.

The other parameters of the model, besides , are denoted by and . These represent the probabilities that an event will change the state from negative to positive or from positive to negative, respectively. They are assumed to be constant across events—this assumption can be thought of as an interpretation of the uniformitarian principle familiar from historical linguistics and other fields. I shall call a change of state from negative to positive a gain and a change of state from positive to negative a loss, so that can be thought of as the gain rate per event and can be thought of as the loss rate per event.

Note that the gain resp. loss probability is / only if the state is negative resp. positive as the event begins. If the state is already positive resp. negative as the event begins then it is impossible for a further gain resp. loss to occur and therefore the gain resp. loss probability is 0 (but the loss resp. gain probability is /). Thus the random variables , , , … are not necessarily independent of one another.

I am aware that there’s a name for a sequence of random variables that are not necessarily independent of one another, namely “stochastic process”. However, that is about the extent of what I know about stochastic processes. I think the thing I’m talking about in this post is a very simple example of a stochastic process–an appropriate name for it would be the gain-loss process. If you know something about stochastic processes it might seem very trivial, but it was an interesting problem for me to try to figure out knowing nothing already about stochastic processes.

1.2. The solution

Suppose is a nonnegative integer and consider the state after exactly events have occurred. If the entity is negative as the th event begins, the probability of gain during the th event is . If the entity is positive as the th event begins, the probability of loss during the th event is . Now, as the th event begins, exactly events have already occurred. Therefore the probability that the entity is negative as the th event begins is and the probability that the entity is positive as the th event begins is . It follows by the law of total probability that

This recurrence relation can be solved using the highly sophisticated method of “use it to find general equations for the first few terms in the sequence, extrapolate the pattern, and confirm that the extrapolation is valid using a proof by induction”. I’ll spare you the laborious first phrase, and just show you the second and third. The solution is

Just so you can check that this is correct, the proofs by induction for the separate cases are given below.

Case 1 (. Base case. The expression

evaluates to 0 if , because the sum is empty.

Successor case. For every nonnegative integer such that

we have

Case 2 (). Base case. The expression

evaluates to 1 if , because the sum is empty.

Successor case. For every nonnegative integer such that

we have

I don’t know if there is any way to make sense of why exactly these equations are the way they are; if you have any ideas, I’d be interested to hear your comments. There is a nice way I can see of understanding the difference between the two cases. Consider an additional gain-loss process which changes in tandem with the gain-loss process that we’ve been considering up till just now, so that its state is always the opposite of that of . Then the gain rate of is (because if gains, loses) and the lose rate of is (because if loses, gains). And for every nonnegative integer , if we let denote the state of after exactly events have occurred, then

because if and only if . Of course, we can also rearrange this equation as .

Now, we can use the equation for Case 1 above, but with the appropriate variable names for substituted in, to see that

and it then follows that

Anyway, you may have noticed that the sum

which appears in both of the equations for is a geometric progression whose common ratio is . If , then and therefore (because and are probabilities, and therefore non-negative). The probability is then simply constant at 0 if (because gain is impossible) and constant at 1 if (because loss is impossible). Outside of this very trivial case, we have , and therefore the geometric progression may be written as a fraction as per the well-known formula:

It follows that

From these equations it is easy to see the limiting behaviour of the gain-loss process as the number of events approaches . If , then and therefore (because and are probabilities, and therefore not greater than 1). The equations in this case reduce to

which show that the state simply alternates deterministically back and forth between positive and negative (because is 0 if is even and 1 if is odd and is 1 if is even and 0 if is odd).

Otherwise, we have and therefore

Now the equations for and above are the same apart from the term in the numerator which contains as a factor, as well as another factor which is independent of . Therefore, regardless of the value of ,

This is a nice result: if is sufficiently large, the dependence of on , , … and is negligible and its success probability is negligibly different from . That it is this exact quantity sort of makes sense: it’s the ratio of the gain rate to the theoretical rate of change of state in either direction that we would get if both a gain and loss could occur in a single event.

In case you like graphs, here’s a graph of the process with , , and 500 events. The x-axis is the number of events that have occurred and the y-axis is the observed frequency, divided by 1000, of the state being positive after this number of events has occurred (for the blue line) or the probability of the state being positive according to the equations described in this post (for the green line). If you want to, you can view the Python code that I used to generate this graph (which is actually capable of simulating multiple-trait interactions, although I haven’t tried solving it in that case) on GitHub.

2. The continuous process

2.1. The problem

Let us now consider the same process, but continuous rather than discrete. That is, rather than the gains and losses occuring over the course of a discrete sequence of events, we now have a continuous interval in time, during which at any point losses and gains might occur instantaneously. The state of the process at time shall be denoted . Although multiple gains and losses may occur during an arbitrary subinterval, we may assume for the purpose of approximation that during sufficiently short subintervals only one gain or loss, or none, may occur, and the probabilities of gain and loss are directly proportional to the length of the subinterval. Let be the constant of proportionality for gain and let be the constant of proportionality for loss. These are the continuous model’s analogues of the and parameters in the discrete model. Note that they may be greater than 1, unlike and .

2.2. The solution

Suppose is a non-negative real number and is a positive integer. Let . The interval in time from time 0 to time can be divided up into subintervals of length . If is small enough, so that the approximating assumptions described in the previous paragraph can be made, then the subintervals can be regarded as discrete events, during each of which gain occurs with probability if the state at the start point of the subinterval is negative and loss occurs with probability if the state at the start point of the subinterval is positive. For every positive integer between 0 and inclusive, let denote the state of this discrete approximation of the process at time . Then for every integer between 0 and (inclusive) we have

provided and are not both equal to 0 (in which case, just as in the discrete case, the state remains constant at whatever the initial state was).

Many of the factors in this equation can be cancelled out, giving us

Now consider the case where in the limit approaches . Note that approaches 0 at the same time, because , and therefore the limit of is not simply 0 as in the discrete case. If we rewrite the expression as

and make the substitution , giving us

then we see that the limit is in fact , an exponential function of . It follows that

This is a pretty interesting result. I initially thought that the continuous process would just have the solution , completely independent of and , based on the idea that it could be viewed as a discrete process with an infinitely large number of events within every interval of time, so that it would constantly behave like the discrete process does in the limit as the number of events approaches infinity. In fact it turns out that it still behaves like the discrete process, with the effect of the initial state never quite disappearing—although it does of course disappear in the limit as approaches , because approaches 0:

4 responses to “A very simple stochastic model of diachronic change”

Nice post! Did you come up with it as a result of contemplating some question in historical linguistics?

I have one suggestion for an edit, in this paragraph:

Note that the gain/loss probability is {p}/{q} only if the state is negative/positive as the event begins. If the state is already positive/negative as the event begins then it is impossible for a further gain/loss to occur and therefore the gain/loss probability is 0 (but the loss/gain probability is {q}/{p}). Thus the random variables {X_1}, {X_2}, {X_3}, … are not necessarily independent of one another.

The “p/q”s threw me a loop the first time I was reading it, because they look like fractions. In situations like this, I usually use “respectively” as in “Note that the probability of gain (resp. loss) is p (resp. q) only if the state is negative (resp. positive) as the event begins.” This is a technique which I’ve found to be very useful in mathematical writing. I’m not sure that the “resp.”s wouldn’t become too stilted as the paragraph continues, though.

It was historical linguistics that led me to start thinking about this model, because I wanted to see if my intuitions about how the frequencies of particular change processes influence the frequencies of traits those processes affect could be illustrated mathematically. For example, it’s intuitively clear that if a trait is a lot easier to gain then lose then it should given enough time become common in the population, but the model tells us something about exactly how common it should end up being on average, and what the variance will be. Of course I’m not sure how accurate the model will be as applied to languages, since there are many complicating factors it ignores (e.g. interactions between traits languages, interactions between languages, etc.)

I don’t like the “resp.” convention for a silly reason: it requires one of the things to be put in parentheses, but this is unfair to the thing which has to be reduced to a parenthetical 🙂 Then again, I guess you could write “p resp. q” without the parentheses and see “resp.” as a coordinating conjunction. I’ll edit it that way.