(a) Two monkeys are typing capital letters (A-Z) randomly. The first stops typing when the word COCONUT appears as seven successive letters. The second stops typing when TUNOCOC appears; TUNOCOC is simply COCONUT spelt backwards. Which monkey is expected to type more?

(b) A monkey is typing capital letters (A-Z) randomly. Two squirrels observe the sequence of letters thus generated. The first squirrel wins if COCONUT appears (as seven successive letters) before TUNOCOC. The second squirrel wins if TUNOCOC appears before COCONUT. Which squirrel is more likely to win?

$\begingroup$@EricDuminil I think they'd be equivalent if (b) had two monkeys typing and each squirrel was watching a different monkey. It's a different scenario when both are watching the same monkey.$\endgroup$
– jafeNov 23 '18 at 11:43

2

$\begingroup$Please clarify: in a, do the monkeys generate separate input strings or do they co-operate on one?$\endgroup$
– Weckar E.Nov 23 '18 at 17:22

2

$\begingroup$Clarification? Are the two monkeys typing on two computers, each creating their own string of letters? Or are the typing parallel on the same keyboard, creating a unified line of letters?$\endgroup$
– FalcoNov 26 '18 at 13:37

15 Answers
15

(a) I claim that the expected typing length are the same for both monkeys. I guess something in my argument will be incorrect, as jafe's answer has 9 approvals, but finding that incorrectness would be helpful to me.

I prove that the probabilty that a monkey ends his typing after exactly $n$ letters is the same for both monkeys, which then implies that the expected typing lengths are the same.

In order for the typing to end after letter number $n$, the last seven letters must be exactly the monkey's preferred word "COCONUT" / "TUNOCOC" and that word cannot ever have appeared before in the text. That word appearing before can be of two kinds:

It could share some letters with the last 7, or

be completely contained in the first $n-7$ letters.

It's easy to see that for the first monkey, case 1 can not happen: If the word "COCONUT" shared some of its letters with the last 7 letters, then the last letter "T" would have to appear a second time in "COCONUT" (besides last place), which it doesn't.

The argument is slightly more complicated for the second monkey, as the last letter ("C") of "TUNOCOC" does appear a second time in it. But "TUNOC" is not the ending part of "TUNOCOC", so also for the second monkey case 1. cannot happen.

Let's recap: In order for each monkey to end typing after exactly $n$ letters, the text it produced must fullfill two criteria:

A) The last seven letters must be exactly the monkey's preferred word
"COCONUT"/"TUNOCOC", and

This is an equivalence: Any sequence of $n$ letters that fulfills A and B (for any of the two monkeys) is a sequence that has that monkey stop after exactly $n$ letters, and vice versa.

Since each letter typed is independent of others, the events that A or B aply to a sequence of $n$ letters (for a given monkey) are also independent, so the respective probabilities can be multiplied. If we use indices $1$ and $2$ to apply for the first and second monkey, we obviuously get

$$ p_1(A)=p_2(A)=\left(\frac1{26}\right)^7$$

Calculating $p_{1/2}(B)$ directly is much harder, but they are the same: Both consider sequences of $n-7$ independent, uniformly distributed letters, and if and only if such a sequence does (does not) contain the word "COCONUT", then the reversed sequence (which is again a sequence of $n-7$ independent, uniformly distributed letters) does (does not) contain the word "TUNOCOC".

So we get

$$p_1(B) = p_2 (B)$$ and this proves my initial claim: The probability that the typing ends after exactly $n$ letters is $p_1(A)p_1(B)$ for the first monkey and $p_2(A)p_2(B)$ for the second, and they are the same.

+++++++

I cannot say why jafe's argument is incorrect, it is very compelling, and as I said initially, it may very well be that it is correct and something I said is incorrect. I just don't see what that may be, in either case.

ADDITION after more thinking and reading the comments:

I think the flaw in jafe's argument is in double counting after "COCOCO". If the next letter is "N", this is counted as 'good' for both the ending "COCO" and continuation "NUT" and the intitial "COCO" and continution "CONUT". However, both can only lead to the same end result "COCOCONUT", the additional "CONUT" option is not really additional.

$\begingroup$The caveat is that you show that the proba for each monkey to end after $n$ letters is the same assuming that they both type at least $n$ letters. But the monkey A has a higher probability of having finished beforehand.$\endgroup$
– EvargaloNov 23 '18 at 12:30

3

$\begingroup$How can that be? If the first monkey has a higher probabilty to finish beforehand than monkey 2, it has to have a higher probabilty to have finished after exactly $k$ letters for some $k < n$, and then my argument applies again. In addition, I do not assume that they have typed at least $n$ letters, I have given exact condtions for that to be true and shown that they have equal probability.$\endgroup$
– IngixNov 23 '18 at 12:42

1

$\begingroup$Again, if you think that the first monkey has a higher probabilty to finish beforehand, which do you think is the first letter index at which the first monkey has a higher probability to have finished?$\endgroup$
– IngixNov 23 '18 at 12:50

1

$\begingroup$I am now convinced that this is the correct answer. Very confusing problem, but the flaw is in the other reasoning (see my comment down Viktor Mellgren's answer)$\endgroup$
– EvargaloNov 23 '18 at 13:49

1

$\begingroup$I wrote a Python script with a smaller alphabet ('ABCNOTU'). Both monkeys seem to need around 800000 letters before writing the desired word. I repeated the experiment 300 times and couldn't find any bias toward monkey 1 or 2.$\endgroup$
– Eric DuminilNov 23 '18 at 14:11

Regardless of the previous letters, the TUNOCOC monkey has 1/26 chance of getting the next letter right. Clearly the COCONUT monkey has at least this, but turns out it does even better: Once it has typed COCO, it has two ways to get the correct result (NUT or CONUT).

$\begingroup$Sounds very compelling, still I think it is wrong, see my solution below. Would be very interested in you (and everybody else, of course) to check my approach.$\endgroup$
– IngixNov 23 '18 at 12:12

3

$\begingroup$I reduced the problem to the monkeys typing FFA or AFF which I think is equivalent. Then I made a computer simulation where I had them imput random characters and select a winner. The simulation does NOT support your answer. They both were winner the same amount of times.$\endgroup$
– Pieter BNov 23 '18 at 13:00

1

$\begingroup$PieterB & Evargalo, what is the characterset of the typewriter in the FFA example? I think it needs a third 'wrong' character.$\endgroup$
– eliasNov 23 '18 at 13:52

$\begingroup$I listed out all combinations of C,O,U,N,T for up to 11 letters. Indeed both COCONUT and TUNOCOC show up the same number of times for each letter count. D'oh!$\endgroup$
– jafeNov 23 '18 at 14:12

Monkey problem

To settle down which monkey is faster on average, I'll use Markov chains and Mathematica. Define a state $i$, for $i = 0..6$, as that the monkey 1 has currently written $i$ correct subsequent characters of COCONUT, but has never written all of COCONUT. Define a state 7 as the monkey 1 having written COCONUT at least once. Since state 7 cannot be escaped, we say that it is absorptive (other states are transient). The transition matrix for monkey 1 is:

Because the monkeys start from the situation when nothing has been written yet, we see that the expected time for them to write their word is the same $C = 8031810176$

This result can also be obtained directly in Mathematica:

d1 = FirstPassageTimeDistribution[p1, 8];
Mean[d1]

In fact, we can compute the characteristic function for the absorption-time-distribution:

CharacteristicFunction[d1, x]

which evaluates to

e^(7ix) / (C - C e^(ix) + e^(7ix))

The distributions for both monkeys have this same characteristic function. Wikipedia claims that cumulative distribution functions and characteristic functions are in one-to-one correspondence. This implies that the monkeys' finishing-time-distributions are the same.

Squirrel problem

The transition matrix for a monkey aiming for both words COCONUT and TUNOCOC is:

Here the states have been numbered in order - C CO COC COCO COCON COCONU T TU TUN TUNO TUNOC TUNOCO COCONUT TUNOCOC. Note that the absorbing states come last. The same reference as above shows that we can compute the probability $B3_{ij}$ of ending from a transient state $i$ to an absorption state $j + 13$ as follows:

Since the monkey starts from nothing, the probability for writing COCONUT and TUNOCOC is 308915099 and 308915775, respectively, divided by 617830874. These correspond to 0.4999994529 and 0.5000005471, computed with Mathematica accurate to 10 decimal places. Therefore the monkey is more probable to write TUNOCOC than COCONUT.

$\begingroup$Thanks for doing the work of establishing the Markov matrices and doing the calculations! It should be noted that the discussed effects about the repeated sub-words "CO" and "OC" can be seen: the expected value vectors start and and end the same, but have different values for COC/TUN and COCO/TUNO. The differences just vanish again in the end. The reason for this is IMO that COCONUT can not overlap with itself in a text (like e.g. NUTNU could). That means that for all such words of the same length (say "TOBACCO"), the expected time should be exactly the same.$\endgroup$
– IngixNov 24 '18 at 19:18

$\begingroup$Thanks. This is a great answer for learning some cool tricks in Mathematica.$\endgroup$
– Erel Segal-HaleviNov 26 '18 at 8:26

$\begingroup$This is a great answer! In my answer I calculated the probabilities for the two squirrels in the second problem too, but my result is slightly more in favour to the second squirrel (rounds to around 50.037 %). I'd expect the methods used to be equivalent, so maybe there's some typo or silly error in one of the two answers? I'll check when I have more time$\endgroup$
– ablNov 27 '18 at 9:30

A simpler example, where instead of typing "COCONUT" or "TUNOCOC", the monkeys are trying to type "AB" or "CA". Also, the keyboard only has "A", "B", and "C" keys. Consider what happens after the first three keystrokes, for which there are 27 possibilities.

In this case:

In 5 out of the 27 possiblities, "AB" appears and "CA" does not. Similarly, in 5 of the possiblities, "CA" appears and "AB" does not. But there's also one possibility, "CAB", where both appear, and squirrel 2 wins. In this instance, the "A" is useful for both, but it was useful for the second squirrel earlier. The "setup" for the first squirrel to match will often result in the second squirrel already having matched.

Now consider:

A similar thing is true of "COCONUT" versus "TUNOCOC". The first squirrel wants "COC" as a setup, but if "COC" does appear, it's often at the tail end (pun intended) of the second squirrel's victory.

$\begingroup$It sounds plausible. But it would also mean that if there's any bias, it will be probably too small to be detected by random simulations, right?$\endgroup$
– Eric DuminilNov 23 '18 at 18:45

$\begingroup$@Duminil correct. Even simulating a single play-through would be fairly expensive.$\endgroup$
– SneftelNov 23 '18 at 20:11

$\begingroup$Can't we check the CAB-Thesis, very cheaply and extrapolate from there?$\endgroup$
– FalcoNov 26 '18 at 13:46

$\begingroup$@Falco Sure, but there's no need for that. The situation is simple enough to solve analytically based on the initial conditions and the transition matrix. In that situation, the second squirrel wins precisely 60% of the time. Of course, I don't offer a formal proof that it is valid to extrapolate from AB to COCONUT.$\endgroup$
– SneftelNov 26 '18 at 14:51

$\begingroup$I ran a simulation with "C,O,N,U,T,X" as possible letters (since the number of dummy-letters will only make the bias smaller. And I got a fairly consistent bias of over 50% wins for the second squirrel$\endgroup$
– FalcoNov 26 '18 at 14:51

@Ingix provides a good proof, one certainly worthy of the tick, but one that could perhaps use some reinforcement. The answer is that the N number of letters would have the same expected value. A few points that might help @jafe realise the flaw in his proof.

The order of independent events occurs has no directional bias. For example, drawing a King Of Hearts followed by an Ace Of Hearts, from a 52 card deck is no more probable than an Ace followed by a King.

Whilst the COCOCONUT argument might seem to hold water, it's first important to realise that the probability of that 9 letter sequence is the same as the probability of DOCOCONUT, or for that matter TUTUNOCOC having the initial CO does not gain anything to the probability as there's still a 1/26 chance of getting the next letter correct.

The apparent gain of the fallback that COCOCONUT gives you is offset by the amount of extra letters that you have to type. If you get the sequence COCOCONUX, you have wasted 9 letters, whereas TUNOCOX only wastes 7. (And COCOCONUX is far more improbable than TUNOCOX).

TLDR: the order of independent variables has no favoured direction.

(P.S. This is my first stack answer, so feel free to give any constructive feedback)

$\begingroup$I'm still confused about this, but your 3) point seems very good to me$\endgroup$
– George MenoutisNov 23 '18 at 14:26

$\begingroup$Welcome to Puzzling! Yup, your first two points are exactly right. As for the wastage of letters, I don't think it's really a good argument. I'm not sure it works that way. IMO the reason why COCOCONUT is a false benefit, is because when considering the probability of the events happening, you are assuming the letters before the last 7 are essentially random. This means each letter has 26 possibilities. As such, the combination of CO- as a prefix to COCONUT is already considered within the probability count. To factor it in again is double counting.$\endgroup$
– Arch2KNov 23 '18 at 14:52

$\begingroup$Thanks for your feedback @Ong Yu Hann. The difference between the 3rd point and the others is that the first 2 are easily provable mathematical facts. The 3rd is simply to help people logically understand why COCOCOCO doesn't provide a probabilistic benefit.$\endgroup$
– WittierDinosaurNov 23 '18 at 14:58

$\begingroup$While it's true that COCONUT and TUNOCOC are equivalent, you are making a far more sweeping (and false) statement when you imply that all sequences of the same length are equivalent. For a simple counterexample that can be checked by hand (using Markov chains, or even just by trying it): if you flip coins over and over, on average it will take you 6 coinflips until you see HH, but only 4 coinflips until you see HT.$\endgroup$
– Misha LavrovNov 27 '18 at 1:14

There is a clever trick to finding the kind of expected times that appear in part (a). It turns out that both for COCONUT and TUNOCOC, the expected time is just $26^7$ (so the monkeys have equal expected times to win), but for this to be convincing, I will demonstrate the method on a case where the expected value is unexpected: the word ABRACADABRA.

Suppose that a monkey is typing randomly, and a family of beavers decides to bet on the outcome. The rules for betting are as follows:

Right before a new letter is typed, a new beaver shows up and is given $1$ dollar. The beaver bets that dollar on that letter being A, at $26:1$ odds.

If the beaver wins (and now has $26$ dollars), the beaver bets all of them on the next letter being B.

If the beaver wins (and now has $26^2$ dollars), the beaver bets all of them on the next letter being R.

In general, whenever the beaver wins the first $k$ bets (and has $26^k$ dollars), the beaver bets all of them on the next, $(k+1$)-th letter still matching the $(k+1)$-th letter of ABRACADABRA.

Any time a beaver loses a bet, that beaver has no more money and goes home. (But at any time, multiple beavers may be betting, because a new beaver shows up right before each letter.)

Each bet is fair, so after $n$ letters are typed, the expected number of money the beavers have altogether is $n$: the $n$ dollars that the beavers get when they show up.

When ABRACADABRA first appears, we can count the total amount of money in the system:

One lucky beaver has $26^{11}$ dollars from betting on ABRACADABRA all the way through.

One beaver started on the ABRA at the end of ABRACADABRA, and has $26^4$ dollars.

One beaver just started betting before the last A, and has $26$ dollars.

All other beavers lost their bets and have no money.

So we are guaranteed to have $26^{11} + 26^4 + 26$ dollars in the system. But the expected amount of money in the system is equal to the number of letters typed. So in expectation, it takes $26^{11} + 26^4 + 26$ letters to type ABRACADABRA.

(Formally, we are invoking Wald's equation to see that the rule "the expected money in the system is equal to the expected number of letters typed" still holds when the number of letters typed depends on the letters that have been typed - when we stop once ABRACADABRA is reached)

Now, going back to COCONUT and TUNOCOC: neither word ends with an initial segment of the word. (The CONUT in COCONUT doesn't change anything.) If the beavers were betting on COCONUT, then at the end when COCONUT is typed, one beaver would have $26^7$ dollars and no other beavers would have anything. The same is true for TUNOCOC. So the expected time is just $26^7$ in both cases.

(A weird consequence is that a word like AAAAA has a longer expected waiting time than a word like ABCDE. To explain this intuitively, consider that if ABCDE hasn't been typed yet, the last five letters can can be any of AABCD, BABCD, CABCD, ..., ZABCD, giving us 26 ways to win on the next letter. But if AAAAA hasn't been typed yet, the last five letters can only be any of BAAAA, CAAAA, DAAAA, ..., ZAAAA, giving us only 25 ways to win.)

$\begingroup$This sounds exciting. Could you add a note on how Wald's equation is applied here? Referring to Wikipedia's page, what is the sequence (X_n), what is N, and how to see that X_N are i.i.d., and in particular why N is independent of the sequence (X_n)?$\endgroup$
– kabaNov 26 '18 at 23:03

$\begingroup$We are using the general version, so we don't need either of the requirements you mention. Here, $N$ is the number of steps taken and $X_n$ is the change in how much money there is in the system at step $n$. Condition 2 in Wikipedia's article essentially requires that $N$ be a stopping time: $N \ge n$ is independent of $X_n, X_{n+1}, X_{n+2}, \dots$ (so it does not require telling the future).$\endgroup$
– Misha LavrovNov 26 '18 at 23:13

$\begingroup$Thanks, I somehow missed the general version. I replicated that the mean time for ABRACADABRA is indeed $26^{11} + 26^4 + 26$ using Markov chains as in my answer. This answer is probably the best one for problem (a) this far.$\endgroup$
– kabaNov 27 '18 at 1:04

In this answer I'll show that in problem (b), the second squirrel has better chances.

First (for introduction purposes only) we'll see a simplified version of the problem, that is simple enough to be run on a simulation

Then we'll see how we can exactly calculate the probability that the second squirrel will win, in this simplified example. The method that we'll use does not depend on any simplification, and can be applied to the original problem as well.

We'll see that the result that we obtain with this method matches with the results of the simulation.

Having then some extra confidence in the method (not that it's really needed, since it's fairly clear), we'll apply it to the original problem and obtain a result.

Simplified problem:

A monkey is typing capital letters from an alphabet that contains the letters A, B and X. Two squirrels observe the sequence of letters thus generated. The first squirrel wins if AAB appears (as three successive letters) before BAA. The second squirrel wins if BAA appears before AAB. Which squirrel is more likely to win?

If chosen this problem because it has a similarity with the original, which I believe is crucial: "AAB" can overlap with "BAA" only in the form "AABAA", while in the other way around it can be either "BAAAB" or "BAAB". Just like "COCONUT" followed by "TUNOCOC" can be overlapped only in the form "COCONUTUNOCOC", while, the other way around, they can be overlapped like "TUNOCOCOCONUT" or like "TUNOCOCONUT".

Note however that we'll be solving the actual problem in the end. The only reason I'm including a simpler version is to explain the method on a simpler case, and because for the simpler problem we can run a simulation to verify our predictions.

Explanation

We must realize that at any point in the game, any of the two strings can be partly under construction, or not. More importantly, the only thing that affects the probabilities at any point in the game is how much of each string is already written. If, for example, the two last letters of the string so far are "XA", that means that the string AAB is partly written up to "A", and the string BAA is not partly written at all. This and only this is what determines which of the two squirrels is better off at this point, regardless of what has been written so far before "XA".

Therefore the state of the game at any given point is defined exclusively by the longest suffix of the sequence written so far that is also a prefix of at least one of the strings that the squirrels are betting at.

Thus this game has exactly seven different possible states, which are:

$\epsilon$ (empty string, meaning no matching suffix)

$\verb|A|$

$\verb|AA|$

$\verb|AAB|$

$\verb|B|$

$\verb|BA|$

$\verb|BAA|$

On each of these different states, the squirrels may have different chances of winning. Let's note $P_\epsilon$ as the chances that the second squirrel has when the state is $\epsilon$, $P_{A}$ as the chances that he has when the state is $\verb|A|$, etc. What we're interested in is $P_\epsilon$, because that's the initial state of the game.

We can easily formulate each of this probabilities in terms of the others. From state $\epsilon$ there's a 1/3 chance of writing an "A", thus moving to state A. There's also a 1/3 chance of transitioning to B, and finally a 1/3 chance of staying at e (if an "X" is written). That means that the chances for the second squirrel on state e are:

We can similarly express the chances for all the other states. Note that $P_{AAB} = 0$, because the second squirrel loses immediately on that state, and $P_{BAA} = 1$, because it wins immediately. The full list is:

This is simply a system of linear equations, and it's simple enough to be solved by hand. The solution is $P_\epsilon = \frac{8}{13}$ or approximately 61.54%. These are the chances for the second squirrel when the game starts. The chances for the first squirrel are $\frac{5}{13}$, about 38.46%.

Running the simplified problem

Now, because this game has only three letters, the average game will be pretty short, so much in fact that we can play it enough times to have some statistics. The following program (in Java) simulates 10 million games and prints the percentage of wins for each squirrel. I've run it several times and the results are always in the range of 38.4 - 38.5% of wins for the first squirrel, and 61.5 - 61.6% of wins for the second squirrel, which matches very well with our previous calculations.

$\begingroup$The coefficients you use to relate different $P_{?}$ correspond to the transition matrix $P3$ in my answer. I think you have a typo for $P_{COCO}$ where $P_C$ should be replaced with $P_{COC}$.$\endgroup$
– kabaNov 27 '18 at 13:04

$\begingroup$@kaba you're right, thanks for spotting that! It's not really a typo but rather a mistake that carries on to the result too. That probably explains the difference between my result and yours. I will fix it later.$\endgroup$
– ablNov 27 '18 at 13:38

$\begingroup$Our solutions are equivalent, and therefore should be equal. I like how leaving it in the linear equation form, as you have, is clearer than solving explicitly using matrix inverse.$\endgroup$
– kabaNov 27 '18 at 13:42

I have tried to create a Markov chain modelling problem a). I haven't even tried to solve it, but the model (i.e. the states and the transitions) should be correct (with one caveat which I'll explain at the end).

First, a simplification: I have reduced the problem to getting either the string A-A-B (as in CO-CO-NUT) or B-A-A (as in TUN-OC-OC), using an alphabet that only includes 3 letters: A, B, and C.

Here's what I've got. This is for AAB:

As you can see, I have separated the state "Only one A" from the state "AA". It is noteworthy that there is no way to go from "AA" to "A".

And this is for BAA:

I'm not sure what this leads to. I was hoping there would be some visible hint about the solution (like: both chains are very similar, with just one difference that tilts the balance in favour of one of them), but this doesn't seem to be the case.

To be honest, I'm not even sure my simplification from COCONUT to AAB is correct, because in my case A and B are equally likely (both of them are just a letter which is typed with probability P = 1/3), whereas CO and NUT are not (CO is only 2 letters long, NUT is 3). This means that solving my chains could, or could not, help solving the original problem.

We start from the simple fact that the probability that a window of 7 letters reads COCONUT, or TUNOCOC, or anything else, is equal. (It's 26^(-7), as @Ingix said.). Crucially, it's completely independent of any letters that appear before or after the window, simply because these letters are not in the window...

Now, let's start with (b). We look at letters 1-7. If they say COCONUT, the first squirrel wins. If they say TUNOCOC (at the same probability!), the second one wins. If they say neither, we move on to letters 2-8 and do the same. At this point, letter 1 is completely irrelevant. You see how we move on and on through the letters, until there's a winner, and the chances of either squirrels to win never differ from each other.

Further clarification:

Of course, if letters 2-7 happened to be COCONU, the first squirrel is in a very good position. Similarly (and in the same probability), if they were TUNOCO, the second squirrel is in a good position. But this is symmetrical, so no one has an overall advantage.

Of course, if letters 1-7 were TUNOCOC, the chance to get COCONUT at letters 5-11 increases, but this doesn't matter since, in this instance, the game has already finished...

In other words, if we were to ask "Which of the two words are we more likely to see at letters 5-11?" (see @abl's comment below), the answer would be TUNOCOC. But the difference is only due to a situation in which the game has already finished. And this is just not the question in hand.

Part (a) is a very similar. We look at letters 1-7 of both monkeys. If the first monkey typed COCONUT, it wins. If the second one types TUNOCOC, it wins. If neither won, we move on to look at letters 2-8 of both, and so on.

$\begingroup$Except that in (b) the probabilities are not independent across windows. The probability of letters 5-11 being COCONUT, conditional to the fact that you reached this window, are lower than the equivalent for TUNOCOC, simply because you have to account for the case "TUNOCOCONUT" (in which case you wouldn't reach window 5-11). Maybe you can believe the discussion now :)$\endgroup$
– ablNov 25 '18 at 0:11

$\begingroup$@abl Sorry, but this is just not true... Are you saying that, typing randomly, one is more likely to type COCONUT than TUNOCOC? These are just two 7-letter sequences, and they're just as likely. Arguments like "consider the case X" are not really valid, because there are so many other cases to consider too. The trick is to be able to consider all cases together with a general argument.$\endgroup$
– AngkorNov 25 '18 at 13:37

4

$\begingroup$@Angkor Maybe I didn't explain myself well. Consider all possible sequences that start with "****TUNOCOC...". It's obvious that there are just as many of those as sequences of the form "****COCONUT...", i.e. both situations are equally probable, right? Note that all sequences of the form "****TUNOCOC..." are a win for squirrel 2. And all sequences of the form "****COCONUT..." are a win for squirrel 1, except for those that start with "TUNOCOCONUT..." which are a win for squirrel 2. There's an asymmetry here, so at the very least we can say that those are not "just two 7-letter sequences".$\endgroup$
– ablNov 25 '18 at 16:33

2

$\begingroup$@Angkor Point taken, I apologize if I was out of tone. Imagine that the alphabet has only two letters, A and B. The squirrels bet for AAB and BAA respectively. The first three letters are typed. Each squirrel has 1/8 chances of winning at the third letter. If none of them wins, then the sequence is one of AAA, ABA, ABB, BAB, BBA, BBB, with equal probability. At the fourth letter, the possibilities (with equal probability) are AAAA, AAAB, ABAA, ABAB, ABBA, ABBB, BABA, BABB, BBAA, BBAB, BBBA, BBBB. Squirrel 1 wins with AAAB. Squirrel 2 wins with ABAA or BBAA, twice the chance.$\endgroup$
– ablNov 25 '18 at 20:43

1

$\begingroup$@Angkor, also, this simplified case is simple enough to be simulated, which I did (see my answer) and the simulation over several million games shows the second squirrel has better chances.$\endgroup$
– ablNov 25 '18 at 20:49

Let's consider the string typed by the monkey after a long (infinite) time and make some simplifications that don't affect the outcome to understand the problem better.

Replace any letter not in COCONUT with X
Replace all groups of COC with A
Replace all remaining C's with X
Replace all remaining O's with B, N's with C, U's with D and T's with E

The problem now is:

in the new string will we more likely first find ABCDE or EDCBA?

And the solution:

If A, B, C, D and E had the same chances of occurring, the answer would be "both are equally likely", but C, D and E have normal chances of occurring, A has a very low chance of occurring, B has a slightly lower than normal chance.
Considering a simplified case - 2 letters - P and Q with P occurring more often than Q, PQ has more chances of appearing first than QP.
This means EDCBA is more likely to appear first than ABCDE.

Looking at it from a different angle:

When the unlikely event that A appears in the string, it is equally likely that it is followed by a B as it is that it is preceded by one.
In the even more unlikely event that an AB or BA appears in the string, it is equally likely that it is followed by a C as it is that it is preceded by one.
...
Which makes it equally likely that ABCDE and EDCBA appear in the string, so when an A appears in either of these sequences, it is equally likely that it starts the ABCDE sequence as it is that it ends the EDCBA sequence.

The fact that two monkeys are typing is irrelevant to the study of the sequence. They're both typing on the same computer (see title), so there is only one sequence of letters. There is no bias, both monkeys have the exact same chance of having their word appear first. Winning, in the clear text, means "having my word come out first".

We state obviously that, if less than $7$ letters are typed, both monkeys have equally $0$ chance of winning.

Let's do some recurrence reasoning, trying to prove that if both monkeys have the same chance of winning after $n$ letters are typed, they have the same chance of winning after $n+1$ letters are typed.

Let $n$ be greater than or equal to $6$. Given there is no winner after the $n$th letter is typed, and both monkeys have had the same chance of winning up until now, (which is trivially established for $n=6$) there are three cases:

And the monkeys have the same chance of winning upon the $n+1$th letter typed. This completes the proof by recurrence.

So, reading the question correctly, the one expected to type more will be the one that stops last, assuming they type at the same frequency. There are two cases. Either the first word out is COCONUT, in which case, once monkey 1 stops typing, monkey 2 has T as a prefix for his word to come out. Or the first word out is TUNOCOC, in which case, once monkey 2 stops typing, monkey 1 has COC as a prefix for his word to come out. Therefore, monkey 2 is expected to have to type more than monkey 1 in this rare case, and then same as monkey 1 once he spoils his opportunity to finish the word.
So, yes, monkey 2 is expected to type slightly more, due to these opportunities being unequal.

(b)

If "before" means "before in time", we are in the situation described in the clear text. Both squirrels have the same odds, if they stop watching after the first word appears. If they wait for both words to appear, well, this does not change anything, the word appearing first is the word appearing first.

Whereas

If "before" means "before in the lexical order", squirrel 1 wins, and the fact that a monkey is typing on a computer is irrelevant. This shows that squirrel 1 has better chances of winning anyway.