The Fibonacci word is the limit of the sequence of words starting with "0" and satisfying rules $0 \to 01, 1 \to 0$. It's equivalent to have initial conditions $S_0 = 0, S_1 = 01$ and then recursion $S_n= S_{n-1}S_{n-2}$.

I want to know what words cannot appear as subwords in the limit $S_\infty$. At first I thought 000 and 11 were the only two that could not appear. Then I noticed 010101. Is there any characterization of which words can or cannot appear as subwords of the Fibonacci word?

Loosely related, this word appears as the cut sequence of the line of slope $\phi = (1 + \sqrt{5})/2$ though the origin.

6 Answers
6

The Fibonacci word is one of the Sturmian words, so its complexity is $n+1$, that is the number of different subwords of length $n$ is $n+1$. So most words are not subwords of the Fibonacci word. There are, as far as I remember 12 different but equivalent definitions of Sturmian words. Some of them give restrictions on possible subwords (see Algebraic combinatorics on words by Lothaire, and an article by Berstel there).

A proof that the Fibonacci word's subword complexity is p_f(n)=n+1 can be found in Section 10.5 of Allouche and Shallit's Automatic Sequences (2003, Cambridge University Press).
–
Joel Reyes NocheApr 4 '11 at 6:24

The easiest way (linear-time, computationally speaking) to determine whether a finite word $w$ is a factor (a subword) of the Fibonacci word $S_\infty$ is the following:

Remove a trailing 0 from $w$, if present (just one); if $w$ begins with 1, add a leading 0;

The word thus obtained should be uniquely parsed with (written as a concatenation of) 0 and 01; if not, then $w$ is not a factor of $S_\infty$ and you are done. If $w=x_1x_2\cdots x_k$ is such a parsing, let $y_i=0$ for all $i$ such that $x_i=01$, and $y_i=1$ otherwise (that is, if $x_i=0$).

Apply the same algorithm on the new word $w'=y_1\cdots y_k$

The original $w$ is a factor of $S_\infty$ if and only if you eventually reach the word 0 or 1 by recursively applying the above procedure.

Correctness can be easily proved, as the Fibonacci word is the limit of the substitution $0\to 01$, $1\to 0$ (folklore, see e.g. Lothaire's Algebraic combinatorics on words).
For instance, $w=1010010010100$ is a factor since the sequence of words generated by the above algorithm is:
$$w,\: 00101001,\: 10010,\: 010,\: 0\;.$$

If you need a more dynamical point of view, Sturmian shifts (such as Fibonacci) are neither of finite type nor sofic. However, it is not hard to get the list of minimal forbidden factors of the Fibonacci word, as follows.
Let $S_i'$ be the $i$-th palindromic prefix of $S_\infty$, which you can obtain by removing the last two characters in $S_i$. Then a finite word $w$ is a factor of the Fibonacci word if and only if it does not contain any of the following as factors, for all $k\geq 1$: $$1S_{2k-1}'1,\quad 0S_{2k}'0\;.$$
In other terms, the sequence of minimal forbidden factors is 11, 000, 10101, 00100100, 1010010100101, …
See for instance Mignosi et al., Words and forbidden factors

I don't know if there is a simple characterization, but it seems there is a simple algorithm. See Bartosz Walczak, A simple representation of subwords of the Fibonacci word, available at http://tcs.uj.edu.pl/~walczak/fibonacci.pdf

The Fibonacci sequence is a quasiperiodic sequence. Quasiperiodic sequences (and more generally: quasiperiodic lattices) can be generated e.g. by the "strip projection method". Roughly speaking: Take the lattice $\mathbb{Z}^2$ and translate the unit cube $[0,1]^2$ along the line through the origin with irrational scope, thus getting a strip. Consider all edges of $\mathbb{Z}^2$ which lay inside the strip. Then the sequence of vertical and horizontal edges is a quasiperiodic sequence. Please cf. e.g. http://arxiv.org/pdf/cond-mat/9903010v1.pdf, fig.4.2 for the case of a Fibonacci sequence (slope = golden ratio 0.618...).

Thus, to test whether a word is a subword of the Fibonacci word, just "lift" it to a path in the edges graph of $\mathbb{Z}^2$ and test whether it is contained in a strip of golden ration slope and corresponding widths (e.g. by projecting orthogonally).

You can even expand this method to get the (asymptotic) frequence of the subword in the Fibonacci word, which is proportional to the difference of the length of the orthogonal projection of the subword to the length of the projection of the strip (i.e. = length of projection of the unit square by definition). The easiest case are of course the two building plots of the sequence, which lift to a vertical resp. horizontal edge of the lattice. Thus the ration of the occurence frequency is again the golden ration. But you can immediately calculate in the same way the of the frequency of any subword.

There are already some excellent answers to this question, but as you may now appreciate the Fibonacci word is highly structured and has a diversity of interesting properties which restrict its class of subwords. I noticed that none of the answers so far addresses directly the fact that the Fibonacci word -- like all Sturmian words -- is balanced. To say that a word is balanced means that for any pair of subwords of equal length, the number of zeros in each subword must either be equal to one another, or differ by exactly one. The simplest unbalanced word is 0011, because the number of zeros in the subwords 00 and 11 are respectively two and zero. Since there are only $O(n^3)$ balanced words of length $n$ this property quite severely restricts the number of allowed subwords.

The examples of forbidden words which you list -- 000, 11 and 010101 -- are themselves balanced words, but the balance rule nonetheless prevents them from occurring in the Fibonacci word. To see this we can argue as follows. Since 101 occurs in the Fibonacci word it cannot contain 000, because the number of zeros in these two subwords would differ by two. Similarly because it contains 00 it cannot contain 11, and because it contains 00100 it cannot contain 10101.

If an infinite word is balanced then the ratio of zeros to ones in its subwords of length $n$ converges rapidly and uniformly to a constant ratio in the limit as $n \to \infty$. In the case of the Fibonacci word this limit is the golden ratio. From a dynamical perspective this convergence is due to the unique ergodicity of the associated symbolic dynamical system, but it also has strong connections to continued fractions: you will notice that the continued fraction approximations to the golden ratio arise as ratios of zeroes to ones in certain subwords of the Fibonacci word. The excellent books by Lothaire (Algebraic combinatorics on words, mentioned above) and by Fogg (Substitutions in dynamics, arithmetics and combinatorics) are good references for the Fibonacci word and for Sturmian and balanced words more generally.

There is a simple algorithm to determine the set of words that do not appear as factors (subwords) in the Fibonacci word $f$ (which comes essentially from the characterization Ale De Luca pointed out in his question).

Take the Fibonacci sequence $(F_n)_{n\geq 0}=1,1,2,3,5,8,\ldots$. For every $n$ larger than $1$ take the word of length $F_n-2$ appearing at the beginning of the Fibonacci word $f$ (i.e., the prefix of $f$ of length $F_n-2$). This gives you the sequence of words $\lambda,0,010,010010,\ldots$ Now, for each of these words $v$, if $v$ is followed by symbol $1$ then take $0v0$; otherwise, if $v$ is followed by symbol $0$, then take $1v1$. This gives you the set of words of minimal length that do not appear as factors in $f$, that is, the minimal forbidden factors of $f$: $11,000,10101,00100100,\ldots$

The words that do not appear as subwords in the Fibonacci word $f$ are precisely the words over $\{0,1\}$ that contain a minimal forbidden factor.