Let $w=a_1a_2a_3...$ be an infinite word over a finite alphabet and $\epsilon>0$. Do there exist integers $n,k$ such that $\frac{d(a_1a_2...a_n,a_{k+1}a_{k+2}...a_{k+n})}{n}<\epsilon$ ?
($d(u,v)$ is the hamming distence)

$\begingroup$Off hand the following should be a counterexample for small enough $\varepsilon$: Consider the sequence 1 "a", 2 random symbols "b" or "c" independent with probability $1/2$ each, 4 "a", 8 random "b,c", 16 "a", 32 random "b,c", etc. It looks like with positive probability it works for all $n\ge n_0$ with some $n_0$ because $k>16n$ or so are definitely useless. Now replace the first $n_0$ symbols by some unique ones and your counterexample is ready.$\endgroup$
– fedjaMay 14 at 2:27

$\begingroup$If we ignore small $n$, the same counterexample should work. If we understand the question literally, then $a_1$ cannot repeat, so we don't have much choice.$\endgroup$
– fedjaMay 14 at 10:45

1 Answer
1

The infinite word will be $xU_1Q_2U_3Q_4U_5Q_6\dots$ where $U_m$ is the finite word consisting of $m$ symbols $u$ and $Q_m$ is the random word consisting of $m$ symbols each of which is $b$ or $c$ with probability $1/2$ with the convention that the choices of symbols at different positions are independent. So you get something like
$$
xubcuuubcbbuuuuucbbccbuuuuuuucccbcbbb\dots
$$

It is easy to check (see the discussion here) that as $n\to\infty$, the string $a_1a_2\dots a_n$ contains one symbol $x$ ($a_1=x$), $\frac n2+O(\sqrt n)$ symbols $u$ and $\frac n2+O(\sqrt n)$ symbols each of which is $b$ or $c$.

Now suppose that $n$ is large enough and $k>n^2$. Then the $u$'s in the word $a_{k+1}a_{k+2}\dots a_{k+n}$ form a single block and the non-$u$'s form another block. One of these blocks has length $\ell \ge n/2$. However, the corresponding block in $a_1a_2\dots a_n$ is occupied by $\frac \ell 2+O(\sqrt n)$ symbols $u$ and $\frac \ell 2+O(\sqrt n)$ symbols that are not $u$, so the Hamming distance in question is at least
$\frac \ell 2+O(\sqrt n)\ge \frac n4+O(\sqrt n)\ge \frac n5$ if $n\ge n_0$.

Thus we need to look only at $k\le n^2$ for large $n$. We have $\frac n2+O(\sqrt n)$ random symbols in $a_1a_2\dots a_n$ and, for fixed $k\ge 1$, the probability that each of them is matched in $a_{k+1}a_{k+2}\dots a_{k+n}$ is $0$ or $1/2$, the corresponding events being independent. Thus, the chance that we have at least $\frac n3$ matchings instead of expected $\le\frac n4$ is at most $Ce^{-cn}$ by the Bernstein (a.k.a. Chernov, Hoeffding, etc.) bound. Since the series $\sum_n Cn^2e^{-cn}$ converges, we conclude that with probability close to $1$, the Hamming distance in question is at least $\frac n6+O(\sqrt n)>\frac n7$ for all $n\ge n_0$, $k\le n^2$.

Finally, due to the uniqueness of $x$ in the word, the Hamming distance is always at least $1$, so the ratio in question is never less than $\min(\frac 17,\frac 1{n_0})$.

I hope it is clearer now but feel free to ask questions if something is still confusing.

By the way, the word "conjecture" means "a statement supported by extensive circumstantial evidence and several rigorous partial results", not "something that just came to my head" or "something I want to be true", so, since you put it in the title, I wonder what positive results you can prove here.

$\begingroup$Thank you. What happen if the alphabet consist of $2$ symbols?. That conjecture is from a simple problem:Let $w=a_1a_2...$ be an binary infinite words and $N>0$. Then there exit $n,k$ such that $d(a_1...a_n,a_{k+1}...a_{k+n})<\frac{n}{2}−N$ or $d(a_1...a_n,a_{k+1}...a_{k+n})>\frac{n}{2}+N$. I wonder that can we put $d(a_1...a_n,a_{k+1}...a_{k+n})$ be smaller (being larger is impossible with $w=000...$).$\endgroup$
– Phan Quốc VượngMay 16 at 4:47

$\begingroup$@PhanQuốcVượng If you care only about sufficiently large $n$, you can emulate the 3-letter alphabet by putting $u=00001111, b=00110011, c=01010101$, say (the key feature is that if $8$ does not divide $k$, you have at least 1 discrepancy in every octuplet and if it does, then you can just think of u,b,c as single symbols.$\endgroup$
– fedjaMay 16 at 6:41