I have a feeling that the following question might have been studied: Suppose I have a finite alphabet $A,$ with $|A| = n,$ and a string $S$ of length $N.$ A string can be said to contain a $k$-th power, if
$$S = S_1 S_2^k S_3,$$ where exponentiation means that $S_2$ is repeated $k$ times, and $S_1$ and $S_3$ might be empty. Let $P(n, N)$ be the function defined by: every string of length $N$ over an alphabet of length $n$ contains a $P(n, N)$-th power.
Now, the question(s):

What is the (asymptotic) behavior of $P(n, N)?$

Given a string $S$ as above, how hard is it to compute whether $S$ contains a $k$-th power?

1. Would $P(n,N)=1$ for all $n>1$, if you say every string. 2. Can do it through a linear scan, so $O(N)$.
–
Fei GaoMar 13 '14 at 1:50

2

@FeiGao 1. No. Suppose $n=2,$ for simplicity, with $A=\{a, b\}.$ Then if the string contains no square, then it contains no $aa$ or $bb,$ so must have the form (wlog) $S=ababab\dots,$ which means that it contains a very high power of $ab.$
–
Igor RivinMar 13 '14 at 1:58

@FeiGao Linear Scan? How do you detect powers, and where do you store them?
–
Igor RivinMar 13 '14 at 1:59

6

If I understand your definition of $P(n,N)$ then since there exist infinite cubefree words over a binary alphabet (see for example the Thue-Morse sequence), one should have $P(2,N) = 2$ when $N\geq 4$. And since the Thue-Morse sequence could be viewed as a word over any $n$-ary alphabet but only using 2 symbols, shouldn't this mean $P(n,N)\leq 2 \forall n\geq 2,N$? Or did I misread what you are asking?
–
ARupinskiMar 13 '14 at 2:03

Interesting! I wonder why this is a useful thing to do in the DNA setting...
–
Igor RivinMar 13 '14 at 13:57

The most common terms for squares or powers in DNA seem to be tandem repeats, microsatellites, or minisatellites, according to the length of the period, and they appear to be useful for many things, like determining parentage or diagnosing Huntington's disease...
–
Jan KynclMar 14 '14 at 0:02

Theorem. For a random binary word of length $n$, the expected number of $h$th powers is
$$
\sim \frac{n}{2^{h-1}-1}.
$$
Proof. A basic event about occurrences of powers of a word in a binary word is
$$
S_{i,j,h} = \{w\in \{0,1\}^n \mid \text{$w$ has an $h$th power of length $h\cdot i$ ending in position $j$}\}
$$
$$
= \{w \mid w = av^hb, |av^h|=j, |v|=i\}
$$
We let $p$ denote the probability of 1 as opposed to 0; namely $p=1/2$. We may assume $w$ has odd length $n = 2k-1$.
Then $\mathbb P(S_{i,j,h}) = p^{(h-1)i}$, and the ranges for the variables are
$$
h\cdot i\le j\le 2k-1 = n
$$
Let $W_{n}$ be a uniformly distributed random word of length $n$, and let $s^{(c)}(w)$ be the number of $c$th powers in the word $w$.
So
$$
\mathbb E \sum_{j=hi}^{2k-1} \mathbf{1}_{S_{i,j,h}} = (2k-hi)p^{(h-1)i}
$$
is the expected number of $h$th powers of length $i$, for $hi\le 2k-1$.
Then
$$
\mathbb E s^{(h)}(W_{2k-1}) = \sum_{i=1}^{\lfloor(2k-1)/h\rfloor}(2k-hi)p^{(h-1)i}
$$
is the expected number of $h$th powers in a word of length $n=2k-1$.
Let us take $n\rightarrow\infty$ and divide by $n$; we get
$$ \sum_{i=1}^\infty p^{(h-1)i} = \frac{p^{h-1}}{1-p^{h-1}} = \frac1{2^{h-1}-1}
$$

Corollary. The expected number of squares, cubes, and 4th powers is $\sim n$, $n/3$, $n/7$, respectively.

The total number of nontrivial powers (squares and higher) overall will be something like
$$
n\cdot \sum_{h=2}^\infty \frac1{2^{h-1}-1}
$$
which is a Lambert series with value $\approx 1.606695n$.

The comments have covered the bulk of the behaviour of P(n,N) for nontrivial values of n, showing that only when n=1 should P have values exceeding 2 (or n at most 2 to get a value more than 1). A sorted suffix tree might help in improving the time bound suggested for searching for k powers, but the problem is clearly solvable in low degree polynomial time.

Regarding random strings, it is easiest to analyze powers of single letters as words of length L can
be considered as letters in an alphabet of size n^L. The proportion of words of length N having a kth power in the first k letters is seen to be n^{1-k}, and although the probabilities of having a kth power of length k
starting at the cth character of a string of length N are not independent, the actual probability of having
a kth power in a string of length N greater than k is close to (N +1-k)n^{1-k} for small values of N, say N at most n^{k/2}.