Also it looks like for $k$ large that $\lim_n f(n,k) \approx \frac{2k-2}{2k-1}$.
–
Steve HuntsmanJan 9 '10 at 23:01

Timothy, your question looks very close to the asymptotics I want as well (mathoverflow.net/questions/25661 although I give a link rather than the full derivation). Could your/Tony's problem be related to the Erd\H{o}s name?!
–
Wadim ZudilinJun 28 '10 at 10:47

Steve Huntsman's .721 is correct as far as it goes, but you get only one more decimal place before the oscillation about $1 / \log 4$ sets in. The guess $$(2k-2)/(2k-1) = 1 - 1/2k - 1/4k^2 - 1/8k^3 - \cdots$$ is correct only through the $k^{-1}$ term; the correct expansion (any partial sum of which is elementary) is $$1 - 1/2k - 1/12k^2 - 1/24k^3 - 19/720k^4 - \cdots.$$ See my answer below for more.
–
Noam D. ElkiesNov 27 '11 at 2:19

More precisely, there is an asymptotic expansion
$$
a_n \sim a_\infty + \epsilon^{\phantom.}_0(\log_q n)
+ n^{-1} \epsilon^{\phantom.}_1(\log_q n)
+ n^{-2} \epsilon^{\phantom.}_2(\log_q n)
+ n^{-3} \epsilon^{\phantom.}_3(\log_q n)
+ \cdots,
$$
where each $\epsilon^{\phantom.}_j$ is smooth function of period $1$
whose average over ${\bf R} / {\bf Z}$ vanishes, and
— while the series need not converge —
truncating it before the term $n^{-j} \epsilon^{\phantom.}_j(\log_q n)$
yields an approximation good to within $O(n^{-j})$. The first few
$\epsilon^{\phantom.}_j$ still have exponentially small amplitudes,
but larger that of $\epsilon^{\phantom.}_0$ by a factor
$\sim C_j k^{2j}$ for some $C_j > 0$; for instance, the amplitude of
$\epsilon^{\phantom.}_1$ exceeds that of $\epsilon^{\phantom.}_0$
by about $2(\pi / \log q)^2 \sim 2 \pi^2 k^2$. So $a_n$ must be
computed up to a somewhat large multiple of $k^2$ before it becomes
experimentally plausible that the residual oscillation $a_n - a_\infty$
won't tend to zero in the limit as $n \rightarrow \infty$.

Here's a plot that shows $a_n$ for $k=2$ (so also $q=2$) and
$2^6 \leq n \leq 2^{13}$, and compares with the periodic approximation
$a_\infty + \epsilon^{\phantom.}_0(\log_q n)$ and the refined approximation
$a_\infty + \sum_{j=0}^2 n^{-j} \epsilon^{\phantom.}_j(\log_q n)$.
(See http://math.harvard.edu/~elkies/mo11255+.pdf for
the original PDF plot, which can be "zoomed in" to view details.) The
horizontal coordinate is $\log_2 n$; the vertical coordinate is centered at
$a_\infty = 1/(2 \log 2)$, showing also the lines $a_\infty \pm 2|a_1|$;
black cross-hairs, eventually merging visually into a continuous curve,
show the numerical values of $a_n$; and the red and green contours
show the smooth approximations.

To obtain this asymptotic expansion, we start by generalizing
R.Barton's formula from $k=2$ to arbitrary $k>1$:
$$
a_n = \frac1k \sum_{r=0}^\infty \phantom. n q^{-r} (1-q^{-r})^{n-1}.
$$
[The proof is the same, but note the exponent $n$ has been corrected
to $n-1$ since we want $n-1$ players eliminated at the $r$-th step,
not all $n$; this does not affect the limiting behavior
$a_\infty+\epsilon^{\phantom.}_0(\log_q n)$, but is needed to get
$\epsilon^{\phantom.}_m$ right for $m>1$.] We would like to approximate
the sum by an integral, which can be evaluated by the change of variable
$q^{-r} = z$:
$$
\frac1k \int_{r=0}^\infty \phantom. n q^{-r} (1-q^{-r})^{n-1}
= \frac1{k \log q} \int_0^1 \phantom. n (1-z)^{n-1} dz
= \left[-a_\infty(1-z)^n\right]_{z=0}^1 = a_\infty.
$$
But it takes some effort to get at the error in this approximation.

We start with the main term, for $j=0$,
which is the only one that does not decay with $n$. Define
$$
\varphi_0(x) := q^x \exp(-q^x),
$$
which as Reid observed decays rapidly both as $x \rightarrow \infty$
and as $x \rightarrow -\infty$. Our zeroth-order approximation to $a_n$ is
$$
\frac1k \sum_{r=0}^\infty \phantom. \varphi_0(\log_q(n)-r),
$$
which as $n \rightarrow \infty$ rapidly nears
$$
\frac1k \sum_{r=-\infty}^\infty \varphi_0(\log_q(n)-r).
$$
For $k=q=2$, this is equivalent with Reid's formula for $R(n)$,
even though he wrote it in terms of the fractional part of $\log_2(n)$,
because the sum is clearly invariant under translation of $\log_q(n)$
by integers.

The functions $\epsilon^{\phantom.}_j$ appearing in the further terms
$n^{-j} \epsilon^{\phantom.}_j(\log_q(n))$
of the asymptotic expansion of $a_n$ are defined similarly by
$$
\epsilon^{\phantom.}_j(t) =
\frac1k \sum_{r\in\bf Z} \phantom. \varphi_j(t+r),
$$
where
$$
\varphi_j(x) = P_j(q^x) \varphi_0(x) = P_j(q^x) q^x \exp(-q^x)
$$
and $P_j$ is the coefficient of $n^{-j}$ in our power series
$$
(1-q^r)^{n-1} = \exp(-nq^{-r}) \phantom.
\sum_{j=0}^\infty \frac{P_j(nq^{-r})}{n^j}.
$$
Thus
$
P_0(u)=1, \phantom+
P_1(u) = -(u^2-2u)/2, \phantom+
P_2(u) = (3u^4-20u^3+24u^2)/24
$, etc.
Again we apply Poisson to expand $\epsilon^{\phantom.}_j(\log_q(n))$
in a Fourier series:
$$
\epsilon^{\phantom.}_j(t)
= \frac1k \sum_{m \in \bf Z} \hat\varphi_j(-m) e^{2\pi i m t},
$$
and evaluate the Fourier transform $\hat\varphi_j$ by integrating
with respect to $q^t$. This yields a linear combination of Gamma
integrals evaluated at $1 + (2\pi i y / \log q) + j'$ for
integers $j' \in [j,2j]$, giving $\hat\varphi_j$ as a
degree-$2j$ polynomial multiple of $\hat\varphi_0$. The first case is
$$
\begin{eqnarray*}
\hat\varphi_1(y)
&=& \frac1{\log q} \left[
\Gamma\Bigl(2 + \frac{2 \pi i y} {\log q}\Bigr)
- \frac12 \Gamma\Bigl(3 + \frac{2 \pi i y} {\log q}\Bigr)
\right]
\\
&=& \frac1{\log q} \frac{\pi y}{\log q} \left(\frac{2 \pi y}{\log q} - i\right)
\phantom.
\Gamma\Bigl(1 + \frac{2 \pi i y} {\log q}\Bigr)
\\
&=& \frac{\pi y}{\log q} \left(\frac{2 \pi y}{\log q} - i\right)
\phantom. \hat\varphi_0(y).
\end{eqnarray*}
$$
Note that $\varphi_1(0) = 0$, so the constant coefficient of the
Fourier series for $\epsilon^{\phantom.}_1(t)$ vanishes;
this is indeed true of $\epsilon^{\phantom.}_j(t)$ for each $j>0$,
because $\hat\varphi_j(0) = \int_{-\infty}^\infty \phi_j(x) \phantom. dx$
is the $n^{-j}$ coefficient of a power series in $n^{-1}$ that we've
already identified with the constant $a_\infty$. Hence (as can also
be seen in the plot above) none of the decaying corrections
$n^{-j} \epsilon^{\phantom.}_j(\log_q n)$ biases the average of $a_n$
away from $a_\infty$, even when $n$ is small enough that
those corrections are a substantial fraction of the
residual oscillation $\epsilon_0(\log_q n)$. This leaves
$\hat\varphi_j(\mp1) e^{\pm 2 \pi i t} / k$ as the leading terms
in the expansion of each $\epsilon^{\phantom.}_j(t)$, so we see as promised
that $\epsilon^{\phantom.}_j$ is still exponentially small but with
an extra factor whose leading term is a multiple of $(2\pi / \log q)^{2j}$.

I suspect the limit doesn't exist, believe it or not. It seems like it should! But plotting $f(n,2)$ for n from, say, 30 to 2000, it looks like we have

$$ f(n) = C + \omega(n) + o(1) $$

where $C \approx 0.721347$ and $\omega(n)$ is a certain function which is "log-periodic" in the sense that $\omega(n) = \omega(2n)$. $\omega(n)$ never gets bigger than about $8 \times 10^{-6}$, so you have to look pretty closely to see it.

This sort of very small log-periodic fluctuation appears in certain questions from the mathematics of analysis of algorithms. For example, the mean length of the longest run in a random binary word of length $n$ is $\log_2 n + C + P(\log_2 n) + O(n^{-1/2} \log^2 n)$, where $P$ is a continuous periodic function of period 1; see Flajolet and Sedgewick, Analytic Combinatorics, Proposition V.1. I've also seen similar results for problems involving binary trees.

Something similar seems to be true for $f(n,3)$, except that the log-periodic fluctuations appear to be on the order of $10^{-9}$ and the fluctuations are ``faster''; if we let $\omega$ denote the log-periodic function as above, then we have something like $\omega(n) = \omega(1.5 n)$ (I write 1.5, not $3/2$, because I am not at all confident that the relevant constant is $3/2$; I'm not doing the computations with enough precision to be sure.)

where s = the fractional part of $\log_2 n$. (Note the terms of the sum decay rapidly both as $k \to \infty$ and $k \to -\infty$.)

OK, here is the argument.

For n ≥ 2, f(n, 2) is equal to the average value over all subsets S of {1, ..., n} of f(|S|, 2) (including S = {1, ..., n}). So we may imagine a process where we start with {1, ..., n} and at each step we pick a subset uniformly at random of our current subset. f(n, 2) is the probability that the last time before we hit the empty set, our set contained just one element.

Consider what happens to each element of {1, ..., n} in this process. At each stage, if it is still in our subset, we keep it with probability 1/2 and throw it out with probability 1/2. So, the probability that it remains after r steps is 2-r. Thus we have n independent variables X1, ..., Xn, each with this exponential distribution, and what we seek is the probability that among them there is a unique maximum.

We may express this probability as a sum over the value of this maximum r. The probability of this occurring for any given r is

From here the derivation is not entirely rigorous, but as $n$ increases, this value is roughly

$$\frac 12 2^{-u} e^{-2^{-u}}.$$

Now u is varying over values of the form $r - \log_2 n$ for $r$ a nonnegative integer, so -u ranges over values of the form $k + s$, s = the fractional part of $\log_2 n$, for $-\infty < k \le \lfloor \log_2 n \rfloor$ an integer. As n goes to ∞, we obtain the entire sum shown at the top.

But the exponent of $1-2^{-r} = 1 - 2^{-u}/n$ should be $n-1$, not $n$, right? I checked this for small $n$; it doesn't change the asymptotic oscillation but is needed to account for the discrepancy between the limiting oscillation and the numerical values for finite $n$.
–
Noam D. ElkiesNov 13 '11 at 20:28

This is not rigorous, but I think it justifies Michael Lugo's observations that f diverges and the behavior is asymptotically log periodic.

To make the recursion more intuitive, note that f(n,k) is a weighted average of f(1,k)...f(n-1,k). The weight is roughly Gaussian, and it is concentrated near the mode of the binomial expansion of (a + b)^n where a=1/k and b=1-1/k, which happens at about the a^[n/k]b^(n-[n/k]) term, so the terms near f(n-[n/k],k) are heavily weighted.

The weight drops off rapidly away from the mode m=n-[n/k] as a normal distribution with standard deviation about sqrt(1/k (1-1/k))*sqrt(n).

Intuitively, the behavior should be similar to a continuous function satisfying

Fix t large. For x even larger, f(x) can then be expressed as a weighted average (convolution) of f on any interval from [t,t*k/(k-1)]. If you multiply x by k/k-1, then you convolve with a slightly wider function, but it's wider by an exponentially decreasing amount. Thus, the support doesn't spread out to cover all of [t,t*k/(k-1)], and it doesn't become invariant under the remainder when you divide log x by log k/(k-1).

That just says that some initial conditions would produce oscillations. It doesn't prove that these initial conditions would.

For t small, the interval [m +- c sqrt(x)] may wrap around [t,t*k/(k-1)], which explains why f almost becomes constant.

Did this come from a backgammon problem, k=6, and you are rolling n dice and then rerolling when you don't get a 6? If it's the Tony Lezard from backgammon, please say hi to him for me.
–
Douglas ZareJan 10 '10 at 1:10

Douglas, hi, hope you're well. Not a backgammon problem this time but one arising out of a playground game my son was playing recently: n children each select at random one of the k corners of the playground and stand in it. A teacher randomly selects one of the corners and all the children in that corner are eliminated from the game. The game continues in this manner until all children are eliminated or there is a unique winner remaining. f(n,k) is the probability that the game will have a winner. In the actual example k was 4 and n was about 30, but I became interested in the asymptotic behaviour as n grows. I'm surprised that the consensus seems to be that f diverges. Plotting f(n,k) out to about n=1000 shows the log-periodic nature of f but the amplitude diminishes and it looks like it's settling down towards a limit. Hardly rigorous of course!