5.Capture-recapture methods are sometimes used to estimate population sizes. A standard image is that a pond contains $N$ fish for some fixed but unknown $N$, and that $G$ of the $N$ fish have been captured, tagged, and returned alive to the pond. You can assume that $G/N$ isn’t close to 0.

In the recapture phase, assume that a simple random sample of $n$ fish is drawn from the $N$ fish in the pond (you might have to use some imagination to believe this assumption). We can observe $X$, the random number of tagged fish in the sample.

The goal is to use the observation to estimate $N$.

(a) For large $n$, the sample proportion $X/n$ is likely to be close to a constant. Identify the constant and hence construct an estimate of $N$ based on $X$.

Later in this exercise you will see how your estimate is related to the MLE of $N$.

(c) Find the likelihood ratio $R(N) = \frac{lik(N)}{lik(N-1)}$ for $N > n$. Simplify the answer as much as you can.

(d) Find the maximum likelihood estimate of $N$ by comparing the likelihood ratios and 1. How does the MLE compare with your estimate in (a)?

6.
Show that if $r > 1$ and $s > 1$ then the mode of the beta $(r, s)$ distribution is $(r-1)/(r+s-2)$. Remember to ignore multiplicative constants and take the log before maximizing.

7.
Suppose that $X$ has the beta $(r, s)$ distribution, and that given $X=p$, the conditional distribution of $H$ is binomial $(10, p)$. Find

(a) the conditional distribution of $X$ given $H = 7$

(b) $E(X \mid H = 7)$

(c) the MAP estimate of $X$ given $H = 7$

(d) $P(H = 7)$

(e) $E(H)$

8.
The chance of heads of a random coin is picked according to the beta $(r, s)$ distribution. The coin is tossed repeatedly.

(a) What is the chance that the first three tosses are heads and the next three tosses are tails?

(b) Given that the first three tosses are heads and the next three tosses are tails, what is the chance that the seventh toss is a head?

(c) Given that three out of the first six tosses are heads, what is the chance that the seventh toss is a head? Compare with the answer to (b).

9.
Person A creates a coin by picking its chance of heads uniformly on $(0, 1)$. In three tosses of that coin, Person A gets two heads.

Independently of Person A, Person B creates a coin by picking its chance of heads uniformly on $(0, 1)$. In three tosses of that coin, Person B gets one head.

(a) Given the data, what is the distribution of the chance of heads of Person A’s coin?

(b) Given the data, what is the distribution of the chance of heads of Person B’s coin?

(c) Given the data, what is the probability that Person A’s coin has a higher chance of heads than Person B’s coin?

10: Markov and Chebyshev Bounds on the Poisson-Binomial Upper Tail.
For $j \ge 1$ let $I_j$ be independent indicators such that $P(I_j = 1) = p_j$. Let $X = I_1 + I_2 + \ldots + I_n$. Then $X$ is the number of successes in $n$ independent trials that are not necessarily identically distributed.

We say that $X$ has the Poisson-binomial distribution with parameters $p_1, p_2, \ldots, p_n$. The binomial is the special case when all the $p_j$’s are equal.

You saw in lab that the number of occupied tables in a Chinese Restaurant process had a Poisson-Binomial distribution. These distributions arise in statistical learning theory, the theory of randomized algorithms, and other areas.

Let $E(X) = \mu$. For $c > 0$, you are going to find an upper bound on $P(X \ge (1+c)\mu)$. That’s the chance that $X$ exceeds its mean by some percent.

In the special case of the binomial, $\mu = np$ and so $P(X \ge (1+c)\mu)$ can be rewritten as $P(\frac{X}{n} - p \ge cp)$. That’s the chance that the sample proportion exceeds $p$ by some percent.

(d) If all the $p_j$’s are equal to $p$, what is the value of the bound in (c)?

11: Chernoff Bound on Poisson-Binomial Upper Tail.
This exercise continues the previous one and uses the same notation.

(a) Show that the mgf of $I_j$ is given by $M_{I_j}(t) = 1 + p_j(e^t - 1)$ for all $t$.

(b) Use (a) to derive an expression for $M_X(t)$, the mgf of $X$ evaluated at $t$.

(c) An useful exponential bound is that $e^x \ge 1 + x$ for all $x$. You don’t have to show it but please look at the graphs. Use the fact to show that $M_X(t) \le \exp\big{(}\mu(e^t -1)\big{)}$ for all $t$. Notice that the right hand side is the mgf of a Poisson random variable that has the same mean as $X$.

(d) Use Chernoff’s method and the bound in (c) to show that

Remember that $\mu = np$ when all the $p_j$’s are equal. If $g(c) = \exp(c)/(1+c)^{1+c}$ is small then the bound above will decrease exponentially as $n$ gets large. That is the focus of the next exercise.

12: Simplified Chernoff Bounds on Poisson-Binomial Upper Tail. This exercise continues the previous one and uses the same notation.

The bound in the previous exercise is a bit complicated. Often, simpler versions are used because they are good enough even though they are weaker.

(a) It is not hard to show that $\log(1+c) \ge \frac{2c}{2+c}$ for $c > 0$. You don’t have to show it but please look at the graphs.
Use the fact to show that $c - (1+c)\log(1+c) \le -\frac{c^2}{2+c}$ for $c > 0$.

(b) Show that if $X$ has a Poisson-binomial distribution with mean $\mu$ then

(c) A simpler but weaker version of the bound in (b) is also often used. Show that

(a) Find the least squares predictor of $S$ based on $X_1$, and find the mean squared error (MSE) of the predictor.

(b) Find the least squares predictor of $X_1$ based on $S$, and find the MSE of the predictor. Is the predictor a linear function of $S$? If so, it must also be the best among all linear predictors based on $S$, which is commonly known as the regression predictor.

15.
A $p$-coin is tossed repeatedly. Let $W_{H}$ be the number of tosses till the first head appears, and $W_{HH}$ the number of tosses till two consecutive heads appear.

(a) Describe a random variable $X$ that depends only on the tosses after $W_H$ and satisfies $W_{HH} = W_H + X$.

(d) Let $\mathbf{c}$ be an $m \times 1$ vector of real numbers and let $W = \mathbf{c}^T\mathbf{V}$ for $\mathbf{V}$ defined in Part (c). In terms of $\mathbf{c}$, $\boldsymbol{\mu}_\mathbf{V}$ and $\boldsymbol{\Sigma}_\mathbf{V}$, find $E(W)$ and $Var(W)$.

20.
Let $X$ and $Y$ be standard bivariate normal with correlation $\rho$. Find $E(\max(X, Y))$. The easiest way is to use the fact that for any two numbers $a$ and $b$, $\max(a, b) = (a + b + \vert a - b \vert)/2$. Check the fact first, and then use it.

(b) Find the least squares predictor of $X$ based on $S$ and provide its mean squared error.

(c) Find the least squares linear predictor of $X$ based on $S$ and provide its mean squared error.

22.
Let $\mathbf{X}$ be a $p \times 1$ random vector and suppose we are trying to predict a random variable $Y$ by a linear function of $\mathbf{X}$. A section of the textbook identifies the least squares linear predictor by restricting the search to linear functions $h(\mathbf{X})$ for which $E(h(\mathbf{X})) = \mu_Y$. Show that this is a legitimate move.

Specifically, let $\hat{Y}_1 = \mathbf{c}^T \mathbf{X} + d$ be a linear predictor such that $E(\hat{Y}_1) \ne \mu_Y$. Find a non-zero constant $k$ such that $\hat{Y}_2 = \hat{Y}_1 + k$ satisfies $E(\hat{Y}_2) = \mu_Y$. Then show that $MSE(\hat{Y}_1) \ge MSE(\hat{Y}_2)$. This will show that the least squares linear predictor has to have the same mean as $Y$.

Let $f_{S_n}$ be the density of $S_n$. The formula for $f_{S_n}$ is piecewise polynomial on the possible values $(0, n)$. In this problem we will just focus on the density on the interval $(0, 1)$ and discover a nice consequence.

(a) For $0 < x < 1$, find $f_{S_2}(x)$.

(b) Use Part (a) and the convolution formula to find $f_{S_3}(x)$ for $0 < x < 1$.

(c) Guess a formula for $f_{S_n}(x)$ for $0 < x < 1$ and prove it by induction.

(b) Now assume in addition that $X_1, X_2, \ldots, X_n$ are i.i.d. normal $(\mu, \sigma^2)$. What is the joint distribution of $\bar{X}, D_1, D_2, \ldots, D_{n-1}$? Explain why $D_n$ isn’t on the list.

(c) True or false (justify your answer): The sample mean and sample variance of an i.i.d. normal sample are independent of each other.

25: Normal Sample Mean and Sample Variance, Part 2

(a) Let $R$ have the chi-squared distribution with $n$ degrees of freedom. What is the mgf of $R$?

(b)
For $R$ as in Part (a), suppose
$R = V + W$ where $V$ and $W$ are independent and $V$ has the chi-squared
distribution with $m < n$ degrees of freedom. Can you identify the distribution of $W$? Justify your answer.