A: This is a good example to demonstrate the multinomial distributions, it's application and also introduce the concept of "conjugate priors". Lets deal with them one at a time and then revisit the problem.

The Conjugate Prior:
This concept is part of Bayesian analysis. Lets assume you have a prior which belongs to a "family" of functions say \( y = f(\theta,x) \). You get some additional information and you update your estimate such that the posterior belongs to the same family of functions. So the prior and posterior differ only in the parameters that go into the function and not in its form and structure. An example of conjugate priors is the Gaussian distribution.

The Multinomial Distribution:
This is an extension of the binomial distribution. Assume you have a bag with black, blue & green balls. The probability of drawing each one of them is \({p_{black},p_{blue},p_{green}}\) and you pull 10 balls. What is the probability of seeing 2 black balls, 6 blue balls and 2 green balls? This is given by the following formula (which can be derived)

The Dirichlet Distrubution:
The probability density of this distribution has the following form and requires a set of parameters as input. These parameters are characterized by a vector \( \{\alpha_1,\alpha_2,\alpha_3,\ldots\}\) which we shall represent as \(\boldsymbol\alpha\). So, if we have a set of probability measures \(\{p_{black},p_{blue},p_{green}\}\), a Dirichlet distribution is

And now for the kicker... the Dirichlet distribution is the conjugate prior of the multinomial distribution. This can be proved algebraically. Here is a wikipedia link describing the same. Notice, the form and structure of the two equations are one and the same and the parameters \(\boldsymbol \alpha\) can be seen as "prior" counts of categories we have seen. This is critical to understand. What we are concluding is not that we have figured out a good way to find priors, but instead if we knew some counts from earlier on the way to include them into the model is to simply treat them as prior counts!

Now, coming back to the problem. We observe 5 Lions, 2 Tigers and 1 Bear. Note, this is our first observation and we have no prior counts whatsoever. So how do we cast this to a Bayesian framework? We exploit one little piece of information that could have gotten overlooked. We implicitly know that there are Lions, Tigers & Bears in the forest. So there must be at least one observed by someone at sometime. Can we leverage this information? Absolutely! We will simply set our \(\boldsymbol \alpha\) parameter to \(\{1,1,1\}\). A frequentist approach would estimate the distribution of lions, tigers and bears as

Notice what happened? Our estimates will now tend to be a bit smoother if we were to do this experiment several times, the \(\boldsymbol \alpha\) would smooth out the estimates. How about our choice of values for \(\boldsymbol \alpha\)? We chose 1s here based on the wording of the problem. But if we want more dampening we could as well choose bigger values. This is related to the amount of confidence we have on the priors. If we chose 100s instead of 1s, it would take a lot of observations to move the estimates.

Finally, let us try out a simulation. The R code below attempts to simulate this very situation. The code does the following:

Pick the true number of Lions, Tigers and Bears. This number will change with every iteration.

Pick a random subset of Lions, Tigers and Bears as the observed. These numbers will always be lesser than the true number of Lions, Tigers and Bears.

Compute the distribution based on frequentist and Bayesian approaches described above keeping \(\boldsymbol \alpha = \{1,1,1\}\).

Compute an error statistic.

I've chosen the RMSE error, but there are better ways to compare distributions.

The RMSE is simple and quite well known, you can look it up here for more information. The lower it is, the better the estimate. The output of the above R code is a graph that charts out how close we got to the real estimate as we increased the number of iterations in the simulations. Initially the lines may be a little close to each other but for a large number of iterations, you can clearly see the Bayes estimate winning hands down. This is a powerful method with wide ranging applications and provides a good degree of robustness.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Linear Algebra (Dover Books on Mathematics)
An excellent book to own if you are looking to get into, or want to understand linear algebra. Please keep in mind that you need to have some basic mathematical background before you can use this book.

Linear Algebra Done Right (Undergraduate Texts in Mathematics)
A great book that exposes the method of proof as it used in Linear Algebra. This book is not for the beginner though. You do need some prior knowledge of the basics at least. It would be a good add-on to an existing course you are doing in Linear Algebra.

Follow @ProbabilityPuzIf you are looking to learn time series analysis, the following are some of the best books in time series analysis.

Introductory Time Series with R (Use R!)
This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid.

Econometrics
A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis.