Just as we model binary data with the beta Bernoulli distribution, we
can model categorical data with the Dirichlet discrete distribution

The beta Bernoulli distribution allows us to learn the underlying
probability, \(\theta\), of the binary random variable, \(x\)

\[P(x=1) =\theta\]

\[P(x=0) = 1-\theta\]

The Dirichlet discrete distribution extends the beta Bernoulli
distribution to the case in which \(x\) can assume more than two
states

\[\forall i \in [0,1,...n] \hspace{2mm} P(x = i) = \theta_i\]

\[\sum_{i=0}^n \theta_i = 1\]

Again, the Dirichlet distribution takes advantage of the fact that the
Dirichlet distribution and the discrete distribution are conjugate. Note
that the discrete distriution is sometimes called the categorical
distribution or the multinomial distribution.