This post describes two approaches for deriving the Expected Lower Bound (ELBO) used in variational inference. Let us begin with a little bit of motivation.
Consider a probabilistic model where we are interested in maximizing the marginal likelihood $p(X)$ for which direct optimization is difficult, but optimizing complete-data likelihood $p(X, Z)$ is significantly easier.
In a bayesian setting, we condition on the data $X$ and compute the posterior distribution $p(Z | X)$ over the latent variables given our observed data.