Share Post

Jensen’s Inequality states that given g, a strictly convex function, and X a random variable, then,

Here we shall consider a scenario of a strictly convex function that maps a uniform random variable and visualize the inequality theorem. Consider a parabolic function with an offset which is strictly convex as shown in the above diagram (cover-figure), where f(x) is the pdf of the uniform random variable. The EM Algorithm is guaranteed to converge as log-likelihood is a strictly concave function and hence the opposite of the inequality holds true. This means any maximum value that the expected value of the probability distribution of the latent variables can take, is guaranteed to lie below the log-likelihood function. Hence, we can always maximize the expected values over many iterations leading to full convergence. The EM derivation follows from the fact that if g is strictly convex, then E[g(X)] = g(E[X]) holds true if and only if X = E[X] with probability 1 which implies X is a constant. At refactored.ai, we constantly work on such problems that help illustrate concepts in Machine Learning through visual mathematical examples.

Let us look at a uniform random variable $X \in U(a, b)$ where the function on the variable is a convex function. The expected value of a function can be derived with preliminary calculus and it results in the equations as shown.

Uniform Random Variable

Let us create a set of uniform random variables with (a, b)s in a list and plot them:

You can see from the above that for all distributions, the E[g(X)] >= g[E(X)]. The green dots show the value of the function of expected value of X and the red dots show corresponding expected values of the function aligned on the y-axis.