Since $\mathrm{LB}(q,\theta) \leq \mathrm{L}(\theta)$ (for all choices of $q(z)$), maximizing the lower-bound $\mathrm{LB}$ will also maximize the log-likelihood wrt $\theta$. The reason to maximize $\mathrm{LB}$ rather than log-likelihood $\mathrm{L}$ directly is that $\arg\max \mathrm{LB}$ often leads to easier expressions. E.g., see this illustrative figure (Bishop p.453):

It also follows from (the last line of) the multi-line derivation above that maximizing $\mathrm{LB}$ w.r.t. $\theta$ amounts to maximization of the expected complete-data log-likelihood (where the complete data set is defined as $\{(x_i,z_i)\}_{i=1}^N$). Hence, the EM algorithm comprises iterations of
$$
\boxed{\textbf{EM}:\, \theta^{(m+1)} := \underbrace{\arg\max_\theta}_{\text{M-step}} \underbrace{\sum_z \overbrace{p(z|x,\theta^{(m)})}^{=q^{(m+1)}(z)} \log p(x,z|\theta)}_{\text{E-step}} }
$$

The negative lower-bound $\mathrm{F}(q,\theta) \triangleq -\mathrm{LB}(q,\theta)$ also appears in various scientific disciplines. In statistical physics and variational calculus, $F$ is known as the free energy functional. Hence, the EM algorithm is a special case of free energy minimization.

The Free-Energy Principle (FEP) is an influential neuroscientific theory that claims that information processing in the brain is also an example of free-energy minimization, see Friston, 2009.

According to FEP, the brain contains a generative model $p(x,z,a,\theta)$ for its environment. Here, $x$ are the sensory signals (observations); $z$ corresponds to (hidden) environmental causes of the sensorium; $a$ represents our actions (control signals for movement) and $\theta$ describes the fixed 'physical rules' of the world.

Free-energy can be interpreted as (a generalized notion of sum of) prediction errors. Free-energy minimization aims to minimize prediction errors through perception, actions, and learning. (The next picture is from a 2012 tutorial presentation by Karl Friston)

$\Rightarrow$ The brain is "nothing but" an approximate Bayesian agent that tries to build a model for its world. Actions (behavior) are selected to fulfill prior expectations about the world.

In our BIASlab research team in the Signal Processing Systems group (FLUX floor 7), we work on developing (approximately Bayesian) artificial intelligent agents that learn purposeful behavior through interactions with their environments, inspired by the free-energy principle. Applications include

robotics

self-learning of new signal processing algorithms, e.g., for hearing aids

self-learning how to drive

Let me know (email [email protected]) if you're interested in developing machine learning applications that are directly inspired by how the brain computes. We often have open traineeships or MSc thesis projects available.