A probabilistic inference task gets its computational load mainly through the need for marginalization (i.e., computing integrals). E.g., for a generative model $p(x_1,x_2,x_3,x_4,x_5)$, the inference task $p(x_2|x_3)$ is given by

Factorizations provide opportunities to cut on the amount of needed computations when doing inference. In what follows, we will use FFGs to process these opportunities in an automatic way (i.e., by message passing).

Assume we wish to compute the marginal
$$
\bar{f}(x_3) = \sum_{x_1,x_2,x_4,x_5,x_6,x_7}f(x_1,x_2,\ldots,x_7)
$$
where $f$ is factorized as given by the following FFG

Due to the factorization, we can decompose this sum by the distributive law as
$$\begin{align*}
\bar{f}(x_3) = & \underbrace{ \left( \sum_{x_1,x_2} f_a(x_1)\,f_b(x_2)\,f_c(x_1,x_2,x_3)\right) }_{\overrightarrow{\mu}_{X_3}(x_3)} \\
& \underbrace{ \cdot\left( \sum_{x_4,x_5} f_d(x_4)\,f_e(x_3,x_4,x_5) \cdot \underbrace{ \left( \sum_{x_6,x_7} f_f(x_5,x_6,x_7)\,f_g(x_7)\right) }_{\overleftarrow{\mu}_{X_5}(x_5)} \right) }_{\overleftarrow{\mu}_{X_3}(x_3)}
\end{align*}$$
which is computationally (much) lighter than executing the full sum $\sum_{x_1,\ldots,x_7}f(x_1,x_2,\ldots,x_7)$

Messages may flow in both directions on any edge (here on $X_3$). We often draw directed edges in a FFG in order to distinguish forward messages $\overrightarrow{\mu}_\bullet(\cdot)$ (in the same direction as the arrow of the edge) from backward messages $\overleftarrow{\mu}_\bullet(\cdot)$ (in opposite direction). With directed edges, the FFG looks as follows:

Crucially, drawing arrows on edges is only meant as a notational convenience. Technically, an FFG is an undirected graph.

Note that $\overleftarrow{\mu}_{X_5}(x_5)$ is obtained by multiplying all enclosed factors ($f_f$, $f_g$) by the green dashed box, followed by marginalization over all enclosed variables ($x_6$, $x_7$).

This is the Closing the Box-rule, which is a general recipe for marginalization of hidden variables and leads to a new factor with outgoing (sum-product) message
$$ \mu_{\text{SP}} = \sum_{ \stackrel{ \textrm{enclosed} }{ \textrm{variables} } } \;\prod_{\stackrel{ \textrm{enclosed} }{ \textrm{factors} }}
$$

(Sum-Product update rule). This recursive pattern for computing messages applies generally and is called the Sum-Product update rule, which is really just a special case of the closing-the-box rule: For any node, the outgoing message is obtained by taking the product of all incoming messages and the node function, followed by summing out (marginalization) all incoming variables. What is left (the outgoing message) is a function of the outgoing variable only:

(Sum-Product Algorithm). If folows that the marginal $\bar{f}(x_3) = \sum_{x_1,x_2,x_4,x_5,x_6,x_7}f(x_1,x_2,\ldots,x_7)$ can be efficiently computed through sum-product messages. Executing inference through SP message passing is called the Sum-Product Algorithm.

(Exercise) Verfiy for yourself that all maginals in a cycle-free graph (a tree) can be computed exactly by starting with messages at the terminals and working towards the root of the tree.

This model expresses the assumption that $Y_1$ and $Y_2$ are independent measurements of $X$.

Assume that we are interested in the posterior for $X$ after observing $Y_1= \hat y_1$ and $Y_2= \hat y_2$. The posterior for $X$ can be inferred by applying the sum-product algorithm to the following graph:

(Note that) we usually draw terminal nodes for observed variables in the graph by smaller solid-black squares. This is just to help the visualization of the graph, since the computational rules are no different than for other nodes.

Let us now consider the case of Gaussian messages $\overrightarrow{\mu}_{X}(x) = \mathcal{N}(\overrightarrow{m}_X,\overrightarrow{V}_X)$, $\overrightarrow{\mu}_{Y}(y) = \mathcal{N}(\overrightarrow{m}_Y,\overrightarrow{V}_Y)$ and $\overrightarrow{\mu}_{Z}(z) = \mathcal{N}(\overrightarrow{m}_Z,\overrightarrow{V}_Z)$. Let´s also define the precision matrices $\overrightarrow{W}_X \triangleq \overrightarrow{V}_X^{-1}$ and similarly for $Y$ and $Z$. Then applying the SP update rule leads to multiplication of two Gaussian distributions, resulting in

Exercise: For the same summation node, work out the SP update rule for the *backward* message $\overleftarrow{\mu}_{X}(x)$ as a function of $\overrightarrow{\mu}_{Y}(y)$ and $\overleftarrow{\mu}_{Z}(z)$? And further refine the answer for Gaussian messages.

(Ex.1) Reflect on the fact that we now have methods for both marginalization and processing observations in FFGs. In principle, we are sufficiently equipped to do inference in probabilistic models through message passing. Draw the graph for $$p(x_1,x_2,x_3)=f_a(x_1)\cdot f_b(x_1,x_2)\cdot f_c(x_2,x_3)$$ and show which boxes need to be closed for computing $p(x_1|x_2)$.

The update rules for elementary and important node types can be put in a Table (see Tables 1 through 6 in Loeliger, 2007).

If the update rules for all node types in a graph have been tabulated, then inference by message passing comes down to a set of table-lookup operations. This also works for large graphs (where 'manual' inference becomes intractable).

If the graph contains no cycles, the Sum-Product Algorithm computes exact marginals for all hidden variables.

If the graph contains cycles, we have in principle an infinite tree without terminals. In this case, the SP Algorithm is not guaranteed to find exact marginals. In practice, if we apply the SP algorithm for just a few iterations we often find satisfying approximate marginals.