Month: May 2011

Here is an example of usage of the graphical models toolbox I’ve just released. I’ll use the terminology of “perturbation” to refer to computing loss gradients from the difference of two problems as in this paper, and “truncated fitting” to refer to backpropagating derivatives through the TRW inference process for a fixed number of iterations, as in this paper.

This is basically the simplest possible vision problem. We will train a conditional random field (CRF) to take some noisy image as input:

and produce marginals that well predict a binary image as output:

The noisy image is produced by setting where is random on and is a noise level as described in this paper. For now, we set which is a pretty challenging amount of noise, as you see above.

The main file, which does the learning described here can be found in the toolbox in examples/train_binary_denoisers.m.

First off, we will train a model using perturbation, with the univariate likelihood loss function, based on TRW inference, with a convergence threshold of 1e-5. We do this by typing:

The final model trained has an error rate of 0.096. We can visualize the marginals produced by making an image where each pixel has an intensity proportional to the predicted probability that that pixel takes label “1”.

On the other hand, we might train a model using truncated fitting, with the univariate likelihood, and 5 iterations of TRW.

>> train_binary_denoisers('trunc_ul_trw_5')

Sparing you the details of the optimization, this yields a total error rate of .0984 and the marginals:

Thus, restricting to only 5 iterations pays only a small accuracy penalty compared to a right convergence threshold.

Or, we could fit using the surrogate conditional likelihood. (Here called E.M., though we don’t happen to have any hidden variables.)

>> train_binary_denoisers('em_trw_1e-5')

This yields an error rate of .1016, and the marginals:

There are many permutations of loss functions, inference algorithms, etc. (Some of which have not yet been published.) Rather than exhaust all the possibilities, I’ll just list a bunch of examples:

I’m releasing code for a “toolbox” of code for learning and inference with graphical models. It is focused on parameter learning using marginalization in the high-treewidth setting. Though the code is, in principle, domain independent, I’ve developed it with vision problems in mind. This means that the code is A) efficient (all the inference algorithms are implemented in C++) and B) can handle arbitrary graph structures.

There are, at present, a bunch of limitations:

All the inference algorithms are for marginal inference. No MAP inference, at all.

The code handles pairwise graphs only

All variables must have the same number of possible values.

For tree-reweighted belief propagation, a single edge appearance probability must be used for all edges

For vision, these are usually no big deal. (Except if you are a MAP inference person. But that is not advisable.) In other domains, though, these might be showstoppers.

The code can be used in a bunch of different ways, depending on if you are looking for a specific tool to use, or a large framework.

Use the [Differentiation] methods (back-TRW or implicit differentiation) to calculate parameter gradients by providing your own loss functions. Do everything else on your own.

Use the [Loss] methods (em, implicit_loss) to calculate parameter gradients by providing a true vector x and a loss name (univariate likelihood, clique likelihood, etc.) Unlike the above usages, these methods explicitly consider the conditional learning setting where one has an input and an output.

Use the [CRF] methods to calculate calculate almost everything (deal with parameter ties for a specific type of model, etc.) These methods consider specific classes of CRFs and given and input, output, loss function, inference method, etc. give the parameter gradient. Employing this gradient in a learning framework is quite straightforward.