READING

Mairal, Bach and Ponce provide a profound discussion of sparse coding for computer vision. Beneath discussing data pre-processing and data analysis methods such as PCA and ICA, they also provide detailed descriptions of sparse coding and dictionary learning algorithms including their applications such as denoising, inpainting, super-resolution and recognition.

Personally, I was most interested in data pre-processing as well as gradient-descent based sparse-coding and dictionary learning approaches.

When performed on overlapping patches, every pixel occurs in multiple patches. Therefore, the corresponding values are averaged. According to Mairal, Bach and Ponce this has a similar effect as high-pass filtering the image.

may give poor results when applied on image patches. Especially after centering, some patches may contain few information resulting in a low $l_2$ norm. Therefore, Mairal, Bach and Ponce use

$x_i := \frac{x_i}{\max(\eta, x_i)}$

with $\eta$ chosen as $0.2$ times the mean $l_2$ norm across all patches. On overlapping patches, the same approach as above is employed.

Whitening: Given the covariance of the image patches, i.e.

$\Sigma = \frac{1}{N}\sum_{i = 1}^N x_i x_i^T$,

whitening intends to transform the image patches such that the covariance matrix is close to the identity matrix. As the covariance is positive semi-definite, the eigenvalues are non-negative and $S = \text{diag}(s_1,\ldots, s_n)$ of the eigenvalue decomposition

$\Sigma = U S^2 U^T$

holds the singular values (note that $U$ is orthogonal). Whitening is the performed as

For applying the discussed pre-processing techniques to color patches, Mairal, Bach and Ponce make the following suggestions:

The chosen color space inherently influences pre-processing; one of the motivation to consider color spaces such as Lab, XYZ and YCrCb is the fact that the Euclidean distance is not an good estimate of the perceived color differences when applied in RGB.

Different channels of color images should be centered separately.

Sparse coding. Mairal, Bach and Ponce discuss several sparse coding approaches with respect to both the $l_0$ norm and the $l_1$ norm, i.e. considering the problems

I was most interested in gradient descent approaches to both problems (however, they also discuss approaches such as matching pursuit, orthogonal matching pursuit etc.). For Equation (1) this results in hard thresholding as illustrated in Algorithm 1; for Equation (2), soft thresholding is illustrated in Algorithm 2.

Alternatively, the $l_1$ norm can be replaced by the $l_0$ norm. The resulting algorithm is shown in Algorithm 3 where the last step in each iteration describes a projection of $D$ onto the set of matrices with $l_2$ norm less than one. This prevents the dictionary columns to grow out of bounds.