READING

Mahendran and Vedaldi propose a visualization technique allowing to visualize higher level features within deep representations. Essentially, the idea is to compute a reconstruction (based on an adequate image prior) which most closely results in the given representation. The approach is applied to AlexNet [] as well as convolutional neural networks mimicking DSIFT [][] and HoG [].

where $\Phi(x)$ refers to the representation obtained on image $x$, and $\Phi(x_0) = \Phi_0$ is the representation about to be visualized. The loss is the Euclidean distance and as regularization, Mahendran and Vedaldi use a combination of the $\alpha$-norm

regarding the balance of these three terms, some caveats need to be considered. First, the Euclidean distance is normalized by $\|\Phi_0\|_2^2$. Furthermore, $\Phi(x)$ is replaced by a scaled version $\Phi(\sigma x)$ in order to address the first convolutional layers being not completely insensitive to scaling. $\sigma$ is set to the average Euclidean norm of the images. The final objective takes the form

With appropriate weighting parameters as detailed in the paper. The objective is minimized using gradient descent with momentum.

For experiments, they consider AlexNet [] as well as convolutional neural networks reconstructingDSIFT [17,20] and HoG [4]. In particular, they detail how DSIFT and HoG can be expressed as convolutional neural networks by converting the individual operations to commonly used layers. Details can be found in the paper.

Qualitative results in Figure 1 show the reconstruction of an input image for representations obtained from the different layers in AlexNet. Note that for each layer, the weighting parameters are chosen separately.