READING

Elsayed et al. use universal adversarial examples to reprogram neural networks in order to perform different tasks. In particular, e.g., on ImageNet, an adversarial example

$\delta = \tanh(W \cdot M)$

is computed where $M$ is a mask image (see Figure 1, in the paper the mask image essentially embeds a smaller image into an ImageNet-sized image) and $W$ is the adversarial perturbation itself (note that the notaiton was changed slightly for simplification). The hyperbolic tangent constraints the adversarial example, also called adversarial program, to the valid range of $[-1,1]$ as used in most ImageNet networks. The adversarial program is comuted by minimizing

$\min_W (-\log P(h(\tilde{y}) | \tilde{x}) + \lambda \|W\|_2^2$.

Here, $h$ is a function mapping the labels of the target taks (e.g., MNIST classification) to the $1000$ labels of the ImageNet classification task (e.g., using the first ten labels, or assigning multiple ImageNet labels to one MNIST label). Essentially, this means minimizing the cross entropy loss of the new task (with new labels) to solve for the adversarial program. Examples of adversarial programs for different tasks and architectures are shown in Figure 1.

Interestingly, these adversarial programs are able to achieve quite high accuracy on tasks such as MNIST and CIFAR10. Additionally, the authors found that this is only possible for trained networks, not for networks with random weights (although, this might also have other reasons).