AI News, Deconvolution and Checkerboard Artifacts

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

On Monday, January 21, 2019

Lecture 11 | Detection and Segmentation

In Lecture 11 we move beyond image classification, and show how convolutional networks can be applied to other core computer vision tasks. We show how ...

High-Accuracy Neural-Network Models for Speech Enhancement

In this talk we will discuss our recent work on AI techniques that improve the quality of audio signals for both machine understanding and sensory perception.

Lesson 2: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..