Why U-Net?

Mar 5, 2018
2 min read

[Update 22.03.2018: link to correct Youtube publication.]

U-Net was proposed in 2015 for medical image segmentation.
You can find the original paper, along with some video introduction on the project homepage.
Its structure is relatively simple and shallow, so it seems to be well fitted for a learning work.

U-Net advantages

Let’s have a look at its main characteristics!

Fitted for segmentation: it computes a pixel-wise output (minus the validity margins of the convolutions).
Since we want to tackle segmentation tasks here, then it should work without modifications.

It has a simple structure.
It’s a repetition of basic building blocks: convolutions, ReLu, max pooling for the downsampling/encoding path, upsampling, convolutions, ReLu for the upsampling/decoding path.
Hence, it should not be too complicated to implement.

It exhibits good performance.
It won various benchmarks when introduced, and it still allows to get decent rankings in [Kaggle(https://www.kaggle.com)] segmentation challenges (here and here for example).

It works with little amount of training data.
While the original authors don’t really provide an explanation about that point, they achieved these good results with only 30 training images.

Filter-bank like structure

Most importantly, when analyzing the structure of the U-Net I’ve been struck by its proximity with well-known Signal Processing tools: scale-spaces or multi-resolution analysis.
While it’s still hard to explain why the network works so well, it’s tempting to analyze its structure as:

the downsampling path is creating something similar to a scale-space and loses gradually some locality in exchange for higher level features or a broader horizon;

the upsampling path propagates these high-level, coarsely localized features into each original pixel;

the horizontal path (from one downward level to the corresponding upward level) re-injects the details lost in the downsampling (max pool) step.
This is strangely similar to what wavelet-based algorithms do.

Enough for this post!
As a bonus, a presentation by Stéphane Mallat where he tries to fill the gap between Convolutional Neural Networks and Filter Banks.