Here, we train and test on the same speaker. We are now doing 8x upsampling.

It is linked to the row over proposed changes at the Scottish ballet. (Speaker p225, Utterance 366)

Your browser does not support the audio element.

High Resolution

Your browser does not support the audio element.

Low Resolution

Your browser does not support the audio element.

Cubic Baseline

Your browser does not support the audio element.

Super Resolution

The model sometimes hallucinates sounds, making interesting mistakes.

In short, the national team without Frank, is like football without feet.

Your browser does not support the audio element.

High Resolution

Your browser does not support the audio element.

Low Resolution

Your browser does not support the audio element.

Super Resolution

We also ran our model on a dataset of piano sonatas. Here is an example (4x upsampling).

Piano Example

Your browser does not support the audio element.

High Resolution

Your browser does not support the audio element.

Low Resolution

Your browser does not support the audio element.

Super Resolution

We have more samples on Github.

Method

Our model consists of a series of downsampling blocks, followed by upsampling blocks.

Each block performs a convolution, dropout, and applies a non-linearity. The two types of blocks are connected by stacking residual connections; this allows us to reuse low-resolution features during upsampling.

Upscaling is done using dimension (subpixel) shuffling.· We also start with initial cubic upsampling layer, and connect it to the output with an additive residual connection.

It follows from basic signal processing theory that our method effectively predicts the high frequencies of a signal from the low frequencies.

Spectrograms showing (from left to right) a high-resolution signal, its low-resolution version, a reconstruction using cubic interpolation, and the output of our model.

Remarks

Machine learning algorithms are only as good as their training data. If you want to apply our method to your personal recordings, you will most likely need to collect additional labeled examples.

Interestingly, super-resolution works better on aliased input (no low-pass filter). This is not reflected well in objective benchmarks, but is noticeable when listening to the samples. For applications like compression (where you control the low-res signal), this may be important.

More generally, the model is very sensitive to how low resolution samples are generated. Even using a different low-pass filter (Butterworth, Chebyshev) at test time will reduce performance.