Frame-Recurrent Video Super-Resolution

Abstract: Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire video, effectively treating the problem as a large number of separate multi-frame super-resolution tasks. This approach has two main weaknesses: 1) Each input frame is processed and warped multiple times, increasing the computational cost, and 2) each output frame is estimated independently conditioned on the input frames, limiting the system's ability to produce temporally consistent results. In this work, we propose an end-to-end trainable frame-recurrent video super-resolution framework that uses the previously inferred HR estimate to super-resolve the subsequent frame. This naturally encourages temporally consistent results and reduces the computational cost by warping only one image in each step. Furthermore, due to its recurrent nature, the proposed method has the ability to assimilate a large number of previous frames without increased computational demands. Extensive evaluations and comparisons with previous methods validate the strengths of our approach and demonstrate that the proposed framework is able to significantly outperform the current state of the art.

​Contributions:

We propose a recurrent framework that uses the HR estimate of the previous frame for generating the subsequent frame, leading to an efficient model that produces temporally consistent results.

Unlike existing approaches, the proposed framework can propagate information over a large temporal range without increasing computations.

Our system is end-to-end trainable and does not require any pre-training stages.

We perform an extensive set of experiments to analyze the proposed framework and relevant baselines under various different settings.

We show that the proposed framework significantly outperforms the current state of the art in video super-resolution both qualitatively and quantitatively.​