Tuesday February 07, 2017

Google has published a paper titled "Pixel Recursive Super Resolution" that demonstrates how it is now possible to turn a tiny, pixelated mess into a more detailed, usable image. Wow, now you can watch your favorite make-believe investigator hitting an "enhance" button to get the face of a suspect without laughing. But in light of all of these fancy advancements in neural network technology, how is it that we are still stuck with bicubic resizing for enlarging photos? I feel like Adobe would have introduced something superior by nowآ—I mean, they did manage to create pure magic like content-aware fill. There is waifu2x, I supposeآ…

آ…it's impossible to create more detail than there is in the source imageآ—so how does Google Brain do it? With a clever combination of two neural networks. The first part, the conditioning network, tries to map the the 8أ—8 source image against other high resolution images. It downsizes other high-res images to 8أ—8 and tries to make a match. The second part, the prior network, uses an implementation of PixelCNN to try and add realistic high-resolution details to the 8أ—8 source image. Basically, the prior network ingests a large number of high-res real imagesآ—of celebrities and bedrooms in this case. Then, when the source image is upscaled, it tries to add new pixels that match what it "knows" about that class of image. For example, if there's a brown pixel towards the top of the image, the prior network might identify that as an eyebrow: so, when the image is scaled up, it might fill in the gaps with an eyebrow-shaped collection of brown pixels.