Google AI adds detail to low-resolution images

It seems intelligent enhancement of image detail is currently high on the agenda at Google. Recently the company brought its RAISR smart image upsampling to Android devices. Now, the Google Brain team has developed a system that uses neural networking to enhance detail in low-resolution images.

The system uses a two-step approach, with a conditioning network first attempting to map 8×8 source images against similar images with a higher resolution and creating an approximation of what the enhanced image might look like. In a second step, the prior network adds realistic detail to the final output image. It does so by learning what each pixel in a low-resolution image generally corresponds to in higher-res files.

As you can see, the system already works pretty well. In the series of samples above, the images on the left show the 64 pixel source images, while the ones in the middle show the output image that the Google Brain algorithm has produced from them. The images on the right show higher-resolution versions of the low-res source images for comparison. While the results are not perfect yet, they are certainly close enough to provide value in a variety of scenarios. Eventually we might even be able to extract high-resolution images from low-quality security-cam footage a la CSI.

Comments

I don't think people get the idea potential here. If it can upsample SO WELL from such a LOW resolution file, imagine what it would do to an already high resolution files...

It's not for generating a NEW FACE, rather generating that finer detail of the eye lashes on that face, the skin, the hair texture. Not to mention landscape detail like enhancing foliage and trees and sky gradients creating images that are much higher resolution-looking (say for printing) than the original ones,

I propose an experiment. Start a new test, and train it on noting but images of known terrorists. Then feed it an 8*8 image of Donald Trump. Would a judge accept the resulting "approximation" as sufficient evidence to arrest him?

It gets it wrong 100% of the time. In each case the image produced is clearly NOT the same person as in the ground truth image. So what's the point? There is no point. You might as well just pick random high-quality images of people and use those instead, since they aren't the right person either, but at least they will be detailed images.

Point is, even sketch artists get it wrong, but good enough to jog a memory of the witnesses.Not all images are 8x8. Larger images benefit from this with better accuracy. Also company in US developed face recognition software that can identify 250 million US citizens (so far)that works with body worn cameras. It takes 9 hours to complete search that will be cut to shorter times in the future. We are pretty close to map everyone and to recognize everyone with AI technology in north America...India is mapping citizens with retina (eye) recognition software.

The problem is that people fail to see the possibility that something like this can bring. Sometimes new tech might not be useful right off the bat, but it still is a VERY remarkable feat to turn those 8x8 images into those on the middle column. Think about the effort that went into making the AI process 64 pixels and then turn them into 1024 and what can be achieved when you feed the same AI with, for instance, a 100MP Phase One RAW file?

I don't think you're understanding how this works. It's basically taking the small image and matching against Googles database of images. So any "improvement" is dependent on a similar "original" already existing. It would be no use whatsoever in trying to increase the resolution of an already high-res image.

Given there are 7b people on the planet it's already down to1 in 7 accuracy at best.

And that's assuming the source is a simple face portrait. It could be a full body shot, in an infinite number of poses. Or a cat. Or a bowl of fruit, or a fish bowl, with one fish, or two, or seven, or a mountain, or a field, or a flower, or a mountain surrounded by fields of flowers, or a picture of mar, or the Milky Way, or or or...

The kind of accuracy that gets plumbers shot on subways by anti terrorist squads...

I'll believe this when Google provides a web page where I can upload an arbitrary 8*8 pixel photo reduction, and get something back that vaguely resembles my original photo. Until then, it's unproven snake oil.

You guys are all confused what's happening here. I can tell from your questions.

You can't get the year issued because this is not superresolution this is actually synthesizing an image. Kind of like what your brain does when you see a dog really far away: you can't see the individual fur strands but you can imagine what they look like.

This system is given many many priors and it figured out what 'faces' or 'rooms' look like. It then synthesized images on unseen other rooms or faces, guessing what they would look like at higher resolution. The images it is creating do not exist. If you have it a bunch of priors of images of coins, it could probably draw you a new image of a coin, but afaik this system was not trained on coins so doesn't know what they look like at 32x32. It's not that it needs your original coin picture. It just needs to know what coins look like, in general.

Dear s1oth1ovechunk, you really do underestimate my knowledge in this area, as someone who has been working in AI for about 35 years. And yes, I fully understand the sensation of seeing hairs on a dog that is 100 yards way, because you already know what the dog looks like. The question isn't one of can you "convincingly" fill in missing detail (obviously you can, even with a pencil on a printout), but whether the "original" can somehow be recreated. And it is HERE that Google is using its vast library of indexed images.

When I "pixelate" something in an image I am uploading, I generally convince myself I can still recognise the faces I've blocked, or read the car registration numbers. Because I know what they are. But no-one else can recognise or read them - unless they've previously seen similar images. Just like your dog analogy.

Thinking about information, if the software:* knows it is one of your friends, * a lot of pictures of your friends have already been uploaded ,* it get information about your camera's location when you took the picture * it get the location of your friends' phone at the very same time,

Then I can believe that it is able to know from those sources who's there and evaluate the possible position of the face from the blobs to get some possible picture.

Well the technology is certainly impressive but these results are fairly useless. Basically the system thinks "well this looks like a face" and more or less randomly generates a face on that spot.

I suppose it's as good as it can get since you can't generate details our of nothing, certainly not an entire face from a 8x8 px blotch. But looking at the difference between the last 2 columns, it's obvious it would not be helpful to say, recognize people on the photo or add more detail to low-res photos, unless we're fine with changing the faces completely.

I suppose this could have some uses still, such as restoration of old photos where it's more about the overall mood rather than the exact details. But then photos like that tend to be unique enough that the AI may not recognize what's it looking at.

You can't compare with CSI, to get a license plate from video you are able to combine several pictures of the same thing, the shape of license plate signs are known, the number of possible combination is huge but limited. The details of the fiction are most of the time not very realistic but not completely impossible, Here they start from 64 dot, no other information, and guess 15 time more pixels.

At first I thought "fake" but if you look at the images on the left with your eyes half-closed, you see that these 8x8 pixel images show more than just almost random pixels. So it might be true. Maybe with restrictions like "this is guaranteed to be a human face" etc.

Ok, if I get it right, they feed the machine with the 8x8 thing on the left, get the 32x32 of the middle column out while a better resolution of the same pic was the right column...How can they tell from the 8x8 the blob they start from is male of female? How can they even guess it is a face?Is there anyone at Google that believe this is more than just fake promotional sci-fi BS?

They didn't insert another face. They synthesized it. If you look at the paper there is a 'nearest neighbor' column. Which is what you are basically suggesting and it is often nothing like the ground truth.

Not true s1oth1ovechunk - didn't you read the text? The first step involves "a conditioning network first attempting to map 8×8 source images against similar images with a higher resolution". In other words, play the image against the entire library of web images that Google has, to see which reduces to the same pattern. In other words, if I sent an 8*8 pixel reduction of an otherwise never-published photo, it would not even begin to know where to start.

Read it again. Look at some of the examples where the synthesized image has nonsense in it. Look at the nearest neighbor. These things are not consistent with your understanding.Also your understanding would be a pretty lame system.

Training neural networks is about generalizing to unseen data. This is not a search system. The faces you see being created do not exist anywhere else.

Perhaps I should mention I have been working in AI since the 1980s (Imperial College, London). Of course I was over-simplifying my previous comment, because I didn't want to "lose" the general audience. And yes, we do a lot of work with Neural Nets in Prolog - my personal discipline being Logic Programming, and my company being named after it ... (WIN-PROLOG, LPA, ...).

We understand full well that the neural net is generalising the database and subsequently synthesising faces that match the 8x8 key.

The point is that it produces a DIFFERENT FACE to the original (as it must given the limited input data). So it's just plain misleading, if not outright dangerous, to say this techonlogy will lead to "extracting high-resolution images from low-quality security-cam footage a la CSI".

Funny that they "recovered" 32x32 images and stopped there. In fact, it is exactly the step from 32x32, in this case, to a much higher resolution which can be done more realistically with machine learning.

I would like to see the "similar images with a higher resolution" used to create the extrapolated image. How similar must the reference image be to produce useable results? And most interestingly from a creative point of view, what happens when you purposely feed it 'false' data?

hehe... you don't know how neural network works. It is basically guesswork to provide a best result (maximizing some scores on some imposed criteria) according to a learning sample set (the larger the better).

Unless it's a brilliant new stuff, it will most likely provide you some sort of face from completely random pixels (especially if you say to it that it is a face there). Also, if you train it in characters for example, instead of faces, on that image it will probably say it's a reversed 5 or something :). It's not recognizing what is in the picture, it just produces a best result according to one or more learning sets and the assumption that the image is of the same type.

So we finally know what those billions of photos people uploaded to picassa and google photos (and other cloud based image storage solutions going through various servers) will be used for - pure guesswork based on random pixels to generate some 'enhanced' image. Hopefully none could be used for legal purposes as we are all, collectively, guilty if they start 'enhancing' security video grabs for police to catch the crims....Second to this, we will finally know the identity of the Minecraft character....well done Google.

in their pdf, they provide more sample results, and explain a bit how the method works: https://arxiv.org/pdf/1702.00783.pdfAlso, in the examples (figure 4 - some bedroom images) you can see some results where the "guessed" image, is quite similar but completely different in details.

Latest in-depth reviews

Canon's EOS R, the company's first full-frame mirrorless camera, impresses us with its image quality and color rendition. But it also comes with quirky ergonomics, uninspiring video features and a number of other shortcomings. Read our full review to see how the EOS R stacks up in today's full-frame mirrorless market.

No Nikon camera we've tested to date balances stills and video capture as well as the Nikon Z7. Though autofocus is less reliable than the D850, Nikon's first full-frame mirrorless gets enough right to earn our recommendation.

Nikon's Coolpix P1000 has moved the zoom needle from 'absurd' to 'ludicrous,' with an equivalent focal length of 24-3000mm. While it's great for lunar and still wildlife photography, we found that it's not suited for much else.

The Nikon Z7 is slated as a mirrorless equivalent to the D850, but it can't subject track with the same reliability as its DSLR counterpart. AF performance is otherwise good, except in low light where hunting can lead to missed shots.

Latest buying guides

What's the best camera for under $500? These entry level cameras should be easy to use, offer good image quality and easily connect with a smartphone for sharing. In this buying guide we've rounded up all the current interchangeable lens cameras costing less than $500 and recommended the best.

Whether you've grown tired of what came with your DSLR, or want to start photographing different subjects, a new lens is probably in order. We've selected our favorite lenses for Sony mirrorlses cameras in several categories to make your decisions easier.

Whether you've grown tired of what came with your DSLR, or want to start photographing different subjects, a new lens is probably in order. We've selected our favorite lenses for Canon DSLRs in several categories to make your decisions easier.

Whether you've grown tired of what came with your DSLR, or want to start photographing different subjects, a new lens is probably in order. We've selected our favorite lenses for Nikon DSLRs in several categories to make your decisions easier.

What’s the best camera for less than $1000? The best cameras for under $1000 should have good ergonomics and controls, great image quality and be capture high-quality video. In this buying guide we’ve rounded up all the current interchangeable lens cameras costing under $1000 and recommended the best.

Canon's EOS R, the company's first full-frame mirrorless camera, impresses us with its image quality and color rendition. But it also comes with quirky ergonomics, uninspiring video features and a number of other shortcomings. Read our full review to see how the EOS R stacks up in today's full-frame mirrorless market.

We spoke to wildfire photographer Stuart Palley about his experiences shooting the recent Woolsey fire, why the Nikon Z7 isn't quite ready to take a permanent spot in his gear bag, and 'that' Tweet from Donald Trump.

The Z7 presented Nikon with a stiff challenge: how to build a mirrorless camera that measures up to its own DSLRs and can deliver a familiar experience to Nikon users. Chris and Jordan tell us whether they think Nikon succeeded.

Nikon has released firmware version 1.02 that resolves a flickering issue when scrolling through images, an ISO limitation problem, and an occasional crash that could occur when displaying certain Raw files.

The Insta360 One X is the company's latest consumer 360-degree camera, supporting 5.7K video, including excellent image stabilization, as well as 18MP photos. And, in our experience, it's a really fun camera to use.