Wednesday, September 12, 2018

Cameras have come a long way since the days of photographers hiding under a cloak amidst a startling puff of smoke. At any camera store, you can find video cameras that can record underwater, devices that takes photographs with breathtaking clarity at incredible distances—or even ones that do both. However, regardless of how advanced they get, every digital camera currently in existence is constrained by one thing: the need for a focusing lens.

Dr. Rajesh Menon tinkers with a prototype "lensless" camera, which uses light scattered by an ordinary pane of glass.Image Credit: University of Utah.

You may remember from high school science class that a focusing lens is typically a clear piece of plastic or glass that guides light rays passing through it toward a focal point. Generally speaking, in digital cameras this focal point will be the image sensor, the device that registers and records the light hitting it. When the lens isn’t set up right for objects at a given distance, you can tell by the blurry nature of the image—but when there isn’t a lens at all, the result is an unintelligible mess.

This schematic shows how a simple lens can refract light rays, focusing them to a single point. Image Credit: Panther, via Wikimedia Commons.

Although they perform an essential function, lenses also add bulk and cost to cameras. It’s not easy to grind down glass to the perfect shape, and the precision involved there can add a lot to the price tag. And no matter how small your phone’s circuitry gets, the bulk of a lens will always be there. But there just doesn’t seem a way around it if you want your photographs to represent objects that you can recognize.

But, in the age of digital cameras and advanced computing, does the image that a camera takes in really need to be intelligible to humans? What if a machine could make sense of the garbled mess that is the result of a completely unfocused camera, and then “translate” it into an image that people could actually understand? That’s exactly what Dr. Rajesh Menon of the University of Utah wondered.

“All the lens does is rearrange this information [the light coming from the object of interest] for a human being to perceive the object or scene,” he says. “Our question was, what if this rearrangement doesn’t happen? Could we still make sense of the object or scene?”

To satiate his curiosity, Menon set up a miniature glass window—uncurved, so as to let light through without distorting it—in his lab, which he surrounded with reflective tape. On one edge of the window he attached a simple off-the-shelf image sensor, and in front of the window he placed a display that showed various simple images like a stick figure, a square, and the University of Utah “U”.

This set of diagrams illustrates Menon's experimental setup. In (a) you can see the general concept of the window/image sensor, while (b) shows how the light rays travel through the window; most pass directly through, but a few are scattered into the image sensor by the rough edge of the window and the reflective tape. Images (c) and (d) are photographs of the actual equipment. Image Credit: R. Menon, via Optics Express.

As light from the display (representing an object being photographed) passed through the window, a very small fraction of it—about 1%—was scattered by the glass and redirected towards the image sensor by the reflective tape. It’s important to note that, although the light’s path was modified by the presence of this tape, it differs from the behavior of a true lens in that the light rays don't converge toward a single point. Next, Menon developed an algorithm using deep learning to “unscramble” the blurry images, reconstructing the original object.

And, to a large extent, it worked! Menon was able to take recognizable photographs using this “lens-less” camera, a first in the history of optics (unless you count the "pinhole camera" effect). Granted, the photos aren’t the sort of high resolution we’ve come to expect from cutting-edge camera technology, but they’re certainly usable. The applications for this technology are far-reaching; autonomous cars could have windows that double as sensors. Future construction projects could incorporate “security glass” that monitors the surrounding area, and augmented reality glasses could be drastically reduced in bulk. And these cameras could be quite cheap as well—Menon says that the biggest cost is in the image sensor itself, which is already quite low. The technology is actually agnostic to the type of image sensor used, so companies (and consumers) could conceivably shop around for the lowest prices. While he acknowledges that there could be an additional cost factor in the software package required to decode the image, Menon is optimistic. “I’d say the cost to the consumer will be much less than what cameras (in your phone, for example) cost today.”

Some of Menon's experimental results. On the left, you can see the pattern produced by the LED array or LCD, and in the center the unmodified image recorded by the image sensor. The rightmost column shows the "unscrambled" photographs after passing through the algorithm. Image Credit: R. Menon, via Optics Express.

Even so, there are still some kinks to work out. To begin with, Menon’s research was conducted with a bright, high-contrast object, and it’s unclear how the technology will fare under less ideal conditions—outdoors at dusk, for example. Menon says, “My intuition is that with appropriate sensors and more sophisticated algorithms, it should work fine under normal daylight or room lighting.” He does point out that for low-light conditions, a flash or infrared light could help. Nevertheless, he considers the issue of lighting to be one of the biggest limitations in his technology.

The other big question mark is the camera’s range. Menon found that for the optimal photo, the object should be about 150mm—that’s about 6 inches—from the window. When we consider that many applications, like security cameras, require a much greater range, this is a fairly serious limitation. However, by including the use of additional sensors or adjusting their position, this optimal distance can be lengthened or shortened.

While the technology is far from perfect, Menon sees this project as an exciting implementation of what he calls “non-anthropomorphic cameras”. He explains, “Cameras have been designed for over 100 years based on human perception. It is arguably true that more images and videos are seen by machines today rather than by humans. This will inevitably be true in the future.”

So, what if we start designing cameras for machines rather than humans, and only have them translate when we need them to? Menon concludes, “Our paper is one small step in that direction.”