Embedded vision: FPGAs’ next technology opportunity

Editor’s Note: I am delighted to have the opportunity to present the following piece from the first quarter edition 2012 of the Xcell Journal, with the kind permission of Xilinx Inc.

This article addresses the emerging application of embedded vision moving into multiple applications, including military and aerospace.

A jointly developed reference design validates the potential of Xilinx’s Zynq device in a burgeoning application category.

-----------------------

What up-and-coming innovation can help you design a system that alerts users to a child struggling in a swimming pool, or to an intruder attempting to enter a residence or business? It’s the same technology that can warn drivers of impending hazards on the roadway, and even prevent them from executing lane-change, acceleration and other maneuvers that would be hazardous to themselves and others. It can equip a military drone or other robot with electronic “eyes” that enable limited-to-full autonomous operation. It can assist a human physician in diagnosing a patient’s illness. It can uniquely identify a face, subsequently initiating a variety of actions (automatically logging into a user account, for example, or pulling up relevant news and other information), interpreting gestures and even discerning a person’s emotional state. And in conjunction with GPS, compass, accelerometer, gyroscope and other features, it can deliver a data-augmented presentation of a scene.

The technology common to all of these application examples is embedded vision, which is poised to enable the next generation of electronic-system success stories. Embedded vision got its start in traditional computer vision applications such as assembly line inspection, optical character recognition, robotics, surveillance and military systems. In recent years, however, the decreasing costs and increasing capabilities of key technology building blocks have broadened and accelerated vision’s penetration into key high-volume markets.

Driven by expanding and evolving application demands, for example, image sensors are making notable improvements in key attributes such as resolution, low-light performance, frame rate, size, power consumption and cost. Similarly, embedded vision applications require processors with high performance, low prices, low power consumption and flexible programmability, all ideal attributes that are increasingly becoming a reality in numerous product implementation forms. Similar benefits are being accrued by latest-generation optics systems, lighting modules, volatile and nonvolatile memories, and I/O standards. And algorithms are up to the challenge, leveraging these hardware improvements to deliver more robust and reliable analysis results.

Embedded vision refers to machines that understand their environment through visual means. By “embedded,” we’re referring to any image-sensor-inclusive system that isn’t a general-purpose computer. Embedded might mean, for example, a cellular phone or tablet computer, a surveillance system, an earth-bound or flight-capable robot, a vehicle containing a 360° suite of cameras or a medical diagnostic device. Or it could be a wired or wireless user interface peripheral; Microsoft’s Kinect for the Xbox 360 game console, perhaps the best-known example of this latter category, sold 8 million units in its first two months on the market.

The topic of multi-camera analytics perhaps deserves some additional discussion. It should be possible to combine multiple views from multiple cameras to more precisely determine acceleration, relative position, and object characteristics (a 'person' in a sign vs. a real 3D person). Any features that support these requirements?

There surely is a lot of applications where embedded vision can really shine. I come from the academia, where I'm working mainly on vision for mobile robotic and surveillance applications. In the field of robotics, the dominant trend is to pack the machine with a PC and let it handle all the algorithmic heavy lifting. There are however emerging applications where using a PC as we know it is a dealbreaker - think UAVs. As for surveillance - at present the dominant paradigm is centralized processing, using some server or even a server cluster. The image data from cameras has to be transfered for processing, putting a large pressure on the communication infrastructure. Sometimes the constraints presented by the communication infrastructure are a brick wall - a complete system redesign is necessary to top over it (or go around it). A natureal solution to this problem is in-place processing.
Programmable logic really shines when it comes to processing of local image information, e.g. using the sliding window approach. Our stream processors for image filtering and feature detection and matching can crunch hundreds of VGA frames per second. Combine it with a nice, low power embedded processor and you get a system for (almost) every job. And with Zynq, you get it all in one package. The only problem is that the development is significantly more complicated than it is the case with pure software designs.