FPGA-based video surveillance comes of age

Video surveillance market and trendsEscalating security concerns have forced governments and institutions to invest heavily in surveillance and security equipment. Further, technological innovations in imaging and video processing have revolutionized the video surveillance industry, which is not limited to security but includes banking, transportation, education, retail, healthcare, gaming, and other areas as well. According to ABI Research, overall video surveillance market revenue will increase from $16 billion in 2008 to $29 billion in 2015, with a CAGR of 9%.

Video surveillance has been transformed from analog standard-definition cameras and VCRs, to megapixel and high-definition cameras and DVRs to networked IP cameras streaming video over Ethernet, to cloud-based network video recorders (NVR). Instead of live viewing, continuous recording and visual searches through recorded material, intelligent cameras and recorders now enable event-based recording and alarm triggering, and automated searches through recorded material.

These advances in security and surveillance are tightly coupled with advances in the image sensor technologies and image signal processing capabilities of semiconductors. These include camera architectures, sensor interfacing/sensor bridges, image signal processing, High Dynamic Range (HDR) processing and video analytics.

Camera architecturesA typical camera consists of an image sensor, an image signal processor and some type of output interface. In addition to these functional blocks, the camera could also include video compression or video analytics functions. Figure 1 shows an example of a megapixel network camera implemented on a LatticeECP3 FPGA device.

Figure 1. Camera architecture implemented on an FPGA

Sensor interfacing / sensor bridgesAn Image Signal Processor (ISP) interfaces typically with one or more image sensors. The most common image sensors are CMOS sensors. Lower resolution and lower frame rate sensors can interface to an ISP via a parallel interface. Higher resolution and higher frame-rate image sensors require high-speed interfaces, which include MIPI, Aptina HiSPi or sub-LVDS interfaces. Low cost ISPs typically support only a parallel interface. Therefore, a sensor bridge function has to be used to convert the MIPI, HiSPi or sub-LVDS signals to parallel interfacing. Low cost FPGAs can be used to convert MIPI, HiSPi or sub-LVDS interfacing to parallel interfacing. In Figure 2, a Lattice MachXO2 FPGA is used to convert HiSPi sensor signals coming from two Aptina image sensors (MT9034) into a parallel format, which then can be processed by a TI DSP device (DM8127).

Figure 2. Sensor interface bridge application using an FPGA

Image Signal Processing (ISP)An Image Signal Processor (ISP) typically consists of multiple video processing algorithms, clustered in the form of a pipeline performing image enhancement and conversion functions on an incoming video stream. The imaging pipeline may be implemented as software running on a computer, on an FPGA or a digital signal processor (DSP), or as an application-specific standard product (ASSP).

A typical image processing pipeline is shown in Figure 3. The raw sensor data is received from the CMOS image sensor. After linearization, defective sensor pixels are corrected based on the value of their neighboring pixels. The image sensor typically provides a grey image, which is converted into red, green and blue by using a bayer filter (called de-bayering). A Color Correction Matrix (CCM) is used to eliminate crosstalk between the red, green and blue pixels.

The Auto Exposure (AE) block automatically adjusts the exposure time to compensate for changing lighting conditions. The HDR algorithms improve the contrast in the bright and dark regions of an image (HDR will be explained in more detail in the next section). Automatic White Balance (AWB) makes adjustments to all colors based on a white reference point in the scene. A noise reducer removes noise, often visible as random spots of obviously wrong color in an otherwise smoothly colored area. Noise increases with temperature and exposure times. Gamma correction redistributes native camera tonal levels into ones which are more perceptually uniform and visible to the human eye, thereby making the most efficient use of a given bit depth.

The overlay function allows text and graphics to be overlaid on top of the video for displaying menu or camera settings. Additional functional modules can be added based on specific requirements. These can range from compression algorithms, such as H.264 or JPEG, to intelligent video analytics, such as motion detection, object detection or face recognition.

HDR processingThe quality of an image depends on various factors, including lighting. Based on lighting conditions, some areas of a scene could appear too dark and some too bright. Details in these areas are often not visible. The missing details could be very important in applications where critical decisions need to be made.

Figure 4. HDR processing example

Dynamic range (also called maximum contrast of an image) describes the ratio between the brightest point of an image to the darkest point of the same image. The human eye is capable of seeing the details in both of these areas, but most image sensors have a limited dynamic range and are not able to capture the details in both areas of the same image. In order to make the details in the high contrast areas visible, multiple images of the same scene with different exposure times need to be captured. A short exposure time is needed to capture very bright areas and a high exposure time is needed to capture the dark areas.

Images with differing exposure times can be combined into one image, increasing the dynamic range of the image. The capture of a scene with different exposure times and the combining of these images into one image is called High Dynamic Range (HDR) processing, also known as Wide Dynamic Range (WDR) processing. For HDR processing, global and local tone mapping algorithms are used to create a single image with an extended dynamic range bringing out the details in the high contrast areas of the same image. Dynamic range is often expressed in dB. A good HDR image has a dynamic range of 90 or greater. For example the Aptina 720p HDR sensor (MT9M034) has a dynamic range of 120dB.

HDR processing algorithms are very processing intense, especially when megapixel sensors at higher frame rates are used. These require a tremendous amount of processing power, which is often not achievable by an off-the-shelf Image Signal Processor (ISP). The implementation of these complex and resource intense algorithms is best served using an FPGA. The same FPGA device can be used for implementation of additional image processing algorithms.

Figure 5. FPGA- based sensor interface bridging and HDR processing

Figure 5 shows an HDR implementation with the Panasonic MN34041 image sensor. Two 1080p frames with different exposure times are captured at 60 frame per second (fps). The two captured frames are synchronized by passing one of the frames through a line buffer. After synchronization, the two frames are combined and processed into one 1080p frame at 30 frames per second. In the above example, the sensor interfacing, line buffer controller, frame combiner and the tone mapping algorithms are implemented on a single Lattice MachXO2 FPGA. The processed frame is then passed on to an image signal processor for further processing.

Video analyticsVideo Analytics (VA) is the capability to automatically analyze video to detect and determine events. Video analytics is used in a wide range of applications, including security and surveillance, automotive, traffic control, and retail. Motion detection is one of the most popular video analytics algorithms, where motion is detected relative to a fixed background scene. A complete suite of intelligent video analytics has been developed and ported by Intellivision Inc. to the LatticeECP3- based HDR-60 Camera Development Kit, enabling the design of smart security cameras. The suite includes Intelligent video motion detection, Intelligent Intrusion Detection and Camera Tamper Detection. The intelligent video motion detection can differentiate between an animal and a person, sending alerts only when necessary. Intelligent Intrusion Detection sends alerts when someone opens a door or enters a restricted space. With the Camera Tamper Detection function, one can be alerted and act accordingly if a camera has been moved, blocked or knocked out of focus.

Figure 6. Video Analytics Demo Screen shot

Figure 6 shows the Ethernet-based graphical control interface for the video analytics demo based on the HDR-60 camera development kit. The yellow line represents the counter demo and the user is able to position the yellow line anywhere in the screen horizontally, vertically or diagonally. If an object crosses the line, then the object crossing from one side to the other is counted. This feature could be used to count the number of people entering or exiting a location. The red box represents a restricted area and the size and location of the red box can be programed to meet a certain application. Once an object enters the scene, it is detected. By entering the restricted area represented by the red box, an alert is set. The automatic detection and alerting can replace costly live monitoring of camera images. A third feature, which is not displayed in Figure 6, is camera tamper detection. When the camera is covered or moved, an alert will be sent as well.

Summary: FPGAs will drive new, complex video applicationsThe thirst for higher resolution and higher frame-rate image sensors continues to grow. The combination of these image sensors with advanced image signal processing algorithms such as HDR and intelligent video analytics are opening up new, as yet unimagined application areas. Existing off-the-shelf signal processors can no longer keep up with the performance requirements of new complex applications.

Programmable FPGA devices are the perfect choice for interfacing with multiple high-resolution image sensors simultaneously, stitching them together in panoramic or tiled formats and running HDR and intelligent video analytics algorithms on the same chip.

If you found this article to be of interest, visit Programmable Logic Designline where – in addition to my Max's Cool Beans blogs – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).

Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).