Computer Vision System Toolbox

Modeling a Video Processing System for an FPGA Target

This example shows how to use the Computer Vision System Toolbox™ in conjunction with HDL Coder™ and DSP System Toolbox™ to show a design workflow for generating hardware description language (HDL) code suitable for targeting video processing application on an FPGA. The example reviews how to design a system that can operate on hardware. You can target any FPGA that can fit the design.

Video processing is computationally intensive. One solution to overcome high computational demands of video processing is to use dedicated hardware, such as a field programmable gate array (FPGA). Other benefits of FPGAs include low cost and reduced power consumption.

For the purpose of learning how to design video processing hardware you will analyze a simple system that sharpens input video. Assume that the hardware uses a common interlaced analog source such as a video camera connected through a composite video interface (yellow RCA plug). Therefore, besides a sharpening filter, you will also need a deinterlacer to handle an interlaced video signal.

Step 1: Model Behavior of a Video Sharpening System

Before you consider a design that can be transferred onto an FPGA, you should first define the system using blocks that process full video frames. This will help you to easily verify your video processing design ideas. Later, it will serve as a reference for verifying the implementation of the algorithm targeted to an FPGA.

The following figure shows the Interlaced Video Source subsystem that modifies a full frame video to produce an interlaced video. Each frame is split into top and bottom interlaced fields.

Each video field is half the height of the original video frames, and the transmission rate doubles between the input and output of the interlacer. The interlacer simply turns one frame into two frames and afterward removes odd lines from the first frame to produce the top field and even lines from the second frame to produce the bottom field.

Prior to applying the sharpening filter the system deinterlaces the video using a simple line doubler, also known as a bob deinterlacer, followed by an alternate frame drop to reduce the bob artifacts. Although the bob deinterlacer is not ideal, use it because it is simple, efficient and easy to implement on an FPGA. It requires only two video lines of memory thus avoiding the need for accessing external memory.

The Computer Vision System Toolbox uses a frame-based model where blocks assume the availability of a full frame of video at any given time step. However, when designing for FPGAs, this assumption is not appropriate. Memory constraints, performance, and latency considerations often require an approach that uses as little memory as possible. Additionally, the data presented to FPGAs from other devices are processed one pixel at a time. Thus, algorithms designed for FPGAs cannot assume that a video frame buffer is available. To design algorithms in Simulink® environment, you will need to convert the previous frame processing model to one that processes individual pixels at a much higher sample rate.

The viponfpga.slxviponfpga.slx model below shows how to generate a pixel stream and how to process it. It also compares the results of processing full video frames, with the results of processing a pixel stream. This will verify the streaming algorithm results.

The different color lines indicate the change in the video rate on the streaming branch of the model. The model also shows the rate transition value of 874 inside the Rate transition factor display block. This rate transition is due to the fact that the pixel stream is sent out in the same amount of time as the full video frames and therefore it must be transmitted at a much higher rate. In this case, it is processing at a rate 874 times the full frame rate.

Step 2.1: Generating Pixel Stream

The data is sent out in the row scanning order starting with the upper left pixel first. Furthermore, it is augmented with non-video data to simulate the effect of horizontal and vertical blanking periods found in real hardware video systems as shown in the diagram below.

Additionally, the Frame to Stream subsystem generates synchronization signals so the blocks that follow it can ascertain the location of each pixel within the 2-D matrix of the full video frame. The conversion from video frame to pixel stream is accomplished using the Image Pad, MATLAB® Function and Unbuffer blocks as shown below.

Hint: To improve performance of MATLAB® Function block, you can disable its debugging option.

Step 2.2: Synchronization Signals

Working with a pixel stream instead of full video frames has a significant impact on the interfaces of the blocks. In addition to video data, each block accepts and maintains a set of synchronization signals. Blocks use those signals to determine the location of a particular pixel within the 2-D space of the video frame. Each streaming block receives and processes five boolean synchronization signals to indicate:

1. Active Pixel
2. Line Start
3. Line End
4. Frame Start
5. Frame End

For convenience, these signals are packaged into a single boolean vector. The figure below shows an example timing diagram describing transmission of a pixel stream containing a 2-by-3 active video area with horizontal blanking of 2 pixels, and vertical blanking of one video line.

In this example, there are no gaps in the Active Video synchronization signals. However, it is possible for gaps to be introduced. Therefore, Start of Line and End of Line signals cannot simply be derived from the Active Video.

The above synchronization signals provide an abstraction for the signals found in practical video hardware. Practical video systems may use:

The above synchronization standards can be translated to the synchronization signals used in this example. Utilizing the Active Video, Line Start, Line End, Frame Start and Frame End signals permits us to stay independent of a particular video synchronization standard while enabling us to build an interface compatible with the standards used in real hardware.

Step 2.3: Impact of Pixel Streaming on System Design

The impact of pixel streaming and the use of synchronization signals can be seen immediately. Each algorithmic block in the model must use and maintain the synchronization signals. In other words, once the synchronization signals are introduced, you need to maintain them throughout the model. This can be observed by looking under the mask of the Streaming Image Sharpening subsystem shown below.

The syncOut signals for both blocks are internally manipulated to match the data stream latencies introduced by the blocks. For example, examine the Streaming 2-D FIR Filter subsystem below.

Typically, when using a general purpose processor, you have access to the entire video buffer. This is clearly not the case for the Streaming 2-D FIR Filter subsystem, since it was designed to process a pixel stream. In this example, the subsystem uses a 3-by-3 kernel of filter coefficients. In order for the filter to produce the first valid output pixel, it needs to store enough input pixels to form a neighborhood required for the application of the filter kernel. In the case of a 3-by-3 kernel, it needs at least two lines of video plus 3 more pixels to create a minimum neighborhood. That is accomplished inside the Line Memory and Make Neighborhood subsystems. Additionally, these subsystems must adjust the syncOut signals to reflect the delay introduced by storing the initial neighborhood. They also handle various padding modes. It turns out that the memory controller is the most complicated part of the design, while the portion that performs the arithmetic is rather simple.

Step 2.4: Verifying the Pixel Stream Processing Design

While building the streaming portion of the design you used the Verification and Display subsystem, shown below, to continuously verify results against the original full frame design.

To facilitate the verification of the processed stream against the original full frame implementation, the Verification and Display subsystem synchronizes the full frame output with the processed pixel stream whose latency was affected by the Frame to Stream, Stream to Frame and the streaming algorithmic blocks.

Step 2.5: Preparing for HDL Code Generation

Both the Streaming Deinterlacer and Streaming 2-D FIR Filter subsystems use blocks supported by HDL Coder. To find out which blocks can be used to develop streaming video systems on an FPGA, simply type hdllib at the MATLAB® command prompt to open the hdlsupported library. This example makes heavy use of Simple Dual Port RAM block, basic Simulink blocks such as Product, and MATLAB® Function.

During development of the streaming algorithm, a 30-by-40 reduced video frame size was employed. This produces a 15-by-40 field of interlaced video. Processing of individual pixels incurs a penalty of multiple function calls per each block invoked. The reduced frame size decreases processing time allowing faster design iterations. The final HDL code generation can be done with full frame sizes, satisfying hardware requirements.

Finally, prior to performing HDL code generation, the model was updated to use only fixed-point types suitable for FPGA implementation. Additionally, the Device vendor and Device type settings within the Hardware Implementation portion of the model configuration was set to 'ASIC/FPGA' setting. This choice affects the behavior of basic arithmetic used in the model by making it better suited for an FPGA target. For additional settings to consider, see hdlsetup function documentation.

Step 3: Generate HDL Code and Verify its Behavior Using ModelSim®

Generate the HDL code only for the Streaming Image Sharpening subsystem. The Frame to Stream and Stream to Frame blocks are not intended to generate HDL code. For hardware deployment, these blocks should be replaced by interface blocks that translate the synchronization signals used by the target board to the signals used in the design. The interface blocks can be created by using HDL Coder or by hand-coding them in HDL. To invoke HDL code generation, execute:

makehdl('viponfpga/Streaming Video Sharpening')

at the MATLAB command prompt.

At this point you can verify that the generated HDL code produces the same results as a simulation of your streaming video processing design. You can accomplish this by using the makehdltb command which automatically produces a model intended for co-simulation with the ModelSim HDL simulator from Mentor Graphics®. Use the following command to produce the co-simulation model shown below.

After you invoke the Start Simulator button in the model above, wait for ModelSim® to finish. Then, invoke the simulation itself to see that the dut ref and the cosim signals produce zero output err signal as shown below.

After verifying the design, we can turn our attention to deploying generated code on a hardware board. To integrate the design onto a board, the video frame size must be increased to match the specifications of the board and camera. Appropriate hardware interface blocks must be developed and added to the system. Finally, the algorithm and interface components need to be integrated with the rest of the hardware system. The final step, which is beyond the scope of this example, is to place the design onto a hardware board. The streaming sharpening filter was successfully deployed on two boards: Altera® Stratix II EP2S60 DSP Development board with a TVP5146 video input daughter card and Xilinx® Spartan®-3A DSP DaVinci™ Development Kit by AVNET.