Inside DSP on Digital Video: Processors for video

To create a successful digital video product, you need to choose the right
processor. Sounds simple—but of course, it isn’t. A big part of
the challenge is that there are so many types of processors from which to choose:
general-purpose CPUs, FPGAs, DSPs, configurable processors, and fixed-function
chips, among others.

A further complication is that digital video is a fast-moving field, with standards
that are shifting and evolving. As a result, a processor’s ability to
adapt to changes tends to be more important in digital video than in many other
applications—but such flexibility usually comes at the cost of reduced
efficiency.

Choosing a processor inevitably requires some compromises, but it’s crucial
to know how to pick one that won’t compromise the success of your product.

One doesn’t fit all
Digital video technology is used in products ranging from cellular phones to
personal video recorders (PVRs). While many video products share some common
functionality—for example, most use video compression algorithms to compress
and/or decompress video—they also have significant differences. Portable
products place a high priority on energy efficiency; line-powered products typically
don’t. Products designed for the living room usually have much higher
video resolution than those designed for hand-held products.

In short, one processor won’t fit all. Even one type of processor won’t
fit all. The key to success in processor selection lies in knowing what’s
available, and understanding the strengths and weaknesses of each processor
type.

A few of your favorite things
Because there are so many processor options, it isn’t practical to look
at all—or even a significant subset—of them in detail. Instead,
a hierarchical approach can make the process manageable: Use your most important
criteria to weed out unsuitable candidates early on.

Criteria commonly used for making the first cut include:

Speed. Digital video tasks, like many other types
of signal processing tasks, place heavy computational loads on processors. Carefully
analyze whether a processor has sufficient speed for the target application,
preferably using video-oriented benchmarks such as the BDTI Video Benchmarks™.

Price. Chip price is important, but cost per channel
or overall system cost may be more important.

Energy efficiency. In most cases, it’s more
meaningful to evaluate energy efficiency than power consumption, since energy
use governs battery life.

Flexibility. Some classes of processors are more flexible
than others and can accommodate late changes in product features or allow field
upgrades, such as adding support for a new compression algorithm. In general,
however, the more flexible the processor, the less efficient it is in cost and
energy use.

Quality of development tools. Whether the processor
has tools that are designed to support development of signal processing applications
(or better yet, video applications) can have a significant effect on development
time, and hence on time to market.

Compatibility with earlier processors. This is typically
important if you expect to reuse software from an earlier product.

Vendor roadmap. Does the vendor’s product roadmap
line up well with your plans for follow-on products? Will the processor continue
to be supported—or upgraded—over the life of your product?

Availability as chip or licensable core. Some processors
are sold as packaged off-the-shelf chips, and some as licensable intellectual
property—often called licensable cores—for use in building custom
chips. Most of the processor categories discussed here include both packaged
chips and licensable cores.

As we will show, each class of processor makes different trade-offs in these
areas.

The lineup
In this article, we focus on six classes of processors commonly used for digital
video: fixed-function engines, application specific standard products (ASSPs),
media processors, DSPs, embedded RISC processors, and FPGAs. We cover them in
order from the most specialized to the most flexible. We discuss the strengths
and weaknesses of each category, and examine a specific product within that
category.

Fixed-function engines use hardwired processing structures
for maximum efficiency; they do not use an instruction stream and are not programmable.
Hard-wired logic sacrifices flexibility in exchange for exceptional processing
speed, energy efficiency, and—often—cost effectiveness.

Using fixed-function engines can simplify system design and testing. Because
fixed-function engines are not programmable, product developers don’t
have to learn new programming tools or deal with integrating multiple software
modules. And they don’t have to figure out whether multiple tasks executing
on the processor may interact in undesirable ways, interfering with the real-time
behavior of the system.

Fixed-function engines are typically provided as licensable intellectual property
(IP) for integration into custom chips. In this form, a fixed-function engine
is best suited to high-volume applications such as cellular phones. Fixed-function
engines are sometimes provided as chips. A fixed-function video chip—such
as an MPEG-2 decoder chip—can be a cost-effective way to add functionality
to an existing product, particularly when the product has a host processor that
can handle the required control and user interface functions.

Figure 1 shows Hantro’s 5150 MPEG-4 video decoder, an example of a fixed-function
video engine sold in IP form. As illustrated in the figure, the engine is intended
to be used as a coprocessor, attached to a general-purpose processor that handles
some of the less demanding sub-tasks required for MPEG-4 decoding.

The key drawback of fixed-function hardware is its lack of flexibility. Since
it is not programmable, product developers cannot easily modify fixed-function
hardware to support new standards or different features. This is a critical
concern because many video applications are still relatively immature, with
rapidly changing standards and features.

Fixed-function engines are often used as components of application-specific
standard products, which we discuss next.

Application-specific standard products (ASSPs) are highly
integrated application-specific chips. In contrast to application-specific integrated
circuits (ASICs), which are designed by a system house for use in its own products,
ASSPs are designed by chip companies and offered as off-the-shelf chips to multiple
system developers. Since developing a complex chip is expensive and time-consuming,
ASSPs are typically available only for well-established applications where high
volumes already exist, or are anticipated.

The Zoran Vaddis 5R, shown in Figure 2, is a highly specialized chip targeting
audio and video processing in a DVD recorder. The key algorithms required are
well defined: most notably, MPEG-2 video compression and decompression.

Though the Vaddis 5R includes two RISC processors, it uses fixed-function hardware
accelerators for the most compute-intensive tasks, like MPEG-2 video decoding
and color space conversion. For that reason, the Vaddis 5R (and other ASSPs
like it) shares the strengths and weaknesses associated with fixed-function
engines: good performance and energy efficiency, but limited flexibility.

This limited flexibility means system designers have limited opportunity to
differentiate a product from other products that use the same ASSP. It also
means that system designers are highly dependent on the chip vendor’s
roadmap, since a new chip will be required to support significantly different
functionality—as might be needed in a follow-on product.

ASSPs that rely primarily on programmable processors for computationally intensive
tasks sacrifice energy and cost efficiency to gain flexibility. ASSPs of this
type are generally bundled with key software components such as video decoders
and device drivers, freeing the system developer from much of the low-level
software development work. Nevertheless, software development and integration
can require significant effort in comparison to that required when using ASSPs
based on fixed-function hardware.

Media processors lie between ASSPs and digital signal processors
(DSPs) on the specialization-vs.-flexibility continuum. Media processors are
optimized for tasks associated with audio and video processing, rather than
for a broad range of signal processing tasks, as DSPs are. Media processors
are typically heterogeneous multiprocessors, incorporating a main processing
engine similar to a DSP, plus two or three specialized coprocessors, and audio- and video-specific peripherals.

Figure 3 illustrates an example media processor, the Philips PNX1500. Typical
of media processors, the PNX1500 is based on a powerful, highly parallel processor
core that is efficient at video processing tasks. Also typical of media processors,
the PNX1500 includes a few fixed-function hardware accelerators and specialized
peripheral devices. The main processor core, which is programmable by the system
designer, handles complex video tasks like compression.

Like the Zoran Vaddis 5R, the PNX1500 is well suited to MPEG-2 decoding. But
unlike the Zoran ASSP, the PNX1500 is flexible enough to be used with other
video compression standards such as H.264.This flexibility comes at a price,
of course: a software-based video decoder is generally less energy- and cost-efficient
than fixed-function hardware.

The heterogeneous multiprocessor nature of media processors makes software
development more difficult compared to other programmable processors. For example,
to implement a given video task, it is typically necessary to program two or
more processing elements and coordinate their interactions. To help address
this disadvantage, media processor vendors often provide optimized software
component libraries.

Media processor vendors typically stress the use of C or C++ for software development
and don’t recommend—or support—assembly language. The focus
on high-level language software development is intended to insulate the programmer
from many of the complexities of the processor architecture. The downside is
that the programmer must rely on the compiler to generate efficient code—and
this isn’t always realistic. Developers may need to invest considerable
effort hand tuning their high-level language code for best performance.

Digital signal processors, or DSPs, are designed for a range
of signal processing applications. DSPs typically employ less video-oriented
specialization and parallelism than media processors. To compensate for their
lower parallelism, DSPs typically must operate at higher instruction rates than
media processors for a given application. Higher instruction rates can complicate
system design and increase energy consumption. On the other hand, DSPs require
lower clock speeds than embedded RISC processors (discussed below) to handle
video tasks. Key advantages for DSPs are their flexibility and strong application
development tools. Figure 4 shows an example video-oriented DSP, the Texas Instruments
TMS320DM642.

Historically, DSPs have been poor compiler targets, and DSP compilers have
been inefficient. But recent years have brought a trend toward developing more
compiler-friendly DSPs. Also, some DSP vendors and independent tool providers
have invested heavily in developing compilers. As a result, DSP compiler quality
has risen dramatically.Yet4, obtaining maximum performance often requires hand-optimized
assembly code. The good news is that DSP vendors often provide good assembly
language programming tools. But the architectures themselves are sometimes complex,
making assembly programming challenging. Because video applications are an important
target of DSPs, DSP development tools often have features that aid developers
of video applications. For example, data visualization capabilities can be valuable
when debugging video processing software.

An important difference between typical DSPs and typical embedded RISC processors
is support for operating systems. DSPs typically support a small number of real-time
OSes, but do not support “full-featured” OSes like Windows CE. Consequently,
many system designs use a DSP to handle video processing and an embedded RISC
processor to run the OS and handle other non-video tasks. Recently, however,
some DSP vendors have made sophisticated operating systems such as Linux available
on their processors.

Historically, DSP vendors have not put a priority on maintaining compatibility
from generation to generation. This makes it harder to re-use application software
when moving from one processor generation to the next. This is changing, however,
with several new DSPs offering some level of compatibility with their predecessors.
For example, the TMS320C64x is binary compatible with its predecessor, the TMS320C62x.

Embedded RISC processors are popular for a wide range of embedded
applications. Historically, they have been general-purpose machines with few
or no application-specific features. RISC processors are often found in the
host processor role in video products, typically alongside a specialized video
processor.

The PXA27x is based on Intel’s XScale core, which itself is based on
the popular ARM v5TE instruction set. The PXA27x adds DSP enhancements to the
ARM instruction set via its Wireless MMX extensions. Its maximum clock speed
of 624 MHz is relatively high for an embedded RISC processor. In combination
with its DSP enhancements, this clock speed makes the PXA27x a capable performer
in a number of video processing tasks.

Although often less efficient at video tasks than other types of processors,
embedded RISC processors enjoy a number of advantages in the realm of application
software development. For example, embedded RISC processors are often backed
by a sophisticated software development infrastructure and legions of programmers.
And embedded RISC processors are generally easier to program than the other
processor classes discussed here. On the downside, the tools and software development
infrastructure for embedded RISC processors typically offer less support for
video processing software development than do the tools and infrastructure provided
for many other types of processors discussed here.

Roadmaps for embedded RISC architectures are generally clearer than the roadmaps
of the other processor classes discussed here, simplifying planning for system
developers who are contemplating multiple generations of products. And backwards
compatibility is almost always maintained. Another advantage of many RISC CPU
architectures is multivendor support—that is, multiple vendors offer chips
based on the same core architecture. Unfortunately, advanced features, like
the Wireless MMX extensions, are often limited to one vendor.

Field-programmable gate arrays (FPGAs) might not be the first
thing that comes to mind when thinking about a video processor, but their flexibility
and high parallelism (and thus, potentially, high speed) can be a great match
for tough video processing applications.

FPGAs like Altera’s Stratix-II can be configured to match the requirements
of an application and can provide massive computational power and memory bandwidth.
Stratix-II is a high-end FPGA family that includes specialized fixed-function
blocks, such as multipliers, PLLs, and memory blocks—all of which can
boost its performance in video processing algorithms. Figure 6 shows an MPEG-2
decoder implementation on the Stratix-II EP2S15.

FPGAs are the most flexible processor type, and FPGA-based designs can be readily
upgraded to implement new features or adapt to emerging standards. Unfortunately,
this flexibility comes at the price of reduced energy efficiency and cost efficiency.
For example, FPGAs are typically far less energy efficient than ASICs or ASSPs,
and FPGAs can cost hundreds or even thousands of dollars apiece. FPGA vendors,
however, have recently introduced more cost-effective devices, making them attractive
for a broader range of applications.

Another downside to FPGAs is that the application development effort is much
higher than that associated with programmable processor software development,
and fewer engineers are skilled in FPGA design than in software development.

While an FPGA can be a good match for video algorithms, a programmable processor
is usually still needed to run things like an OS. For this reason, FPGAs are
typically used in conjunction with one or more programmable processors. However,
with the advent of “soft” processor cores designed for implementation
within an FPGA, like Altera’s Nios II and Xilinx’s MicroBlaze (both
32-bit RISC processor cores), instruction set processors can now be incorporated
within an FPGA.

Alternatives
In addition to the six categories of processors discussed above, there are
at least four other processor types that may be suitable for some digital video
applications. These include the following:

Embedded PC CPUs are general-purpose processors, and
thus have few (if any) features that are specifically designed for video processing.
Vendors often recycle older, PC-oriented architectures and add more on-chip
integration to create variants specifically designed for embedded applications.
These embedded PC CPUs are generally unsuitable for heavy-duty video processing,
and so they are frequently coupled with a specialized “video” processor
that handles the core video processing tasks.

Configurable processors are licensable processor cores
that can be customized by the core licensee for use in custom chips. The customization
process takes place before the chip is fabricated; once fabricated, the processor
hardware is fixed.

Reconfigurable processors are similar to configurable
processors, except that they can be reconfigured for different tasks after the
chip is fabricated, thus allowing different configurations to be selected at
run-time.

Application-specific instruction processors (ASIPs)
are processors that are custom designed for the application at hand. ASIPs are
not sold as packaged processors or as licensable processor cores; instead, vendors
offer tools that enable chip designers to create their own ASIPs.

Because digital video is such a hot market, expect to see even more processor
types introduced in the coming years. These will probably combine elements of
the types of processors we’ve discussed here, and their tradeoffs will
reflect those of the constituent approaches.

Hedging bets
Clearly, no single processor or processor type is best for all digital video
applications. Classes of processors that offer some flexibility are becoming
popular, but fixed-function hardware has its place, too. In part, it’s
a question of how much you want to hedge your bets—and you must consider
all the solutions.

ResourcesInside DSP readers may receive a 20% rebate of their fee on any seminar
purchased by March 30, 2005. Just enter code DSP0503 under “Promotional
Codes” on the purchase form. See www.BDTI.com/video.html
for details.