Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

Methods and systems for processing video data are provided herein.
Aspects of the method may comprise receiving on a chip, a plurality of
video frames and storing a portion of the received video frames in a
memory on the chip. A first portion of the stored video frames may be
decoded on the chip and a second portion of the stored video frames may
be encoded on the chip during the decoding. A third portion of the stored
video frames may be converted from an input video format to a YUV video
format. A reference frame corresponding to the converted third portion
may be identified from the stored video frames. After conversion, the
converted third portion of the received video frames may be utilized as
the reference frame for estimating/encoding, or compensating/decoding
temporal motion of the subsequently received video frames.

Claims:

1. A method for processing video data, the method comprising: receiving
on a chip, a plurality of video frames; storing at least a portion of
said received plurality of video frames in a memory on said chip;
decoding on said chip, a first portion of said stored at least a portion
of said received plurality of video frames; and encoding on said chip a
second portion of said stored at least a portion of said received
plurality of video frames during said decoding.

[0006] The above stated patent applications are hereby incorporated herein
by reference in their entirety.

BACKGROUND OF THE INVENTION

[0007] Video compression and decompression techniques, as well as
different display standards, are utilized by conventional video
processing systems, such as portable video communication devices, during
recording, transmission, storage, and playback of video information. For
example, quarter common intermediate format (QCIF) may be utilized for
playback and recording of video information, such as videoconferencing,
utilizing portable video communication devices, for example, portable
video telephone devices. The QCIF format is an option provided by the
ITU-T's H.261 standard for videoconferencing codes. It produces a color
image of 144 non-interlaced luminance lines, each containing 176 pixels.
The frame rate for videoconferencing may be up to 15 frames per second
(fps). QCIF provides approximately one quarter the resolution of the
common intermediate format (CIF) with resolution of 288 luminance (Y)
lines each containing 352 pixels.

[0008] Conventional video processing systems for portable video
communication devices, such as video processing systems implementing the
QCIF format, utilize video encoding and decoding techniques to compress
video information during transmission, or for storage, and to decompress
elementary video data prior to communicating the video data to a display.
The video compression and decompression (CODEC) techniques, such as
variable length coding (VLC), discrete cosine transformation (DCT),
quantization, and/or motion estimation, in conventional video processing
systems for portable video communication devices utilize a significant
part of the computing and memory resources of a general purpose central
processing unit (CPU) of a microprocessor, or other embedded processor,
for computation-intensive tasks and data transfers during encoding and/or
decoding of video data. The general purpose CPU, however, handles other
real-time processing tasks, such as communication with other modules
within a video processing network during a video teleconference utilizing
the portable video communication devices, for example. The increased
amount of computation-intensive video processing tasks and data transfer
tasks executed by the CPU and/or other processor, in a conventional QCIF
video processing system results in a significant decrease in the video
quality that the CPU or processor may provide within the video processing
network.

[0009] Further limitations and disadvantages of conventional and
traditional approaches will become apparent to one of skill in the art,
through comparison of such systems with some aspects of the present
invention as set forth in the remainder of the present application with
reference to the drawings.

BRIEF SUMMARY OF THE INVENTION

[0010] A system and/or method for processing video data, substantially as
shown in and/or described in connection with at least one of the figures,
as set forth more completely in the claims.

[0011] Various advantages, aspects and novel features of the present
invention, as well as details of an illustrated embodiment thereof, will
be more fully understood from the following description and drawings.

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS

[0012] FIG. 1A is a block diagram of an exemplary video encoding system
that may be utilized in connection with an aspect of the invention.

[0013] FIG. 1B is a block diagram of an exemplary video decoding system
that may be utilized in connection with an aspect of the invention.

[0014] FIG. 2 is a block diagram of the exemplary microprocessor
architecture for video compression and decompression utilizing on-chip
accelerators, in accordance with an embodiment of the invention.

[0015] FIG. 3A illustrates architecture for an exemplary on-chip memory
module (OCM) that may be utilized in connection with the microprocessor
of FIG. 2, for example, in accordance with an embodiment of the
invention,

[0016] FIG. 3B illustrates an exemplary rotating buffer scheme within an
on-chip memory (OCM) module that may be utilized in connection with the
microprocessor of FIG. 2, for example, in accordance with an embodiment
of the invention.

[0017] FIG. 3C illustrates an exemplary timing diagram for encoding and
decoding within the microprocessor of FIG. 2, for example, in accordance
with an embodiment of the invention.

[0018] FIG. 3D illustrates an exemplary timing diagram for QCIF dual video
display in connection with the microprocessor of FIG. 2 for example, in
accordance with an embodiment of the invention.

[0019] FIG. 4 is an exemplary timing diagram illustrating macroblock
processing during video encoding via the microprocessor of FIG. 2, for
example, in accordance with an embodiment of the invention.

[0020] FIG. 5 is an exemplary timing diagram illustrating video decoding
via the microprocessor of FIG. 2, for example, in accordance with an
embodiment of the invention.

[0021] FIG. 6 is a flow diagram of an exemplary method for compression of
video information, in accordance with an embodiment of the invention.

[0022] FIG. 7 is a flow diagram of an exemplary method for decompression
of video information, in accordance with an embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0023] Certain aspects of the invention may be found in a method and
system for on-chip processing of video data. In one aspect of the
invention, computation-intensive video processing and data transfer tasks
for encoding/decoding video information in a portable video communication
device, such as a QCIF-enabled portable video communication device, may
be significantly improved by utilizing one or more hardware accelerators
within the microprocessor of the portable device. The hardware
accelerators may be adapted to offload most of the computation-intensive
encoding and/or decoding tasks from the CPU, which frees up the processor
to handle other tasks. This results in increased CPU processing speeds
and/or the data transfer speeds within the QCIF video processing network.

[0024] In addition, the hardware accelerators may utilize one or more
local memory modules for storing intermediate processing results during
encoding and/or decoding, thus minimizing the burden on the system bus
within the microprocessor and any on-chip memory, such as a level one
tightly coupled memory (TCM) and/or level two on-chip memory (OCM) within
the microprocessor. The OCM, for example, may be utilized to store
YUV-formatted macroblock information prior to encoding and/or
RGB-formatted macroblock information after decoding and prior to
displaying the decoded video information. The OCM may be utilized to
store a plurality of reference frames that may be used for encoding
and/or decoding, as well as computational results and/or video data prior
to encoding or after decoding and prior to output for displaying.

[0025] FIG. 1A is a block diagram of an exemplary video encoding system
that may be utilized in connection with an aspect of the invention.
Referring to FIG. 1A, the video encoding system 100 may comprise a
pre-processor 102, a motion separation module 104, a discrete cosine
transformer and quantizer (DCTQ) module 106, a variable length code (VLC)
encoder 108, a packer 110, a frame buffer 112, a motion estimator 114, a
motion compensator 116, and an inverse quantizer and inverse discrete
cosine transformer (IQIDCT) module 118.

[0026] The pre-processor 102 comprises suitable circuitry, logic, and/or
code and may be adapted to acquire video data from the camera 130 and
convert the camera video data to a YUV formal The motion estimator 114
comprises suitable circuitry, logic, and/or code and may be adapted to
acquire one or more reference macroblocks and a current macroblock and
determine a most optimal reference macroblock from the acquired reference
macroblocks for use during motion separation and/or motion compensation,
for example. The motion separation module 104 comprises suitable
circuitry, logic, and/or code and may be adapted to acquire a current
macroblock and its motion reference and determine one or more estimation
errors based on the difference between the acquired current macroblock
and its motion reference.

[0027] The DCTQ module 106 and the IQIDCT module 118 comprise suitable
circuitry, logic, and/or code and may be adapted to transform the
estimation errors to frequency coefficients and the frequency
coefficients back to estimation errors. For example, the DCTQ module 106
may be adapted to acquire one or more estimation errors and apply a
discrete cosine transform and subsequently quantize the acquired
estimation errors to obtain frequency coefficients. Similarly, the IQIDCT
module 118 may be adapted to acquire one or more frequency coefficients
and apply an inverse quantization to the acquired frequency coefficients
and subsequently an inverse discrete cosine transform to obtain
estimation errors.

[0028] The motion compensator 116 comprises suitable circuitry, logic,
and/or code and may be adapted to acquire a motion reference and an
estimation error and reconstruct a current macroblock based on the
acquired motion reference and estimation error. The VLC encoder 108 and
the packer 110 comprise suitable circuitry, logic, and/or code and may be
adapted to generate an encoded elementary video stream based on motion
estimation information and/or quantized frequency coefficients. For
example, motion estimation from one or more reference macroblocks may be
encoded together with corresponding frequency coefficients to generate
the encoded elementary bitstream.

[0029] In operation, the pre-processor 102 may acquire video data from the
camera 130, such as QCIF video data, and may convert the video data to
YUV-formatted video data suitable for encoding. A current macroblock 120
may then be communicated to both the motion separation module 104 and the
motion estimator 114. The motion estimator 114 may acquire one or more
reference macroblocks 122 from the frame buffer 112 and may determine a
motion reference 126 corresponding to the current macroblock 120. The
motion reference 126 may then be communicated to both the motion
separation module 104 and the motion compensator 116.

[0030] The motion separation module 104, having acquired the current
macroblock 120 and its motion reference 126, may generate an estimation
error based on a difference between the motion reference 126 and the
current macroblock 120. The generated estimation error may be
communicated to the DCTQ module 106 where the estimation error may be
transformed into one or more frequency coefficients by applying a
discrete cosine transformation and a quantization process. The generated
frequency coefficients may be communicated to the VLC encoder 108 and the
packer 110 for encoding into the bitstream 132. The bitstream 132 may
also comprise one or more motion reference pointers corresponding to the
quantized frequency coefficients of the current macroblock.

[0031] The frequency coefficients generated by the DCTQ module 106 may be
communicated to the inverse discrete cosine transformer and inverse
quantizer module 118. The IQIDCT module 118 may transform the frequency
coefficients back to one or more estimation errors 128. The estimation
errors 128, together with its motion reference 126, may be utilized by
the motion compensator 116 to generate a reconstructed current macroblock
124. The reconstructed macroblock 124 may be stored in the frame buffer
112 and may be utilized as a reference for macroblocks in a subsequent
frame generated by the pre-processor 102.

[0032] FIG. 1B is a block diagram of an exemplary video decoding system
that may be utilized in connection with an aspect of the invention.
Referring to FIG. 18, the video decoding system 150 may comprise an
unpacker 152, a VLC decoder 154, a reference generating module 164, a
frame buffer 160, an IQIDCT module 156, a motion compensator 158, and a
post-processor 162.

[0033] The unpacker 152 and VLC decoder 154 comprise suitable circuitry,
logic, and/or code and may be adapted to decode an elementary video
bitstream and generate one or more quantized frequency coefficients
and/or corresponding motion reference pointers. The IQIDCT module 156
comprises suitable circuitry, logic, and/or code and may be adapted to
transform one or more quantized frequency coefficients to one or more
estimation errors. The motion compensator 158 comprises suitable
circuitry, logic, and/or code and may be adapted to acquire a motion
reference and an estimation error and reconstruct a current macroblock
based on the acquired motion reference and estimation error.

[0034] in operation, the unpacker 152 and VLC decoder 154 may decode a
QCIF elementary video bitstream 174 and generate one or more quantized
frequency coefficients and/or corresponding motion reference pointers.
The generated quantized frequency coefficients may then be communicated
to the reference generating module 164 and the IQIDCT module 156. The
reference generating module 164 may acquire one or more reference
macroblocks 166 from the frame buffer 160 and may generate a motion
reference 172 corresponding to the quantized frequency coefficients. The
motion reference 172 may be communicated to the motion compensator 158
for macroblock reconstruction,

[0035] The IQIDCT module 156 may transform the quantized frequency
coefficients to one or more estimation errors 178. The estimation errors
178 may be communicated to the motion compensator 158. The motion
compensator 158 may then reconstruct a current macroblock 168 utilizing
the estimation errors 178 and its motion reference 172. The reconstructed
current macroblock 168 may be stored in the frame buffer 160 for
subsequent post-processing. For example, a reconstructed macroblock 170
may be communicated from the frame buffer 160 to the post-processor 162.
The post-processor 162 may convert the YUV-formatted data 170 in frame
buffer 160 to an RGB format and communicate the RGB-formatted video data
to the display 176 for video displaying in a QCIF video format.

[0036] Referring to FIGS, 1A and 1B, in one aspect of the invention, one
or more on-chip accelerators may be utilized to offload
computation-intensive tasks from the CPU during encoding and/or decoding
of video data. For example, one accelerator may be utilized to handle
motion related computations, such as motion estimation, motion
separation, and/or motion compensation. A second accelerator may be
utilized to handle computation-intensive processing associated with
discrete cosine transformation, quantization, inverse discrete cosine
transformation, and inverse quantization. Another on-chip accelerator may
be utilized to handle pre-processing of data, such as RGB-to-YUV format
conversion, and post-processing of video data, such as YUV-to-RGB format
conversion. Furthermore, one or more on-chip memory (OCM) modules may be
utilized to improve data processing speed of the CPU and the
microprocessor during video data encoding and/or decoding. For example,
an OCM module may be utilized during QCIF-formatted video data and may
buffer one or more video frames that may be utilized during encoding
and/or decoding. In addition, the OCM module may also comprise buffers
for intermediate computational results during encoding and/or decoding
such as discrete cosine transformation (DCT) coefficients and/or
estimation error information.

[0037] FIG. 2 is a block diagram of the exemplary microprocessor
architecture for video compression and decompression utilizing on-chip
accelerators, in accordance with an embodiment of the invention.
Referring to FIG, 2, the exemplary microprocessor architecture 200 may
comprise a central processing unit (CPU) 202, a variable length code
coprocessor (VLCOP) 206, a video pre-processing and post-processing (VPP)
accelerator 208, a transformation and quantization (TQ) accelerator 210,
a motion engine (ME) accelerator 212, an on-chip memory (OCM) 214, an
external memory interface (EMI) 216, a display interface (DSPI) 218, and
a camera interface (CAMI) 242. The EMI 216, the DSPI 218, and the CAMI
220 may be utilized within the microprocessor architecture 200 to access
the external memory 238, the display 240, and the camera 242,
respectively.

[0038] The CPU 202 may comprise an instruction port 226, a data port 228,
a peripheral device port 222, a coprocessor port 224, tightly coupled
memory (TCM) 204, and a direct memory access (DMA) module 230. The
instruction port 226 and the data port 228 may be utilized by the CPU 202
to acquire the program and to communicate data via bus connections to the
system bus 244 during encoding and/or decoding of video information.

[0039] The TCM 204 may be utilized within the microprocessor architecture
200 for storage and access to large amount of data without compromising
operating efficiency of the CPU 202. The DMA module 230 may be utilized
in connection with the TCM 204 to ensure quick access and data transfer
of information from the TCM 204 during operating cycles when the CPU 202
is not accessing the TCM 204. In an exemplary aspect of the invention,
the TCM 204 may comprise a level one (L1) memory for the CPU 202.

[0040] The CPU 202 may utilize the coprocessor port 224 to communicate
with the VLCOP 206. The VLCOP 206 may be adapted to assist the CPU 202 by
offloading certain variable length code (VLC) encoding and/or decoding
tasks. For example, the VLCOP 206 may be adapted to utilize suitable
coding techniques, such as code table look-up and/or packing/unpacking of
an elementary bitstream, to coordinate encoding/decoding tasks with the
CPU 202 on a cycle-by-cycle basis.

[0041] The OCM 214 may be utilized within the microprocessor architecture
200 during pre-processing and post-processing of video data during
compression and/or decompression. For example, the OCM 214 may be adapted
to store camera data communicated from the camera 242 via the CAMI 220
prior to conversion to YUV-formatted video data. The OCM 214 may also be
adapted to store YUV-formatted data prior to conversion to RGB-formatted
video data and subsequent communication of such data to the video display
240 via the DSPI 218 for displaying in a QCIF format, for example.

[0042] In an exemplary aspect of the invention, the OCM 214 may comprise
one or more frame buffers that may be adapted to store one or more
reference frames utilized during encoding and/or decoding. For example,
the OCM 214 may comprise three buffers adapted to store luminance (Y) and
chrominance (UV) information for three frames. The three buffers may be
adapted to be used within the exemplary microprocessor architecture 200
in a rotating fashion. In addition, the OCM 214 may comprise buffers
adapted to store computational results and/or video data prior to
encoding or after decoding and prior to output for displaying, such as
DCT coefficients and/or estimation error information. The OCM 214 may be
accessed by the CPU 202, the VPP accelerator 208, the TQ accelerator 218,
the ME accelerator 212, the EMI 216, the DSPI 218, and the CAMI 220 via
the system bus 244. In an exemplary aspect of the invention, the OCM 214
may be utilized as a level two (L2) memory for the CPU 202.

[0043] The CPU 202 may utilize the peripheral device port 222 to
communicate with the on-chip accelerators VPP 208, TQ 210, and ME 212 via
a bus connection. The VPP accelerator 208 may comprise suitable circuitry
and/or logic and may be adapted to provide video data pre-processing and
post-processing during encoding and/or decoding of video data within the
microprocessor architecture 200 so that encoded and decoded video data is
in a YUV format. For example, the camera 242 may capture video in a
line-by-line sequence and in a format specific to the camera 242. The
captured data may then be pre-processed by the VPP accelerator 208 to a
YUV format suitable for encoding. In addition, the VPP accelerator 208
may be adapted to convert decoded YUV-formatted video data to
RGB-formatted video data prior to communicating the data to a video
display 240 in a line-by-line sequence, for example. Post-processed video
data from the VPP accelerator 208 may be stored in a local line buffer,
for example, of the VPP accelerator 208. Post-processed video data in a
VPP local line buffer may be in a QCIF format and may be communicated to,
or fetched by, the DSPI 218 and subsequently to the display 240 for
displaying. In a different aspect of the invention, the CPU 202 may
perform post-processing of video data and post-processed data may be
stored in the TCM 204 for subsequent communication to the DSPI 218 via
the bus 244.

[0044] The TQ accelerator 210 may comprise suitable circuitry and/or logic
and may be adapted to perform discrete cosine transformation and
quantization related processing of video data, including inverse discrete
cosine transformation and inverse quantization. The TQ accelerator 210
may also utilize shared memory 232 together with the ME accelerator 212.
The ME accelerator 212 may comprise suitable circuitry and/or logic and
may be adapted to perform motion estimation, motion separation, and/or
motion compensation during encoding and/or decoding of video data within
the microprocessor architecture 200. In one aspect of the invention, the
ME accelerator 212 may utilize on-chip reference memory 234, on-chip
current memory 236, and or the OCM 214 to store reference macroblock data
and current macroblock data, respectively, utilized by the ME accelerator
212 during motion estimation, motion separation, and/or motion
compensation. By utilizing the VLCOP 206, the VPP accelerator 208, the TQ
accelerator 210, the ME accelerator 212, as well as the shared memory
232, the reference memory 234, the current memory 236, and the OCM 214
during encoding and/or decoding of video data, the CPU 202 may be
alleviated from computation-intensive tasks during encoding and/or
decoding of video data.

[0045] FIG. 3A illustrates architecture for an exemplary on-chip memory
module (OCM) that may be utilized in connection with the microprocessor
of FIG. 2, for example, in accordance with an embodiment of the
invention. Referring to FIG. 3A, the on-chip memory (OCM) architecture
300 may comprise a plurality of buffers, such as camera buffers 302,
reference buffers 304, delta buffer 306, and a DCT buffer 308. The OCM
architecture 300 may be adapted to store macroblock and/or computational
data during encoding and/or decoding of video data. The OCM architecture
300 may be utilized, for example, within the OCM 214 of FIG. 2.

[0046] Referring to FIGS. 2 and 3A, at least two camera buffers 302 may be
adapted to store pre-processed camera data, which may be YUV-formatted
for encoding. Each buffer may be adapted to hold one row of macroblocks.
One of the two buffers may be utilized by the VPP accelerator 208 of FIG.
2 to write YUV-formatted video data after conversion by the VPP
accelerator 208 of the captured data from CAMI 220 during pre-processing
within the microprocessor architecture 200. The second buffer may be
utilized by the ME 212 to read YUV-formatted data for motion estimation
and separation, while the previous buffer is being filled by the VPP
accelerator 208.

[0047] The frame buffers 304 may comprise a plurality of frame buffers
that may be adapted to store a plurality of reference frames, for
example, reference frames Fa, Fb and Fc, which may be utilized during
encoding and/or decoding of video data. The delta buffer 306 may be
adapted to store a delta, or a difference, between a macroblock and its
motion reference. The delta buffer 306 may also store estimation error
information based on a determined delta. The DCT buffer 308 may be
adapted to store DCT coefficients of a macroblock. In an exemplary aspect
of the invention, the delta and/or DCT coefficients may be
double-buffered to enable a module writing to the OCM 300 and a module
reading from the OCM 300 to operate in parallel.

[0048] In an exemplary embodiment of the invention, for a QCIF size video,
the buffers within OCM 300 may be arranged utilizing memory space
allocations as illustrated in the following table.

In this arrangement, the OCM architecture 300 may be implemented
utilizing 128K bytes OCM, for example. Other memory arrangements may be
utilized in accordance with other aspects of the invention.

[0049] FIG. 3B illustrates an exemplary rotating buffer scheme 320 within
an on-chip memory (OCM) module that may be utilized in connection with
the microprocessor of FIG. 2, for example, in accordance with an
embodiment of the invention. Referring to FIG. 3B, the exemplary rotating
buffer scheme 320 may be implemented utilizing reference frame buffers
within an OCM, such as reference buffers 304 of OCM 300 in FIG. 3A. In an
exemplary aspect of the invention, the rotating buffer scheme 320 may
utilize three buffers, FRAMEa, FRAMEb, and FRAMEc, in a rotating fashion
for simultaneous encoding and decoding of video data within a
QCIF-enabled portable video communication device, for example.

[0050] During an exemplary video processing cycle, at the start of
processing of frame 1, the decoding (DEC) reference may be stored in
buffer FRAMEa and the encoding (ENC) reference may be stored in buffer
FRAMEc. After encoding frame 1, the encoding result for the current
frame, ENC current, may be stored in buffer FRAMEb. After decoding frame
1, the decoding result for the current frame, DEC current, may be stored
in buffer FRAMEc.

[0051] At the start of processing of frame 2, DEC current in FRAMEc may be
utilized as DEC reference. Similarly, ENC current in FRAMEb may be
utilized as ENC reference. After encoding frame 2, the encoding result
for the current frame, ENC current, may be stored in buffer FRAMEa. After
decoding frame 2, the decoding result for the current frame, DEC current,
may be stored in buffer FRAMEb.

[0052] At the start of processing of frame 3, DEC current in FRAMEb may be
utilized as DEC reference. Similarly, ENC current in FRAMEa may be
utilized as ENC reference. After encoding frame 3, the encoding result
for the current frame, ENC current, may be stored in buffer FRAMEc. After
decoding frame 3, the decoding result for the current frame, DEC current,
may be stored in buffer FRAMEa.

[0053] FIG. 3C illustrates an exemplary timing diagram 340 for encoding
and decoding within the microprocessor of FIG. 2, for example, in
accordance with an embodiment of the invention. Referring to FIG. 3C, the
timing diagram 340 may illustrate exemplary video data capturing and
simultaneous encoding and/or decoding of the captured video data. In an
exemplary aspect of the invention, a portable video communication device
may be adapted to process QCIF-formatted video data, where encoding may
be performed simultaneously with the capturing of the video data and
decoding may be performed after the encoding of the video data. A
synchronization rate that is twice the video frame rate may be utilized
during processing within the portable video communication device.

[0054] Referring to FIGS. 3C and 3B, at time t1, the capturing of the
ith frame from camera may be started. After the first row of MB captured,
the encoding may be simultaneously started utilizing ENC reference stored
in buffer FRAMEc. After encoding, the result ENC current may be stored in
FRAMEb and may be utilized during encoding of the subsequent (i+1)th
frame captured from camera. At time t2, the encoding of the ith
frame captured from camera is completed and the decoding of the ith frame
from remote is started. During decoding of the ith frame from remote, a
DEC reference in FRAMEa may be utilized and the result may be stored as
DEC current in FRAMEc.

[0055] At time t3, the capturing of the (i+1)th frame from camera may
be started. After the first row of MB captured, the encoding may be
simultaneously started utilizing ENC reference stored in FRAMEb. During
encoding of the (i+1)th frame, one or more macroblocks from the ith frame
in FRAMEb may be utilized for motion estimation. After -encoding, the
result ENC current may be stored in FRAMEa and may be utilized during
encoding of the subsequent (i+2)th frame from camera. At time t4,
the encoding of the (i+1)th frame from camera is completed and the
decoding of the ith frame from remote is started. During decoding of the
(i+1)th frame from remote, a DEC reference in FRAMEc may be utilized and
the result may be stored as DEC current in FRAMEb.

[0056] At time t5, the capturing of the (i+2)th frame from camera may
be started. After the first row of MB captured, the encoding may be
simultaneously started utilizing an ENC reference stored in FRAMES.
During encoding of the (i+2)th frame, one or more macroblocks from the
(i+1)th frame in FRAMEa may be utilized for motion estimation. After
encoding, the result ENC current may be stored in FRAMEc and may be
utilized during encoding of the subsequent (i+3)th frame from camera. At
time t6, the encoding of the (i+2)th frame from camera is completed
and the decoding of the (i+2)th frame from remote is started. During
decoding of the (i+2)th frame from remote, a DEC reference in FRAMEb may
be utilized and the result may be stored as DEC current in FRAMES.

[0057] FIG. 3D illustrates an exemplary timing diagram 360 for QCIF dual
video display in connection with the microprocessor of FIG. 2, for
example, in accordance with an embodiment of the invention. Referring to
FIGS. 2, 3C and 3D, an exemplary portable video communication device may
utilize the microprocessor architecture 200 and simultaneously capture,
encode and decode one or more video frames from QCIF-formatted video
data. In one aspect of the invention, during a two-way video
communication, such as a video telephone connection utilizing
QCIF-compatible portable video devices, each portable video processing
device may be adapted to display QCIF-formatted data of a sender and a
recipient party. For example, a first portable video processing device
may capture video frames, as illustrated on FIG. 30, and may
simultaneously display in a QCIF format the user of The first device,
which may be represented as "display self" in FIG. 3D, as well as display
in a QCIF format the user of the second portable video communication
device, which may be represented as "display remote" in FIG. 3D.

[0058] In another aspect of the invention, "display self" QCIF-formatted
video data may be acquired from an ENC current buffer. Similarly,
"display remote" QCIF-formatted video data may be acquired from a DEC
current buffer. In this manner, "display self' QCIF-formatted data may be
acquired from buffer FRAMEc for time period t1 through t2, from
FRAMEb for time period t2 through t4, and from FRAMES for time
period t4 through t6, for example. Similarly, "display remote"
QCIF-formatted data may be acquired from buffer FRAMES for time period
t1 through t3, from FRAMEc for time period t3 through
t5, and from FRAMEb for time period t5 through t7 (not
illustrated), for example.

[0059] FIG. 4 is an exemplary timing diagram illustrating macroblock
processing during video encoding via the microprocessor of FIG. 2, for
example, in accordance with an embodiment of the invention. Referring to
FIGS. 2, 3A, and 4, for example, the QCIF camera data may be communicated
from the camera 242 to the VPP accelerator 208 via the CAMI 220 and the
system bus 244. Captured video data may be stored within a line buffer,
for example, within the CAMI 220. The VPP accelerator 208 may then
convert the captured camera data to a YUV-format and store the result in
buffer 302 within the OCM 214 in a line-by-line sequence. After one row
of macroblocks have been filled with YUV-formatted data, the CPU 202, ME
212, and TO 208 may start encoding of the macroblocks in the filled row
and the VPP 208 may continue storing YUV-formatted data of the next row
of macroblocks in the other buffer.

[0060] In an exemplary aspect of the invention, for each macroblock, the
CPU 202 may first set up the microprocessor architecture 200 for encoding
a current macroblock. The ME accelerator 212 may then acquire
YUV-formatted data for current macroblock from buffer 302 within the OCM
214 and may store the current macroblock data in a local memory inside
ME. The ME accelerator 212 may then acquire from buffer 304 luminance (Y)
data in the motion search area of the current macroblock. During motion
estimation, the ME accelerator 212 may compare current macroblock with
possible motion reference candidates in the search area from buffer 304,
and the CPU 202 may be utilized to select the final motion reference for
the current macroblock.

[0061] After the motion reference for current macroblock has been
selected, the ME accelerator 212 may generate one or more estimation
errors during motion separation based on a delta, or a difference,
between the current macroblock and the selected motion reference. The
generated delta and/or estimation error information may be stored in
buffer 306 in OCM 214 for subsequent processing by the TQ accelerator
210. The TQ accelerator 210 may discrete cosine transform and quantize
the estimation errors to obtain quantized frequency coefficients. The
quantized frequency coefficients may then be communicated to buffer 308
in OCM 214 for storage and subsequent encoding in a VLC bitstream, for
example. The quantized frequency coefficients may then be inverse
quantized and inverse discrete cosine transformed by the TQ accelerator
210 to generate estimation errors. The generated estimation errors may be
stored back in buffer 306 in the OCM 214 for subsequent utilization by
the ME accelerator 212 during motion compensation.

[0062] The ME accelerator 212 may then reconstruct the current macroblock
based on the reference macroblock information stored in the reference
buffers 304 and the generated delta and/or estimation error information
stored in buffer 306 in the OCM 214. After the current macroblock is
reconstructed by the ME accelerator 212, the reconstructed macroblock may
be stored in a buffer 304 in the OCM 214 to be utilized as a reference
macroblock during a subsequent operation cycle.

[0063] After quantized frequency coefficients information is stored in
buffer 308 in OCM 214, the CPU 202 may utilize DMA 230 to move the
frequency coefficients into TCM 204 and encode the quantized frequency
coefficients into a VLC bitstream, for example. In an exemplary aspect of
the invention, the CPU 202 and the accelerators VPP 208, TQ 210, and ME
212 may be utilized to process QCIF-formatted video data in a parallel
and/or pipeline fashion to achieve faster and more efficient encoding of
video data.

[0064] FIG. 5 is an exemplary timing diagram illustrating video decoding
via the microprocessor of FIG. 2, for example, in accordance with an
embodiment of the invention. Referring to FIGS. 2, 3A, and 5, for each
macroblock MB0, the CPU 202 may decode the frequency coefficients of a
current encoded macroblock MB0 from a current VLC encoded frame within an
elementary video bitstream received from remote. For example, the current
VLC encoded frame may be stored within buffer in the TCM 204. The CPU 202
may then decode the current macroblock MB0 and generate one or more
quantized frequency coefficients. The generated quantized frequency
coefficients may be stored in buffer 308 in OCM 214, may be transferred
by DMA 230, for subsequent communication to the TQ accelerator 210.

[0065] The TQ accelerator 210 may acquire the quantized frequency
coefficients from buffer 308 in the OCM 214 and may inverse quantize and
inverse discrete cosine transform the quantized frequency coefficients to
generate one or more estimation errors and/or delta information. The
generated delta and/or estimation error information may be stored in
buffer 306 in the OCM 214. While the TQ accelerator 210 generates the
estimation error, the ME accelerator 212 may acquire the motion reference
of MB0 from the reference buffer 304 in OCM 214.

[0066] The ME accelerator 212 may then reconstruct the current macroblock
MB0 utilizing the acquired motion reference from buffer 304 and the
generated estimation errors stored in buffer 306 in the OCM 214. The
reconstructed macroblock MB0 may be stored in buffer 304 to be utilized
as a reference macroblock during the decoding of a subsequent frame. In
an exemplary aspect of the invention, some of the tasks performed by the
CPU 202 and the accelerators VPP 208, TQ 210, and ME 212 may be performed
simultaneously and/or in a pipeline fashion to achieve faster and more
efficient encoding of video data. For example, the CPU 202 may start
decoding the VLC bitstream of a subsequent macroblock MB1 after the TQ
has processed MB0 frequency coefficients in the buffer 308.

[0067] After one row of macroblocks have been decoded, the VPP accelerator
208 may obtain the decoded macroblocks from buffer 304 in the OCM 214 and
may convert the YUV-formatted macroblocks to an RGB format in a
line-by-line sequence for subsequent displaying by a QCIF-compatible
portable video communication device, for example. The RGB-formatted lines
may be communicated to the DSPI 218 and the DSPI 218 may then communicate
the acquired RGB-formatted lines to the video display 240 for displaying.

[0068] FIG, 6 is a flow diagram of an exemplary method 600 for compression
of video information, in accordance with an embodiment of the invention.
Referring to FIG. 6, at 601, video capture may be received within a
microprocessor from a source, such as a camera feed, in a line-by-line
sequence. The microprocessor may be utilized within a portable video
communication device, such as a QCIF-enabled device. At 603, the captured
video frames may be converted to a YUV format by one or more hardware
accelerators within the microprocessor and may be subsequently stored in
an on-chip memory (OCM) in a format suitable for compression. At 605, for
each MB in a current frame, a current macroblock and its motion search
area in the previous frame may be acquired from the OCM. At 607, the
motion reference of the current macroblock may be determined from its
motion search area stored in the OCM. At 609, a delta, or a difference,
may be determined between the current MB and its motion reference. The
delta may then be stored in a buffer in the OCM.

[0069] At 613, the delta may be discrete cosine transformed and quantized
to generate quantized frequency coefficients. At 615, the generated
quantized frequency coefficients may be inverse quantized and inverse
discrete cosine transformed to generate estimation errors. At 617, the
current macroblock may be reconstructed by one or more of the hardware
accelerators based on the generated estimation errors and the current
macroblock motion reference. At 619, the reconstructed macroblock may be
stored in a reference buffer in the OCM and may be utilized as a
reference macroblock during encoding of the subsequent frame. At 621, the
quantized frequency coefficients of the current macroblock may be VLC
encoded and packed into a bitstream.

[0070] FIG. 7 is a flow diagram of an exemplary method for decompression
of video information, in accordance with an embodiment of the invention.
Referring to FIG. 7, at 701, VLC encoded video bitstream may be received
from communication packets. At 702, a VLC encoded video stream may be
decoded to generate the quantized frequency coefficients of the MB and
its motion reference. The generated quantized frequency coefficients may
be stored in a buffer in an on-chip memory (OCM) shared by on-chip
hardware accelerators. At 703, the stored quantized frequency
coefficients may be inverse quantized and inverse discrete cosine
transformed to obtain estimation errors. At 705, the motion reference of
the current macroblock may be acquired from a reference buffer in the
OCM, for example. At 707, a decoded macroblock may be reconstructed
ufilizing the estimation errors of the MB and its motion reference. At
709, the decoded macroblock may be stored in the reference buffer in the
OCM so that the decoded macroblock may be utilized as a reference
macroblock during decoding of the subsequent frame. At 711, the decoded
YUV-formatted frame may be converted line by line to an RGB format
suitable for display. The RGB-formatted data may then be stored in a
display line buffer. A display line buffer may be provided as part of a
QCIF display interface in a portable video communication device. At 713,
the RGB-formatted line may be communicated from the display line buffer
to a video display for displaying. The RGB-formatted line may then be
displayed by the video display.

[0071] Accordingly, aspects of the invention may be realized in hardware,
software, firmware or a combination thereof. The invention may be
realized in a centralized fashion in at least one computer system, or in
a distributed fashion where different elements are spread across several
interconnected computer systems. Any kind of computer system or other
apparatus adapted for carrying out the methods described herein is
suited. A typical combination of hardware, software and firmware may be a
general-purpose computer system with a computer program that, when being
loaded and executed, controls the computer system such that it carries
out the methods described herein.

[0072] One embodiment of the present invention may be implemented as a
board level product, as a single chip, application specific integrated
circuit (ASIC), or with varying levels integrated on a single chip with
other portions of the system as separate components. The degree of
integration of the system will primarily be determined by speed and cost
considerations. Because of the sophisticated nature of modern processors,
it is possible to utilize a commercially available processor, which may
be implemented external to an ASIC implementation of the present system.
Alternatively, if the processor is available as an ASIC core or logic
block, then the commercially available processor may be implemented as
part of an ASIC device with various functions implemented as firmware,

[0073] The invention may also be embedded in a computer program product,
which comprises all the features enabling the implementation of the
methods described herein, and which when loaded in a computer system is
able to carry out these methods. Computer program in the present context
may mean, for example, any expression, in any language, code or notation,
of a set of instructions intended to cause a system having an information
processing capability to perform a particular function either directly or
after either or both of the following: a) conversion to another language,
code or notation; b) reproduction in a different material form. However,
other meanings of computer program within the understanding of those
skilled in the art are also contemplated by the present invention.

[0074] While the invention has been described with reference to certain
embodiments, it will be understood by those skilled in the art that
various changes may be made and equivalents may be substituted without
departing from the scope of the present invention. In addition, many
modifications may be made to adapt a particular situation or material to
the teachings of the present invention without departing from its scope.
Therefore, it is intended that the present invention not be limited to
the particular embodiments disclosed, but that the present invention will
include all embodiments falling within the scope of the appended claims.