Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

The rate controller in a digital video encoding system is responsible for
allocating a bit budget for video frames to be encoded. The rate
controller considers many different factors when determining the frame
bit budget. One of the factors considered is the complexity of the frames
being compressed. Occasionally there will be a very complex frame that is
not representative of the overall video frame sequence. Such a rare
complex frame may cause a disproportionate affect on the bit budget
allocation. The system of the present invention limits the amount that a
very complex frame can change the bit budget allocation. The rate
controller of the present invention also includes a relaxation factor.
The relaxation factor allows a user to determine if the rate controller
should strictly allocate its bit budget or relax its standards such that
the rate controller may not be so conservative when allocating bits to
frames.

Claims:

1-14. (canceled)

15. A method of tracking digital video information complexity, the method
comprising: determining a complexity measure for a current digital video
picture, the complexity measure for the picture accounting for a
plurality of macroblocks in the picture; combining the complexity measure
for the current digital video picture to a running average complexity
measure for a series of digital video pictures in a manner that prevents
the current digital video picture from significantly changing the running
average complexity measure for the series of digital video pictures; and
at a rate controller, encoding the digital video information utilizing
the running average complexity measure.

16. The method of claim 15, wherein the running average complexity is not
allowed to change by more than a predetermined percentage.

17. The method of claim 15, wherein the running average complexity is
processed by a non-linear smoothing filter.

18. The method of claim 15 further comprising determining a value for the
current digital video picture that represents a deviation between the
current digital video picture and an average digital video picture in
terms of bits needed to encode the current digital video picture for a
particular desired visual quality, wherein encoding the digital video
information comprises using the determined value for the current digital
video picture.

19. A non-transitory computer-readable medium storing a computer program
which when executed by a processor tracks digital video information
complexity, the computer program comprising sets of instructions for:
determining a complexity measure for a current digital video picture, the
complexity measure for the picture accounting for a plurality of
macroblocks in the picture; combining the complexity measure for the
current digital video picture to a running average complexity measure for
a series of digital video pictures in a manner that prevents the current
digital video picture from significant changing the running average
complexity measure for the series of digital video pictures; and encoding
the digital video information utilizing the running average complexity
measure.

20. The non-transitory computer-readable medium of claim 19, wherein the
running average complexity is not allowed to change by more than a
predetermined percentage.

21. (canceled)

22. The non-transitory computer-readable medium of claim 19, wherein the
computer program further comprising a set of instructions for determining
a value for the current digital video picture that represents a deviation
between the current digital video picture and an average digital video
picture in terms of bits needed to encode the current digital video
picture for a particular desired visual quality, wherein the set of
instructions for encoding the digital video information comprises a set
of instructions for using the determined value for the current digital
video picture.

23. The non-transitory computer-readable medium of claim 22, wherein the
value for the current digital video picture is determined based on the
complexity measure for the current digital video picture and the combined
running average complexity value.

24. The non-transitory computer-readable medium of claim 19, wherein the
set of instructions for combining comprises a set of instructions for
applying a first weighting factor to the complexity measure for the
current digital video picture and a second weighting factor to the
running average complexity measure for the series of digital video
pictures.

25. The method of claim 18, wherein the value for the current digital
video picture is determined based on the complexity measure for the
current digital video picture and the combined running average complexity
value.

26. The method of claim 15, wherein the complexity measure for the
current digital video picture is a mean of sum of absolute difference
values for the plurality of macroblocks of the current digital video
picture.

27. The method of claim 15, wherein combining comprises applying a first
weighting factor to the complexity measure for the current digital video
picture and a second weighting factor to the running average complexity
measure for the series of digital video pictures.

28. The method of 27, wherein the first weighting factor is larger than
the second weighting factor.

29. A method of encoding a current video frame in a sequence of video
frames, the method comprising: determining a running complexity measure
value for the sequence of video frames; determining a complexity measure
for the current video frame that quantifies an amount of complexity of
the current video frame; computing a difference between the complexity
measure for the current video and the running complexity measure; when
the difference is larger than a predetermined limit, setting the
difference to the predetermined limit; updating the running complexity
measure value based on the difference; and at a rate controller, encoding
the current video frame using the updated running complexity measure
value.

30. The method of claim 29, wherein the predetermined limit is a
percentage value.

31. The method of claim 29, wherein encoding the current video frame
comprises calculating a number of bit to encode the current video frame
based on the updated running complexity measure value.

32. The method of claim 29, wherein updating the running complexity
measure value comprises applying a first weighting factor to the running
complexity measure value and a second weighting factor to the difference
between the complexity measure for the current video and the running
complexity measure.

33. The method of claim 29, wherein the complexity measure for the
current video frame is a mean of sum of absolute difference values for a
plurality of macroblocks of the current video frame.

34. The method of claim 29, wherein determining the running complexity
measure value for the sequence of video frames comprises: determining an
intra-frame running complexity measure value for a set of intra-frames in
the sequence of video frames; and determining an inter-frame running
complexity measure value for a set of inter-frames in the sequence of
video frames.

35. The method of claim 34, wherein updating the running complexity
measure value comprises: updating the intra-frame running complexity
measure value when the current video frame is an intra-frame; and
updating the inter-frame running complexity measure value when the
current video frame is an inter-frame.

36. A non-transitory computer readable medium storing a computer program
which when executed by a processor encodes a current video frame in a
sequence of video frames, the computer program comprising sets of
instructions for: determining a running complexity measure value for the
sequence of video frames; determining a complexity measure for the
current video frame that quantifies an amount of complexity of the
current video frame; computing a difference between the complexity
measure for the current video and the running complexity measure; when
the difference is larger than a predetermined limit, setting the
difference to the predetermined limit; updating the running complexity
measure value based on the difference; and encoding the current video
frame using the updated running complexity measure value.

37. The non-transitory computer readable medium of claim 36, wherein the
set of instructions for encoding the current video frame comprises a set
of instructions for calculating a number of bit to encode the current
video frame based on the updated running complexity measure value.

38. The non-transitory computer readable medium of claim 36, wherein the
set of instructions for updating the running complexity measure value
comprises a set of instructions for applying a first weighting factor to
the running complexity measure value and a second weighting factor to the
difference between the complexity measure for the current video and the
running complexity measure.

39. The non-transitory computer readable medium of claim 36, wherein the
set of instructions for determining the running complexity measure value
for the sequence of video frames comprises sets of instructions for:
determining an intra-frame running complexity measure value for a set of
intra-frames in the sequence of video frames; and determining an
inter-frame running complexity measure value for a set of inter-frames in
the sequence of video frames.

40. The non-transitory computer readable medium of claim 39, wherein the
set of instructions for updating the running complexity measure value
comprises sets of instructions for: updating the intra-frame running
complexity measure value when the current video frame is an intra-frame;
and updating the inter-frame running complexity measure value when the
current video frame is an inter-frame.

Description:

RELATED APPLICATIONS

[0001] The present patent application claims the benefit of the previous
U.S. Provisional patent application entitled "Method of Implementing
Improved Rate Control For A Multimedia Compression And Encoding System"
filed on Dec. 17, 2002 having Ser. No. 60/434,137.

FIELD OF THE INVENTION

[0002] The present invention relates to the field of multi-media
compression systems. In particular the present invention discloses
methods and systems for implementing a rate controller that efficiently
allocate a bit budget for items to be compressed.

[0004] The video media has been slower to move to digital storage and
transmission formats than audio. This has been largely due to the massive
amounts of digital information required to accurately represent video in
digital form. The massive amounts of information require very
high-capacity digital storage systems and high-bandwidth transmission
systems.

[0005] However, video is now rapidly moving to digital storage and
transmission formats. The DVD (Digital Versatile Disc), a digital video
system, has been one of the fastest selling consumer electronic products
in years. DVDs have been rapidly supplanting Video-Cassette Recorders
(VCRs) as the pre-recorded video playback system of choice due their high
video quality, very high audio quality, convenience, and extra features.
The antiquated analog NTSC (National Television Standards Committee)
video transmission system is now being replaced with the digital ATSC
(Advanced Television Standards Committee) video transmission system.

[0006] Computer systems have been using various different digital video
formats for a number of years. Among the best digital video compression
and encoding systems used by computer systems have been the digital video
systems backed by the Motion Pictures Expert Group known as MPEG. The
three most well known and highly used digital video formats from MPEG are
known simply as MPEG-1, MPEG-2, and MPEG-4. (The MPEG-2 digital video
compression and encoding system is used by DVDs.)

[0007] The MPEG-2 and MPEG-4 standards compress a series of video and
encode the compressed frames into a digital stream. Video frames may be
compressed as Intra-frames or Inter-frames. An Intra-frame independently
defines a complete video frame. An Inter-frame defines a video frame with
reference to other video frames, previous or subsequent to the current
frame.

[0008] When compressing video frames, an MPEG-2 and MPEG-4 encoder usually
implements a `rate controller` that is used to allocate a `bit budget`
for each video frame that will be compressed. The bit budget specifies
the number of bits that have been allocated to encode the video frame. By
efficiently allocating a bit budget to each video frame, the rate
controller attempts generate the highest quality compressed video stream
without overflowing buffers (sending more information than can be stored)
or underflowing buffers (not sending frames fast enough such that the
decoder runs out of frames to display). Thus, to best compress and encode
a digital video stream, a digital video encoder needs a good rate
controller. The present invention introduces new methods and systems for
implementing a rate controller for a digital video encoder.

SUMMARY OF THE INVENTION

[0009] A rate controller for allocating a bit budget for video frames to
be encoded is disclosed. The rate controller of the present invention
considers many different factors when determining the frame bit budget.
One of the factors considered is the complexity of the frames being
compressed. Occasionally there will be a very complex frame that is not
representative of the overall video frame sequence. Such a rare complex
frame may cause a disproportionate affect on the bit budget allocation.
The system of the present invention limits the amount that a very complex
frame can change the bit budget allocation.

[0010] The rate controller of the present invention also includes a
relaxation factor. The relaxation factor allows a user to determine if
the rate controller should strictly allocate its bit budget or relax its
standards such that the rate controller may not be so conservative when
allocating bits to frames.

[0011] Other objects, features, and advantages of present invention will
be apparent from the company drawings and from the following detailed
description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The objects, features, and advantages of the present invention will
be apparent to one skilled in the art, in view of the following detailed
description in which:

[0026] A method and system for performing rate control in a multi-media
compression and encoding system is disclosed. In the following
description, for purposes of explanation, specific nomenclature is set
forth to provide a thorough understanding of the present invention.
However, it will be apparent to one skilled in the art that these
specific details are not required in order to practice the present
invention. For example, the present invention has been described with
reference to the MPEG-4 multimedia compression and encoding system.
However, the same techniques can easily be applied to other types of
compression and encoding systems.

Multimedia Compression and Encoding Overview

[0027] FIG. 1 illustrates a high level block diagram of a typical digital
video encoder 100 as is well known in the art. The digital video encoder
100 receives incoming video stream 105 at the left of the block diagram.
Each video frame is processed by a Discrete Cosine Transformation (DCT)
unit 110. The frame may be processed independently (an intra-frame) or
with reference to information from other frames received from the motion
compensation unit (an inter-frame). A Quantizer (Q) unit 120 then
quantizes the information from the Discrete Cosine Transformation unit
110. The quantized frame is then encoded with an entropy encoder (H) unit
180 to produce an encoded bitstream.

[0028] Since an inter-frame encoded video frame is defined with reference
to other nearby video frames, the digital video encoder 100 needs to
create a copy of how each video frame will appear within a digital video
decoder such that inter-frames may be encoded. Thus, the lower portion of
the digital video encoder 100 is actually a digital video decoder.
Specifically, Inverse quantizer (Q-1) 130 reverses the quantization
of the video frame information and inverse Discrete Cosine Transformation
(DCT-1) unit 140 reverses the Discrete Cosine Transformation of the
video frame information. After all the DCT coefficients are reconstructed
from iDCT, the motion compensation unit will use the information, along
with the motion vectors, to reconstruct the video frame.

[0029] The reconstructed video frame may then be used as a reference frame
for the motion estimation of other video frames. Specifically, the
decoded video frame may be used to encode inter-frames that are defined
relative to information in that decoded video frame. The motion
compensation (MC) unit 150 and a motion estimation (ME) unit 160 are used
to determine motion vectors and generate differential values used to
encode inter-frames.

[0030] A rate controller 190 receives information from many different
components in a digital video encoder 100 and uses that information to
allocate a bit budget for each video frame. The bit budget should be
assigned in a manner that will generate the highest quality digital video
bit stream that that complies with a specified set of restrictions.

[0031] The rate controller 190 must attempt to generate the highest
quality compressed video stream without overflowing buffers (exceeding
the amount of available memory by sending more video information than can
be stored by a receiver) or underflowing buffers (not sending video
frames fast enough such that a decoder runs out of frames to display).
Details on buffer overflow and buffer underflow will be presented later
in this document.

Models Used For Rate Controller Creation

[0032] Various different models can be used to illustrate the problems to
be handled by a MPEG-4 video rate controller. A transmission model may be
used to model the timing of video frame transmissions and buffer
occupancy in a receiver. Rate distortion models are used to select a
quantizer value in the Quantizer (Q) unit 120. Different rate distortion
models are for inter-frame quantizer selection and intra-frame quantizer
selection.

[0033] The rate transmission model simulates data transmission across a
communication channel (such as a computer network or a video signal
transmission path) and buffer occupancy in the digital video decoder of a
digital video player. Typically, in a computer network embodiment, the
compressed video data is transmitted from server through a network with a
constant bandwidth to a client system. On the client side, a digital
video player has a memory buffer to cache incoming digital video
information received across the network. The digital video player in a
client system can be required to cache certain amount of digital video
information before the digital video player begins to play the video
stream.

[0034] When digital video is streamed from a server system across a
network to a digital video player in a client system, the digital video
player will not be able to start playing the video until at least the
information defining the first video frame arrives. However, the digital
video player should not immediately begin playing the video stream after
receiving only the first video frame. For example, what if the second
video frame takes longer time to arrive than the intended display
duration of the first video frame? In such a situation, the memory buffer
of the digital video player lacks the information needed to display the
next video frame. This condition is referred to as `buffer underflow` in
the digital video player. To prevent this situation, there should be a
minimum `buffer occupancy` requirement for the digital video player. The
minimum buffer occupancy requirement for the digital video player will
allow the digital video player to accommodate the fluctuation in video
frame sizes and network bandwidth limits.

[0035] On the other hand, a server system may send overly large video
frame that exceeds the physically limited amount of memory buffer space
available to the digital video player. Or the server system may send a
number of video frames faster than the video frames can be decoded and
displayed. In these cases where the amount of transmitted digital video
information exceeds the digital video player's maximum buffer size, a
`buffer overflow` condition occurs. When a buffer overflow occurs, the
digital video player may discard the digital video frame that exceeded
the memory buffer limitations. For handheld devices with limited amounts
of memory, the memory buffer restriction is more critical than in a
desktop computer with a hard drive available as secondary memory.

[0036] To conceptually illustrate when such buffer underflow and buffer
overflow conditions may occur, a video frame transmission model has been
created. The transmission model conceptually illustrates the transmission
and playing of a sequence of video frames with reference to the available
network bandwidth and digital video player's memory buffer resources.

A Temporal Video Frame Model

[0037] Each digital video frame transmitted across a communication medium
has two temporal properties: frame display duration (the amount of time
that the video frame should be displayed on the digital video player's
display screen) and video frame transmission duration (the amount of time
that is required to transmit the digital video frame across the
communication medium). These two temporal properties are very important
to the operation of the rate controller that must allocate frame bit
budgets in a manner that obtains high quality video yet avoids the
problems of buffer underflow and buffer overflow.

[0038]FIG. 2a illustrates a conceptual temporal model for a video frame
that illustrates the video frame display duration and the video frame
transmission duration properties. The video frame display duration, the
time to display this particular frame on the digital video player, is
represented as line along the horizontal axis. The longer that the video
frame must be displayed, the longer the line along the horizontal axis.
The video frame transmission duration, the time it takes to transmit the
compressed digital video frame information (for example, from server to
player), is represented as line along the vertical axis. The video frame
transmission duration is actually generated from two vertical values: the
size of the digital video frame (in bits) and the bandwidth (in bits per
second) of the communication channel. Since the size of a digital video
frame in bits is generated by the rate controller and the bandwidth of
the communication channel are known, the transmission time of a frame can
be determined from the relation:

[0039] As illustrated in FIG. 2a, the relation of these two temporal
properties (video frame display duration and video frame transmission
duration) of a video frame can be illustrated as a right-angled triangle
with the video frame display duration along the horizontal access and the
video frame transmission duration along the vertical axis. If a video
frame has a video frame display duration that equals the video frame
transmission duration, the triangle will be an isosceles triangle with
forty-five degree angles as illustrated in FIG. 2a.

[0040] If a video frame has a video frame transmission duration time that
is longer than the video frame display duration then the video frame
triangle's will have an angle greater than forty-five degree in the lower
left corner as illustrated in FIG. 2b. An intra-frame, a video frame that
completely defines a video frame appearance independently without
reference to other video frames, typically has a video frame
representation as illustrated in FIG. 2b wherein the video frame
transmission time exceeds the video frame display time.

[0041] If a video frame has a video frame transmission duration that is
shorter than the video frame display duration then the video frame
right-triangle will have an angle less than forty-five degrees in the
lower left corner as illustrated in FIG. 2c. An efficiently compressed
inter-frame, a video frame that is defined with reference to information
from other nearby video frames, typically has a temporal video frame
representation as illustrated in FIG. 2c wherein the video frame display
time exceeds the video frame transmission time.

The Video Frame Sequence Transmission Model

[0042] A sequence of transmitted digital video frames can be represented
by piling up a series of right-angled video frame triangles as
illustrated in FIGS. 2a to 2c. Specifically, FIG. 3 illustrates a
conceptual video frame transmission model created from a sequence of
right-angled triangular video frame models.

[0043] By connecting the hypotenuses of these right-angled triangular
video frame models, a snaking video frame sequence transmission path is
created as illustrated in FIG. 3. The horizontal axis represents the
display times of the video frames and the vertical axis represents the
transmission time of the video frames.

[0044] The actual snaking video frame sequence transmission path is
overlaid on top of a target transmission path. The target transmission
path represents a transmission path wherein the high quality video
bitstream is achieved by transmitting a series of video frames with a sum
of transmission times equal to the sum of the display times of the video
frames. The target transmission path is not actually an ideal
transmission path since the compression system will compress some frames
better than others such that video frames that are easily compressed
should be allocated fewer bits (and thus have a shorter transmission
time) and frames that do not easily should be allocated more bits (and
thus have a larger transmission time). However, an ideal path should
closely follow the target path or else buffer overflow or buffer
underflow problems will occur.

[0045] The digital video player's buffer size limitations and minimum
buffer occupancy requirement can also be represented as proportional time
quantified values. Thus, the digital video player's buffer size
limitation and minimum player buffer occupancy requirement can be
illustrated on the temporal video frame transmission model of FIG. 3.

Memory Buffer Underflow

[0046] The digital video player's minimum buffer occupancy can be
interpreted as the digital video player's waiting time along the
horizontal axis before the first frame is played in order to prevent
buffer underflow. If the player does not wait a needed minimum amount of
time along the horizontal access then the digital video player may
quickly display all the available video frames and then be forced to wait
for the transmission of the next video frame in the video frame sequence.

[0047] A buffer underflow can also occur if the digital video server
transmits too many video frames that are very large in size (and thus
have long transmission times) but have short display durations. The
underflow occurs because the short display duration of a few large video
frames causes the digital video player to quickly display and remove the
received video frames from the buffer until the digital video player
exhausts all the available video frames before receiving subsequent video
frames.

[0048] To prevent this situation, a forty-five degree `buffer bottom` line
320 places an upper bound on the allowed transmission path and thus
limits the video frame transmission time (and thus video frame bit size)
of a subsequent video frame to be transmitted. By limiting the
transmission path to fall below the buffer bottom line 320, the player
will not become starved for new video frames to display. A buffer bottom
alarm line 335 may be used to inform the server than the receiver may be
nearing a memory buffer underflow condition.

Memory Buffer Overflow

[0049] The player's memory buffer size limitation can be interpreted as
the time to fill up the digital video player's memory buffer (along the
vertical axis) if no video frame information is taken out of the memory
buffer. If video frames are not displayed and subsequently removed from
the memory buffer at a fast enough rate then the limited memory buffer
will overflow with video frame information. Thus, if too many video
frames with duration times longer than their transmission times are sent
in quick succession, the digital video player may overflow its memory
buffers.

[0050] To prevent buffer overflows, a `buffer top` line 350 may be used to
limit the rate at which the encoder will create short transmission time
frames that have long display times. By limiting the transmission path to
remain above the buffer top line 350, the digital video player will not
overflow its memory buffers with video frames to display. A buffer top
alarm line 325 may be used to inform the server than the receiver may be
nearing a memory buffer overflow condition.

Temporal Model Coordinate System Origin

[0051] Starting from the first video frame, the origin of the coordinate
system with coincides with the current buffer position. The horizontal
axis represents the playing time and the vertical axis represents the
transmission time of each video frame sent. In one embodiment, the system
will update the origin of the coordinate system to a new position on the
transmission model after the encoder creates each new video frame, as
illustrated in FIG. 4. The origin always slides to the right to the end
of the previous frame's play duration and is aligned vertically on the
target transmission path. Since the duration of the next frame to be
encoded is known to the video encoder, and the vertical axis always
passes the position of the new frame, the updated coordinate system can
be determined.

[0052]FIG. 4 illustrates a series of video frame coordinate systems F0,
F1, F2, F3 and F4 as updated coordinate systems as time progresses. For
each new frame, the goal is to find a vertical position (transmission
duration or frame size) of the new video frame so that the position of
the next node fulfills the buffer restrictions. Specifically, the next
node must fall between the buffer top 450 and the buffer bottom 420
lines. As illustrated by coordinate system F4, the encoder knows the
display duration of the next frame (the horizontal aspect of the next
frame's triangle) but it must determine the transmission time or frame
size (that is represented by the vertical aspect of the next frame's
triangle).

Rate Controller Improvements

[0053] As previously set forth, a real transmission path will generally
always have a certain amount of deviation about the target transmission
path. Normally, the compressed video frame sizes vary within a certain
range. For example, FIG. 5a illustrates a bar chart of a series of
encoded video frames having different sizes (in number of bytes)
represented by a height and an average frame size. Note that the
Intra-frames generally use a significantly larger number of bytes than
inter-frames since intra-frame must be completely self-defined whereas
inter-frames are able to reference information in other nearby video
frames.

[0054] The temporal transmission model set forth in the previous section
provides a valuable tool that may be used predict the memory buffer
condition in a digital video player that would receive and decode the
digital video stream. Thus, the rate controller in a digital video
encoder may use the temporal transmission model to prevent any memory
buffer overflows or memory buffer underflows from occurring.
Specifically, the rate controller should allocate target bit budgets for
each video frame in a manner to achieve maximum video quality while still
satisfying the memory buffer restrictions that prevent memory buffer
overflow or memory buffer underflow.

[0055] A rate controller using the temporal transmission model and other
teachings of the present invention can be implemented in computer
instructions on any suitable computer system. The computer instructions
may be placed onto a computer-readable media and distributed. The
computer instructions may also be transmitted across a communication
channel to receiving system. For example, a computer program implementing
the teachings of the present invention may be transmitted from a server
computer across a computer network to a client computer system and then
executed on that client computer system.

Frame Complexity

[0056] The content of different video sequences varies significantly.
Furthermore, even the different video frames within the same video
sequence can vary quite significantly. For example, scene changes and
fast cuts will significantly change the characteristics of a video
stream. Thus, each individual inter-frame or intra-frame within the same
video sequence may need a different number of bits in order to achieve
approximately the same level of visual quality.

[0057] The complexity of a video frame can be measured by mean average
difference (MAD) for the video frame. The mean average difference (MAD)
is the mean of the Sum of Absolute Differences (SAD) values for the
macroblocks in the video frame. To prevent any quick large changes caused
by unusual video frames, an average MAD value may be calculated across
the history of a number of frames may be calculated. In one embodiment,
the average MAD (avgMAD) can be calculated by doing weighted average of
the MAD of a current frame (curMAD) and the historical average MAD
(avgMAD) as follows:

[0058] In one embodiment, the system maintains two different running
historical MAD averages, one MAD average for intra-frames and one MAD
average for non intra-frames. These two different MAD averages are kept
because the comparisons between the MAD values for intra-frames and the
MAD values for non intra-frames are not very useful.

[0059] Then, using the average MAD, a target bit hint (targetBitsHint)
value may be calculated. The target bit hint (targetBitsHint) represents
how much deviation there is between the current video frame and the
average video frame in terms of bits needed to encode the current video
frame for a desired visual quality. The target bit hint (targetBitsHint)
may be calculated as follows:

targetBitsHint=(curMAD-avgMAD)/avgMAD;

[0060] However, a single very complex video frame can significantly affect
the average such that average is not representative. For example, FIG. 5b
illustrates the calculated MAD for a series of video frames and the
calculated average MAD. As illustrated in FIG. 5b, a single very complex
video frame can move the average MAD a large amount (due to the 80%
weight) such that the average MAD is then not very representative of the
overall MAD value for video frame sequence. To prevent such a situation,
a cap may be placed on how much the average MAD can be affected by any
single video frame.

[0061] In one embodiment, a non-linear smoothing filter is applied when
tracking local averages of video frame complexity and video frame size.
The non-linear smoothing filter places a limitation extent to which new
data can contribute to the local average (e.g. by a cap, a scaling
factor, or both). The following program listing describes one possible
implementation of a non-linear smoothing filter that may be used:

[0062] In another embodiment, the average MAD is not allowed to change by
more than a pre-defined fixed percentage amount. For example, in one
embodiment, the historical average MAD may not be allowed to change by
more than twenty percent (20%). However, other pre-defined percentage
values may be used. Similarly other methods of capping the amount of
change to the average MAD from a single complex video frame may also be
used.

Current Buffer Limitations

[0063] As set forth with reference to FIGS. 2 and 3, the encoder must
carefully allocate bit budgets to each video frame in a manner that
avoids memory buffer problems in the digital video player system. This is
a `hard` limit in that the rate controller should always keep the frame
sequence within the buffer top 450 and the buffer bottom 420 lines of
FIG. 4 to prevent memory buffer overflow or memory buffer underflow in
the digital video player, respectively. When the memory buffer in a
digital video player begins to approach the level of an overflow or an
underflow then the rate controller should make adjustments to the video
frame bit budgets to compensate.

[0064] In one embodiment, a simple `buffer anxiety` level may be
calculated. The buffer anxiety value may be defined as the percentage of
the memory buffer space used. The buffer anxiety value thus quantifies
whether there is a danger of a memory buffer underflow or buffer
overflow. The buffer anxiety is zero when the memory buffer level is
right on the target transmission path. However, the buffer anxiety value
will approach the "high-anxiety" value of one ("1") as the buffer memory
value approaches the buffer bottom 420 or the buffer top 450. Referring
to FIG. 6, if the transmission path is above the target transmission path
610 then the buffer anxiety is calculated relative to the chance of
hitting the buffer bottom 620 and thus causing a buffer underflow. On the
other hand, if the transmission path is below the target transmission
path 610 then the buffer anxiety is calculated relative to the chance of
hitting the buffer top 660 and thus causing a buffer overflow.

[0065]FIG. 6 graphically illustrates how the buffer anxiety for underflow
may be calculated in one embodiment. Referring to FIG. 6, a buffer size
amount (Buff_size) 680 is defined as the amount between the target path
610 and the buffer bottom 620. Similarly, a buffer used amount
(Buff_used) 670 is defined as the amount between the target path 610 and
the current buffer condition. In this manner, the buffer anxiety value
for underflow purposes can be calculated as

Buffer_anxiety=Buff_used/Buff_size

If the amount of the memory buffer that has been used is small then the
buffer anxiety value is close to zero. However, if nearly all the video
frame information from the memory buffer has been used to display frames,
the buffer anxiety value will be close to one (`1`) indicating a
high-anxiety condition. A similar calculation can be performed to
calculate the buffer anxiety for overflow purposes. Specifically, memory
buffer space used amount (Buff_used) 675 is divided by a memory buffer
available amount (Buff_size) 685.

[0066] The buffer anxiety value can be used to scale down the amount of
bits allocated to the next video frame. For example, a `scale` amount can
be determined and that scale amount is multiplied by the proposed bit
budget. If the buffer anxiety zero, then no scaling is needed (scale=1).
If the buffer anxiety value is very high (close to one) then the amount
of bits allocated to the next video frame should be scaled down
significantly (using scale amount close to zero).

[0067] FIG. 7a illustrates a scaling curve that may be used to determine a
scale amount. The input buffer anxiety is on the x-axis (horizontal axis)
and the corresponding output scale factor is illustrated on the y-axis
(vertical axis). Thus, as illustrated in FIG. 7a, if there is no buffer
anxiety (buffer anxiety ˜0) then there is no scaling (scale=1) but
if the buffer anxiety is very high (buffer anxiety ˜1) then the
scale factor reduces the bit budget (scale factor close to zero).

[0068] Such a scaling system will ensure that memory buffer limits in the
digital video player are not violated. However, such a scaling system may
be too aggressive such that the quality of the output video stream is
unnecessarily limited to strictly prevent memory buffer underflow or
memory buffer overflow. But if an encoder is confident that there will be
no memory buffer underflow nor memory buffer overflow problems, then the
encoder may wish to relax this strict scaling system. To allow for such a
relaxation, the present invention introduces a `relaxation` control, R
that may be used to relax the strict scaling factor.

[0069] In one embodiment, the relaxation control R is set in a range from
zero ("0") to one ("1"). The relaxation control is set to zero if no
relaxation is allowed such that the scaling system strictly controls the
bit budget to prevent any possible memory buffer underflow or memory
buffer overflow from occurring. At the opposite end of the spectrum, the
relaxation control may be set to one to prevent any scaling from being
performed. (Setting relaxation to one is probably not advisable since a
memory buffer underflow or a memory buffer overflow may then occur.)

[0070] To implement such a relaxation control system, the following
equation is used to process the scaling factor, scale.

Scale=Relaxation+Scale-(Relaxation*Scale)

[0071] FIG. 7b graphically illustrates how the scaling curve appears when
the relaxation control is set to zero ("0"). As seen in FIG. 7b, the
scaling curve is unchanged from the original scaling curve of FIG. 7a.
Thus, with no relaxation (relaxation control=0), scaling is performed
normally.

[0072] FIG. 7c illustrates how the scaling curve appears when the
relaxation control is set to one-half ("0.5"). As seen in FIG. 7c, the
scaling curve now does not scale down the bit budget as aggressively as
the original scaling curve since it has been partially `relaxed.`
However, the system will still scale down bit budgets in

[0073] Finally, FIG. 7d illustrates how the scaling curve appears when the
relaxation control is set to one ("1"). As seen in FIG. 7d, the scaling
curve is simply a flat line at one ("1") such that no scaling is
performed at all. Thus, the scaling has been fully `relaxed` such that no
scaling is performed.

[0074] The foregoing has described a system for performing rate control in
a multi-media compression and encoding system. It is contemplated that
changes and modifications may be made by one of ordinary skill in the
art, to the materials and arrangements of elements of the present
invention without departing from the scope of the invention.