Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A video encoding method and apparatus to select one combination, for each
block of an input video signal, from a plurality of combinations. Each
combination includes a predictive parameter and at least one reference
picture number determined in advance for the reference picture. A
prediction picture signal is generated in accordance with the reference
picture number and predictive parameter of the selected combination. A
predictive error signal is generated representing an error between the
input video signal and the prediction picture signal. Encoding the
predictive error signal, information of the motion vector, and index
information indicating the selected combination is included.

Claims:

1. A video decoding method for decoding encoded data obtained by
subjecting a video image having luminance and two color differences to
prediction encoding to the video image, comprising:receiving an input of
encoded data obtained by encoding, for one or more to-be-decoded blocks,
a plurality of combinations comprising (1) a weighting factor for each
luminance and for each of two color differences, (2) an offset for each
luminance and for each of two color differences and (3) a flag indicating
presence or non-presence of the weighting factor and the offset
concerning luminance, and encoding, for a to-be-decoded block, (A) a
quantized orthogonal transform coefficient of a prediction error signal
concerning luminance and two color differences, (B) information of a
motion vector and (C) an index indicating (a) one combination of the
plurality of combinations and (b) a reference image;determining whether a
unit-of-decoding of the to-be-decoded block is a frame or a
field;deriving, for each luminance and for each of two color differences,
the weighting factors and the offsets from the plurality of combinations
and the index;generating, for each luminance and for each of two color
differences, a prediction image by multiplying the reference image by the
weighting factors and adding the offsets, based on the motion
vector;generating a prediction error signal by subjecting the quantized
orthogonal transform coefficient to inverse quantization and inverse
orthogonal transform; andgenerating a decoded image by calculating a sum
of the prediction error signal and the prediction image;wherein when the
unit-of-decoding is the frame, respective possible values of the index
indicate different combinations of the combinations, respectively, and
when the unit-of-decoding is the field, two possible values corresponding
to the different reference images indicate the same combination.

2. A video decoding apparatus for decoding encoded data obtained by
subjecting a video image having luminance and two color differences to
prediction encoding to the video image, comprising:a receiver configured
to receive an input of encoded data obtained by encoding, for one or more
to-be-decoded blocks, a plurality of combinations comprising (1) a
weighting factor for each luminance and for each of two color
differences, (2) an offset for each luminance and for each of two color
differences and (3) a flag indicating presence or non-presence of the
weighting factor and the offset concerning luminance, and encoding, for a
to-be-decoded block, (A) a quantized orthogonal transform coefficient of
a prediction error signal concerning luminance and two color differences,
(B) information of a motion vector and (C) an index indicating (a) one
combination of the plurality of combinations and (b) a reference image;a
determining module configured to determine whether a unit-of-decoding of
the to-be-decoded block is a frame or a field;a deriving module
configured to derive, for each luminance and for each of two color
differences, the weighting factors and the offsets from the plurality of
combinations and the index;a first generator configured to generate, for
each of luminance and for each of two color differences, a prediction
image by multiplying the reference image by the weighting factors and
adding the offsets, based on the motion vector;a second generator
configured to generate a prediction error signal by subjecting the
quantized orthogonal transform coefficient to inverse quantization and
inverse orthogonal transform; anda third generator configured to generate
a decoded image by calculating a sum of the prediction error signal and
the prediction image;wherein when the unit-of-decoding is the frame,
respective possible values of the index indicate different combinations
of the combinations, respectively, and when the unit-of-decoding is the
field, two possible values corresponding to the different reference
images indicate the same combination.

Description:

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001]This is a divisional application of and claims the benefit of
priority under 35 U.S.C. §120 from U.S. application Ser. No.
12/791,018, filed Jun. 1, 2010, the entirety of which is herein
incorporated by reference. U.S. application Ser. No. 12/791,018 is a
divisional application of U.S. application Ser. No. 12/694,320, filed on
Jan. 27, 2010, which is a divisional application of U.S. application Ser.
No. 12/635,738, filed on Dec. 11, 2009, which is a divisional application
of U.S. application Ser. No. 12/577,437, filed on Oct. 12, 2009, which is
a divisional application of U.S. application Ser. No. 12/323,930, filed
on Nov. 26, 2008, which is a divisional application of U.S. application
Ser. No. 11/687,923, filed Mar. 19, 2007, which is a divisional
application of U.S. application Ser. No. 10/754,535, filed on Jan. 12,
2004, which is a continuation application of International Application
No. PCT/JP03/04992, filed Apr. 18, 2003, which was not published under
PCT Article 21(2) in English.

[0002]This application is based upon and claims the benefit of priority
from the prior Japanese Patent Applications No. 2002-116718, filed Apr.
18, 2002; and No. 2002-340042, filed Nov. 22, 2002, the entire contents
of both of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0003]1. Field of the Invention

[0004]The present invention relates to a video encoding/decoding method
and apparatus which encode/decode a fade video and dissolving video, in
particular, at high efficiency.

[0005]2. Description of the Related Art

[0006]Motion compensation predictive inter-frame encoding is used as one
of encoding modes in a video encoding standard scheme such as ITU-TH.261,
H.263, ISO/IEC MPEG-2, or MPEG-4. As a predictive model in motion
compensation predictive inter-frame encoding, a model that exhibits the
highest predictive efficiency when no change in brightness occurs in the
time direction is used. In the case of a fade video which changes in the
brightness of pictures, there is no method known up to now which makes a
proper prediction against a change in the brightness of pictures when,
for example, a normal picture fades in from a black picture. In order to
maintain picture quality in a fade video as well, therefore, a large
number of bits are required.

[0007]In order to solve this problem, for example, in Japanese Patent No.
3166716, "Fade Countermeasure Video Encoder and Encoding Method", a fade
video part is detected to change the allocation of the number of bits.
More specifically, in the case of a fadeout video, a large number of bits
are allocated to the start part of fadeout that changes in luminance. In
general, the last part of fadeout becomes a monochrome picture, and hence
can be easily encoded. For this reason, the number of bits allocated to
this part is reduced. This makes it possible to improve the overall
picture quality without excessively increasing the total number of bits.

[0008]In Japanese Patent No. 2938412, "Video Luminance Change Compensation
Method, Video Encoding Apparatus, Video Decoding Apparatus, Recording
Medium on Which Video Encoding or Decoding Program Is Recorded, and
Recording Medium on Which Encoded Data of Video Is Recorded", there is
proposed an encoding scheme of properly coping with a fade video by
compensating for a reference picture in accordance with two parameters,
i.e., a luminance change amount and contrast change amount.

[0009]In Thomas Wiegand and Berand Girod, "Multi-frame motion-compensated
prediction for video transmission", Kluwer Academic Publishers 2001, an
encoding scheme based on a plurality of frame buffers is proposed. In
this scheme, an attempt has been made to improve the predictive
efficiency by selectively generating a prediction picture from a
plurality of reference frames held in the frame buffers.

[0010]According to the conventional techniques, in order to encode a fade
video or dissolving video while maintaining high picture quality, a large
number of bits are required. Therefore, an improvement in encoding
efficiency cannot be expected.

BRIEF SUMMARY OF THE INVENTION

[0011]It is an object of the present invention to provide a video
encoding/decoding method and apparatus which can encode a video which
changes in luminance over time, e.g., a fade video or dissolving video,
in particular, at high efficiency.

[0012]According to a first aspect of the present invention, there is
provided a video encoding method of subjecting an input videos signal to
motion compensation predictive encoding by using a reference picture
signal representing at least one reference picture and a motion vector
between the input video signal and the reference picture signal,
comprising: selecting one combination, for each block of the input video
signal, from a plurality of combinations each including a predictive
parameter and at least one reference picture number determined in advance
for the reference picture; generating a prediction picture signal in
accordance with the reference picture number and predictive parameter of
the selected combination; generating a predictive error signal
representing an error between the input video signal and the prediction
picture signal; and encoding the predictive error signal, information of
the motion vector, and index information indicating the selected
combination.

[0013]According to a second aspect of the present invention, there is
provided a video decoding method comprising: decoding encoded data
including a predictive error signal representing an error in a prediction
picture signal with respect to a video signal, motion vector information,
and index information indicating a combination of at least one reference
picture number and a predictive parameter; generating a prediction
picture signal in accordance with the reference picture number and
predictive parameter of the combination indicated by the decoded index
information; and generating a reproduction video signal by using the
predictive error signal and the prediction picture signal.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

[0014]FIG. 1 is a block diagram showing the arrangement of a video
encoding apparatus according to the first embodiment of the present
invention;

[0015]FIG. 2 is a block diagram showing the detailed arrangement of a
frame memory/prediction picture generator in FIG. 1;

[0016]FIG. 3 is a view showing an example of a table of combinations of
reference frame numbers and predictive parameters, which is used in the
first embodiment;

[0017]FIG. 4 is a flow chart showing an example of a sequence for
selecting a predictive scheme (a combination of a reference frame number
and a predictive parameter) for each macroblock and determining an
encoding mode in the first embodiment;

[0018]FIG. 5 is a block diagram showing the arrangement of a video
decoding apparatus according to the first embodiment;

[0019]FIG. 6 is a block diagram showing the detailed arrangement of the
frame memory/prediction picture generator in FIG. 5;

[0020]FIG. 7 is a view showing an example of a table of combinations of
predictive parameters in a case wherein the number of reference frames is
one and a reference frame number is sent as mode information according to
the second embodiment of the present invention;

[0021]FIG. 8 is a view showing an example of a table of combinations of
predictive parameters in a case wherein the number of reference frames is
two and a reference frame number is sent as mode information according to
the second embodiment;

[0022]FIG. 9 is a view showing an example of a table of combinations of
reference picture numbers and predictive parameters in a case wherein the
number of reference frame is one according to the third embodiment of the
present invention;

[0023]FIG. 10 is a view showing an example of a table for only luminance
signals according to the third embodiment;

[0024]FIG. 11 is a view showing an example of a syntax for each block when
index information is to be encoded;

[0025]FIG. 12 is a view showing a specific example of an encoded bit
stream when a prediction picture is to be generated by using one
reference picture;

[0026]FIG. 13 is a view showing a specific example of an encoded bit
stream when a prediction picture is to be generated by using two
reference pictures;

[0027]FIG. 14 is a view showing an example of a table of reference frame
numbers, reference field numbers, and predictive parameters when
information to be encoded is a top field according to the fourth
embodiment of the present invention; and

[0028]FIG. 15 is a view showing an example of a table of reference frame
numbers, reference field numbers, and predictive parameters when
information to be encoded is a bottom field according to the fourth
embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0029]The embodiments of the present invention will be described below
with reference to the several views of the accompanying drawing.

First Embodiment

About Encoding Side

[0030]FIG. 1 shows the arrangement of a video encoding apparatus according
to the first embodiment of the present invention. A video signal 100 is
input to the video encoding apparatus, for example, on a frame basis. The
video signal 100 is input to a subtracter 101. The subtracter 101
calculates the difference between the video signal 100 and a prediction
picture signal 212 to generate a predictive error signal. A mode
selection switch 102 selects either the predictive error signal or the
video signal 100. An orthogonal transformer 103 subjects the selected
signal to an orthogonal transformation, e.g., a discrete cosine transform
(DCT). The orthogonal transformer 103 generates orthogonal transformation
coefficient information, e.g., DCT coefficient information. The
orthogonal transformation coefficient information is quantized by a
quantizer 104 and branched into two paths. One quantization orthogonal
transformation coefficient information 210 branched into two paths is
guided to a variable-length encoder 111.

[0031]The other quantization orthogonal transformation coefficient
information 210 branched into the two paths is sequentially subjected to
processing reverse to that in the quantizer 104 and orthogonal
transformer 103 by a dequantizer or inverse quantizer 105 and inverse
orthogonal transformer 106 to be reconstructed into a predictive error
signal. Thereafter, an adder 107 adds the reconstructed predictive error
signal to the prediction picture signal 212 input through a switch 109 to
generate a local decoded video signal 211. The local decoded video signal
211 is input to a frame memory/prediction picture generator 108.

[0032]The frame memory/prediction picture generator 108 selects one of a
plurality of combinations of prepared reference frame numbers and
predictive parameters. The linear sum of the video signal (local decoded
video signal 211) of the reference frame indicated by the reference frame
number of the selected combination is calculated in accordance with the
predictive parameter of the selected combination, and the resultant
signal is added to an offset based on the predictive parameter. With this
operation, in this case, a reference picture signal is generated on a
frame basis. Subsequently, the frame memory/prediction picture generator
108 motion-compensates for the reference picture signal by using a motion
vector to generate the prediction picture signal 212.

[0033]In this process the frame memory/prediction picture generator 108
generates motion vector information 214 and index information 215
indicating a selected combination of a reference frame number and a
predictive parameter, and sends information necessary for selection of an
encoding mode to a mode selector 110. The motion vector information 214
and index information 215 are input to a variable-length encoder 111. The
frame memory/prediction picture generator 108 will be described in detail
later.

[0034]The mode selector 110 selects an encoding mode on a macroblock basis
on the basis of predictive information P from the frame memory/prediction
picture generator 108, i.e., selects either the intraframe encoding mode
or the motion compensated predictive interframe encoding mode, and
outputs switch control signals M and S.

[0035]In the intraframe encoding mode, the switches 102 and 112 are
switched to the A side by the switch control signals M and S, and the
input video signal 100 is input to the orthogonal transformer 103. In the
interframe encoding mode, the switches 102 and 112 are switched to the B
side by the switch control signals M and S. As a consequence, the
predictive error signal from the subtracter 101 is input to the
orthogonal transformer 103, and the prediction picture signal 212 from
the frame memory/prediction picture generator 108 is input to the adder
107. Mode information 213 is output from the mode selector 110 and input
to the variable-length encoder 111.

[0036]The variable-length encoder 111 subjects the quantization orthogonal
transformation coefficient information 210, mode information 213, motion
vector information 214, and index information 215 to variable-length
encoding. The variable-length codes generated by this operation are
multiplexed by a multiplier 114. The resultant data is then smoothed by
an output buffer 115. Encoded data 116 output from the output buffer 115
is sent out to a transmission system or storage system (not shown).

[0037]An encoding controller 113 controls an encoding unit 112. More
specifically, the encoding controller 113 monitors the buffer amount of
the output buffer 115, and controls encoding parameters such as the
quantization step size of the quantizer 104 to make the buffer amount
constant.

(About Frame Memory/Prediction Picture Generator 108)

[0038]FIG. 2 shows the detailed arrangement of the frame memory/prediction
picture generator 108 in FIG. 1. Referring to FIG. 2, the local decoded
video signal 211 input from the adder 107 in FIG. 1 is stored in a frame
memory set 202 under the control of a memory controller 201. The frame
memory set 202 has a plurality of (N) frame memories FM1 to FMN for
temporarily holding the local decoded video signal 211 as a reference
frame.

[0039]In a predictive parameter controller 203 is prepared a plurality of
combinations of reference frame numbers and predictive parameters in
advance as a table. The predictive parameter controller 203 selects, on
the basis of the video signal 100, a combination of the reference frame
number of a reference frame and a predictive parameter that is used to
generate the prediction picture signal 212, and outputs the index
information 215 indicating the selected combination.

[0040]A multi-frame motion evaluator 204 generates a reference picture
signal in accordance with the combination of the reference frame number
and the index information selected by the predictive parameter controller
203. The multi-frame motion evaluator 204 evaluates the motion amount and
predictive error from this reference picture signal and input video
signal 100, and outputs the motion vector information 214 that minimizes
the predictive error. A multi-frame motion compensator 205 carries out
motion-compensation for each block using a reference picture signal
selected by the multi-frame motion evaluator 204 in accordance with the
motion vector to generate the prediction picture signal 212.

[0041]The memory controller 201 sets a reference frame number to a local
decoded video signal for each frame, and stores each frame in one of the
frame memories FM1 to FMN of the frame memory set 202. For example, the
respective frames are sequentially numbered from the frame nearest to the
input picture. The same reference frame number may be set for different
frames. In this case, for example, different predictive parameters are
used. A frame near to the input picture is selected from the frame
memories FM1 to FMN and sent to the predictive parameter controller 203.

[0042]FIG. 3 shows an example of the table of combinations of reference
frame numbers and predictive parameters, which is prepared in the
predictive parameter controller 203. "Index" corresponds to prediction
pictures that can be selected for each block. In this case, there are
eight types of prediction pictures. A reference frame number n is the
number of a local decoded video used as a reference frame, and in this
case, indicates the number of a local decoded video corresponding to n
past frames.

[0043]When the prediction picture signal 212 is generated by using the
picture signals of a plurality of reference frames stored in the frame
memory set 202, a plurality of reference frame numbers are designated,
and (the number of reference frames +1) coefficients are designated as
predictive parameters for each of a luminance signal (Y) and color
difference signals (Cb and Cr). In this case, as indicated by equations
(1) to (3), n assumes the number of reference frames, n+1 predictive
parameters Di (i=1, . . . , n+1) are prepared for the luminance signal Y;
n+1 predictive parameters Ei (i=1, . . . , n+1), for the color difference
signal Cb; and n+1 predictive parameters Fi (i=1, . . . , n+1), for the
color difference signal Cr:

[0044]This operation will be described in more detail with reference to
FIG. 3. Referring to FIG. 3, the last numeral of each predictive
parameter represents an offset, and the first numeral of each predictive
parameter represents a weighting factor (predictive coefficient). For
index 0, the number of reference frames is given by n=2, the reference
frame number is 1, and predictive parameters are 1 and 0 for each of the
luminance signal Y and color difference signals Cr and Cb. What the
predictive parameters are 1 and 0 as in this case indicates that a local
decoded video signal corresponding to the reference frame number "1" is
multiplied by 1 and added to offset 0. In other words, the local decoded
video signal corresponding to the reference frame number 1 becomes a
reference picture signal without any change.

[0045]For index 1, two reference frames as local decoded video signals
corresponding to the reference frame numbers 1 and 2 are used. In
accordance with predictive parameters 2, -1, and 0 for the luminance
signal Y, the local decoded video signal corresponding to the reference
frame number 1 is doubled, and the local decoded video signal
corresponding to the reference frame number 2 is subtracted from the
resultant signal. Offset 0 is then added to the resultant signal. That
is, extrapolation prediction is performed from the local decoded video
signals of two frames to generate a reference picture signal. For the
color difference signals Cr and Cb, since predictive parameters are 1, 0,
and 0, the local decoded video signal corresponding to the reference
frame number 1 is used as a reference picture signal without any change.
This predictive scheme corresponding to index 1 is especially effective
for a dissolving video.

[0046]For index 2, in accordance with predictive parameters 5/4 and 16,
the local decoded video signal corresponding to the reference frame
number 1 is multiplied by 5/4 and added with offset 16. For the color
difference signals Cr and Cb, since the predictive parameter is 1, the
color difference signals Cr and Cb become reference picture signals
without any change. This predictive scheme is especially effective for a
fade-in video from a black frame.

[0047]In this manner, reference picture signals can be selected on the
basis of a plurality of predictive schemes with different combinations of
the numbers of reference frames to be used and predictive parameters.
This makes it possible for this embodiment to properly cope with a fade
video and dissolving video that have suffered deterioration in picture
quality due to the absence of a proper predictive scheme.

[0048]An example of a specific sequence for selecting a predictive scheme
(a combination of a reference frame numbers and a predictive parameter)
for each macroblock and determining an encoding mode in this embodiment
will be described next with reference to FIG. 4.

[0049]First of all, a maximum assumable value is set to variable min_D
(step S101). LOOP1 (step S102) indicates a repetition for the selection
of a predictive scheme in interframe encoding, and variable represents
the value of "index" in FIG. 3. In this case, in order to obtain an
optimal motion vector for each predictive scheme, an evaluation value D
of each index (each combination of a reference frame number and a
predictive parameter) is calculated from the number of bits associated
with motion vector information 214 (the number of bits of a
variable-length code output from the variable-length encoder 111 in
correspondence with the motion vector information 214) and a predictive
error absolute value sum, and a motion vector that minimizes the
evaluation value D is selected (step S103). The evaluation value D is
compared with min_D (step S104). If the evaluation value D is smaller
than min_D, the evaluation value D is set to min_D, and index i is
assigned to min_i (step S105).

[0050]An evaluation value D for intraframe encoding is then calculated
(step S106). The evaluation value D is compared with min_D (step S107).
If this comparison indicates that min_D is smaller than the evaluation
value D, mode MODE is determined as interframe encoding, and min_i is
assigned to index information INDEX (step S108). If the evaluation value
D is smaller, mode MODE is determined as intraframe encoding (step S109).
In this case, the evaluation value D is set as the estimated value of the
number of bits with the same quantization step size.

(About Decoding Side)

[0051]A video decoding apparatus corresponding to the video encoding
apparatus shown in FIG. 1 will be described next. FIG. 5 shows the
arrangement of the video decoding apparatus according to this embodiment.
Encoded data 300 sent out from the video encoding apparatus show in FIG.
1 and sent through a transmission system or storage system is temporarily
stored in an input buffer 301 and demultiplexed by a demultiplexer 302
for each frame on the basis of a syntax. The resultant data is input to a
variable-length decoder 303. The variable-length decoder 303 decodes the
variable-length code of each syntax of the encoded data 300 to reproduce
a quantization orthogonal transformation coefficient, mode information
413, motion vector information 414, and index information 415.

[0052]Of the reproduced information, the quantization orthogonal
transformation coefficient is dequantized by a dequantizer 304 and
inversely orthogonal-transformed by an inverse orthogonal transformer
305. If the mode information 413 indicates the intraframe encoding mode,
a reproduction video signal is output from the inverse orthogonal
transformer 305. This signal is then output as a reproduction video
signal 310 through an adder 306.

[0053]If the mode information 413 indicates the interframe encoding mode,
a predictive error signal is output from the inverse orthogonal
transformer 305, and a mode selection switch 309 is turned on. The
prediction picture signal 412 output from a frame memory/prediction
picture generator 308 is added to the predictive error signal by the
adder 306. As a consequence, the reproduction video signal 310 is output.
The reproduction video signal 310 is stored as a reference picture signal
in the frame memory/prediction picture generator 308.

[0055]Like the frame memory/prediction picture generator 108 on the
encoding side in FIG. 1, the frame memory/prediction picture generator
308 includes a plurality of prepared combinations of reference frame
numbers and predictive parameters as a table, and selects one combination
indicated by the index information 415 from the table. The linear sum of
the video signal (reproduction video signal 210) of the reference frame
indicated by the reference frame number of the selected combination is
calculated in accordance with the predictive parameter of the selected
combination, and an offset based on the predictive parameter is added to
the resultant signal. With this operation, a reference picture signal is
generated. Subsequently, the generated reference picture signal is
motion-compensated for by using the motion vector indicated by the motion
vector information 414, thereby generating a prediction picture signal
412.

(About Frame Memory/Prediction Picture Generator 308)

[0056]FIG. 6 shows the detailed arrangement of the frame memory/prediction
picture generator 308 in FIG. 5. Referring to FIG. 6, the reproduction
video signal 310 output from the adder 306 in FIG. 5 is stored in the
frame memory set 402 under the control of a memory controller 401. The
frame memory set 402 has a plurality of (N) frame memories FM1 to FMN for
temporarily holding the reproduction video signal 310 as a reference
frame.

[0057]A predictive parameter controller 403 has in advance combinations of
reference frame numbers and predictive parameters as a table like the one
shown in FIG. 3. The predictive parameter controller 403 selects a
combination of the reference frame number of a reference frame and a
predictive parameter, which are used to generate the prediction picture
signal 412, on the basis of the index information 415 from the
variable-length decoder 303 in FIG. 5. A plurality of multi-frame motion
compensators 404 generate a reference picture signal in accordance with a
combination of a reference frame number and index information, which is
selected by the predictive parameter controller 403, and performs
motion-compensation for each block using this reference picture signal in
accordance with the motion vector indicated by the motion vector
information 414 from the variable-length decoder 303 in FIG. 5, thereby
generating the prediction picture signal 412.

Second Embodiment

[0058]The second embodiment of the present invention will be described
next with reference to FIGS. 7 and 8. Since the overall arrangements of a
video encoding apparatus and video decoding apparatus in this embodiment
are almost the same as those in the first embodiment, only the
differences from the first embodiment will be described.

[0059]In this embodiment, there is described an example of the manner of
expressing predictive parameters based on a scheme of capable of
designating a plurality of reference frame numbers in accordance with
mode information of a macroblock basis. A reference frame number is
discriminated by the mode information for each macroblock. This
embodiment therefore uses a table of predictive parameters as shown in
FIGS. 7 and 8 instead of using a table of combinations of reference frame
numbers and predictive parameters as in the first embodiment. That is,
index information does not indicate a reference frame number, and only a
combination of predictive parameters is designated.

[0060]The table in FIG. 7 shows an example of a combination of predictive
parameters when the number of reference frames is one. As predictive
parameters, (the number of reference frames +1) parameters, i.e., two
parameters (one weighting factor and one offset), are designated for each
of a luminance signal (Y) and color difference signals (Cb and Cr).

[0061]The table in FIG. 8 shows an example of a combination of predictive
parameters when the number of reference frames is two. In this case, as
predictive parameters, (the number of reference frames +1) parameters,
i.e., three parameters (two weighting factors and one offset), are
designated from each of a luminance signal (Y) and color difference
signals (Cb and Cr). This table is prepared for the encoding side and
decoding side each as in the first embodiment.

Third Embodiment

[0062]The third embodiment of the present invention will be described with
reference to FIGS. 9 and 10. Since the overall arrangements of a video
encoding apparatus and video decoding apparatus in this embodiment are
almost the same as those in the first embodiment, only the differences
from the first and second embodiments will be described below.

[0063]In the first and second embodiments, a video is managed on a frame
basis. In this embodiment, however, a video is managed on a picture
basis. If both a progressive signal and an interlaced signal exist as
input picture signals, pictures are not necessarily encoded on a frame
basis. In consideration of this, a picture assumes (a) a picture of one
frame of a progressive signal, (b) a picture of one frame generated by
merging two fields of an interlaced signal, or (c) a picture of one field
of an interlaced signal.

[0064]If a picture to be encoded is a picture with a frame structure like
(a) or (b), a reference picture used in motion compensation prediction is
also managed as a frame regardless of whether the encoded picture, which
is the reference picture, has a frame structure or field structure. A
reference picture number is assigned to this picture. Likewise, if a
picture to be encoded is a picture with a field structure like (c), a
reference picture used in motion compensation prediction is also managed
as a field regardless of whether the encoded picture, which is the
reference picture, has a frame structure or field structure. A reference
picture number is assigned to this picture.

[0065]Equations (4), (5), and (6) are examples of predictive equations for
reference picture numbers and predictive parameters, which are prepared
in the predictive parameter controller 203. These examples are predictive
equations for generating a prediction picture signal by motion
compensation prediction using one reference picture signal.

Y=clip((D1(i)×RY(i)+2LY.sup.-1)>>LY+D-
2(i)) (4)

Cb=clip((E1(i)×(RCb(i)-128)+2LC.sup.-1)>>L-
C+E2(i)+128) (5)

Cr=clip((F1(i)×(RCr(i)-128)+2LC.sup.-1)>>L-
C+F2(i)+128) (6)

where Y is a prediction picture signal of a luminance signal, Cb and Cr
are prediction picture signals of two color difference signals,
RY(i), RCb(i), and Rcr(i) are the pixel values of the
luminance signal and two color difference signals of a reference picture
signal with index i, D1(i) and D2(i) are the predictive
coefficient and offset of the luminance signal with index i, Ei(i)
and E2(i) are the predictive coefficient and offset of the color
difference signal Cb with index i, and F1(i) and F2(i) are the
predictive coefficient and offset of the color difference signal Cr with
index i. Index i indicates a value from 0 (the maximum number of
reference pictures -1), and encoded for each block to be encoded (e.g.,
for each macroblock). The resultant data is then transmitted to the video
decoding apparatus.

[0066]The predictive parameters D1(i), D2(i), E1(i),
E2(i), F1(i), and F2(i) are represented by values
determined in advance between the video encoding apparatus and the video
decoding apparatus or a unit of encoding such as a frame, field, or
slice, and are encoded together with encoded data to be transmitted from
the video encoding apparatus to the video decoding apparatus. With this
operation, these parameters are shared by the two apparatuses.

[0067]The equations (4), (5), and (6) are predictive equations wherein
powers of 2, i.e., 2, 4, 8, 16, . . . are selected as the denominators of
predictive coefficients by which reference picture signals are
multiplied. The predictive equations can eliminate the necessity of
division and be calculated by arithmetic shifts. This makes it possible
to avoid a large increase in calculation cost due to division.

[0068]In equations (4), (5), and (6), ">>" of a>>b represents
an operator for arithmetically shifting an integer a to the right by b
bits. The function "clip" represents a clipping function for setting the
value in "( )" to 0 when it is smaller than 0, and setting the value to
255 when it is larger than 255.

[0069]In this case, assuming that LY is the shift amount of a
luminance signal, and LC is the shift amount of a color difference
signal. As these shift amounts LY and LC, values determined in
advance between the video encoding apparatus and the video decoding
apparatus are used. The video encoding apparatus encodes the shift
amounts LY and LC, together with a table and encoded data, in a
predetermined unit of encoding, e.g., a frame, field, or slice, and
transmits the resultant data to the video decoding apparatus. This allows
the two apparatuses to share the shift amounts LY and LC.

[0070]In this embodiment, tables of combinations of reference picture
numbers and predictive parameters like those shown in FIGS. 9 and 10 are
prepared in the predictive parameter controller 203 in FIG. 2. Referring
to FIGS. 9 and 10, index i corresponds to prediction pictures that can be
selected for each block. In this case, four types of prediction pictures
are present in correspondence with 0 to 3 of index i. "Reference picture
number" is, in other words, the number of a local decoded video signal
used as a reference picture.

[0071]"Flag" is a flag indicating whether or not a predictive equation
using a predictive parameter is applied to a reference picture number
indicated by index i. If Flag is "0", motion compensation prediction is
performed by using the local decoded video signal corresponding to the
reference picture number indicated by index i without using any
predictive parameter. If Flag is "1", a prediction picture is generated
according to equations (4), (5), and (6) by using a local decoded video
and predictive parameter corresponding to the reference picture number
indicated by index i, thus performing motion compensation prediction.
This information of Flag is also encoded, together with a table and
encoded data, by using a value determined in advance between the video
encoding apparatus and the video decoding apparatus or in a predetermined
unit of encoding, e.g., a frame, field, or slice, in the video encoding
apparatus. The resultant data is transmitted to the video decoding
apparatus. This allows the two apparatuses to share the information of
Flag.

[0072]In these cases, a prediction picture is generated by using a
predictive parameter when index i=0 with respect to a reference picture
number 105, and motion compensation prediction is performed without using
any predictive parameter when i=1. As described above, a plurality of
predictive schemes may exist for the same reference picture number.

[0073]The table shown in FIG. 9 has predictive parameters D1(i),
D2(i), E1(i), E2(i), F1(i), and F2(i) assigned
to a luminance signal and two color difference signals in correspondence
with equations (4), (5), and (6). FIG. 10 shows an example of a table in
which predictive parameters are assigned to only luminance signals. In
general, the number of bits of a color difference signal is not very
large compared with the number of bits of a luminance signal. For this
reason, in order to reduce the amount of calculation required to generate
a prediction picture and the number of bits transmitted in a table, a
table is prepared, in which predictive parameters for color difference
signals are omitted as shown in FIG. 10 and predictive parameters are
assigned to only luminance signals. In this case, only equation (4) is
used as a predictive equation.

[0074]Equations (7) to (12) are predictive equations in a case wherein a
plurality of (two in this case) reference pictures are used.

PY(i)=(D1(i)×RY(i)+2LY.sup.-1)>>L.sub-
.Y+D2(i) (7)

PCb(i)=(E1(i)×(RCb(i)-128)+2LC.sup.-1)>&-
gt;LC+E2(i)+128 (8)

PCr(i)=(F1(i)×(RCr(i)-128)+2LC.sup.-1)>&-
gt;LC+F2(i)+128 (9)

Y=clip((PY(i)+PY(j)+1)>>1) (10)

Cb=clip((PCb(i)+PCb(j)+1)>>1) (11)

Cr=clip((PCr(i)+PCr(j)+1)>>1) (12)

[0075]The pieces of information of the predictive parameters D1(i),
D2(i), E1(i), E2(i), F1(i), F2(i), LY, and
LC and Flag are values determined in advance between the video
encoding apparatus and the video decoding apparatus or encoded, together
with encoded data, in a unit of encoding such as a frame, field, or
slice, and are transmitted from the video encoding apparatus to the video
decoding apparatus. This allows the two apparatuses to share these pieces
of information.

[0076]If a picture to be decoded is a picture having a frame structure, a
reference picture used for motion compensation prediction is also managed
as a frame regardless of whether a decoded picture as a reference picture
has a frame structure or field structure. A reference picture number is
assigned to this picture. Likewise, if a picture to be programmed is a
picture having a field structure, a reference picture used for motion
compensation prediction is also managed as a field regardless of whether
a decoded picture as a reference picture has a frame structure or field
structure. A reference picture number is assigned to this picture.

(About Syntax of Index Information)

[0077]FIG. 11 shows an example of a syntax in a case wherein index
information is encoded in each block. First of all, mode information MODE
is present for each block. It is determined in accordance with the mode
information MODE whether or not index information IDi indicating the
value of index i and index information IDj indicating the value of index
j are encoded. Encoded information of motion vector information MVi for
the motion compensation prediction of index i and motion vector
information MVj for the motion predictive compensation of index j is
added as motion vector information for each block after encoded index
information.

(About Data Structure of Encoded Bit Stream)

[0078]FIG. 12 shows a specific example of an encoded bit stream for each
block when a prediction picture is generated by using one reference
picture. The index information IDi is set after mode information MODE,
and the motion vector information MVi is set thereafter. The motion
vector information MVi is generally two-dimensional vector information.
Depending on a motion compensation method in a block which is indicated
by mode information, a plurality of two-dimensional vectors may further
be sent.

[0079]FIG. 13 shows a specific example of an encoded bit stream for each
block when a prediction picture is generated by using two reference
pictures. Index information IDi and index information IDj are set after
mode information MODE, and motion vector information MVi and motion
vector information MVj are set thereafter. The motion vector information
MVi and motion vector information j are generally two-dimensional vector
information. Depending on a motion compensation method in a block
indicated by mode information, a plurality of two-dimensional vectors may
be further sent.

[0080]Note that the above structures of a syntax and bit stream can be
equally applied to all the embodiments.

Fourth Embodiment

[0081]The fourth embodiment of the present invention will be described
next with reference to FIGS. 14 and 15. Since the overall arrangements of
a video encoding apparatus and video decoding apparatus in this
embodiment are almost the same as those in the first embodiment, only
differences from the first, second, and third embodiments will be
described. In the third embodiment, encoding on a frame basis and
encoding on a field basis are switched for each picture. In the fourth
embodiment, encoding on a frame basis and encoding on a field basis are
switched for each macroblock.

[0082]When encoding on a frame basis and encoding on a field basis are
switched for each macroblock, the same reference picture number indicates
different pictures, even within the same picture, depending on whether a
macroblock is encoded on the frame basis or on the field basis. For this
reason, with the tables shown in FIGS. 9 and 10 used in the third
embodiment, a proper prediction picture signal may not be generated.

[0083]In order to solve this problem, in this embodiment, tables of
combinations of reference picture numbers and predictive parameters like
those shown in FIGS. 14 and 15 are prepared in a predictive parameter
controller 203 in FIG. 2. Assume that when a macroblock is to be encoded
on the field basis, the same predictive parameter as that corresponding
to a reference picture number (reference frame index number) used when
the macroblock is encoded on the frame basis is used.

[0084]FIG. 14 shows a table used when the macroblock is encoded on a field
basis and a picture to be encoded is a top field. The upper and lower
rows of each field index column correspond to the top field and bottom
field, respectively. As shown in FIG. 14, frame index j and field index k
are related such that when k=2j in the top field, k=2j+1 in the bottom
field. Reference frame number m and reference field number n are related
such that when n=2m in the top field, n=2m+1 in the bottom field.

[0085]FIG. 15 shows a table used when the macroblock is encoded on a field
basis, and a picture to be encoded is a bottom field. As in the table
shown in FIG. 14, the upper and lower rows of each field index column
correspond to a top field and the bottom field, respectively. In the
table in FIG. 15, frame index j and field index k are related such that
when k=2+1 in the top field, k=2j in the bottom field. This makes it
possible to assign a small value as field index k to an in-phase bottom
field. The relationship between reference frame number m and reference
field number n is the same as that in the table in FIG. 14.

[0086]When the macroblock is to be encoded on a field basis, a frame index
and field index are encoded as index information by using the tables
shown in FIGS. 14 and 15. When the macroblock is to be encoded on a frame
basis, only the frame index common to the tables in FIGS. 14 and 15 is
index-encoded as index information.

[0087]In this embodiment, predictive parameters are assigned to a frame
and field by using one table. However, a table for frames and a table for
fields may be separately prepared for one picture or slice.

[0088]Each embodiment described above has exemplified the video
encoding/decoding scheme using orthogonal transformation on a block
basis. Even if, however, another transformation technique such as wavelet
transformation is used, the technique of the present invention which has
been described in the above embodiments can be used.

[0089]Video encoding and decoding processing according to the present
invention may be implemented as hardware (apparatus) or software using a
computer. Some processing may be implemented by hardware, and the other
processing may be performed by software. According to the present
invention, there can be provided a program for causing a computer to
execute the above video encoding or video decoding or a storage medium
storing the program.