G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques

G10L19/16—Vocoder architecture

G10L19/18—Vocoders using multiple modes

Abstract

A maximum of Nmax bits for encoding is defined for a set of parameters which may be calculated from a signal frame. The parameters for a first sub-set are calculated and encoded with N0 bits, where N0 < Nmax. The allocation of Nmax - N0 encoding bits for the parameters of a second sub-set are determined and the encoding bits allocated to the parameters for the second sub-set are classified. The allocation and/or order of classification of the encoding bits are determined as a function of the encoding parameters for the first sub-set. For a total of N available bits for the encoding of the total parameters (N0 < N = Nmax), the parameters for the second sub-set allocated the N - N0 encoding bits classified the first in said order are selected. Said selected parameters are calculated and encoded to give the N - N0 bits. The N0 encoding bits for the first sub-set and the N - N0 encoding bits for the selected parameters for the second sub-set are finally introduced into the output sequence of the encoder.

Description

AT

ENCODING METHOD AND DECODING AUDIO VARIABLE RATE

The present invention relates to coding devices and decoding audio signals, intended in particular to take place in applications for transmitting or storing audio signals (speech and / or sound) digitized and compressed.

More particularly, this invention relates to having the capability audio coding systems to provide various flow rates, also called multi-rate coding systems. Such systems are distinguished from fixed rate coders by their ability to change the speed of coding, possibly during treatment, which is particularly suitable for transmission over heterogeneous access networks, whether such networks IP mixing fixed and mobile access, broadband (ADSL), low flow rates (PSTN modems, GPRS) or involving variable capacity of terminals (mobile, PC, ...).

There are basically two categories of multirate coders: that of multirate coders "switchable" and that of "hierarchical" coders.

The multirate coders "switchable" based on a coding structure belonging to one technological family (temporal coding, or frequency, for example, CELP, sinusoidal, or transform), wherein a rate indication is simultaneously supplied to the encoder and decoder . The encoder uses this information to select the parts of the algorithm and the relevant tables for the selected flow. The decoder operates symmetrically. Many switchable multirate coding structures have been proposed for audio coding. This is the case for standard mobile encoders that organization 3GPP ( "3rd Generation Partnership Project"), the NB-AMR ( "Narrow Band Adaptive Multi-Rate" Technical Specification 3GPP TS 26.090, Version 5.0.0 June 2002) telephone band, or WB-AMR ( "Wide band Adaptive Multi-Rate" Technical Specification 3GPP TS 26 190 version 5.1.0, December 2001) wideband. These coders operate over fairly wide ranges of flow rates (4.75 to 12.2 kbit / s for NB-AMR, and 6.60 to 23.85 kbit / s for the WB-AMR), with enough granularity large (8 rates for NB-AMR and WB-9 for AMR). However, the price to pay for this flexibility is a rather substantial structural complexity to achieve accommodate these flows, these coders must support many different options, various quantization tables etc. The performance curve gradually increases with the flow, but the progress is not linear and some flows are inherently better optimized than others.

In said coding systems "hierarchical", also called "scalable", the binary data from the encoding operation can be divided into successive layers. A base layer, also called

"Core" is formed of the binary elements strictly necessary for the decoding of the bitstream, and determining a minimum decoding quality.

The following layers make it possible to gradually improve the quality of the signal from the decoding operation, each new layer bringing new information, which, operated by the decoder, outputs an increasing signal quality.

One of the hierarchical coding features is the possibility to intervene at any level of the chain of transmission or storage to remove part of the bit stream without having to provide specific guidance to the encoder or decoder. The decoder uses the binary information it receives and produces a corresponding signal quality.

The field of hierarchical coding structures has also given rise to numerous studies. Some hierarchical coding structures operate from a single type of encoder, designed to deliver prioritized coded information. When additional layers improve the quality of the output signal without changing the bandwidth, rather it is called "nested encoders" (see eg RD lacovo et al., "Embedded CELP Coding for Variable Bit-Rate Between 6.4 and 9.6 kbit / s ", Proc. ICASSP, 1991, pp. 681-686). This type of encoder does not allow large gaps between the lowest and highest available throughput.

The hierarchy is often used to gradually increase the bandwidth of the signal: the kernel provides a baseband signal, for example telephone (300-3400 Hz), and the following layers are used to code additional frequency bands (e.g., band enlarged up to 7 kHz, HiFi band up to 20 kHz or intermediates, ...). Encoders or subband coders using a time-frequency transformation, such as described in the document "Subband / Transform coding using filter banks design based on time domain aliasing cancellation" JP Princen et al. (Proc. IEEE ICASSP-87, pp. 2161-2164) and "High Quality Audio Transform Coding at 64 kbit / s", by Y. Mahieux et al. (IEEE Trans. Commun, Vol. 42, No. 11, November 1994, pp. 3010-3019), are particularly suitable for such operations.

It is another frequent hand to use a technique different coding for the core and for the module or modules encoding the additional layers, then we speak of different coding stages, each stage consisting of a sub-coder. The sub-encoder stage of a given level may either encode parts of the non-coded signal by the preceding stages, or encode coding residue from the previous stage, the residue is obtained by subtracting the decoded signal to the signal original.

The advantage of such structures is that they can get off at relatively low flow rates with sufficient quality, while producing a good quality broadband. Indeed, the techniques used for low flow are generally not effective at high speeds and vice versa.

Such structures for using two different technologies

(Eg CELP and time-frequency transform ...) are particularly effective for scanning large ranges of flow rates.

However, the hierarchical coding structures proposed in the prior art precisely define the rate allocated to each of the intermediate layers. Each layer corresponds to the encoding of certain parameters, and the granularity of the hierarchical bit stream depends on the rate allocated to these parameters (typically a layer may contain of the order of a few tens of bits per frame, a signal frame consisting of a number of signal samples over a given period, the example described below whereas a frame of 960 samples corresponds to 60 ms of signal).

In addition, when the bandwidth of the decoded signals may vary depending on the level of bits layers, changing the line rate can produce annoying artifacts when listening.

The present invention is intended to provide a multirate coding solution that overcomes the disadvantages mentioned in the case of the use of existing hierarchical and switchable codings.

The invention thus provides a method of encoding a digital audio signal frame into a binary output sequence, wherein a maximum number Nmax of coding bits is defined for a set of parameters calculable from the signal frame, composed of a first and a second subsets. The proposed method comprises the following steps:

- the parameters of the first subset is computed, and code these parameters on a number N0 of bits of coding such that N0 <Nmax;

- an allocation of Nmax is determined - N0 coding bits for the parameters of the second subset; and

- Nmax is the class - N0 coding bits allocated to the parameters of the second subset in a determined order.

The allocation and / or the order of ranking of the Nmax - N0 coding bits are determined based on coded parameters of the first subset. The coding method further comprises the following steps in response to an indication of a number N of bits of the bit sequence output available for encoding said set of parameters, with N0 <N <Nmax:

- N are selected parameters of the second subset that are allocated - N0 coding bits ranked first in said order; - the selected parameters is calculated from the second subset, and these parameters are encoded to produce the N - N0 coding bits ranked first; and

- is inserted into the coding N0 bit output sequence of the first subset as well as the N - N0 coding bits of the selected parameters of the second subset.

The method according to the invention allows to define a multi-rate coding, which will operate at least within a range corresponding to each frame to a number of bits ranging from N0 to Nmax.

Can thus be considered that the concept of pre-set flow rates which is linked to the existing switchable and hierarchical coding is replaced by a concept of "cursor", making it possible to freely vary the flow rate between a minimum value (which can optionally correspond to a number of bits N less than N0) and a maximum value (corresponding to Nmax). These extremes are potentially remote. The process offers good performance in terms of coding efficiency regardless of the selected flow.

Advantageously, the number N of bits of the bit sequence output is strictly less than Nmax. The encoder then is remarkable that the employee bit allocation does not refer to the actual encoder output rate, but another number Nmax agreed with the decoder.

It is possible to fix Nmax = N as a function of the instantaneous flow available on a transmission channel. The output sequence of such a multirate encoder switchable can be processed by a decoder which does not receive the entire sequence, since it is able to recover the structure of the coding bits of the second subset through the knowledge N.sub.max.

Another case where we can have N = N max is the audio data storage maximum encoding rate. During a reading of N 'bits of this stored content lower flow rate, the decoder can recover the structure of the coding bits of the second subset since N' ≥ N0. The bits of the rank order of coding allocated to the parameters of the second subset may be a pre-established order.

In a preferred embodiment, the bit rank order coding allocated to the parameters of the second subset is variable. It may especially be a descending order determined based on at least the coded parameters of the first subset. Thus the decoder that receives a bit sequence of N 1 bits for the frame, with N0 <N '<N <Nmax, deduct this order of N0 bits received for encoding the first subset.

The allocation of Nmax - N0 coding bits of the parameters of the second subset may be carried fixedly (in this case, the bits of these ranking is based at least the coded parameters of the first subset).

In a preferred embodiment, the allocation of Nmax - N0 coding bits of the parameters of the second subset is a function of the coded parameters of the first subset.

Advantageously, this bit rank order of coding allocated to the parameters of the second subset is determined with the aid of at least one psychoacoustic criterion as a function of the coded parameters of the first subset.

The parameters of the second subset can relate to spectral bands of the signal. In this case, the method advantageously comprises a step of estimating a spectral envelope of the signal coded from the coded parameters of the first subset and a step of computing a frequency masking curve by applying an auditory perception model to the estimated spectral envelope, and the psychoacoustic criterion makes reference to the level of the estimated spectral envelope with respect to the masking curve in each spectral band.

In one embodiment, it commands the encoding bits in the output sequence in such a way that the N0 coding bits of the first subset precede the N - NO coding bits of the selected parameters of the second subset and that the respective coding bits of the selected parameters of the second subset is displayed in the order determined for said coding bits. This allows, in case the binary sequence is truncated, receive the most important part.

The number N may vary from one frame to another, including for example according to the available capacity of the transmission resource.

Multirate audio coding according to the present invention may be used as a switchable or hierarchical mode very flexible, since any number of bits to be transmitted freely chosen between N0 and Nmax .Can be selected at any time, that is to say frame by frame.

The coding of the first subset of parameters can be variable speed, thereby varying the number N0 from one frame to another. This helps to better adjust the distribution of bits depending frames to be encoded.

In one embodiment, the first subset comprises parameters calculated by an encoder core. Advantageously the encoder ring has an operating frequency band lower than the passband of the signal to be coded, and the first subset furthermore comprises energy levels of the associated audio signal in frequency bands higher than the core of the operating band encoder. This type of structure is that of a hierarchical encoder at two levels, which supplies for example via the core encoder an encoded considered a sufficient quality signal and depending on the available bit rate, complete the encoding performed by the encoder ring by additional information from the encoding method according to the invention.

Preferably, it then instructs the first coding bit subset in the output sequence so that the coding bits of the parameters calculated by the core coder are immediately followed by the coding bits of the energy levels associated with bands higher frequencies. This ensures the same bandwidth for encoded frames successively as soon as the decoder receives enough bits to have the encoder kernel information and coded energy levels associated with the higher frequency bands.

In one embodiment, it is estimated a difference signal between the signal to be coded and a synthesis signal derived from the coded parameters produced by the encoder core, and the first subset furthermore comprises energy levels of the difference signal associated with frequency bands included in the operating band of the core coder.

A second aspect of the invention relates to a method for decoding an input bit sequence to synthesize a digital audio signal corresponding to the decoding of a coded frame according to the coding method of the invention. According to this method, a maximum number Nmax of coding bits is defined for a set of description parameters of a signal frame, consisting of a first and a second subsets. The input sequence comprises, for a signal frame, a number N 'of coding bits for said set of parameters, with 1 N <Nmax. The decoding method according to the invention comprises the following steps:

- extracting, from said N 'bits of the input sequence, a bit number N0 of coding parameters of the first subset if N0 <N 1; - recovering the parameters of the first subset on the basis of said N0 coding bits extracted;

- an allocation of Nmax is determined - N0 coding bits for the parameters of the second subset; and

- Nmax is the class - N0 coding bits allocated to the parameters of the second subset in a determined order.

The allocation and / or the order of ranking of the Nmax - N0 coding bits are determined based on the parameters retrieved from the first subset. The decoding method further comprises the steps of:

- selecting the parameters of the second subset that are allocated N '- N0 coding bits ranked first in said order; - extracting, from said N 'bits of the input sequence, N' - NO coding bits of the selected parameters of the second subset;

- recovering the selected parameters of the second subset on the basis of said N '- NO extracted coding bits; and - signal frame is synthesized using the recovered parameters of the first and second subsets.

This decoding process is advantageously combined with methods to regenerate missing parameters due to the truncation of the bit sequence produced Nmax, virtually or not, by the encoder.

A third aspect of the invention relates to an audio encoder, comprising digital signal processing means arranged to implement an encoding method according to the invention.

Another aspect of the invention relates to an audio decoder, comprising digital signal processing means arranged to implement a decoding method according to the invention.

Other features and advantages of the present invention appear from the following description of nonlimiting exemplary embodiments, with reference to the accompanying drawings, in which: - Figure 1 is a block diagram of an exemplary audio encoder according the invention;

- Figure 2 shows a bit sequence of N-bit output in one embodiment of the invention; and

- Figure 3 is a block diagram of an audio decoder according to the invention.

The encoder shown in Figure 1 has a hierarchical structure with two coding stages. A first 1 Coding floor consists for example of an encoder core telephone band (300-3400 Hz) of CELP. This encoder is in this example a standardized G.723.1 coder by the ITU-T ( "International Telecommunication Union") in fixed mode at 6.4 kbit / s. It calculates G.723.1 parameters in accordance with the standard and quantified by means of 192-bit coding P1 per frame of 30 ms.

The second coding stage 2, for increasing the bandwidth to the wideband (50-7000 Hz), operates on the encoding residue E of the first stage, supplied by a subtractor 3 in the diagram of Figure 1. A signal synchronization module 4 delays S audio signal frame of the time taken by the processing core encoder 1. Its output is sent to the subtractor 3 which subtracts the synthetic signal S 'equal to the core decoder output operating on the basis of quantized parameters as represented by the output bits P1 of the encoder ring. As usual, the encoder 1 incorporates a local decoder providing S '.

The S coding audio signal, for example, bandwidth

7 kHz, being sampled at 16 kHz. A frame consists for example of 960 samples or 60 ms signal or two elementary frames of the G.723.1 coder core. As it operates on signals sampled at

8 kHz, the S signal is subsampled by a factor 2 to the coding core entry 1. Similarly, the synthetic signal S 'is upsampled to 16 kHz output of the coder core 1.

The second stage 2 comprises a 5 time-frequency processing module, for example of the MDCT ( "Modified Discrete Cosine Transform") to which is sent the E residue obtained by the subtracter 3. In practice, the operation of the modules 3 and 5 shown in Figure 1 can be achieved by performing the following operations for each sub-frame of 20 ms: - transformation of the MDCT input signal S delayed by module 4 which provides 320 MDCT coefficients. The spectrum is limited to 7225 Hz, only the first 289 MDCT coefficients are different from 0;

- MDCT transformation of the synthetic signal S '. As it is the spectrum of a voice band signal, only the first 139 MDCT coefficients are different from 0 Gusqu'à 3450 Hz); and

- calculation of the difference spectrum between the above spectra.

The resulting spectrum is distributed into several strips of different widths by a module 6. For example, the bandwidth of the G.723.1 encoded may be divided into strips 21 while the higher frequencies are divided into 11 additional bands. In these 11 additional bands, the residue E is identical to the input signal S.

A module 7 performs the coding of the spectral envelope of E. residue He begins by calculating the energy of the MDCT coefficients of each band of the difference spectrum. These energies are hereinafter called "scaling factors". 32 the scale factors form the spectral envelope of the difference signal. Module 7 then proceeds to quantification in two parts. The first portion corresponds to the telephone band (first 21 bands, of 0-3450 Hz), the second to the high bands (11 past strips, 3450-7225 Hz). In each game, the first scaling factor is quantified in absolute and the following differential, using a conventional Huffman variable rate. The 32 scale factors are quantized on a variable number N2 (i) of bits P2 for each rank i subframe (i = 1, 2, 3).

A module 8 normalizes the MDCT coefficients divided into bands by the module 6, by dividing them by the scale factors quantized FQ respectively determined for these strips. The normalized spectra and are provided to the quantification module 9 which uses a vector quantization scheme known type. The quantization bits from the module 9 are denoted P3 in Figure 1.

An output multiplexer 10 collects the bits P1, P2 and P3 from the modules 1, 7 and 9 to form the binary sequence Φ encoder output.

According to the invention, the total number of bits N of the output sequence representing a current frame is not necessarily equal to Nmax. It can be less him. However, the allocation of the bands quantizing bits is performed based on the number Nmax.

In the diagram of Figure 1, this allocation is performed for each sub-frame by the module 12 from the number Nmax - N0, the scale factors quantized FQ and a spectral masking curve calculated by a module 11.

The operation of this last module 11 is as follows. It first determines an approximate value of original spectral envelope of the signal S from the difference signal, as quantified by the module 7, and that it determines with the same resolution for the synthetic signal S 'resulting encoder core. These last two envelopes are also determinable by a decoder which would only have parameters of the first-mentioned subset. Thus, the estimated spectral envelope of the signal S will also be available to the decoder. Then, the module 11 calculates a spectral masking curve by applying in a known manner, a model of auditory perception band by band to the estimated original spectral envelope. This curve 11 provides a masking level for each band.

The module 12 performs dynamic allocation of Nmax - N0 remaining bits of the sequence from the Φ 3 * 32 bands of the three transformations

MDCT of the difference signal. In the implementation of the invention herein set out, according to a perceptual criterion psychacoustique importance referring to the level of the estimated spectral envelope with respect to the masking curve in each band, is allocated to each band a rate proportional to this level. Other classification criteria are used.

Following this bit allocation, 9 knows how bit module are to be considered for the quantification of each band in each subframe.

However, if N <Nmax, these bits allocated will not necessarily all used. An ordering of bands representing the bits is performed by a module 13 according to a perceptual importance criterion. The module 13 classifies 3 x 32 bands in a descending order of importance which may be the descending order of the signal-to-mask ratios (ratio between the estimated spectral envelope and the masking curve in each band). This order is used for the construction of the binary sequence Φ in accordance with the invention.

Depending on the number N of bits desired in the Φ sequence for encoding the current frame, are determined bands are quantified by the module 9 by selecting the bands listed first by the module 13 and by retaining for each selected band a number of bits as determined by the module 12.

Then the MDCT coefficients of each selected band are quantized by the module 9, for example using a vector quantizer, in accordance with number of allocated bits to produce a total number of bits equal to N - N0.

The output of multiplexer 10 is the Φ bit sequence consisting of the first N bits of the following ordered sequence shown in Figure 2 (if N = Nmax): a / first bitstreams corresponding to the two G.723.1 frames (384 bits ); b / then the bits

quantization scale factors for the three sub-frames (i = 1, 2, 3) of the 22nd spectral band (first band beyond the telephone band) to the 32nd band (Huffman coding variable rate) ;

96 bands in order of perceptual importance, the most important band for the lower band, in the order determined by the module 13.

Placing first (a and b) G.723.1 parameters and scale factors of the high bands will keep the same bandwidth for the signal restorable by the decoder regardless of the actual flow beyond a minimum value corresponding to the reception of the groups a and b. This minimum value, sufficient to Huffman coding of the 3 x 11 = 33 scale factors of the high bands in addition to the G.723.1 coding, is for example 8 kbit / s.

The above encoding method enables a decoding of the frame if the decoder receives N 'bits with N0 ≤ N' ≤ N. This number N 'will generally variable from one frame to another.

A decoder according to the invention corresponding to this example is illustrated in Figure 3. A demultiplexer 20 separates the bit sequence received Φ 'to extract the code bits P1 and P2. The 384 bits P1 are supplied to the decoder 21 of G.723.1-type core so that it synthesizes two frames of the base signal S 1 in the telephone band. P2 bits are decoded according to the Huffman algorithm by a module 22 which thus recovers the quantized scale factors FQ for each of the three subframes.

A module 23 for calculating the masking curve, identical to 11 of the encoder of Figure 1, receives the base signal S 'and factors quantized scales FQ and produces spectral masking levels for each of the 96 bands. From these masking levels, of the quantized scale factors FQ and knowledge of the number Nmax (as well as that of the number N0 which is derived from the bits of the Huffman decoding P2 by the module 22), a module 24 determines a bit allocation in the same way as the module 12 of Figure 1. in addition, a module 25 performs scheduling of the strips according to the same criterion to rank the module 13 described with reference to FIG 1.

According to information provided by the modules 24 and 25, the module

26 extracts the bits P3 of the input sequence Φ 'and synthesizes the normalized MDCT coefficients for the bands represented in the sequence Φ. If applicable (N '<Nmax), the normalized MDCT coefficients relating to the missing bands may also be synthesized by interpolation or extrapolation as described below (module 27). These missing bands may have been eliminated by the encoder due to truncation N <Nmax, or they may have been eliminated during transmission (N '<N)

The normalized MDCT coefficients, synthesized by the module 26 and / or the module 27, are multiplied by their respective quantized scale factors (multiplier 28) before being presented to the module 29 which performs the inverse transformation frequency-time processing MDCT performed by the 5th of the encoder module. The time correction resulting signal is added to the synthetic signal S 'outputted from the decoder 21 core

(Adder 30) for generating the output audio signal S of the decoder.

It should be noted that the decoder may synthesize the same signal S in cases where it does not receive the first N0 bits of the sequence.

It is sufficient to receive 2 x N1 bits corresponding to part a of the above enumeration, decoding then being in a "degraded" mode.

Only this degraded mode does not use the MDCT synthesis to obtain the decoded signal. To ensure hitless switching between this mode and the other modes, the decoder performs three MDCT analysis followed by three MDCT synthesis, allowing the updating of memories of the MDCT. The output signal contains a telephone band signal quality. If the 2 x first N1 bits are not even received, the decoder considers the corresponding frame as erased and can use an algorithm known concealing erased frames.

If the decoder receives two bits N1 corresponding to the part more bits of part b (high bands of the three spectral envelopes), it may start synthesizing a wideband signal. It may include the following steps.

1 / The module 22 retrieves parts of the three spectral envelopes received.

2 / Not received bands have their scale factor temporarily set to zero. 3 / The lower parts of the spectral envelopes are calculated from the MDCT analysis performed on the signal obtained after the G.723.1 decoding, and the module 23 calculates the three masking curves on the envelopes thus obtained.

4 / The spectral envelope is adjusted for regulating avoiding holes due to non-received bands: zero values ​​in the upper part of the spectral envelopes FQ are replaced by for example one hundredth of the value of the masking curve calculated previously, such that they remain inaudible. The full spectrum of low bands and the spectral envelope of the high bands are known at this stage.

5 / The module 27 then generates the high spectrum. The fine structure of these bands is generated by reflection of the fine structure of the neighborhood known before weighting by the scale factors (multiplier 28). In the case where none of the P3 bit is received, the "known neighborhood" corresponds to the spectrum of the signal S 'produced by the core decoder

G.723.1. His "thinking" may be to copy the value of the normalized MDCT spectrum, possibly with a reduction of its variations proportional to the distance of the "known neighborhood".

6 / After transformation inverse MDCT (29) and adding (30) the correction signal resulting to the output signal of the decoder core, the synthesized wideband signal is obtained.

In the case where the decoder also receives at least a portion of the low spectral envelope of the differential signal (c), it may or may not take into account this information to narrow the spectral envelope in step 3.

If the decoder 10 receives enough P3 bits to decode at least the MDCT coefficients of the most important band listed first in the portion of the sequence, then the module 26 recovers some of the MDCT coefficients normalized by the allocation and scheduling indicated by the modules 24 and 25. These MDCT coefficients therefore do not need to be interpolated as in step 5 above. For the other bands, the process of steps 1 to 6 is applied by the module 27 in the same manner as above, the knowledge of the MDCT coefficients received for certain bands allowing more reliable interpolation in step 5.

Unreceived bands may vary from subframe to the next MDCT. The "known neighborhood" of a missing band may correspond to the same band in another subframe where it does not fail, and / or one or more nearest bands in the frequency domain in the same sub- frame. It is also possible to regenerate a missing MDCT spectrum in a band for a sub-frame by a weighted sum of contributions evaluated from several bands / subframes of the "known neighborhood".

Insofar as the actual speed of N 'bits per frame arbitrarily setting the last bit of a given frame, the last coded parameter transmitted may, as appropriate, be transmitted completely or partially. Two cases may then arise: - either the coding structure adopted allows to exploit the received partial information (case quantifiers scalar or vector quantization partitioned dictionaries)

- or it does not allow it and treated incompletely received parameter as other unsuccessful parameters. Note that, in the latter case, if bit order varies every frame, and the number of lost bits is variable and selecting N 'average produce bit on all the decoded frames, better quality that would be obtained with a smaller number of bits.

Claims

1. A method of coding a digital audio signal frame (S) into a binary output sequence (Φ), wherein a maximum number Nmax of coding bits is defined for a set of parameters calculable from the frame signal, consisting of a first and second subsets, the method comprising the steps of:

- calculating the parameters of the first subset, and coding these parameters on a number N0 of bits of coding such that N0 <Nmax;

- determining an allocation of Nmax - N0 coding bits for the parameters of the second subset; and

- classify the Nmax - N0 coding bits allocated to the parameters of the second subset in a determined order, in which the allocation and / or the order of ranking of the Nmax - N0 coding bits is determined based on coded parameters of first subset, the method further comprising the following steps in response to an indication of a number N of bits of the bit sequence output available for encoding said set of parameters, with N0 <N ≤ Nmax:

- selecting parameters of the second subset that are allocated the N - N0 coding bits ranked first in said order; - calculating the selected parameters of the second subset, and coding these parameters to produce said N - N0 coding bits ranked first; and

- insert in the N0 coding bits output sequence of the first subset as well as the N - N0 coding bits of the selected parameters of the second subset.

2. The method of claim 1, wherein the bits of the rank order of coding allocated to the parameters of the second subset is variable from one frame to another.

3. The method of claim 1 or 2, in which N <Nmax.

4. A method according to any preceding claim, wherein the bits of the rank order of coding allocated to the parameters of the second subset is an order of decreasing importance determined as a function of at least the coded parameters of the first subset.

5. The method of claim 4, wherein the bits of the rank order of coding allocated to the parameters of the second subset is determined with the aid of at least one psychoacoustic criterion as a function of the coded parameters of the first subset .

6. The method of claim 5, wherein the parameters of the second subset pertain to spectral bands of the signal, wherein -Self a spectral envelope of the encoded signal from the encoded parameters of the first subset, wherein a frequency masking curve is calculated by applying an auditory perception model to the estimated spectral envelope, and in which the psychoacoustic criterion makes reference to the level of the estimated spectral envelope with respect to the masking curve in each spectral band.

7. A method according to any one of claims 4 to 6, in which Nmax = N.

N0 coding bits of the selected parameters - 8. A method according to any preceding claim, wherein N is the order of coding bits in the output sequence in such a way that the N0 coding bits of the first subset precede the second subset and that the respective coding bits of the selected parameters of the second subset is displayed in the order determined for said coding bits.

9. A method according to any preceding claim, wherein the number N varies from one frame to another.

10. A method according to any preceding claim, wherein the encoding parameters of the first subset is at variable bit rate, thereby varying the number N0 from one frame to another.

11. A method according to any preceding claim, wherein the first subset comprises parameters calculated by a core coder (1).

12. The method of claim 11, wherein the encoder core (1) has an operating frequency band lower than the passband of the signal to be coded, and wherein the first subset furthermore comprises energy levels of the audio signal associated with frequency bands higher than the core encoder strip running.

13. A method according to each of Claims 8 and 12, wherein the coding bits are ordered in the first subset in the output sequence so that the coding bits of the parameters calculated by the core coder are immediately followed by the bits coding of the energy levels associated with the higher frequency bands.

14. A method according to any one of claims 11 to 13, wherein estimating a difference signal between the signal to be coded and a synthesis signal derived from the coded parameters produced by the encoder ring, and wherein the first subset further comprises energy levels of the difference signal associated with frequency bands included in the operating band of the core coder.

15. A method according to claim 8 and any one of claims 12 to 14, wherein the coding bits are ordered in the first subset in the output sequence so that the coding bits of the parameters calculated by the core encoder (1) are followed by energy levels of coding bits associated with the frequency bands.

16. A method of decoding a binary input sequence (Φ ') for synthesizing a digital audio signal (S), wherein a maximum number Nmax of coding bits is defined for a set of description parameters of a frame signal, composed of a first and a second sub-assemblies, the input sequence comprising, for a signal frame, a number N 'of coding bits of said set of parameters, with N' <Nmax, the method comprising the steps of:

- extracting, from said N 'bits of the input sequence, a bit number N0 of coding parameters of the first subset if N0 <N 1;

- retrieve the parameters of the first subset on the basis of said N0 coding bits extracted; - determining an allocation of Nmax - N0 coding bits for the parameters of the second subset; and

- classify the Nmax - N0 coding bits allocated to the parameters of the second subset in a determined order, in which the allocation and / or the order of ranking of the Nmax -N0 coding bits is determined as a function of the recovered parameters of the first subset, the method further comprising the steps of:

- selecting parameters of the second subset that are allocated the N '- N0 coding bits ranked first in said order;

- extracting, from said N 'bits of the input sequence, N' - N0 coding bits of the selected parameters of the second subset;

- recovering the selected parameters of the second subset on the basis of said N '- N0 coding bits extracted; and

- synthesizing the signal frame by using the recovered parameters of the first and second subsets.

17. The method of claim 16, wherein the bits of the rank order of coding allocated to the parameters of the second subset is variable from one frame to another.

18. The method of claim 16 or 17, wherein N 1 <Nmax.

19. A method according to any one of claims 16 to 18, wherein the order of ranking of the coding bits allocated to the parameters of the 004/070706

23

the second subset is an order of decreasing importance determined as a function of at least the recovered parameters of the first subset.

20. The method of claim 19, wherein the bits of the rank order of coding allocated to the parameters of the second subset is determined with the aid of at least one psychoacoustic criterion as a function of the recovered parameters of the first subset .

21. The method of claim 20, wherein the parameters of the second subset pertain to spectral bands of the signal, wherein a spectral envelope of the signal is estimated using the recovered parameters of the first subset, wherein it calculates a frequency masking curve by applying an auditory perception model to the estimated spectral envelope, and in which the psychoacoustic criterion makes reference to the level of the estimated spectral envelope with respect to the masking curve in each spectral band.

22. The method of any of claims 16 to 21, wherein the N0 coding bits of the parameters of the first subset are extracted from the N 'bits received at positions of the sequence which precede the positions from which are extracted N '- N0 coding bits of the selected parameters of the second subset.

23. A method according to any one of claims 16 to 22, wherein, to synthesize the signal frame, it is estimated unselected parameters of the second subset by interpolation from at least selected parameters recovered on the basis said N '- N0 coding bits extracted.

24. A method according to any one of claims 16 to 23, wherein the first subset comprises input parameters of a core decoder (21).

25. The method of claim 24, wherein the decoder core

(21) has an operating frequency band lower than the band 04/070706

24

bandwidth of the signal to be synthesized, and wherein the first subset furthermore comprises energy levels of the associated audio signal in frequency bands higher than the operating band of the core decoder.

26. A method according to each of claims 22 and 25, wherein the first subset coding bits in the input sequence are ordered in such a way that the coding bits of the decoder core of the input parameters (21) are immediately followed by the coding bits of the energy levels associated with the higher frequency bands.

27. The method of claim 26, comprising the following steps if the N 'bits of the input sequence (Φ 1) are limited to the coding bits of the decoder core of the input parameters (21) and at least a portion of the coding bits of the energy levels associated with the higher frequency bands: - extracting from the input sequence the coding bits of the decoder core of the input parameters and said portion of the energy levels of coding bits;

- synthesizing a base signal (S ') in the core decoder and recovering energy levels associated with the higher frequency bands on the basis of the extracted coding bits;

- calculating a spectrum of the base signal;

- assigning an energy level to each higher band with which is associated an uncoded energy level in the input sequence;

- synthesizing spectral components for each higher frequency band from the corresponding energy level and of the spectrum of the base signal in at least one band of said spectrum;

- applying a transformation to the time domain to the synthesized spectral components to obtain a correction signal from the basic signal; and - summing the base signal and the correction signal to synthesize the signal frame.

28. The method of claim 27, wherein the energy level assigned to a higher band with which is associated an uncoded energy level in the input sequence is a fraction of a perceptual masking level calculated from the spectrum of the basic signal and energy levels recovered based on the extracted coding bits.

29. A method according to any one of claims 24 to 28, wherein synthesizing a base signal (S ') in the core decoder, and wherein the first subset furthermore comprises energy levels of a signal difference between the signal to be synthesized and the base signal associated with frequency bands included in the operating band of the core coder.

30. A method according to any one of claims 25, 26 and 29, wherein, for N0 <N '<Nmax, it is estimated unselected parameters of the second subset relating to spectral components in frequency bands at using a calculated spectrum of the base signal and / or selected parameters recovered on the basis of said N '- N0 coding bits extracted.

31. The method of claim 30, wherein the unselected parameters of the second subset in a frequency band are estimated with the aid of a spectral neighborhood of said band, determined on the basis of the N 'coding bits of the input sequence.

32. The method of claim 22 and any one of claims 25 to 31, wherein the coding bits of the decoder core input parameters (21) are extracted from the N 'bits received at positions of the preceding sequence the positions from which are extracted the coding bits of the energy levels associated with the frequency bands.

33. A method according to any one of claims 16 to 32, wherein the number N 'varies from one frame to another.

34. A method according to any one of claims 16 to 33, wherein the number N0 varies from one frame to another.

35. An audio encoder, comprising digital signal processing means arranged to implement an encoding method according to any one of claims 1 to 15.

36. An audio decoder, comprising digital signal processing means arranged to implement a decoding method according to any one of claims 16 to 34.

PCT/FR2003/0038702003-01-082003-12-22Method for encoding and decoding audio at a variable rate
WO2004070706A1
(en)