Abstract:

The techniques described are utilized for detection of noise and
noise-like segments in audio coding. The techniques can include
performing a prediction gain calculation, an energy compaction
calculation, and a mean and variation energy calculation. Signal adaptive
noise decisions can be made both in time and frequency dimensions. The
techniques can be embodied as part of an AAC (advanced audio coding)
encoder to detect noise and noise-like spectral bands. This detected
information is transmitted in a bitstream using a signaling method
defined for a perceptual noise substitution (PNS) encoding tool of the
AAC encoder

Claims:

1. A method, comprising:calculating mean and variance energies for each
frequency band of a signal;defining boundaries for a ratio of the mean
and variance energies in each frequency band of the signal;
anddetermining if each frequency band of the signal is noise or
noise-like using the defined boundaries and a stage of two or more
decisions.

2. The method of claim 1, further comprising predicting gain for spectral
samples corresponding to each frequency band of a signal.

3. The method of claim 2, wherein predicting gain for spectral samples
corresponding to each frequency band of a signal comprises applying
linear predictive coding principles and accumulating resulting gain.

4. The method of claim 1, further comprising transmitting energy levels
for each frequency band.

5. The method of claim 4, wherein the energy levels are transmitted using
a signal defined for a perceptual noise substitution encoding tool of an
encoder.

6. The method of claim 1, further comprising providing signal-adaptive
noise decisions in time and frequency dimensions.

7. The method of claim 1, further comprising determining if each frequency
band is tonal or tonal-like using the defined boundaries.

8. An apparatus, comprising:an encoder configured to determine noise or
noise-like characteristics in frequency bands of signals using boundaries
defined from a ratio of mean and variance energies in each frequency band
and a stage of two or more decisions.

9. The apparatus of claim 8 further comprising a processor configured to
receive signals.

10. The apparatus of claim 8, wherein the encoder is an advanced audio
coding (AAC) encoder.

11. The apparatus of claim 8, wherein the defined boundaries may change
over time.

12. The apparatus of claim 8, wherein the encoder determines if each
frequency band is tonal or tonal-like using the defined boundaries.

13. An apparatus, comprising:an encoder configured to determine noise or
noise-like characteristics in frequency bands of communication signals
using boundaries defined from a ratio of mean and variance energies in
each frequency band and a stage of two or more decisions.

14. The apparatus of claim 13 further comprising a memory configured to
contain programmed instructions and communication signals.

15. The apparatus of claim 13 wherein the encoder is further configured to
predict gain for spectral segments for each frequency band.

16. The apparatus of claim 13 wherein the encoder is further configured to
predict gain using linear predictive coding.

17. The apparatus of claim 13 further comprising an interface configured
to transmit energy levels for each frequency band.

18. A computer program product embodied on a computer readable medium that
estimates and detects noise and noise-like spectral signal segments, the
computer program product comprising:code for calculating mean and
variance energies for each frequency band of a signal;code for defining
boundaries for a ratio of the mean and variance energies in each
frequency band of the signal; andcode for determining if each frequency
band of the signal is noise or noise-like using the defined boundaries
and a stage of two or more decisions.

19. The computer program product of claim 18, further comprising code for
determining if each frequency band is tonal or tonal-like using the
defined boundaries.

20. The computer program product of claim 18, further comprising code for
predicting gain for spectral samples corresponding to each frequency band
of a signal.

21. The computer program product of claim 18, further comprising code for
transmitting energy levels for each frequency band.

22. The computer program product of claim 21, wherein the energy levels
are transmitted using a signal defined for a perceptual noise
substitution encoding tool of an encoder.

Description:

RELATED APPLICATION

[0001]This application is a continuation of U.S. patent Ser. No.
10/924,006 filed Aug. 23, 2004, which is hereby incorporated by reference
in its entirety.

[0003]This section is intended to provide a background or context to the
invention that is recited in the claims. The description herein may
include concepts that could be pursued, but are not necessarily ones that
have been previously conceived or pursued. Therefore, unless otherwise
indicated herein, what is described in this section is not prior art to
the claims in this application and is not admitted to be prior art by
inclusion in this section.

[0004]Generally, in an audio encoding system, an incoming time domain
audio signal is compressed such that the bitrate needed to represent the
signal is significantly reduced. Ideally, the bitrate of the encoded
signal fits to the constraints of the transmission channel or minimizes
the size of the encoded file. Techniques for fitting bitrate to channel
constraints are used in real-time communication and streaming services.
Techniques for minimizing file size are used when storing audio content
locally or via downloading at high audio quality.

[0005]Audio encoders aim to minimize perceptual distortion at a given
bitrate while minimizing the encoded file size. Nevertheless, the lower
the bitrate, the more challenging it is for the encoder to achieve these
goals. In both cases, advanced encoding models and techniques are applied
to maximize the end user experience. Typically, it is the encoding
performance with the worst-case signals (signals that are difficult to
encode) that ultimately defines the overall performance of any encoding
system. Another important factor in defining overall performance of an
encoding system is the encoding speed and the resources needed for a
given bitrate or audio quality level that can be achieved. For commercial
use and especially for mobile use, encoding speed and memory requirements
play a significant role.

[0006]In an attempt to achieve even lower bitrates without reducing the
perceptual distortion, new audio coding methods are being explored. Some
conventional audio coding methods involve efficient coding of noise and
noise-like signal segments. In such techniques, perceptual audio encoders
encode the input signal in frequency domain, as human auditory properties
can be best described in frequency domain. Spectral samples are typically
quantized on a frequency band basis. The quantizer shapes the
quantization noise by either increasing or decreasing the corresponding
quantizer step size until the noise is just below the auditory masking
threshold. On one hand, the introduced perceptual distortion is inaudible
to the human ear but, on the other hand, this limits the lowest possible
bitrate. It is well known that coding of high frequencies uses
significant numbers of bits, but from perceptual point of view, it is the
low frequencies that are more important.

[0007]Where a certain frequency band contains only white noise, the
spectral samples within the band are still coded (with high bitrate) even
though from an auditory point of view an exact representation of the
spectral samples is not needed. It would be much more efficient to code
the frequency band with a coding scheme optimized for noise or noise-like
signal segments leaving more bits to the other frequency bands or,
alternatively, lowering the lowest possible bitrate boundary.

[0008]One example of an audio coding system is the advanced audio coding
(AAC) system. The AAC is a lossy data compression scheme intended for
audio streams. AAC was designed to replace MP3 and is an extension of the
MPEG-2 international standard, ISO/IEC 13818-3. It was further improved
in MPEG-4, MPEG-4 Version 2 and MPEG-4 Version 3, ISO/IEC 14496-3.

[0009]AAC includes signaling methods for compact representation of noise
and noise-like signal segments. However, AAC does not have a way to
detect such signal segments. It is up to the implementer of the AAC
encoder to decide how noise or noise-like signal segments should be
detected or whether to detect such segments at all. Uncontrolled and
false noise detection can actually result in severe quality degradation
instead of quality improvement.

[0010]Attempts have been made to estimate and detect noise for perceptual
audio coders, such as AAC coders. For example, a method using a predictor
in the frequency domain on a frequency band basis is presented in:
"Estimation of perceptual entropy using noise masking criteria,"
Johnston, J. D.; Acoustics, Speech, and Signal Processing, 1988.
ICASSP-88., 1988 International Conference on, 11-14 Apr. 1988; Pages:
2524-2527 vol. 5. Johnston describes calculating a tonality measure from
the power spectrum, which is then used as a threshold to differentiate
noise-like and tone-like signal segments. A method to use a predictor in
time domain and noise detection in frequency domain is described in
"Improving audio codecs by noise substitution, Schulz Donald; Journal of
the Audio Engineering Society," Vol. 44, No. 7/8, July/August 1996;
Pages: 593-598. In this method, a predicted version of the input signal
is first determined and noise detection is then made in frequency domain
by comparing the original and predicted signals on a frequency band
basis.

[0011]There is a need for noise detection techniques to be applied in
various types of audio coding schemes. Further, there is a need for
efficient estimation methods for detecting noise and noise-like signal
segments. Even further, there is a need to reduce the bitrate of AAC
encoded streams, which reduces the demand for bandwidth.

SUMMARY OF THE INVENTION

[0012]Briefly, the present invention relates to techniques for detection
of noise and noise-like segments in audio coding. While AAC coding is
used as an example, the present invention is applicable in other types of
coding, which utilize specific coding methods for noise and noise-like
segments or need a reliable method to detect these segments for a reason
or another.

[0013]One exemplary embodiment relates to a method of estimating and
detecting noise and noise-like spectral signal segments. The method
includes performing a prediction gain calculation, an energy compaction
calculation, and a mean and variation energy calculation. Signal adaptive
noise decisions are made both in time and frequency dimensions. The
method can be embodied as part of an AAC encoder to detect noise and
noise-like spectral bands. This detected information is transmitted in a
bitstream using a signaling method defined for a perceptual noise
substitution (PNS) encoding tool of the AAC encoder.

[0014]Another exemplary embodiment relates to a system for estimating and
detecting noise and noise-like spectral signal segments. The system
includes an electronic device having a processor and an encoder that
determines noise or noise-like characteristics in frequency bands of the
received communication signals using defined boundaries for a ratio of
mean and variance energies in each frequency band. The system may also
include a communication interface, which sends and receives communication
signals.

[0015]Another exemplary embodiment relates to a device configured for
estimating and detecting noise and noise-like spectral signal segments.
The device includes a memory configured to contain programmed
instructions and communication signals and an encoder that determines
noise or noise-like characteristics in frequency bands of the
communication signals using defined boundaries for a ratio of mean and
variance energies in each frequency band. The device may also be
configured for communication in a network.

[0016]Another exemplary embodiment relates to a computer program product
that estimates and detects noise and noise-like spectral signal segments.
The computer program product includes computer code to calculate mean and
variance energies for each frequency band of a signal, computer code to
define boundaries for a ratio of the mean and variance energies in each
frequency band of the signal, and computer code to determine if each
frequency band of the signal is noise or noise-like using the defined
boundaries.

BRIEF DESCRIPTION OF DRAWINGS

[0017]FIG. 1 is a flow diagram depicting operations performed in the
estimation and detection of noise and noise-like spectral signal segments
in audio coding in accordance with an exemplary embodiment.

[0018]FIG. 2 is a diagram depicting an exemplary communication system
including the techniques discussed with reference to FIG. 1.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0019]FIG. 1 illustrates a flow diagram 10 depicting operations performed
in the estimation and detection of noise and noise-like spectral signal
segments in audio coding. Additional, fewer, or different operations may
be performed depending on the embodiment. In an operation 12, a gain
prediction for the spectral samples corresponding to each frequency band
is calculated. In this calculation, the variable x represents a frequency
domain signal of length N: x=F(xt) where xt is the time domain
input signal and F( ) denotes time-to-frequency transformation. The
variable sfbOffset of length M represents the boundaries of the frequency
bands, which follow also the boundaries of the critical bands of human
auditory system.

[0020]A gain prediction is calculated for each frequency band. In an
exemplary embodiment, the prediction gain is determined by applying
linear predictive coding (LPC) principles to spectral samples within each
frequency band and accumulating the resulted gain across the frequency
bands to obtain an average prediction gain aGain for the current frame
as:

< ##EQU00001##

where fGaini is the prediction gain of the ith frequency band
and gThr is the global threshold for the prediction gain. This threshold
prevents the average prediction gain from being too high in case some of
the spectral bands have significant prediction gain. In an example
implementation, the value of gThr is set to 1.45.

[0021]The prediction gain for the ith frequency band can be obtained
by solving the normal equations:

≦≦ ##EQU00002##

where P defines the order of the filter coefficients ak and R is the
autocorrelation sequence of the spectral samples calculated by:

##EQU00003##

where sfbLen=sfbOffset(i+1)-sfbOffset(i) is the length of the ith
frequency band.

[0022]The predictor order P can be determined based on the length of the
frequency band:

P=min(10,sfbLen/4) (5)

One solution of the normal equations is performed by the Levinson-Durbin
recursion. The following operations can be performed for m=1, . . . , P,
where ak.sup.(m) denotes the kth coefficient of an mth
order predictor by:

≦≦ ##EQU00004##

where Eoi=Ri(0).

[0023]The prediction gain can be obtained by:

##EQU00005##

Next, mean and variance energies can be calculated for each frequency band
by:

##EQU00006##

The mean and variance energies are used to define the boundaries for the
ratio of the mean and variance energy and how much that ratio is allowed
to vary in each frequency band. This range can be used to differentiate
whether the frequency band is noise-like or tonal-like. The allowed range
can be obtained by:

≧ > ##EQU00007##

where vThr defines the threshold for the mean energy range calculation. In
the an example implementation, this value is set to 3.3, but also other
values may be applied.

[0024]A stage of decisions can be made for each frequency band to see
whether the band is noise/noise-like or tonal/tonal-like as follows

< ##EQU00008##

where pGaini is the adjusted prediction gain of previous frame for
the ith frequency band and wi1 is the frequency band
dependent weighting factor, which is updated according to:

wi1= {square root over (wi-11)} (11)

where w-11=0.7 in an example implementation. Also,

< ##EQU00009##

where eCompi defines the energy compression ratio of the ith
frequency band, wi2 is frequency band dependent weighting
factor, and cThr is global threshold value for the energy compression
ratio. In the current implementation the value of cThr is set to
10-0.1. The energy compression ratio can be calculated according to:

π ≦≦ ##EQU00010##

The frequency dependent weighting factor wi can be updated according to:

wi2= {square root over (wi-12)} (14)

where w-12=0.7 in an example implementation. The noise decision
stage is:

> < ##EQU00011##

If the ith frequency band was assigned to be noise or noise-like,
i.e., isNoisei3=1, then what is transmitted to the receiver is
the energy level of the band. The same signaling method used in an AAC
codec can be used here. The prediction gain related to the time dimension
of each frequency band is finally updated as:

##EQU00012##

Equation (13) may be realized with fast algorithms that use transform
length of 2n. In case the length of the frequency band does not fit
into these conditions, that is, the length is smaller than the length of
the transform, zero padding can be used. Also, it is known that human
auditory system is more sensitive at low frequencies than at high
frequencies. Therefore, for optimal performance, it is advantageous to
limit the lowest possible noise frequency band to some threshold
frequency, such as 5 kHz, but also other values are applicable.

[0025]In an implementation using an AAC encoder, the following parameters
can be used. The time-to-frequency transformation F( ) is 128- or
1024-point MDCT, the sfbOffset table depends on the sampling rate and are
listed in the AAC specifications but, for example, at 44 kHz the table
for 128- and 1024-point MDCTs are as:

[0034]It is also possible to define the start of noise detection band to
be below 5 kHz. In this case it is advantageous to make the noise
detection calculations separately; one set of calculations for the
frequency bands below 5 kHz and the other set of calculations for
frequency bands above 5 kHz. Also the thresholds related to prediction
gain and mean energy threshold calculations can be adjusted to better
cope with the sensitivity of human auditory system at low frequencies;
values 1.15 and 4.0, respectively, provide best performance for the
frequencies below 5 kHz.

[0035]The techniques described require no buffering of previous frame
samples, which is one of the main drawbacks of prior solutions. Buffering
typically extends to at least 2-3 past frames and with larger frame sizes
this requires a lot of static RAM storage during encoding. The noise
estimation is done using signal adaptive threshold values and no hard
threshold levels are used which is typically used in prediction based
noise estimation solutions. Furthermore, the complexity of the method
plays no significant role in the whole encoder implementation as only few
calculations are done for each frame and additional calculations are done
only to those frequency bands which have high probability to be noise or
noise-like. For example, the number of noise or noise-like frequency
bands with respect to total number of frequency bands present can be less
than half or more.

[0036]Simulations using the described techniques have shown that reliable
noise detection can be achieved without introducing any perceptual
distortions to the coded signals. The bitrate limit for the lowest
possible bitrate depends on the signal content but, with typical signals,
bitrate reduction between 5-15% can be expected when compared to an
encoding where noise detection and substitution is not applied.

[0037]FIG. 2 illustrates a system 50 including the noise detection feature
described herein. The exemplary embodiments described herein can be
applied to any system capable coding of signals. An exemplary system 50
includes a terminal equipment (TE) device 52, an access point (AP) 54, a
server 56, and a network 58. The TE device 52 can include memory (MEM), a
central processing unit (CPU), a user interface (UI), and an input-output
interface (I/O). The memory can include non-volatile memory for storing
applications that control the CPU and random access memory for data
processing. The I/O interface may include a network interface card of a
wireless local area network, such as one of the cards based on the IEEE
802.11 standards.

[0038]The TE device 52 may be connected to the network 58 (e.g., a local
area network (LAN), the Internet, a phone network) via the access point
54 and further to the server 56. The TE device 52 may also communicate
directly with the server 56, for instance using a cable, infrared, or a
data transmission at radio frequencies. The server 56 may provide various
processing functions for the TE device 52.

[0039]The TE device 52 can be any electronic device, for example a
personal digital assistant (PDA) device, remote controller or a
combination of an earpiece and a microphone. The TE device 52 can be a
supplementary device used by a computer or a mobile station, in which
case the data transmission to the server 56 can be arranged via a
computer or a mobile station. The TE device 52 can be a personal computer
(PC) or other computing device in which, for example, music is encoded
and sent over an air channel to a mobile device or over the Internet to
another PC. In an exemplary embodiment, the TE device 52 is a mobile
station communicating with a public land mobile network, to which also
the server 56 is functionally connected. The TE device 52 connected to
the network 58 includes mobile station functionality for communicating
with the network 58 wirelessly. The network 18 can be any known wireless
or wired network, for instance a network supporting the GSM service, a
network supporting the GPRS (General Packet Radio Service), or a third
generation mobile network, such the UMTS (Universal Mobile
Telecommunications System) network according to the 3GPP (3rd
Generation Partnership Project) standard. The functionality of the server
56 can also be implemented in the mobile network. The TE device 56 can be
a mobile phone used for speaking only, or it can also contain PDA
(Personal Digital Assistant) functionality.

[0040]While several embodiments of the invention have been described, it
is to be understood that modifications and changes will occur to those
skilled in the art to which the invention pertains. The invention is not
limited to a particular embodiment, but extends to various modifications,
combinations, and permutations that nevertheless fall within the scope
and spirit of the appended claims.