Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A signal classifying method and apparatus are disclosed. The signal
classifying method includes: obtaining a spectrum fluctuation parameter
of a current signal frame determined as a foreground frame, and buffering
the spectrum fluctuation parameter; obtaining a spectrum fluctuation
variance of the current signal frame according to spectrum fluctuation
parameters of all buffered signal frames, and buffering the spectrum
fluctuation variance; and calculating a ratio of signal frames whose
spectrum fluctuation variance is above or equal to a first threshold to
all the buffered signal frames, and determining the current signal frame
as a speech frame if the ratio is above or equal to a second threshold or
determining the current signal frame as a music frame if the ratio is
below the second threshold. In the embodiments of the present invention,
the spectrum fluctuation variance of the signal is used as a parameter
for classifying the signals, and a local statistical method is applied to
decide the type of the signal. Therefore, the signals are classified with
few parameters, simple logical relations and low complexity.

Claims:

1. A signal classifying method, comprising: obtaining a spectrum
fluctuation parameter of a current signal frame; buffering the spectrum
fluctuation parameter of the current signal frame in a first buffer array
if the current signal frame is a foreground frame; if the current signal
frame falls within a first number of initial signal frames, setting a
spectrum fluctuation variance of the current signal frame to a specific
value and buffering the spectrum fluctuation variance of the current
signal frame in a second buffer array; otherwise, obtaining the spectrum
fluctuation variance of the current signal frame according to spectrum
fluctuation parameters of all signal frames buffered in the first buffer
array and buffering the spectrum fluctuation variance of the current
signal frame in the second buffer array; and calculating a ratio of
signal frames whose spectrum fluctuation variance is above or equal to a
first threshold to all signal frames buffered in the second buffer array,
and determining the current signal frame as a speech frame if the ratio
is above or equal to a second threshold or determining the current signal
frame as a music frame if the ratio is below the second threshold.

2. The signal classifying method according to claim 1, wherein the first
threshold is a first adaptive threshold, and the first adaptive threshold
is obtained according to a Modified Segmental Signal Noise Ratio (MSSNR)
or a Signal-to-Noise Ratio (SNR).

3. The signal classifying method according to claim 2, wherein obtaining
the first adaptive threshold according to the MSSNR comprises: updating a
maximal value of the MSSNR according to the current signal frame;
determining a threshold of the MSSNR according to the updated maximal
value of the MSSNR; obtaining the number of frames whose MSSNR is above
the MSSNR threshold and number of frames whose MSSNR is below or equal to
the MSSNR threshold among a certain number of frames inclusive of the
current signal frame; calculating a difference measure between the number
of frames whose MSSNR is above the MSSNR threshold and the number of
frames whose MSSNR is below or equal to the MSSNR threshold; and
obtaining the first adaptive threshold according to the difference
measure.

4. The signal classifying method according to claim 2, wherein obtaining
the first adaptive threshold according to the SNR comprises: updating a
maximal value of the SNR according to the current signal frame;
determining a threshold of the SNR according to the updated maximal value
of the SNR; obtaining the number of frames whose SNR is above the SNR
threshold and number of frames whose SNR is below or equal to the SNR
threshold among a certain number of frames inclusive of the current
signal frame; calculating a difference measure between the number of
frames whose SNR is above the SNR threshold and the number of frames
whose SNR is below or equal to the SNR threshold; and obtaining the first
adaptive threshold according to the difference measure.

5. The signal classifying method according to claim 1, wherein the method
further comprises using other parameters in addition to the spectrum
fluctuation variance as a basis for assisting in classifying the signals,
which comprises: making an auxiliary decision according to a first
peakiness measure and/or a second peakiness measure.

6. The signal classifying method according to claim 1, wherein after
obtaining a decision result which indicates that the current signal frame
is a speech frame or a music frame, the method further comprises:
applying a hangover of a frame to the decision result to obtain a final
decision result.

7. The signal classifying method according to claim 1, wherein the method
of determining the current signal frame as a foreground frame comprises:
using the MSSNR or the SNR as a basis of the decision; and determining
the current signal frame as a foreground frame if the MSSNR is above or
equal to a third threshold or the SNR is above or equal to a fourth
threshold.

8. The signal classifying method according to claim 1, wherein before
obtaining the ratio of signal frames whose spectrum fluctuation variance
is above or equal to the first threshold to all the signal frames
buffered in the second buffer array, the method further comprises:
performing windowed smoothing for several initial spectrum fluctuation
variance values buffered in the second buffer array.

9. A signal classifying apparatus, comprising: a first obtaining module,
configured to obtain a spectrum fluctuation parameter of a current signal
frame; a foreground frame determining module, configured to determine the
current signal frame as a foreground frame and buffer the spectrum
fluctuation parameter of the current signal frame determined as the
foreground frame into a first buffering module; the first buffering
module, configured to buffer the spectrum fluctuation parameter of the
current signal frame determined by the foreground frame determining
module; a setting module, configured to set a spectrum fluctuation
variance of the current signal frame to a specific value and buffer the
spectrum fluctuation variance in a second buffering module if the current
signal frame falls within a first number of initial signal frames; a
second obtaining module, configured to obtain the spectrum fluctuation
variance of the current signal frame according to spectrum fluctuation
parameters of all signal frames buffered in the first buffering module
and buffer the spectrum fluctuation variance of the current signal frame
in the second buffering module if the current signal frame falls outside
the first number of initial signal frames; the second buffering module,
configured to buffer the spectrum fluctuation variance of the current
signal frame set by the setting module or obtained by the second
obtaining module; and a first deciding module, configured to: calculate a
ratio of signal frames whose spectrum fluctuation variance is above or
equal to a first threshold to all signal frames buffered in the second
buffering module, and determine the current signal frame as a speech
frame if the ratio is above or equal to a second threshold or determine
the current signal frame as a music frame if the ratio is below the
second threshold.

10. The signal classifying apparatus according to claim 9, wherein the
first deciding module comprises: a first threshold determining unit,
configured to determine the first threshold; a ratio obtaining unit,
configured to obtain the ratio of the signal frames whose spectrum
fluctuation variance is above or equal to the first threshold determined
by the first threshold determining unit to all the signal frames buffered
in the second buffering module; a second threshold determining unit,
configured to determine the second threshold; and a judging unit,
configured to: compare the ratio obtained by the ratio obtaining unit
with the second threshold determined by the second threshold determining
unit; and determine the current signal frame as a speech frame if the
ratio is above or equal to the second threshold, or determine the current
signal frame as a music frame if the ratio is below the second threshold.

11. The signal classifying apparatus according to claim 9, further
comprising: a second deciding module, configured to assist the first
deciding module in classifying the signals according to other parameters.

12. The signal classifying apparatus according to claim 9, further
comprising: a decision correcting module, configured to obtain a final
decision result by applying a hangover of a frame to the decision result
obtained by the first deciding module or obtained by both the first
deciding module and the second deciding module, wherein the decision
result indicates whether the current signal frame is a speech frame or a
music frame;

13. The signal classifying apparatus according to claim 9, further
comprising: a windowing module, configured to: perform windowed smoothing
for several initial spectrum fluctuation variance values buffered in the
second buffering module before the first deciding module calculates the
ratio of the signal frames whose spectrum fluctuation variance is above
or equal to the first threshold to all the signal frames buffered in the
second buffering module.

Description:

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of U.S. patent application Ser.
No. 12/979,994, filed on Dec. 28, 2010, which is a continuation of
International Patent Application No. PCT/CN2010/076499, filed on Aug. 31,
2010, which claims priority to Chinese Patent Application No.
200910110798.4, filed on Oct. 15, 2009, all of which are hereby
incorporated by reference in their entireties.

FIELD OF THE INVENTION

[0002] The present invention relates to communication technologies, and in
particular, to a signal classifying method and apparatus.

BACKGROUND OF THE INVENTION

[0003] Speech coding technologies can compress speech signals to save
transmission bandwidth and increase the capacity of a communication
system. With the popularity of the Internet and the expansion of the
communication field, the speech coding technologies are a focus of
standardization in China and around the world. Speech coders are
developing toward multi-rate and wideband, and the input signals of
speech coders are diversified, including music and other signals. People
require higher and higher quality of conversation, especially the quality
of music signals. For different input signals, coders of different coding
rates and even different core coding algorithms are applied to ensure the
coding quality of different types of signals and save bandwidth to the
utmost extent, which has become a megatrend of speech coders. Therefore,
identifying the type of input signals accurately becomes a hot topic of
research in the communication industry.

[0004] A decision tree is a method widely used for classifying signals. A
long-term decision tree and a short-term decision tree are used together
to decide the type of signals. First, a First-In First-Out (FIFO) memory
of a specific time length is set for buffering short-term signal
characteristic variables. The long-term signal characteristics are
calculated according to the short-term signal characteristic variables of
the same time length as the previous one, where the same time length as
the previous one includes the current frame; and the speech signals and
music signals are classified according to the calculated long-term signal
characteristics. In the same time length before the signals begin,
namely, before the FIFO memory is full, a decision is made according to
the short-term signal characteristics. In both the short-term decision
and the long-term decision, the decision trees shown in FIG. 1 and FIG. 2
are applied.

[0005] In the process of developing the present invention, the inventor
finds that the signal classifying method based on a decision tree is
complex, involving too much calculation of parameters and logical
branches.

SUMMARY OF THE INVENTION

[0006] The embodiments of the present invention provide a signal
classifying method and apparatus so that signals are classified with few
parameters, simple logical relations and low complexity.

[0007] A signal classifying method provided in an embodiment of the
present invention includes: obtaining a spectrum fluctuation parameter of
a current signal frame; buffering the spectrum fluctuation parameter of
the current signal frame in a first buffer array if the current signal
frame is a foreground frame; if the current signal frame falls within a
first number of initial signal frames, setting a spectrum fluctuation
variance of the current signal frame to a specific value and buffering
the spectrum fluctuation variance of the current signal frame in a second
buffer array; otherwise, obtaining the spectrum fluctuation variance of
the current signal frame according to spectrum fluctuation parameters of
all signal frames buffered in the first buffer array and buffering the
spectrum fluctuation variance of the current signal frame in the second
buffer array; and calculating a ratio of signal frames whose spectrum
fluctuation variance is above or equal to a first threshold to all signal
frames buffered in the second buffer array, and determining the current
signal frame as a speech frame if the ratio is above or equal to a second
threshold or determining the current signal frame as a music frame if the
ratio is below the second threshold.

[0008] Another signal classifying method provided in an embodiment of the
present invention includes: obtaining a spectrum fluctuation parameter of
a current signal frame determined as a foreground frame, and buffering
the spectrum fluctuation parameter; obtaining a spectrum fluctuation
variance of the current signal frame according to spectrum fluctuation
parameters of all buffered signal frames, and buffering the spectrum
fluctuation variance; and calculating a ratio of signal frames whose
spectrum fluctuation variance is above or equal to a first threshold to
all the buffered signal frames, and determining the current signal frame
as a speech frame if the ratio is above or equal to a second threshold or
determining the current signal frame as a music frame if the ratio is
below the second threshold.

[0009] A signal classifying apparatus provided in an embodiment of the
present invention includes: a first obtaining module, configured to
obtain a spectrum fluctuation parameter of a current signal frame; a
foreground frame determining module, configured to determine the current
signal frame as a foreground frame and buffer the spectrum fluctuation
parameter of the current signal frame determined as the foreground frame
into a first buffering module; the first buffering module, configured to
buffer the spectrum fluctuation parameter of the current signal frame
determined by the foreground frame determining module; a setting module,
configured to set a spectrum fluctuation variance of the current signal
frame to a specific value and buffer the spectrum fluctuation variance in
a second buffering module if the current signal frame falls within a
first number of initial signal frames; a second obtaining module,
configured to obtain the spectrum fluctuation variance of the current
signal frame according to spectrum fluctuation parameters of all signal
frames buffered in the first buffering module and buffer the spectrum
fluctuation variance of the current signal frame in the second buffering
module if the current signal frame falls outside the first number of
initial signal frames; the second buffering module, configured to buffer
the spectrum fluctuation variance of the current signal frame set by the
setting module or obtained by the second obtaining module; and a first
deciding module, configured to: calculate a ratio of signal frames whose
spectrum fluctuation variance is above or equal to a first threshold to
all signal frames buffered in the second buffering module, and determine
the current signal frame as a speech frame if the ratio is above or equal
to a second threshold or determine the current signal frame as a music
frame if the ratio is below the second threshold.

[0010] Another signal classifying apparatus provided in an embodiment of
the present invention includes: a third obtaining module, configured to
obtain a spectrum fluctuation parameter of a current signal frame
determined as a foreground frame, and buffer the spectrum fluctuation
parameter; a fourth obtaining module, configured to obtain a spectrum
fluctuation variance of the current signal frame according to the
spectrum fluctuation parameters of all signal frames buffered in the
third obtaining module, and buffer the spectrum fluctuation variance; and
a third deciding module, configured to: calculate a ratio of signal
frames whose spectrum fluctuation variance is above or equal to a first
threshold to all signal frames buffered in the fourth obtaining module,
and determine the current signal frame as a speech frame if the ratio is
above or equal to a second threshold or determine the current signal
frame as a music frame if the ratio is below the second threshold.

[0011] In the technical solution under the present invention, the spectrum
fluctuation parameter of the current signal frame is obtained; if the
current signal frame is a foreground frame, the spectrum fluctuation
parameter of the current signal frame is buffered in the first buffer
array; if the current signal frame falls within a first number of initial
signal frames, the spectrum fluctuation variance of the current signal
frame is set to a specific value, and is buffered in the second buffer
array; if the current signal frame falls outside the first number of
initial signal frames, the spectrum fluctuation variance of the current
signal frame is obtained according to the spectrum fluctuation parameters
of all buffered signal frames, and is buffered in the second buffer
array. The signal spectrum fluctuation variance serves as a parameter for
classifying signals, and the local statistical method is applied to
decide the signal type. Therefore, the signals are classified with few
parameters, simple logical relations and low complexity.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] To describe the technical solution under the present invention more
clearly, the following outlines the accompanying drawings involved in the
embodiments of the present invention. Apparently, the accompanying
drawings outlined below are not exhaustive, and persons of ordinary skill
in the art can derive other drawings from such accompanying drawings
without any creative effort.

[0013] FIG. 1 shows how to classify signals through a short-term decision
tree in the prior art;

[0014] FIG. 2 shows how to classify signals through a long-term decision
tree in the prior art;

[0015] FIG. 3 is a flowchart of a signal classifying method according to
an embodiment of the present invention;

[0016] FIG. 4 is a flowchart of a signal classifying method according to
another embodiment of the present invention;

[0017] FIG. 5 is a flowchart of a signal classifying method according to
another embodiment of the present invention;

[0018]FIG. 6 is a flowchart of obtaining a first adaptive threshold
according to an MSSNRn in an embodiment of the present invention;

[0019] FIG. 7 is a flowchart of obtaining a first adaptive threshold
according to an SNR in an embodiment of the present invention;

[0020] FIG. 8 shows a structure of a signal classifying apparatus
according to an embodiment of the present invention;

[0021] FIG. 9 shows a structure of a signal classifying apparatus
according to another embodiment of the present invention; and

[0022]FIG. 10 shows a structure of a signal classifying apparatus
according to another embodiment of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0023] The following detailed description is given with reference to the
accompanying drawings to provide a thorough understanding of the present
invention. Evidently, the drawings and the detailed description are
merely representative of particular embodiments of the present invention,
and the embodiments are illustrative in nature and not exhaustive. All
other embodiments, which can be derived by those skilled in the art from
the embodiments given herein without any creative effort, shall fall
within the scope of the present invention.

[0024] FIG. 3 is a flowchart of a signal classifying method in an
embodiment of the present invention. As shown in FIG. 3, the method
includes the following steps:

[0025] S101. Obtain a spectrum fluctuation parameter of a current signal
frame.

[0026] In this embodiment, an input signal is framed to generate a certain
number of signal frames. If the type of a signal frame currently being
processed needs to be identified, this signal frame is called a current
signal frame. Framing is a universal concept in the digital signal
processing, and refers to dividing a long segment of signals into several
short segments of signals.

[0027] The current signal frame undergoes time-frequency transform to form
a signal spectrum, and the spectrum fluctuation parameter (flux) of the
current signal frame is calculated according to the spectrum of the
current signal frame and several previous signal frames.

[0028] S102. Buffer the spectrum fluctuation parameter of the current
signal frame in a first buffer array if the current signal frame is a
foreground frame.

[0029] In this embodiment, the types of a signal frame include foreground
frame and background frame. A foreground frame generally refers to the
signal frame with high energy in the communication process, for example,
the signal frame of a conversation between two or more parties or signal
frame of music played in the communication process such as a ring back
tone. A background frame generally refers to the noise background of the
conversation or music in the communication process. The signal
classifying in this embodiment refers to identifying the type of the
signal in the foreground frame. Before the signal classifying, it is
necessary to determine whether the current signal frame is a foreground
frame.

[0030] If the current signal frame is a foreground frame, the spectrum
fluctuation parameter (flux) of the current signal frame needs to be
buffered. In this embodiment, a spectrum fluctuation parameter buffer
array (flux_buf) may be set, and this array is referred to as a first
buffer array below. The flux_buf array is updated when the signal frame
is a foreground frame, and the first buffer array can buffer a first
number of signal frames.

[0031] In this embodiment, the step of obtaining the spectrum fluctuation
parameter of the current signal frame and the step of determining the
current signal frame as a foreground frame are not order-sensitive. Any
variations of the embodiments of the present invention without departing
from the essence of the present invention shall fall within the scope of
the present invention.

[0032] S103. If the current signal frame falls within a first number of
initial signal frames, set a spectrum fluctuation variance of the current
signal frame to a specific value and buffer the spectrum fluctuation
variance of the current signal frame in a second buffer array; otherwise,
obtain the spectrum fluctuation variance of the current signal frame
according to spectrum fluctuation parameters of all buffered signal
frames and buffer the spectrum fluctuation variance of the current signal
frame in the second buffer array.

[0033] In this embodiment, a spectrum fluctuation variance var_fluxn
may be obtained according to whether the first buffer array is full,
where var_fluxn is a spectrum fluctuation variance of frame n.

[0034] Supposing that the first number is m1, if the current signal
frame falls between frame 1 and frame m1, the spectrum fluctuation
variance of the current signal frame is set to a specific value; if the
current signal frame does not fall between frame 1 and frame m1, but
falls within the signal frames that begin with frame m1+1, the
spectrum fluctuation variance of the current signal frame can be obtained
according to the flux of the m1 signal frames buffered.

[0035] After the spectrum fluctuation variance of the current signal frame
is obtained, the spectrum fluctuation variance needs to be buffered. In
this embodiment, a spectrum fluctuation variance buffer array
(var_flux_buf) may be set, and this array is referred to as a second
buffer array below. The var_flux_buf is updated when the signal frame is
a foreground frame.

[0036] S104. Calculate a ratio of signal frames whose spectrum fluctuation
variance is above or equal to a first threshold to all signal frames
buffered in the second buffer array, and determine the current signal
frame as a speech frame if the ratio is above or equal to a second
threshold or determine the current signal frame as a music frame if the
ratio is below the second threshold.

[0037] In this embodiment, var_flux may be used as a parameter for
deciding whether the signal is speech or music. After the current signal
frame is determined as a foreground frame, a judgment may be made on the
basis of a ratio of the signal frames, whose var_flux is above or equal
to a threshold, to the signal frames buffered in the var_flux_buf array
(including the current signal frame), so as to determine whether the
current signal frame is a speech frame or a music frame, namely, a local
statistical method is applied. This threshold is referred to as a first
threshold below.

[0038] If the ratio of the signal frames whose var_flux is above or equal
to the first threshold to all signal frames buffered in the second buffer
array (including the current signal frame) is above a second threshold,
the current signal frame is a speech frame; if the ratio is below the
second threshold, the current signal frame is a music frame.

[0039] In this embodiment, the spectrum fluctuation parameter of the
current signal frame is obtained; if the current signal frame is a
foreground frame, the spectrum fluctuation parameter of the current
signal frame is buffered in the first buffer array; if the current signal
frame falls within a first number of initial signal frames, the spectrum
fluctuation variance of the current signal frame is set to a specific
value, and is buffered in the second buffer array; if the current signal
frame falls outside the first number of initial signal frames, the
spectrum fluctuation variance of the current signal frame is obtained
according to the spectrum fluctuation parameters of all buffered signal
frames, and is buffered in the second buffer array. The signal spectrum
fluctuation variance serves as a parameter for classifying signals, and
the local statistical method is applied to decide the signal type.
Therefore, the signals are classified with few parameters, simple logical
relations and low complexity.

[0040] FIG. 4 is a flowchart of a signal classifying method in another
embodiment of the present invention. As shown in FIG. 4, the method
includes the following steps:

[0041] S201. Obtain a spectrum fluctuation parameter of a current signal
frame determined as a foreground frame, and buffer the spectrum
fluctuation parameter.

[0042] In this embodiment, an input signal is framed to generate a certain
number of signal frames. If the type of a signal frame currently being
processed needs to be identified, this signal frame is called a current
signal frame. Framing is a universal concept in the digital signal
processing, and refers to dividing a long segment of signals into several
short segments of signals.

[0043] The types of a signal frame include foreground frame and background
frame. A foreground frame generally refers to the signal frame with high
energy in the communication process, for example, the signal frame of a
conversation between two or more parties or signal frame of music played
in the communication process such as a ring back tone. A background frame
generally refers to the noise background of the conversation or music in
the communication process.

[0044] The signal classifying in this embodiment refers to identifying the
type of the signal in the foreground frame. Before the signal
classifying, it is necessary to determine whether the current signal
frame is a foreground frame. Meanwhile, it is necessary to obtain the
spectrum fluctuation parameter of the current signal frame determined as
a foreground frame. The two operations above are not order-sensitive. Any
variations of the embodiments of the present invention without departing
from the essence of the present invention shall fall within the scope of
the present invention.

[0045] The method for obtaining the spectrum fluctuation parameter of the
current signal frame may be: performing time-frequency transform for the
current signal frame to form a signal spectrum, and calculating the
spectrum fluctuation parameter (flux) of the current signal frame
according to the spectrum of the current signal frame and several
previous signal frames.

[0046] After the spectrum fluctuation parameter of the current signal
frame determined as a foreground frame is obtained, the spectrum
fluctuation parameter needs to be buffered. In this embodiment, a
spectrum fluctuation parameter buffer array (flux_buf) may be set. The
flux_buf array is updated when the signal frame is a foreground frame.

[0047] S202. Obtain a spectrum fluctuation variance of the current signal
frame according to spectrum fluctuation parameters of all buffered signal
frames, and buffer the spectrum fluctuation variance.

[0048] In this embodiment, the spectrum fluctuation variance of the
current signal frame can be obtained according to spectrum fluctuation
parameters of all buffered signal frames no matter whether the first
array is full.

[0049] After the spectrum fluctuation variance of the current signal frame
is obtained, the spectrum fluctuation variance needs to be buffered. In
this embodiment, a spectrum fluctuation variance buffer array
(var_flux_buf) may be set. The var_flux_buf array is updated when the
signal frame is a foreground frame.

[0050] S203. Calculate a ratio of the signal frames whose spectrum
fluctuation variance is above or equal to a first threshold to all the
buffered signal frames, and determine the current signal frame as a
speech frame if the ratio is above or equal to a second threshold or
determine the current signal frame as a music frame if the ratio is below
the second threshold.

[0051] In this embodiment, var_flux may be used as a parameter for
deciding whether the signal is speech or music. After the current signal
frame is determined as a foreground frame, a judgment may be made on the
basis of a ratio of the signal frames whose var_flux is above or equal to
a threshold to the signal frames buffered in the var_flux_buf array
(including the current signal frame), so as to determine whether the
current signal frame is a speech frame or a music frame, namely, a local
statistical method is applied. This threshold is referred to as a first
threshold below.

[0052] If the ratio of the signal frames whose var_flux is above or equal
to the first threshold to all buffered signal frames (including the
current signal frame) is above a second threshold, the current signal
frame is a speech frame; if the ratio is below the second threshold, the
current signal frame is a music frame.

[0053] In the technical solution provided in this embodiment, the spectrum
fluctuation parameter of the current signal frame determined as a
foreground frame is obtained and buffered; the spectrum fluctuation
variance is obtained according to the spectrum fluctuation parameters of
all buffered signal frames and is buffered; the ratio of the signal
frames whose spectrum fluctuation variance is above or equal to the first
threshold to all buffered signal frames is calculated; if the ratio is
above or equal to the second threshold, the current signal frame is a
speech frame; if the ratio is below the second threshold, the current
signal frame is a music frame. The signal spectrum fluctuation variance
serves as a parameter for classifying signals, and the local statistical
method is applied to decide the signal type. Therefore, the signals are
classified with few parameters, simple logical relations and low
complexity.

[0054] FIG. 5 is a flowchart of a signal classifying method in another
embodiment of the present invention. As shown in FIG. 5, the method
includes the following steps:

[0055] S301. Obtain a spectrum fluctuation parameter of a current signal
frame.

[0056] In this embodiment, an input signal is framed to generate a certain
number of signal frames. If the type of a signal frame currently being
processed needs to be identified, this signal frame is called a current
signal frame. Framing is a universal concept in the digital signal
processing, and refers to dividing a long segment of signals into several
short segments of signals. The framing is performed in multiple ways, and
the length of the obtained signal frame may be different, for example,
5-50 ms. In some implementation, the frame length may be 10 ms.

[0057] Under a set sampling rate, each signal frame undergoes
time-frequency transform to form a signal spectrum, namely, N1
time-frequency transform coefficients Spn(i). Spn(i)
represents an ith time-frequency transform coefficient of frame n.
The sampling rate and the time-frequency transform method may vary. In
some implementation, the sampling rate may be 8000 Hz, and the
time-frequency transform method is 128-point Fast Fourier Transform
(FFT).

[0058] The current signal frame undergoes time-frequency transform to form
a signal spectrum, and the spectrum fluctuation parameter (flux) of the
current signal frame is calculated according to the spectrum of the
current signal frame and several previous signal frames. The calculation
method is diversified. For example, within a frequency range, the
characteristics of the spectrum are analyzed. The number of previous
frames may be selected at discretion. For example, three previous frames
are selected, and the calculation method is:

[0059] In the formula above, fluxn represents the spectrum
fluctuation parameter of frame n; k1,k2 represents a frequency
range determined in a signal spectrum, where
1≦k1<k2≦N1, for example, k1=2,
k2=48; m represents the number of selected frames before the current
signal frame. In the foregoing formula, m is equal to 3.

[0060] S302. Buffer the spectrum fluctuation parameter of the current
signal frame in a first buffer array if the current signal frame is a
foreground frame.

[0061] In this embodiment, the types of a signal frame include foreground
frame and background frame. A foreground frame generally refers to the
signal frame with high energy in the communication process, for example,
the signal frame of a conversation between two or more parties or signal
frame of music played in the communication process such as a ring back
tone. A background frame generally refers to the noise background of the
conversation or music in the communication process. The signal
classifying in this embodiment refers to identifying the type of the
signal in the foreground frame. Before the signal classifying, it is
necessary to determine whether the current signal frame is a foreground
frame.

[0062] If the current signal frame is a foreground frame, the spectrum
fluctuation parameter (flux) of the current signal frame needs to be
buffered. In this embodiment, a spectrum fluctuation parameter buffer
array (flux_buf) may be set, and this array is referred to as a first
buffer array below. The buffer array comes in many types, for example, a
FIFO array. The flux_buf array is updated when the signal frame is a
foreground frame. This array can buffer the flux of m1 signal
frames. m1 is an integer above 0, for example, m1=20. For
clearer description, m1 is called the first number. That is, the
first buffer array can buffer the first number of signal frames.

[0063] The foreground frame may be determined in many ways, for example,
through a Modified Segmental Signal Noise Ratio (MSSNR) or a Signal to
Noise Ratio (SNR), as described below:

[0064] Method 1: Determining the Foreground Frame Through an MSSNR:

[0065] The MSSNRn of the current signal frame is obtained. If
MSSNRn≧alpha1, the current signal frame is a foreground frame;
otherwise, the current signal frame is a background frame. MSSNRn
represents the modified sub-band SNR of frame n; alpha1 is a set
threshold. For clearer description, alpha1 is called a third threshold.
alpha1 may be set to any value, for example, alpha1=50.

[0066] In this embodiment, MSSNRn may be obtained in many ways, as
exemplified below:

[0067] 1. Calculate the spectrum sub-band energy (Ei) of the current
signal frame.

[0068] The spectrum is divided into w sub-bands
(0≦w≦N1), and the energy of each sub-band is Ei,
where i=0, 1, 2, . . . , w-1:

E i = 1 M i k = 0 M i - 1 e I + k
##EQU00002##

[0069] In the formula above, Mi represents the number of frequency
points in sub-band i; I represents the index of the initial frequency
point of sub-band i; eI+k represents the energy of frequency point
I+k.

[0070] 2. Update the long-term moving average Ei of Ei in the
background frame.

[0071] Once the current signal frame is determined as a background frame,
Ei is updated through:

Ei=β Ei+(1-β)Ei i=0,1,2, . . . w-1

[0072] In the formula above, β is a decimal between 0 and 1 for
controlling the update speed.

[0075] The snrn of the current signal frame is obtained. If
snrn≧alpha2, the current signal frame is a foreground frame;
otherwise, the current signal frame is a background frame. snrn
represents the SNR of frame n; alpha2 is a set threshold. For clearer
description, alpha2 is called a fourth threshold. alpha2 may be set to
any value, for example, alpha2=15.

[0076] In this embodiment, snrn may be obtained in many ways, as
exemplified below:

[0077] 1. Calculate the spectrum energy (Ef) of the current signal frame.

Ef = 1 Mf k = 0 Mf - 1 e k ##EQU00004##

[0078] In the formula above, Mf represents the number of frequency
points in the current signal frame; and ek represents the energy of
frequency point k.

[0079] 2. Update the long-term moving average Ef of Ef in the background
frame.

[0080] Once the current signal frame is determined as a background frame,
Ef is updated through:

Ef=μ Efp+(1-μ)Ef

[0081] In the formula above, μ is a decimal between 0 and 1 for
controlling the update speed.

[0082] 3. Calculate snrn.

snr n = 10 log ( Ef Ef _ ) ##EQU00005##

[0083] In this embodiment, the step of obtaining the spectrum fluctuation
parameter of the current signal frame and the step of determining the
current signal frame as a foreground frame are not order-sensitive. Any
variations of the embodiments of the present invention without departing
from the essence of the present invention shall fall within the scope of
the present invention. In some implementation, the current signal frame
is determined as a foreground frame first, and then the spectrum
fluctuation parameter of the current signal frame is obtained and
buffered. In this case, the foregoing process is expressed as follows:

[0084] S301'. Determine the current signal frame as a foreground frame.

[0085] S302'. Obtain and buffer the spectrum fluctuation parameter of the
current signal frame.

[0086] In this case, unlike S301 which obtains the spectrum fluctuation
parameter of the current signal frame, S302' obtains the spectrum
fluctuation parameter of the current signal frame determined as a
foreground frame, and it is not necessary to obtain the spectrum
fluctuation parameter of the background frame. Therefore, the calculation
and the complexity are reduced.

[0087] Alternatively, the current signal frame is determined as a
foreground frame first, and then the spectrum fluctuation parameter of
every current signal frame is obtained, but only the spectrum fluctuation
parameter of the current signal frame determined as a foreground frame is
buffered.

[0088] S303. Obtain the spectrum fluctuation variance of the current
signal frame, and buffer it into the second buffer array.

[0089] In this embodiment, a spectrum fluctuation variance var_fluxn
may be obtained according to whether the first buffer array is full,
where var_fluxn is a spectrum fluctuation variance of frame n. If
the current signal frame falls within a first number of initial signal
frames, the spectrum fluctuation variance of the current signal frame is
set to a specific value, and the spectrum fluctuation variance of the
current signal frame is buffered in the second buffer array; otherwise,
the spectrum fluctuation variance of the current signal frame is obtained
according to spectrum fluctuation parameters of all buffered signal
frames, and the spectrum fluctuation variance of the current signal frame
is buffered in the second buffer array.

[0090] If the flux_buf array buffers the first m1 flux values, the
var_fluxn may be set to a specific value, namely, if the current
signal frame falls within the first number of initial signal frames, the
spectrum fluctuation variance of the current signal frame is set to a
specific value such as 0. That is, the spectrum fluctuation variance of
frame 1 to frame m1 determined as foreground frames is 0.

[0091] If the current signal frame does not fall within the first number
of initial signal frames, starting from frame m1+1, the spectrum
fluctuation variance var_fluxn of each signal frame determined as a
foreground frame after frame m1 can be calculated according to the
flux of the m1 signal frames buffered. In this case, the spectrum
fluctuation variance of the current signal frame may be calculated in
many ways, as exemplified below:

[0092] In the case of buffering the flux m1, the average value
mov_fluxn of the flux is initialized according to the m1 flux
values buffered:

mov_flux n = ( i = 1 m 1 flux i ) / m 1
##EQU00006##

[0093] After the initialization, starting from signal frame m1+1
which is determined as a foreground frame, the mov_flux can be updated
once for each foreground frame according to:

mov_fluxn=σ*mov_fluxn-1+(1-σ)fluxn

[0094] where σ is a decimal between 0 and 1 for controlling the
update speed.

[0095] Therefore, starting from signal frame m1+1 which is determined
as a foreground frame, the var_fluxn can be determined according to
the flux of the m1 buffered signal frames inclusive of the current
signal frame, namely,

var_flux n = k = 1 m 1 ( flux n - k - mov_flux n
) 2 , ##EQU00007##

where n is greater than m1.

[0096] In some implementation, the spectrum fluctuation variance of frame
1 to frame m1 determined as foreground frames may be determined in
other ways. For example, the spectrum fluctuation variance of the current
signal frame is obtained according to the spectrum fluctuation parameter
of all buffered signal frames, as detailed below:

[0097] If the flux_buf array buffers the first s flux values
(1≦s≦m1), the average values mov_fluxn and
var_fluxn of the flux values are calculated according to:

[0098] In this embodiment, the spectrum fluctuation variance of the
current signal frame is obtained according to spectrum fluctuation
parameters of all buffered signal frames no matter whether the first
buffer array is full.

[0099] After the spectrum fluctuation variance of the current signal frame
is obtained, the spectrum fluctuation variance needs to be buffered. In
this embodiment, a spectrum fluctuation variance buffer array
(var_flux_buf) may be set, and this array is referred to as a second
buffer array below. The buffer array comes in many types, for example, a
FIFO array. The var_flux_buf array is updated when the signal frame is a
foreground frame. This array can buffer the var_flux of m3 signal
frames. m3 is an integer above 0, for example, m3=120.

[0101] In some implementation, it is appropriate to perform windowed
smoothing for several initial var_flux values buffered in the
var_flux_buf array, for example, apply a ramping window to the var_flux
of the signal frames that range from frame m1+1 to frame
m1+m2 to prevent instability of a few initial values from
affecting the decision of the speech frames and music frames. m2 is
an integer above 0, for example, m2=20. The windowing is expressed as:

[0102] In some implementation, other types of windows such as a hamming
window are applied.

[0103] S305. Calculate a ratio of signal frames whose spectrum fluctuation
variance is above or equal to a first threshold to all signal frames
buffered in the second buffer array, and determine the current signal
frame as a speech frame if the ratio is above or equal to a second
threshold or determine the current signal frame as a music frame if the
ratio is below the second threshold.

[0104] In this embodiment, var_flux may be used as a parameter for
deciding whether the signal is speech or music. After the current signal
frame is determined as a foreground frame, a judgment may be made on the
basis of a ratio of the signal frames whose var_flux is above or equal to
a threshold to all signal frames buffered in the var_flux_buf array
(including the current signal frame), so as to determine whether the
current signal frame is a speech frame or a music frame, namely, a local
statistical method is applied. This threshold is referred to as a first
threshold below.

[0105] If the ratio of the signal frames whose var_flux is above or equal
to the first threshold to all buffered signal frames (including the
current signal frame) is above a second threshold, the current signal
frame is a speech frame; if the ratio is below the second threshold, the
current signal frame is a music frame. The second threshold may be a
decimal between 0 and 1, for example, 0.5.

[0106] In this embodiment, the local statistical method comes in the
following scenarios:

[0107] Before the var_flux_buf array is full, for example, when only the
var_fluxn values of m4 frames are buffered
(m4<m3), and the type of signal frame m4 serving as the
current signal frame needs to be determined, it is only necessary to
calculate a ratio R of the frames whose var_flux is above the first
threshold to all the m4 frames. If R is above or equal to the second
threshold, the current signal is a speech frame; otherwise, the current
signal is a music frame.

[0108] If the var_flux_buf array is full, the ratio R of signal frames
whose var_fluxn is above the first threshold to all the buffered
m3 frames (including the current signal frame) is calculated. If the
ratio is above or equal to the second threshold, the current signal frame
is a speech frame; otherwise, the current signal frame is a music frame.

[0109] In some implementation, if the initial m5 signal frames are
buffered, R is set to a value above or equal to the second threshold so
that the initial m5 signal frames are decided as speech frames.
m5 may be any non-negative integer, for example, m5=75. That
is, the ratio R of the signal frames whose spectrum fluctuation variance
is above or equal to the first threshold to the buffered initial m5
signal frames (including the current signal frame) is a preset value;
starting from signal frame m5+1 which is determined as a foreground
frame, the ratio R of the signal frames whose spectrum fluctuation
variance is above or equal to the first threshold to the buffered signal
frames (including the current signal frame) is calculated according to a
formula. In this way, the initial speech signals are prevented from being
decided as music signals mistakenly.

[0110] In this embodiment, the first threshold may be a preset fixed
value, or a first adaptive threshold Tvar--fluxn. The
fixed first threshold is any value between the maximal value and the
minimal value of var_flux. Tvar--fluxn may be
adjusted adaptively according to the background environment, for example,
according to change of the SNR of the signal. In this way, the signals
with noise can be well identified. Tvar--fluxn may be
obtained in many ways, for example, calculated according to MSSNRn
or snrn, as exemplified below:

[0111] Method 1: Determining Tvar--fluxn according to
MSSNRn, as shown in FIG. 6:

[0112] S401. Update the maximal value of the MSSNR according to the
current signal frame.

[0113] The maximal value of MSSNRn, expressed as maxMSSNR, is
determined for each frame. If the MSSNRn of the current signal frame
is above maxMSSNR, the maxMSSNR is updated to the MSSNRn
value of the current signal frame; otherwise, the maxMSSNR is
multiplied by a coefficient such as 0.9999 to generate the updated
maxMSSNR. That is, the maxMSSNR value is updated according to
the MSSNRn of each frame.

[0114] S402. Determine the MSSNR threshold according to the updated
maximal value of the MSSNR, namely, calculate the adaptive threshold
(TMSSNR) of MSSNRn according to the updated maxMSSNR:

TMSSNR=Cop*maxMSSNR

[0115] Cop is a decimal between 0 and 1, and is adjusted according to
the working point, for example, Cop=0.5. The working point is an external
input for controlling the tendency of deciding whether the signal is
speech or music.

[0116] S403. Among a certain number of frames including the current signal
frame, obtain the number of frames whose MSSNR is above the MSSNR
threshold and the number of frames whose MSSNR is below or equal to the
MSSNR threshold; calculate a difference measure between the two numbers,
and obtain the first adaptive threshold according to the difference
measure.

[0117] In this embodiment, Tvar--fluxn is calculated
according to the MSSNRn value of 1 signal frames which include the
current signal frame and 1-1 frames before the current signal frame,
where 1 is an integer above 0, for example, 1=512. The detailed method is
as follows:

[0118] (1) Among the 1 frames, the number of frames with
MSSNRn>TMSSNR is expressed as highbin; the number of
frames with MSSNRn≦TMSSNR is expressed as lowbin
namely, highbin+lowbin=l.

[0119] (2) The difference measure between highbin and lowbin is
expressed as diffhist:

diff hist = high bin - low bin l = 2 * high bin l - 1
##EQU00010##

[0120] Depending on the operating point, a corresponding offset factor
Vop needs to be added to diffhist to generate the difference
measure after offset, namely,

diffhistavg=ρ*diffhistavg+(1-ρ)*diffhis-
tbias

[0121] (3) The moving average value diffhistavg designed to
calculate diffhist of Tvar--fluxn is:

diffhistavg=0.9*diffhistavg+0.1*diffhistbi-
as

[0122] In the formula above, ρ is a decimal between 0 and 1 for
controlling the update speed of diffhistavg, for example,
ρ=0.9.

[0123] (4) diffhistavg needs to fall within a restricted value
range between -XT and XT, where XT is the upper limit and
-XT is the lower limit. XT may be a decimal between 0 and 1,
for example, XT=0.6. The restricted diffhistavg is
expressed as a final difference measure diffhistfinal.

[0124] (5) The first adaptive threshold of var_fluxn is expressed as
Tvar--fluxn, which is calculated through:

Topup and Topdown are the maximal value and minimal
value of Tvar--fluxn respectively, and are set
according to the operating point.

[0125] Therefore, the first adaptive threshold of the spectrum fluctuation
variance is calculated according to the difference measure, external
input working point, and the maximal value and minimal value of the
adaptive threshold of the preset spectrum fluctuation variance.

[0126] Method 2: Determining Tvar--fluxn according to
snrn, as shown in FIG. 7:

[0127] S501. Update the maximal value of the SNR according to the current
signal frame.

[0128] The maximal value of snrn, expressed as maxsnr, is
determined for each frame. If the snrn of the current signal frame
is above maxsnr, the maxsnr is updated to the snrn value
of the current signal frame; otherwise, the maxsnr is multiplied by
a coefficient such as 0.9999 to generate the updated maxsnr. That
is, the maxsnr value is updated according to the snrn of each
frame.

[0129] S502. Determine the SNR threshold according to the updated maximal
value of the SNR, namely, calculate the adaptive threshold (Tsnr) of
snrn.

Tsnr=Cop*maxsnr

[0130] Cop is a decimal between 0 and 1, and is adjusted according to
the working point, for example, Cop=0.5. The working point is an external
input for controlling the tendency of deciding whether the signal is
speech or music.

[0131] S503. Among a certain number of frames including the current signal
frame, obtain the number of frames whose snr is above the snr threshold
and the number of frames whose snr is below or equal to the snr
threshold; calculate a difference measure between the two numbers, and
obtain the first adaptive threshold according to the difference measure.

[0132] In this embodiment, Tvar--fluxn is calculated
according to the snrn value of 1 signal frames which include the
current signal frame and 1-1 frames before the current signal frame,
where 1 is an integer above 0, for example, 1=512. The detailed method is
as follows:

[0133] (1) Among the 1 frames, the number of frames with
snrn>Tsnr is expressed as highbin; the number of frames
with snrn≦Tsnr is expressed as lowbin, namely,
highbin+lowbin=l.

[0134] (2) The difference measure between highbin and lowbin is
expressed as diffhist:

diff hist = high bin - low bin l = 2 * high bin l - 1
##EQU00012##

[0135] Depending on the working point, a corresponding offset factor
Vop needs to be added to diffhist to generate the difference
measure after offset, namely,

diffhistbias=diffhist+ Vop

[0136] (3) The moving average value diffhistavg designed to
calculate diffhist of Tvar--fluxn is:

diffhistavg=ρ*diffhistavg+(1-ρ)*diffhis-
tbias

[0137] In the formula above, ρ is a decimal between 0 and 1 for
controlling the update speed of diffhistavg, for example,
ρ=0.9.

[0138] (4) diffhistavg needs to fall within a restricted value
range between -XT and XT, where XT is the upper limit and
-XT is the lower limit. XT may be a decimal between 0 and 1,
for example, XT=0.6. The restricted diffhistavg is
expressed as a final difference measure diffhistfinal.

[0139] (5) The first adaptive threshold of var_fluxn is expressed as
Tvar--fluxn, which is calculated through:

Topup and Topdown are the maximal value and minimal
value of Tvar--fluxn respectively, which are set
according to the working point.

[0140] Therefore, the first adaptive threshold of the spectrum fluctuation
variance is calculated according to the difference measure, external
input working point, and the maximal value and minimal value of the
adaptive threshold of the preset spectrum fluctuation variance.

[0141] S306. Classify signals according to other parameters in addition to
the spectrum fluctuation variance.

[0142] In some implementation, when var_flux is used as a main parameter
for classifying signals, the signal type may be decided according to
other additional parameters to further improve the performance of signal
classifying. Other parameters include zero-crossing rate, peakiness
measure, and so on. In some implementation, peakiness measure hp1 or
hp2 may be used to decide the type of the signal. For clearer
description, hp1 is called a first peakiness measure, and hp2
is called a second peakiness measure. If hp1≧T1 and/or
hp2≧T2, the current signal frame is a music frame.
Alternatively, the current signal frame is determined as a music frame
if: the avg_P1 obtained according to hp1 is above or equal to
T1 or the avg_P2 obtained according to hp2 is above or
equal to T2; or the avg_P1 obtained according to hp1 is
above or equal to T1 and the avg_P2 obtained according to
hp2 is above or equal to T2, as detailed below:

[0147] 4. Select N initial peak(i) values which are relatively great, for
example, select 5 initial peak(i) values, and calculate hp1 and
hp2 according to the following formulas. If below 5 peak values are
found, set N to the number of peak values actually found, and use the N
peak values to calculate:

[0148] In the formulas above, N is the number of peak values actually used
for calculating hp1 and hp2.

[0149] In some implementation, the N peak(i) values may be obtained among
the x found spectrum peak values in other ways than the foregoing
arrangement; or, several values instead of the initial greater values are
selected among the arranged peak values. Any variations made without
departing from the essence of the present invention shall fall within the
scope of the present invention.

[0150] 5. If hp1≧T1 and/or hp2≧T2, the
current signal frame is a music frame, where T1 and T2 are
experiential values.

[0151] That is, in this embodiment, after var_fluxn is used as a main
parameter for deciding the type of the current signal frame, the
parameter hp1 and/or hp2 may be used to make an auxiliary
decision, thus improving the ratio of identifying the music frames
successfully and correcting the decision result obtained through the
local statistical method.

[0152] In some implementation, the moving average of hp1 (namely,
avg_P1) and the moving average of hp2 (namely, avg_P2) are
calculated first. If avg_P1≧T1 and/or
avg_P2≧T2, the current signal frame is a music frame,
where T1 and T2 are experiential values. In this way, the
extremely large or small values are prevented from affecting the decision
result.

[0153] avg_P1 and avg_P2 may be obtained through:

avg--P1=γ*avg--P1+(1-γ)*hp1

avg--P2=γ*avg--P2+(1-γ)*hp2

[0154] In the formulas above, γ is a decimal between 0 and 1, for
example, γ=0.995

[0155] The operation of obtaining other parameters and the auxiliary
decision based on other parameters may also be performed before S305. The
operations are not order-sensitive. Any variations made without departing
from the essence of the present invention shall fall within the scope of
the present invention.

[0156] S307. Apply the hangover of a frame to the raw decision result to
obtain the final decision result.

[0157] In some implementation, the decision result obtained in step S305
or S306 is called the raw decision result of the current signal frame,
and is expressed as SMd_raw. The hangover of a frame is adopted to obtain
the final decision result of the current signal frame, namely, SMd_out,
thus avoiding frequent switching between different signal types.

[0158] Here, last_SMd_raw represents the raw decision result of the
previous frame, and last_SMd_out represents the final decision result of
the previous frame. If last_SMd_raw=SMd_raw, SMd_out=SMd_raw; otherwise,
SMd_out=last_SMd_out. After the final decision is made for every frame,
last_SMd_raw and last_SMd_out are updated to the decision result of the
current signal frame respectively.

[0159] For example, it is assumed that the raw decision result of the
previous frame (last_SMd_raw) indicates the previous signal frame is
speech, and that the final decision result (last_SMd_out) of the previous
frame also indicates the previous signal frame is speech. If the raw
decision result of the current signal frame (SMd_raw) indicates that the
current signal frame is music, because last_SMd_raw is different from
SMd_raw, the final decision result (SMd_out) of the current signal frame
indicates speech, namely, is the same as last_SMd_out. The last_SMd_raw
is updated to music, and the last_SMd_out is updated to speech.

[0160] FIG. 8 shows a structure of a signal classifying apparatus in an
embodiment of the present invention. As shown in FIG. 8, the apparatus
includes: a first obtaining module 601, configured to obtain a spectrum
fluctuation parameter of a current signal frame; a foreground frame
determining module 602, configured to determine the current signal frame
as a foreground frame and buffer the spectrum fluctuation parameter of
the current signal frame determined as the foreground frame into a first
buffering module 603; the first buffering module 603, configured to
buffer the spectrum fluctuation parameter of the current signal frame
determined by the foreground frame determining module 602; a setting
module 604, configured to set a spectrum fluctuation variance of the
current signal frame to a specific value and buffer the spectrum
fluctuation variance in a second buffering module 606 if the current
signal frame falls within a first number of initial signal frames; a
second obtaining module 605, configured to obtain the spectrum
fluctuation variance of the current signal frame according to spectrum
fluctuation parameters of all signal frames buffered in the first
buffering module 603 and buffer the spectrum fluctuation variance of the
current signal frame in the second buffering module 606 if the current
signal frame falls outside the first number of initial signal frames; the
second buffering module 606, configured to buffer the spectrum
fluctuation variance of the current signal frame set by the setting
module 604 or obtained by the second obtaining module 605; and a first
deciding module 607, configured to: calculate a ratio of signal frames
whose spectrum fluctuation variance is above or equal to a first
threshold to all signal frames buffered in the second buffering module
606, and determine the current signal frame as a speech frame if the
ratio is above or equal to a second threshold or determine the current
signal frame as a music frame if the ratio is below the second threshold.

[0161] Through the apparatus provided in this embodiment, the spectrum
fluctuation parameter of the current signal frame is obtained; if the
current signal frame is a foreground frame, the spectrum fluctuation
parameter of the current signal frame is buffered in the first buffering
module 603; if the current signal frame falls within a first number of
initial signal frames, the spectrum fluctuation variance of the current
signal frame is set to a specific value, and is buffered in the second
buffering module 606; if the current signal frame falls outside the first
number of initial signal frames, the spectrum fluctuation variance of the
current signal frame is obtained according to the spectrum fluctuation
parameters of all buffered signal frames, and is buffered in the second
buffering module 606. The signal spectrum fluctuation variance serves as
a parameter for classifying signals, and the local statistical method is
applied to decide the signal type. Therefore, the signals are classified
with few parameters, simple logical relations and low complexity.

[0162] FIG. 9 shows a structure of a signal classifying apparatus in
another embodiment of the present invention. As shown in FIG. 9, the
apparatus in this embodiment may include the following modules in
addition to the modules shown in FIG. 8: a second deciding module 608,
configured to assist the first deciding module 607 in classifying the
signals according to other parameters; a decision correcting module 609,
configured to obtain a final decision result by applying a hangover of a
frame to the decision result obtained by the first deciding module 607 or
obtained by both the first deciding module 607 and the second deciding
module 608, where the decision result indicates whether the current
signal frame is a speech frame or a music frame; and a windowing module
610, configured to: perform windowed smoothing for several initial
spectrum fluctuation variance values buffered in the second buffering
module 606 before the first deciding module 607 calculates the ratio of
the signal frames whose spectrum fluctuation variance is above or equal
to the first threshold to all signal frames buffered in the second
buffering module 606.

[0163] The first deciding module 607 may include: a first threshold
determining unit 6071, configured to determine the first threshold; a
ratio obtaining unit 6072, configured to obtain the ratio of the signal
frames whose spectrum fluctuation variance is above or equal to the first
threshold determined by the first threshold determining unit 6071 to all
signal frames buffered in the second buffering module 606; a second
threshold determining unit 6073, configured to determine the second
threshold; and a judging unit 6074, configured to: compare the ratio
obtained by the ratio obtaining unit 6072 with the second threshold
determined by the second threshold determining unit 6073; and determine
the current signal frame as a speech frame if the ratio is above or equal
to the second threshold, or determine the current signal frame as a music
frame if the ratio is below the second threshold.

[0164] The following describes the signal classifying apparatus with
reference to the foregoing method embodiments:

[0165] The first obtaining module 601 obtains the spectrum fluctuation
parameter of the current signal frame. The foreground frame determining
module 602 buffers the spectrum fluctuation parameter of the current
signal frame into the first buffering module 603 if determining the
current signal frame as a foreground frame. The setting module 604 sets
the spectrum fluctuation variance of the current signal frame to a
specific value and buffers the spectrum fluctuation variance in the
second buffering module 606 if the current signal frame falls within a
first number of initial signal frames. The second obtaining module 605
obtains the spectrum fluctuation variance of the current signal frame
according to spectrum fluctuation parameters of all signal frames
buffered in the first buffering module 603 and buffers the spectrum
fluctuation variance of the current signal frame in the second buffering
module 606 if the current signal frame falls outside the first number of
initial signal frames. In some implementation, a windowing module 610 may
perform windowed smoothing for several initial spectrum fluctuation
variance values buffered in the second buffering module 606. The first
deciding module 607 calculates a ratio of signal frames whose spectrum
fluctuation variance is above or equal to a first threshold to all signal
frames buffered in the second buffering module 606, and determines the
current signal frame as a speech frame if the ratio is above or equal to
a second threshold or determines the current signal frame as a music
frame if the ratio is below the second threshold. In some implementation,
the second deciding module 608 may use other parameters than the spectrum
fluctuation variance to assist in classifying the signals; and the
decision correcting module 609 may apply the hangover of a frame to the
raw decision result to obtain the final decision result.

[0166]FIG. 10 shows a structure of a signal classifying apparatus in
another embodiment of the present invention. As shown in FIG. 10, the
apparatus includes: a third obtaining module 701, configured to obtain a
spectrum fluctuation parameter of a current signal frame determined as a
foreground frame, and buffer the spectrum fluctuation parameter; a fourth
obtaining module 702, configured to obtain a spectrum fluctuation
variance of the current signal frame according to the spectrum
fluctuation parameters of all signal frames buffered in the third
obtaining module 701, and buffer the spectrum fluctuation variance; and a
third deciding module 703, configured to: calculate a ratio of signal
frames whose spectrum fluctuation variance is above or equal to a first
threshold to all signal frames buffered in the fourth obtaining module
702, and determine the current signal frame as a speech frame if the
ratio is above or equal to a second threshold or determine the current
signal frame as a music frame if the ratio is below the second threshold.

[0167] Through the apparatus provided in this embodiment, the spectrum
fluctuation parameter of the current signal frame determined as a
foreground frame is obtained and buffered; the spectrum fluctuation
variance is obtained according to the spectrum fluctuation parameters of
all buffered signal frames and is buffered; the ratio of the signal
frames whose spectrum fluctuation variance is above or equal to the first
threshold to all buffered signal frames is calculated; if the ratio is
above or equal to the second threshold, the current signal frame is a
speech frame; if the ratio is below the second threshold, the current
signal frame is a music frame. The signal spectrum fluctuation variance
serves as a parameter for classifying signals, and the local statistical
method is applied to decide the signal type. Therefore, the signals are
classified with few parameters, simple logical relations and low
complexity.

[0168] The signal classifying has been detailed in the foregoing method
embodiments, and the signal classifying apparatus is designed to
implement the signal classifying method above. For more details about the
classifying method performed by the signal classifying apparatus, see the
method embodiments above.

[0169] In the embodiments of the present invention, speech signals and
music signals are taken as an example. Based on the methods in the
embodiments of the present invention, other input signals such as speech
and noise can be classified as well. For the signal classifying based on
the local statistical method in the present invention, the spectrum
fluctuation parameter and the spectrum fluctuation variance of the
current signal frame are used as a basis for deciding the signal type. In
some implementation, other parameters of the current signal frame may be
used as a basis for deciding the signal type.

[0170] Persons of ordinary skill in the art should understand that all or
part of the steps of the method according to the embodiments of the
present invention may be implemented by a program instructing relevant
hardware. The program may be stored in a computer readable storage
medium. When the program runs, the steps of the method according to the
embodiments of the present invention are performed. The storage medium
may be any medium that is capable of storing program codes, such as a
Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or
a Compact Disk-Read Only Memory (CD-ROM).

[0171] Finally, it should be noted that the above embodiments are merely
provided for describing the technical solution of the present invention,
but not intended to limit the present invention. It is apparent that
persons skilled in the art can make various modifications and variations
to the invention without departing from the spirit and scope of the
invention. The present invention is intended to cover the modifications
and variations provided that they fall within the scope of protection
defined by the following claims or their equivalents.