Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A novel beamforming post-processor technique with enhanced noise
suppression capability. The present beamforming post-processor technique
is a non-linear post-processing technique for sensor arrays (e.g.,
microphone arrays) which improves the directivity and signal separation
capabilities. The technique works in so-called instantaneous direction of
arrival space, estimates the probability for sound coming from a given
incident angle or look-up direction and applies a time-varying, gain
based, spatio-temporal filter for suppressing sounds coming from
directions other than the sound source direction, resulting in minimal
artifacts and musical noise.

Claims:

1-31. (canceled)

32. A computer-implemented process for improving the directivity and
signal to noise ratio of the output of a beamformer employed with a
sensor array in an environment, comprising: capturing stationary tones
dispersed at locations in the environment with sensors of a sensor array;
inputting signals of the stationary tones and a desired signal captured
by the sensors of a sensor array in the frequency domain defined by
frequency bins and frames in time; computing a beamformer output as
function of the input signals divided into frequency bins and frames in
time; dividing a spatial region corresponding to a working space of the
sensor array into a plurality of incident angle regions, and for each
frequency bin and incident angle region, computing the probability that
the desired signal occurs at a given incident angle region using an
instantaneous direction of arrival computation; and spatially filtering
the beamformer output by multiplying the probability that the desired
signal occurs at a given incident angle region by the beamformer output
while attenuating signals from the locations of the stationary tones.

33. The computer-implemented process of claim 32 wherein the stationary
tones are captured as part of a calibration procedure.

34. The computer-implemented process of claim 32 wherein the stationary
tones are played by speakers at known locations.

35. The computer-implemented process of claim 32 wherein the input
signals in the frequency domain are converted from the time domain into
the frequency domain prior to inputting them using a Modulated Complex
Lapped Transform (MCLT).

36. The computer-implemented process of claim 32 wherein the sensors are
microphones and wherein the sensor array is a microphone array.

37. The computer-implemented process of claim 32 wherein the
instantaneous direction of arrival computation for each frequency bin is
based on the phase differences of the input signals from a pair of
sensors.

38. The computer-implemented process of claim 32 wherein spatially
filtering the beamformer output attenuates signals originating from
directions other than the direction of the desired signal.

39. A system for improving the signal to noise ratio of a desired signal
received from a microphone array in an environment, comprising: a general
purpose computing device; a computer program comprising program modules
executable by the general purpose computing device, wherein the computing
device is directed by the program modules of the computer program to,
capture audio signals of dispersed stationary sound sources and a desired
signal in an environment in the time domain with a microphone array;
convert the time-domain signals to frequency-domain and frequency bins
using a converter; input the signals in the frequency domain into a
beamformer and compute a beamformer output wherein the beamformer output
represents the optimal solution for capturing an audio signal at a target
point using the total microphone array input; estimate the probability
that the desired signal comes from a given incident angle using an
instantaneous direction of arrival computation; and output an enhanced
signal for the desired signal with a greater signal to noise ratio by
taking the product of the beamformer output and the probability
estimation that the desired signal comes from a given incident angle
while attenuating audio signals that come from directions of the
stationary sound sources.

40. The system of claim 39 wherein the instantaneous direction of arrival
computation for each frequency bin is based on the phase differences of
the input signals from a pair of microphones.

41. The system of claim 39 wherein the beamformer is a time-invariant
beamformer.

42. The system of claim 39 wherein the enhanced signal with a greater
signal to noise ratio is computed and output in real time.

43. The system of claim 39 wherein the modules to estimate the
probability that a desired signal comes from a given incident angle using
an instantaneous direction of arrival computation and the module to
output an enhanced signal with a greater signal to noise ratio by taking
the product of the beamformer output and the probability estimation that
the desired signal comes from a given incident angle form a
post-processor that attenuates signals originating from directions other
than the direction of the desired signal to output a signal with an
enhanced signal to noise ratio.

44. A computer-implemented process for improving the signal to noise
ratio of a desired signal received from a microphone array in an
environment, comprising: capturing audio signals of dispersed stationary
sound sources and a desired signal in the environment in the time domain
with a microphone array; converting the time-domain signals to
frequency-domain and frequency bins using a converter; inputting the
signals in the frequency domain into a beamformer and computing a
beamformer output wherein the beamformer output represents the optimal
solution for capturing an audio signal at a target point using the total
microphone array input; estimating the probability that the desired
signal comes from a given incident angle using an instantaneous direction
of arrival computation; and outputting an enhanced signal of the desired
signal with a greater signal to noise ratio by taking the product of the
beamformer output and the probability estimation that the desired signal
comes from a given incident angle.

45. The computer-implemented process of claim 44 further comprising
attenuating audio signals that come from directions of the stationary
sound sources.

46. The computer-implemented process of claim 44 wherein the
instantaneous direction of arrival computation for each frequency bin is
based on the phase differences of the input signals from a pair of
microphones.

47. The computer-implemented process of claim 44 wherein the beamformer
is a time-invariant beamformer.

48. The computer-implemented process of claim 44 wherein the enhanced
signal with a greater signal to noise ratio is computed and output in
real time.

49. The computer-implemented process of claim 44 wherein the
instantaneous direction of arrival computation is based on the phase
differences of the input signals from a pair of sensors.

50. The computer-implemented process of claim 44 wherein the dispersed
stationary sound sources are at known locations.

51. The computer-implemented process of claim 44 wherein the dispersed
stationary sound sources are activated as part of a calibration
procedure.

Description:

BACKGROUND

[0001] Using multiple sensors arranged in an array, for example
microphones arranged in a microphone array, to improve the quality of a
captured signal, such as an audio signal, is a common practice. Various
processing is typically performed to improve the signal captured by the
array. For example, beamforming is one way that the captured signal can
be improved.

[0002] Beamforming operations are applicable to processing the signals of
a number of arrays, including microphone arrays, sonar arrays,
directional radio antenna arrays, radar arrays, and so forth. In general,
a beamformer is basically a spatial filter that operates on the output of
an array of sensors, such as microphones, in order to enhance the
amplitude of a coherent wave front relative to background noise and
directional interference. In the case of a microphone array, beamforming
involves processing output audio signals of the microphones of the array
in such a way as to make the microphone array act as a highly directional
microphone. In other words, beamforming provides a "listening beam" which
points to, and receives, a particular sound source while attenuating
other sounds and noise, including, for example, reflections,
reverberations, interference, and sounds or noise coming from other
directions or points outside the primary beam. Beamforming operations
make the microphone array listen to given look-up direction, or angular
space range. Pointing of such beams to various directions is typically
referred to as beamsteering. A typical beamformer employs a set of beams
that cover a desired angular space range in order to better capture the
target or desired signal. There are, however, limitations to the
improvement possible in processing a signal by employing beamforming.

[0003] Under real life conditions high reverberation leads to spatial
spreading of the sound, even of point sources. For example, in many cases
point noise sources are not stationary and have the dynamics of the
source speech signal or are speech signals themselves, i.e. interference
sources. Conventional time invariant beamformers are usually optimized
under the assumption of isotropic ambient noise. Adaptive beamformers, on
the other hand, work best under low reverberation conditions and a point
noise source. In both cases, however, the improvements possible in noise
suppression and signal selection capabilities of these algorithms are
nearly exhausted with already existing algorithms.

[0004] Therefore, the SNR of the output signal generated by conventional
beamformer systems is often further enhanced using post-processing or
post-filtering techniques. In general, such techniques operate by
applying additional post-filtering algorithms for sensor array outputs to
enhance beamformer output signals. For example, microphone array
processing algorithms generally use a beamformer to jointly process the
signals from all microphones to create a single-channel output signal
with increased directivity and thus higher SNR compared to a single
microphone. This output signal is then often further enhanced by the use
of a single channel post-filter for processing the beamformer output in
such a way that the SNR of the output signal is significantly improved
relative to the SNR produced by use of the beamformer alone.

[0005] Unfortunately, one problem with conventional beamformer
post-filtering techniques is that they generally operate on the
assumption that any noise present in the signal is either incoherent or
diffuse. As such, these conventional post-filtering techniques generally
fail to make allowances for point noise sources which may be strongly
correlated across the sensor array. Consequently, the SNR of the output
signal is not generally improved relative to highly correlated point
noise sources.

SUMMARY

[0006] This Summary is provided to introduce a selection of concepts in a
simplified form that are further described below in the Detailed
Description. This Summary is not intended to identify key features or
essential features of the claimed subject matter, nor is it intended to
be used to limit the scope of the claimed subject matter.

[0007] In general, the present beamforming post-processor technique is a
novel technique for post-processing a sensor array's (e.g., a microphone
array's) beamformer output to achieve better spatial filtering under
conditions of noise and reverberation. For each frame (e.g., audio frame)
and frequency bin the technique estimates the spatial probability for
sound source presence (the probability that the desired sound source is
in a particular look-up direction or angular space). It uses the spatial
probability for the sound source presence and multiplies it by the
beamformer output for each frequency bin to select the desired signal and
to suppress undesired signals (i.e. not coming from the likely sound
source direction or sector).

[0008] The technique uses so called instantaneous direction of arrival
space (IDOA) to estimate the probability of the desired or target signal
arriving from a given location. In general, for a microphone array, the
phase differences at a particular frequency bin between the signals
received at a pair of microphones give an indication of the instantaneous
direction of arrival (IDOA) of a given sound source. IDOA vectors provide
an indication of the direction from which a signal and/or point noise
source originates. Non-correlated noise will be evenly spread in this
space, while the signal and ambient noise (correlated components) will
lie inside a hyper-volume that represents all potential positions of a
sound source within the signal field.

[0009] In one embodiment the present beamforming post-processor technique
is implemented as a real-time post-processor after a time-invariant
beamformer. The present technique substantially improves the directivity
of the microphone array. It is CPU efficient and adapts quickly when the
listening direction changes, even in the presence of ambient and point
noise sources. One exemplary embodiment of the present technique improves
the performance of a traditional time invariant beamformer 3-9 dB.

[0010] It is noted that while the foregoing limitations in existing sensor
array beamforming and noise suppression schemes described in the
Background section can be resolved by a particular implementation of the
present beamforming post-processor technique, this is in no way limited
to implementations that just solve any or all of the noted disadvantages.
Rather, the present technique has a much wider application as will become
evident from the descriptions to follow.

[0011] In the following description of embodiments of the present
disclosure reference is made to the accompanying drawings which form a
part hereof, and in which are shown, by way of illustration, specific
embodiments in which the technique may be practiced. It is understood
that other embodiments may be utilized and structural changes may be made
without departing from the scope of the present disclosure.

DESCRIPTION OF THE DRAWINGS

[0012] The specific features, aspects, and advantages of the disclosure
will become better understood with regard to the following description,
appended claims, and accompanying drawings where:

[0013]FIG. 1 is a diagram depicting a general purpose computing device
constituting an exemplary system for a implementing a component of the
present beamforming post-processor technique.

[0014]FIG. 2 is a diagram depicting one exemplary architecture of the
present beamforming post-processor technique.

[0015]FIG. 3 is a flow diagram depicting one generalized exemplary
embodiment of a process employing the present beamforming post-processor
technique.

[0016]FIG. 4 is a flow diagram depicting one more detailed exemplary
embodiment of a process employing the present beamforming post-processor
technique.

DETAILED DESCRIPTION

1.0 The Computing Environment

[0017] Before providing a description of embodiments of the present
Beamforming post-processor technique, a brief, general description of a
suitable computing environment in which portions thereof may be
implemented will be described. The present technique is operational with
numerous general purpose or special purpose computing system environments
or configurations. Examples of well known computing systems,
environments, and/or configurations that may be suitable include, but are
not limited to, personal computers, server computers, hand-held or laptop
devices (for example, media players, notebook computers, cellular phones,
personal data assistants, voice recorders), multiprocessor systems,
microprocessor-based systems, set top boxes, programmable consumer
electronics, network PCs, minicomputers, mainframe computers, distributed
computing environments that include any of the above systems or devices,
and the like.

[0018]FIG. 1 illustrates an example of a suitable computing system
environment. The computing system environment is only one example of a
suitable computing environment and is not intended to suggest any
limitation as to the scope of use or functionality of the present
beamforming post-processor technique. Neither should the computing
environment be interpreted as having any dependency or requirement
relating to any one or combination of components illustrated in the
exemplary operating environment. With reference to FIG. 1, an exemplary
system for implementing the present beamforming post-processor technique
includes a computing device, such as computing device 100. In its most
basic configuration, computing device 100 typically includes at least one
processing unit 102 and memory 104. Depending on the exact configuration
and type of computing device, memory 104 may be volatile (such as RAM),
non-volatile (such as ROM, flash memory, etc.) or some combination of the
two. This most basic configuration is illustrated in FIG. 1 by dashed
line 106. Additionally, device 100 may also have additional
features/functionality. For example, device 100 may also include
additional storage (removable and/or non-removable) including, but not
limited to, magnetic or optical disks or tape. Such additional storage is
illustrated in FIG. 1 by removable storage 108 and non-removable storage
110. Computer storage media includes volatile and nonvolatile, removable
and non-removable media implemented in any method or technology for
storage of information such as computer readable instructions, data
structures, program modules or other data. Memory 104, removable storage
108 and non-removable storage 110 are all examples of computer storage
media. Computer storage media includes, but is not limited to, RAM, ROM,
EEPROM, flash memory or other memory technology, CD-ROM, digital
versatile disks (DVD) or other optical storage, magnetic cassettes,
magnetic tape, magnetic disk storage or other magnetic storage devices,
or any other medium which can be used to store the desired information
and which can accessed by device 100. Any such computer storage media may
be part of device 100.

[0019] Device 100 has a sensor array 118, such as, for example, a
microphone array, and may also contain communications connection(s) 112
that allow the device to communicate with other devices. Communications
connection(s) 112 is an example of communication media. Communication
media typically embodies computer readable instructions, data structures,
program modules or other data in a modulated data signal such as a
carrier wave or other transport mechanism and includes any information
delivery media. The term "modulated data signal" means a signal that has
one or more of its characteristics set or changed in such a manner as to
encode information in the signal. By way of example, and not limitation,
communication media includes wired media such as a wired network or
direct-wired connection, and wireless media such as acoustic, RF,
infrared and other wireless media. The term computer readable media as
used herein includes both storage media and communication media.

[0020] Device 100 may have various input device(s) 114 such as a keyboard,
mouse, pen, camera, touch input device, and so on. Output device(s) 116
such as a display, speakers, a printer, and so on may also be included.
All of these devices are well known in the art and need not be discussed
at length here.

[0021] The present beamforming post-processor technique may be described
in the general context of computer-executable instructions, such as
program modules, being executed by a computing device. Generally, program
modules include routines, programs, objects, components, data structures,
and so on, that perform particular tasks or implement particular abstract
data types. The present beamforming post-processor technique may also be
practiced in distributed computing environments where tasks are performed
by remote processing devices that are linked through a communications
network. In a distributed computing environment, program modules may be
located in both local and remote computer storage media including memory
storage devices.

[0022] The exemplary operating environment having now been discussed, the
remaining parts of this description section will be devoted to a
description of the program modules embodying the present beamforming
post-processor technique.

2.0 Beamforming Post-Processor Technique

[0023] In one embodiment, the present beamforming post-processor technique
is a non-linear post-processing technique for sensor arrays, which
improves the directivity of the beamformer and separates the desired
signal from noise. The technique works in so-called instantaneous
direction of arrival space to estimate the probability of the signal
coming from a given location (e.g., look-up direction in angular space)
and uses this probability to apply a time-varying, gain-based,
spatio-temporal filter for suppressing sounds coming from other
non-desired directions other than the estimated sound source direction,
resulting in minimal artifacts and musical noise.

[0024] One exemplary architecture of the present beamforming
post-processor technique 200 is shown in FIG. 2. This architecture 200
consists of a conventional beamformer 202 which receives inputs from an
array of sensors, such as, for example, an array of microphones 204. The
output of the beamformer 202 is input into a post-processor 206, which
consists of a spatial filtering module 210 and a spatial probability
estimation module 208 which employs an instantaneous direction of arrival
computation. The spatial probability estimation module 208 estimates the
probability that the desired signal originates from a given direction,
θS, using the inputs from the array of sensors. This
probability is then multiplied by the beamformer output in the spatial
filtering module 210, to provide the desired sound source signal with an
improved signal to noise ratio 212.

[0025] One very general exemplary process employing the present
post-processor beamforming technique is shown in FIG. 3. As shown in FIG.
3, box 302, signals of a sensor array in the frequency domain are input
into a standard beamformer. A beamformer output is computed as a function
of the input signals divided into frequency bins and an index of time
frames (box 304). The probability that the desired signal originates a
given direction θS is computed using an instantaneous
direction of arrival computation (box 306). This probability is
multiplied by the beamformer output (box 308) to produce the desired
signal with an enhanced signal to noise ratio (box 310).

[0026] More particularly, a more detailed exemplary process employing the
present beamforming post-processor technique for a microphone is shown in
FIG. 4. The audio signals captured by the microphone array xi(l),i=1
. . . (M-1), where M is the number of microphones, are digitized using
conventional analog to digital (A/D) conversion techniques, breaking the
audio signals into frames (boxes 402, 404). The present beamforming
post-processor technique then converts the time-domain signal xi(n)
to the frequency-domain (box 406). In one embodiment a modulated complex
lapped transform (MCLT) is used for this purpose, although other
conventional transforms could equally well be used. One can denote the
frequency domain transform as xi.sup.(n)(k), where k is the
frequency bin, n is the index of the time-frame (e.g., frame), and i is
the microphone (where i is 1 to M)).

[0027] The signals in the frequency domain, xi.sup.(n)(k), are then
input into a beamformer, whose output represents the optimal solution for
capturing an audio signal at a target point using the total microphone
array input (box 408). Additionally, the signals in the frequency domain
are used to compute the instantaneous direction of arrival of the desired
signal for each angular space (defined by incident angle or look-up angle
(box 410)). This information is used to compute the spatial variation of
the sound source position in presence of Noise
(N(0,λIDOA(k))), for each frequency bin. The IDOA information
and the spatial variation of the sound source in the presence of Noise is
then used to compute the probability density that the desired sound
source signal comes from a given direction, θ, for each frequency
bin (box 412). This probability is used to compute the likelihood that
for a frequency bin k of a given frame the desired signal originates from
a given direction θS (414). If desired this likelihood can
also optionally be temporally smoothed (box 416). The likelihood,
smoothed or not, is then used to find the estimated probability that the
desired signal originates from direction θS. Spatial filtering
is then performed by multiplying the estimated probability the desired
signal comes from a given direction by the beamformer output (box 418),
outputting a signal with an enhanced signal to noise ratio (box 420). The
final output in the time domain can be obtained by taking the
inverse-MCLT (IMCLT) or corresponding inverse transformation of the
transformation used to convert to frequency domain (inverse Fourier
transformation, for example), of the enhanced signal in the frequency
domain (box 422). Other processing such as encoding and transmitting the
enhanced signal can also be performed (box 424).

2.4 Exemplary Computations

[0028] The following paragraphs provide exemplary models and exemplary
computations that can be employed with the present beamforming
post-processor technique.

[0029] 2.4.1 Modeling

[0030] A typical beamformer is capable of providing optimized beam design
for sensor arrays of any known geometry and operational characteristics.
In particular, consider an array of M microphones with a known positions
vector p. The microphones in the array sample the signal field in the
workspace around the array at locations
pm=(xm,ym,zm):m=0, 1, . . . , M-1. This sampling
yields a set of signals that are denotes by the signal vector x(t, p).

[0031] Further, each microphone m has a known directivity pattern,
Um(f,c), where f is the frequency and c={φ,θ, ρ}
represents the coordinates of a sound source in a radial coordinate
system. A similar notation will be used to represent those same
coordinates in a rectangular coordinate system, in this case, c={x,y,z}.
As is known to those skilled in the art, the directivity pattern of a
microphone is a complex function which provides the sensitivity and the
phase shift introduced by the microphone for sounds coming from certain
locations or directions. For an ideal omni-directional microphone,
Um(f,c)=constant. However, the microphone array can use microphones
of different types and directivity patterns without loss of generality of
the typical beamformer.

[0032] 2.4.1.1 Sound Capture Model

[0033] Let vector p={pm m=0, 1, . . . , M-1} denote the positions of
the M microphones in the array, where pm=(xm,ym,zm).
This yields a set of signals that one can denote by vector x(t, p). Each
sensor m has known directivity pattern Um(f,c), where
c={φ,θ,ρ} represents the coordinates of the sound source in
a radial coordinate system and f denotes the signal frequency. It is
often preferable to perform signal processing algorithms in the frequency
domain because efficient implementations can be employed.

[0034] As is known to those skilled in the art, a sound signal originating
at a particular location, c, relative to a microphone array is affected
by a number of factors. For example, given a sound signal, S(f),
originating at point c, the signal actually captured by each microphone
can be defined by Equation (1), as illustrated below:

represents the delay and decay due to the distance from the sound source
to the microphone ∥c-pm∥, and ν is the speed of
sound. The term Am(f) is the frequency response of the system
preamplifier/ADC circuitry for each microphone, m, S(f) is the source
signal, and Nm(f) is the captured noise. The variable Um(f,c),
accounts for microphone directivity relative to point c.

[0035] 2.4.1.2 Ambient Noise Model

[0036] Given the captured signal, Xm(f,pm), the first task is to
compute noise models for modeling various types of noise within the local
environment of the microphone array. The noise models described herein
distinguish two types of noise: isotropic ambient nose and instrumental
noise. Both time and frequency-domain modeling of these noise sources are
well known to those skilled in the art. Consequently, the types of noise
models considered will only be generally described below.

[0037] The captured noise Nm(f,pm) is considered to contain two
noise components: acoustic noise and instrumental noise. The acoustic
noise, with spectrum denoted with NA(f), is correlated across all
microphone signals. The instrumental noise, having a spectrum denoted by
the term Nl(f), represents electrical circuit noise from the
microphone, preamplifier, and ADC (analog/digital conversion) circuitry.
The instrumental noise in each channel is incoherent across the channels,
and usually has a nearly white noise spectrum Nl(f). Assuming
isotropic ambient noise one can represent the signal, captured by any of
the microphones, as a sum of infinite number of uncorrelated noise
sources randomly spread in space:

Indices for frame and frequency are omitted for simplicity. Estimation of
all of these noise sources is impossible because one has a finite number
of microphones. Therefore, the isotropic ambient noise is modeled as one
noise source in different positions in the work volume for each frame,
plus a residual incoherent random component, which incorporates the
instrumental noise. The noise capture equation changes to:

Nm.sup.(n)=Dm(cn)N(0,λN(cn))+N(0,λ.-
sub.NC) (4)

where cn is the noise source random position for nth audio
frame, λN(cn) is the spatially dependent correlated noise
variation (λN(cn)=const .A-inverted.cn for isotropic
noise) and λNC is the variation of the incoherent component.

[0038] 2.4.2 Spatio-Temporal Filter

[0039] The sound capture model and noise models having been described, the
following paragraphs describe the computations performed in one
embodiment of the present beamforming post-processor technique to obtain
a spatial and temporal post-processor that improves the quality of the
beamformer output of the desired signal. The following paragraphs are
also referenced with respect to the flow diagram shown in FIG. 4.

[0040] 2.4.2.1 Instantaneous Direction of Arrival Space

[0041] In general, for a microphone array, the phase differences at a
particular frequency bin between the signals received at a pair of
microphones give an indication of the instantaneous direction of arrival
(IDOA) of a given sound source. IDOA vectors provide an indication of the
direction from which a signal and/or point noise source originates.
Non-correlated noise will be evenly spread in this space, while the
signal and ambient noise (correlated components) will lie inside a
hyper-volume that represents all potential positions of a sound source
within the signal field.

[0042] To provide an indication of the direction a signal or noise source
originates from (as indicated in FIG. 4, box 410), one can find the
Instantaneous Direction of Arrival (IDOA) for each frequency bin based on
the phase differences of non-repetitive pairs of input signals. For M
microphones these phase differences form a M-1 dimensional space,
spanning all potential IDOA. If one defines an IDOA vector in this space
as

Δ(f)[δ1(f),δ2(f), . . . ,
δM-1(f)] (5)

where δi(f) is the phase difference between channels 1 and
i+1:

δl(f)=arg(X1(f))-arg(Xl+l(f))l={1, . . . , M-1}
(6)

then the non-correlated noise will be evenly spread in this space, while
the signal and ambient noise (correlated components) will lay inside a
hypervolume that represents all potential positions
c={φ,θ,ρ} of a sound source in real three dimensional
space. For far field sound capture, this is a M-1 dimensional
hypersurface as the distance is presumed to approach infinity. Linear
microphone arrays can distinguish only one dimension--the incident angle,
and the real space is represented by a M-1 dimensional hyperline. For
each frequency, a theoretical line that represents the positions of sound
sources in the angular range of -90 degrees to +90 degrees can be
computed using Equation (5). The actual distribution of the sound sources
is a cloud around the theoretical line due to the presence of an additive
non-correlated component. For each point in the real space there is a
corresponding point in the IDOA space (which may be not unique). The
opposite is not true: there are points in the IDOA space without
corresponding point in the real space.

[0043] 2.4.2.2 Presence of a Sound Source.

[0044] For simplicity and without any loss of generality, a linear
microphone array is considered, sensitive only to the incident angle
θ-direction of arrival in one dimension. The incident angle is
defined by a discretization of space. For example, in one embodiment a
set of angles is defined that is used to compute various
parameters--probability, likelihood, etc. Such set can, for example, be
in from -90 to +90 degrees every 5 degrees. Let Ψk(θ)
denote the function that generates the vector Δ for given incident
angle θ and frequency bin k according to equations (1), (5) and
(6). In each frame, the kth bin is represented by one point
Δk in the IDOA space. Consider a sound source at θS
with its correspondence in IDOA space at
ΔS(k)=Ψk(θS). With additive noise, the
resultant point in IDOA space will be spread around ΔS(k):

ΔS+N(k)=ΔS(k)+N(0,λIDOA(k)). (7)

where N(0,θIDOA(k)) is the spatial movement of Δk
in the IDOA space, caused by the correlated and non-correlated noises.

[0045] 2.4.2.3 Space Conversion

[0046] The distance from each IDOA point to the theoretical in IDOA space
is computed as a function of incident angle space, as shown in FIG. 4,
box 412. The conversion from the distance from an IDOA point to the
theoretical hyperline in IDOA space into the incident angle space (real
world, one dimensional in this case) is given by:

k ( θ ) = Δ k - Ψ k ( θ )
Ψ k ( θ ) θ
( 8 ) ##EQU00003##

where ∥Δk-Ψk(θ)∥ is the
Euclidean distance between Δk and Ψk(θ) in IDOA
space,

Ψ k ( θ ) θ ##EQU00004##

are the partial derivatives, and γk(θ) is the distance
of observed IDOA point to the points in the real world. Note that the
dimensions in IDOA space are measured in radians as phase difference,
while γk(θ) is measured in radians as units of incident
angle. This computation provides the distance between each IDOA point and
the theoretical line as a function of the incident angle for each
frequency bin and each frame.

[0047] 2.4.2.4 Estimation of the Variance in Real Space

[0048] As shown in FIG. 4, box 414, in order to compute the probability
that the sound source originates from a given incident angle, one must
have the conversion from distance to the theoretical hyperline in IDOA
space to distance into the incident angle space given by Equation (7) and
the noise properties.

[0049] Analytic estimation in real-time of the probability density
function for a sound source in every frequency bin is computationally
expensive. Therefore the beamforming post-processor technique estimates
indirectly the variation λk(θ) of the sound source
position in presence of noise N(0,λIDOA(k)) from Equation (7).
Let λk(θ) and γk(θ) be a K×N
matrix, where K is the number of frequency bins and N is the number of
discrete values of the incident or direction angle of the microphone.
Variation estimation goes through two stages. During the first stage a
rough variation estimation matrix λ (θ,k) is built. If
θmin is the angle that minimizes γk(θ), only
the minimum values in the rough model are updated:

λk.sup.(n)(θmin)=(1-α)λk.sup.(n-1-
)(θmin)+αγk(θmin)2 (9)

where γ is estimated according to Eq. (8),

α = T τ A ##EQU00005##

(τA is the adaptation time constant, T is the frame duration).
During the second stage a direction-frequency smoothing filter H
(θ,k) is applied after each update to estimate the spatial
variation matrix λ(θ,k)=H(θ,k)*λ(θ,k). Here
it is assumed a Gaussian distribution of the non-correlated component,
which allows one to assume the same deviation in the real space towards
the incident angle, θ.

[0050] 2.4.2.5 Likelihood Estimation

[0051] As shown in FIG. 4, box 416, a likelihood estimation that the
desired signal comes from a given incident angle is computed using the
IDOA information and the variation due to noise. With known spatial
variation λk(θ) and the distance of the observed IDOA
points to the points in the real world, γk(θ), the
probability density for frequency bin k to originate from direction
θ is given by:

and for a given direction, θS, the likelihood that the sound
source originates from this direction for a given frequency bin is:

Λ k ( θ S ) = p k ( θ S )
p k ( θ min ) , ( 11 ) ##EQU00007##

where θmin is the value which minimizes pk(θ).

[0052] 2.4.2.6 Spatio-Temporal Filtering

[0053] Besides spatial position, the desired (e.g., speech) signal has
temporal characteristics and consecutive frames are highly correlated due
to the fact that this signal changes slowly relatively to the frame
duration. Rapid change of the estimated spatial filter can cause musical
noise and distortions in the same way as in gain based noise suppressors.
As shown in FIG. 4, box 418, to reflect the temporal characteristics of
the speech signal, temporal smoothing can optionally be applied. For a
given direction, the absence/presence of speech can be modeled with two
states: S0 and S1. The sequence of frequency bin states is
modeled as first-order Markov process. Then the pseudo-stationarity
property of the desired (e.g., speech) signal can be represented by
P(qn=S1|qn-1=S1) with the following constraint:
P(qn=S1|qn-1=S1)>P(qn=S1), where qn
denotes the state of n-th frame as either S0 or S1. By assuming
that the Markov process is time invariant, one can use the notation
aijP(qn=Hj|qn . . . 1=Hj). Based on the
formulations above, a recursive formula for signal presence likelihood
for given look-up direction in nth frame Λk.sup.(n) is
obtained as:

where aij are the transition probabilities,
Λk(θS) is estimated by Equation (11), and
Λk.sup.(n)(θS) is the likelihood of having a
signal at direction θS for nth frame. As shown in FIG. 4,
box 420, this likelihood can be converted to a probability and spatial
filtering can be performed by multiplying the probability that the
desired signal comes form a given direction times the beamformer output.
More specifically, conversion to probability gives the estimated
probability for the speech signal to originate from this direction:

The spatio-temporal filter to compute the post-processor output
Zk.sup.(n) (for all frequency bins in the current frame) from the
beamformer output Yk.sup.(n) is:

Zk.sup.(n)=Pk.sup.(n)(θS)Yk.sup.(n), (14)

i.e., the signal presence probability is used as a suppression.

[0054] It should also be noted that any or all of the aforementioned
alternate embodiments may be used in any combination desired to form
additional hybrid embodiments. For example, even though this disclosure
describes the present beamforming post-processor technique with respect
to a microphone array, the present technique is equally applicable to
sonar arrays, directional radio antenna arrays, radar arrays, and the
like. Although the subject matter has been described in language specific
to structural features and/or methodological acts, it is to be understood
that the subject matter defined in the appended claims is not necessarily
limited to the specific features or acts described above. The specific
features and acts described above are disclosed as example forms of
implementing the claims.

Patent applications by Alejandro Acero, Bellevue, WA US

Patent applications by Ivan Tashev, Kirkland, WA US

Patent applications by Microsoft Corporation

Patent applications in class DIRECTIVE CIRCUITS FOR MICROPHONES

Patent applications in all subclasses DIRECTIVE CIRCUITS FOR MICROPHONES