Double-Talk Detection (DTD)

Acoustic echo cancellers (AEC) are used in full-duplex hands-free communication systems in which the far-end (loudspeaker) signal is coupled with the near-end (microphone) signal. AECs are used to remove the unwanted echoes from the microphone signal. Cancellation of these echoes is a system identification signal processing problem. In this situation the microphone signal is considered the desired signal and the loudspeaker signal is the excitation signal. An adaptive algorithm modifies the filters coefficients, so when it is convolved with the excitation signal, it’s output matches that of the microphone signal. The difference between the near-end signal and the filter output is used as the criteria for updating the filter coefficients.

In a full-duplex conversation, the presence of signals other than the far-end signal convolved with the echo path, inhibit the ability of the adaptive algorithm to model the system. Ambient noise provides a theoretical bound on the achievable cancellation of the system. The presence of the near-end talker during far-end speech is a source of disruption in the adaptation of the filter. Therefore, adaptation of the filter must be prevented during this double-talk (DT) scenario via a double-talk detector.

There are many approaches to determining when a double-talk scenario occurs, but all follow the same general procedure. The general procedure is that a detection statistic, η, can be formulated from the excitation, desired, and/or error signals. Then this detection statistic is compared to a threshold, to determine if double-talk can be declared. The system notation for this paper is as follows: let x(n), y(n), d̂(n)and represent the far-end, near-end and estimated echo signals respectively.

One practical approach to double-talk detection (DTD) is the Geigel detector. The detection statistic is the ratio of the far-end to near-end signal levels.

η =

max{∣x(n)∣,…,∣x(n−N)∣}

∣y(n)∣

(1)

If the maximum far-end signal over an interval of length N (typically the length of the echo path) is less than the near-end signal by a threshold, then double-talk scenario can be decleared. The threshold for this detector is usually set to a value close the echo return loss (ERL) of the echo path. Therefore, if the near-end talker is active, than the near-end signal level will increase enough to lower η below the threshold. This double-talk detector works well in line echo cancellation because the echo return loss of a lines echoes from impedance mismatches remain consistently 6dB or more. In AEC, the ERL is much more variable, thus it has to be estimated during the communication. The uncertainty in this estimation, often leads to an increase in missed detections and/or false alarms.

Another approach to DTD is using cross-correlation. Closed-loop and open-loop analysis are the two main correlation based methods. In the closed-loop analysis, the cross-correlation is between the microphone and the estimated echo signal.

η =

∣∑d̂(n−k)y(n−k)∣

∑∣d̂(n−k)y(n−k)∣

(2)

In the open-loop analysis, the cross-correlation is between microphone and the maximally correlated excitation signal.

η =

max

∣∑x(n−k−N)y(n−k)∣

N

∑∣x(n−k−N)y(n−k)∣

(3)

As one can observe, in a well trained system,when the near-end signal contains only echo the correlation will be high. Therefore, when the near-end talker is active, the detection statistic is lowered, and double-talk can be declared. In (2), it is apparent that accuracy of this system relies on a good estimate of the echo signal, hence the system must have reached a suitable level of convergence before it can be used. The open-loop system does not have this restriction, but in (3) it is clear the computational burden is increased. Some solutions use a hybrid approach. In other words, open-loop analysis is used until sufficient convergence has been obtained and the closed-loop analysis takes over.

The double-talk detectors discussed previously, operate in the time-domain. These detectors can be extended to be used in the subband and transform domain implementations of echo cancellation. There are two methodologies for these systems. The first is to allow for independent control within each frequency bucket. Meaning adaptation in each band is restricted by the double-talk detector used within that band. The alternative method is to make a collective decision using the double-talk decisions made within each band. If the number of bands that have a positive DTD is greater than a pre-determined threshold, then a fullband double-talk decision is made and adaptation is stopped on all bands.

An alternative to double-talk detectors is the two-path method. In a two-path echo canceller system two sets of filters are used. The foreground filter is the filter used for the output of the AEC system, while the background filter is continuously adapting its coefficients. When the background filter is perceived to have achieve better cancellation than the foreground filter, the coefficients of the background filter are downloaded to the foreground. This method is useful for handling the DT scenario. When DT occurs, the cancellation of the background filter will be worse than the foreground filter. Thus, the foreground filter will not be updated and seemingly unaffected by double-talk.