To link to the entire object, paste this link in email, IM or documentTo embed the entire object, paste this HTML in websiteTo link to this page, paste this link in email, IM or documentTo embed this page, paste this HTML in website

SOURCE-SPECIFIC LEARNING AND BINAURAL CUES SELECTION
TECHNIQUES FOR AUDIO SOURCE SEPARATION
by
Namgook Cho
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful¯llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2009
Copyright 2009 Namgook Cho

Several audio source separation techniques, which aim to determine the original sources given their acoustic mixtures, are proposed in this research. Two different mixing processes are considered in terms of the number of microphones: single-channel and multi-channel settings. Since no spatial cue is used in the single-channel observation, we exploit different characteristics of audio sounds. In the case of multichannel mixtures, the spatial information to the sound field enables us to estimate the mixing system by locating sound sources.; In Chapter 3, we propose a source-specific learning approach to efficient music representation, and use it to separate music signals that co-exist with background noise such as speech or environmental sounds. The basic idea is to determine a set of learned elementary functions, or called atoms, to efficiently capture music signal characteristics. There are three steps in the construction of a learned dictionary. First, we decompose basic components of musical signals (e.g., musical notes) into a set of source-independent atoms (i.e., Gabor atoms). Then, we prioritize these Gabor atoms according to their approximation capability to music signals of interest. Third, we use the prioritized Gabor atoms to synthesize new atoms to build a compact learned dictionary. The number of atoms needed to represent music signals using the learned dictionary is much less than that of the Gabor dictionary, resulting in a sparse music representation. Experimental results are given to demonstrate its efficiency and application to music signal separation from a mixture of multiple sounds.; In Chapter 4, we investigate the noise effects on the multichannel audio source separation with instantaneous mixing system, where sounds emanating from different sources arrive at the same time without any delay between them. Under the noisy condition, source sparsity assumption, which is critical in Sparse Component Analysis (SCA), is easily violated, i.e., several sources may exist at a time-frequency point. These violation of the assumption yields errors in the estimation of mixing parameters and rendered the use of l1-norm minimization improper. We propose an enhanced technique to address the problem by employing weighted soft-assignment clustering and generalized lp-norm minimization with regularization. The technique results in more robust and sparser solutions in a noisy environment than SCA-based methods. In addition, we extend the single-channel audio source separation technique based on source-specific dictionaries to the multichannel case to extract music signals from stereo-channel mixtures.; In Chapter 5, we propose a robust technique to separate audio sources received by a microphone array in a room acoustic environment with an underdetermined mixing process (i.e., the number of sources is larger than the number of mixtures). Our scheme consists of two stages: 1) estimation of mixing parameters and 2) recovery of source signals. For the first stage, contrary to the traditional DUET (Degenerate Unmixing Estimation Technique)-type methods that exploit all binaural cues, we estimate the mixing parameters by selecting a reliable subset of binaural cues based on the phase determinacy condition and source sparsity. As a result, we can determine the mixing parameters successfully even in a reverberant environment with longer time delay. Then, proper mathematical tools are applied to the underdetermined linear system to recover the original audio sources for the second stage. Experimental results on simulated data in a room acoustic environment are given to show a significant gain over the DUET-type method in audio source separation.

SOURCE-SPECIFIC LEARNING AND BINAURAL CUES SELECTION
TECHNIQUES FOR AUDIO SOURCE SEPARATION
by
Namgook Cho
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful¯llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2009
Copyright 2009 Namgook Cho