Sound Recognition in Mixture

Introduction

We present a method for recognizing sound sources in a mixture. Using source separation ideas
based on probabilistic latent component analysis (PLCA), we learn dictionaries from each source and
estimate the relative proportions of sound sources in a mixture by decomposing them with the dictionaries
and summing the corresponding activations. In addition to the basic model, we introduce a new method for
learning temporal dependency among dictionary elements using a transition matrix.
We show this temporally constrained model shows better results than the basic model.

Demo Example

This video demo shows levels of three different sound sources (speech, gun and airplane) in an
audio track of a movie. The bars in the left is based on the basic model and those in the right is on the
improved model (temporally constrained with the transition matrix).
The video shows that the improved model has relatively less false alarm, which is marked with the red circles.