Automatic sound source identification is a fundamental task in machine listening with a wide range of applications in environmental sound analysis including the monitoring of urban noise and bird migrations. In this talk I will discuss our efforts at addressing this problem, including data collection, annotation and the systematic exploration of a variety of methods for robust classification. I will discuss how simple feature learning approaches such as spherical k-means significantly outperform off-the-self methods based on MFCC, given large codebooks trained with every possible shift of the input representation. I will show how the size of codebooks, and the need for shifting data, can be reduced by using convolutional filters, first by means of the deep scattering spectrum, and then as part of deep convolutional neural networks. As model complexity increases, however, performance is impeded by the scarcity of labeled data, a limitation that we partially overcome with a new framework for audio data augmentation. While promising, these solutions only address simplified versions of the real-world problems we wish to tackle. At the end of the talk, I’ll discuss various steps we’re currently undertaking to close that gap.

This presentation will discuss recent research exploring different approaches to sound synthesis using analysis/re-synthesis methods for singing and sound textures. We will describe Ircam's Singing Synthesis system ISiS integrating two synthesis approaches: A classical phase vocoder based approach and a more innovative deterministic and stochastic decomposition (PaN) based on a pulse and noise model. We will notably discuss the underlying analysis of the glottal pulse parameters as well as some recent approaches to establish high level control of the singing voice quality (Intensity Changes, mouth opening, roughness of the voice). Concerning Sound Texture synthesis we will describe a recent signal representation using perceptually motivated parameters: envelop statistics in the perceptual bands (McDermott, 2009, 2011, 2013), discuss synthesis methods that allow producing sound signals from these statistical descriptors, and demonstrate some synthesis results not only for analysis synthesis of textures but also the use as effect for the transformation of arbitrary sounds by means of manipulation of these descriptors.

16:30 - Mathias Mauch: "Evolving Music in the Lab and in the Wild"

Let's revisit music culture through the eye of an evolutionary biologist. Can we evolve music in the lab, like bacteria in a Petri dish? Can we observe how music changes in the wild? I'll be reporting on two data-driven studies I did in collaboration with actual biologists to answer just these questions. On the way I'll be introducing my own background in music informatics and the tools we needed to analyse the audio.