AES San Francisco 2012Poster Session P16

Sunday, October 28, 2:00 pm — 3:30 pm (Foyer)

Poster: P16 - Analysis and Synthesis of Sound

P16-1 Envelope-Based Spatial Parameter Estimation in Directional Audio Coding—Michael Kratschmer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Oliver Thiergart, International Audio Laboratories Erlangen - Erlangen, Germany; Ville Pulkki, Aalto University - Espoo, FinlandDirectional Audio Coding provides an efficient description of spatial sound in terms of few audio downmix signals and parametric side information, namely the direction-of-arrival (DOA) and diffuseness of the sound. This representation allows an accurate reproduction of the recorded spatial sound with almost arbitrary loudspeaker setups. The DOA information can be efficiently estimated with linear microphone arrays by considering the phase information between the sensors. Due to the microphone spacing, the DOA estimates are corrupted by spatial aliasing at higher frequencies affecting the sound reproduction quality. In this paper we propose to consider the signal envelope for estimating the DOA at higher frequencies to avoid the spatial aliasing problem. Experimental results show that the presented approach has great potential in improving the estimation accuracy and rendering quality.
Convention Paper 8791 (Purchase now)

P16-2 Approximation of Dynamic Convolution Exploiting Principal Component Analysis: Objective and Subjective Quality Evaluation—Andrea Primavera, Universitá Politecnica della Marche - Ancona, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Laura Romoli, Universitá Politecnica della Marche - Ancona, Italy; Michele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), ItalyIn recent years, several techniques have been proposed in the literature in order to attempt the emulation of nonlinear electro-acoustic devices, such as compressors, distortions, and preamplifiers. Among them, the dynamic convolution technique is one of the most common approaches used to perform this task. In this paper an exhaustive objective and subjective analysis of a dynamic convolution operation based on principal components analysis has been performed. Taking into consideration real nonlinear systems, such as bass preamplifier, distortion, and compressor, comparisons with the existing techniques of the state of the art have been carried out in order to prove the effectiveness of the proposed approach.
Convention Paper 8792 (Purchase now)

P16-4 Automatic Mode Estimation of Persian Musical Signals—Peyman Heydarian, London Metropolitan University - London, UK; Lewis Jones, London Metropolitan University - London, UK; Allan Seago, London Metropolitan University - London, UKMusical mode is central to maqamic musical traditions that span from Western China to Southern Europe. A mode usually represents the scale and is to some extent an indication of the emotional content of a piece. Knowledge of the mode is useful in searching multicultural archives of maqamic musical signals. Thus, the modal information is worth inclusion in metadata of a file. An automatic mode classification algorithm will have potential applications in music recommendation and play list generation, where the pieces can be ordered based on a perceptually accepted criterion such as the mode. It has the possibility of being used as a framework for music composition and synthesis. This paper presents an algorithm for classification of Persian audio musical signals, based on a generative approach, i.e., Gaussian Mixture Models (GMM), where chroma is used as the feature. The results will be compared with a chroma-based method with a Manhattan distance measure that was previously developed by ourselves.
Convention Paper 8794 (Purchase now)

P16-5 Generating Matrix Coefficients for Feedback Delay Networks Using Genetic Algorithm—Michael Chemistruck, University of Miami - Coral Gables, FL, USA; Kyle Marcolini, University of Miami - Coral Gables, FL, USA; Will Pirkle, University of Miami - Coral Gables, FL, USAThe following paper analyzes the use of the Genetic Algorithm (GA) in conjunction with a length-4 feedback delay network for audio reverberation applications. While it is possible to manually assign coefficient values to the feedback network, our goal was to automate the generation of these coefficients to help produce a reverb with characteristics as similar to those of a real room reverberation as possible. To do this we designed a GA to be used in conjunction with a delay-based reverb that would be more desirable in the use of real-time applications than the more computationally expensive convolution reverb.
Convention Paper 8795 (Purchase now)

P16-6 Low Complexity Transient Detection in Audio Coding Using an Image Edge Detection Approach—Julien Capobianco, France Telecom Orange Labs/TECH/OPERA - Lannion Cedex, France; Université Pierre et Marie Curie - Paris, France; Grégory Pallone, France Telecom Orange Labs/TECH/OPERA - Lannion Cedex, France; Laurent Daudet, University Paris Diderot - Paris, FranceIn this paper we propose a new low complexity method of transient detection using an image edge detection approach. In this method, the time-frequency spectrum of an audio signal is considered as an image. Using appropriate mapping function for converting energy bins into pixels, audio transients correspond to rectilinear edges in the image. Then, the transient detection problem is equivalent to an edge detection problem. Inspired by standard image methods of edge detection, we derive a detection function specific to rectilinear edges that can be implemented with a very low complexity. Our method is evaluated in two practical audio coding applications, in replacement of the SBR transient detector in HEAAC+ V2 and in the stereo parametric tool of MPEG USAC.
Convention Paper 8796 (Purchase now)

P16-7 Temporal Coherence-Based Howling Detection for Speech Applications—Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Hao Liu, Chinese Academy of Sciences - Beijing, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, ChinaThis paper proposes a novel howling detection criterion for speech applications, which is based on temporal coherence (will be referred as TCHD). The proposed TCHD criterion is based on the fact that the speech only has a relatively short coherence time, while the coherence times of the true howling components are nearly infinite since the howling components are perfectly correlated with themselves for large delays. The proposed TCHD criterion is computationally efficient for two reasons. First, the fast Fourier transform (FFT) can be applied directly to compute the temporal coherence. Second, the proposed TCHD criterion does not need to identify spectral peaks from the raw periodogram of the microphone signal. Simulation and experimental results show the validity of the proposed TCHD criterion.
Convention Paper 8797 (Purchase now)

P16-8 A Mixing Matrix Estimation Method for Blind Source Separation of Underdetermined Audio Mixture—Mingu Lee, Samsung Electronics Co. - Suwon-si, Gyeonggi-do, Korea; Keong-Mo Sung, Seoul National University - Seoul, KoreaA new mixing matrix estimation method for under-determined blind source separation of audio signals is proposed. By statistically modeling the local features, i.e., the magnitude ratio and phase difference of the mixtures, in a time-frequency region, a region can have information of the mixing angle of a source with reliability amounted to its likelihood. Regional data are then clustered with statistical tests based on their likelihood to produce estimates for the mixing angle of the sources as well as the number of them. Experimental results show that the proposed mixing matrix estimation algorithm outperform the existing methods.
Convention Paper 8798 (Purchase now)

P16-9 Speech Separation with Microphone Arrays Using the Mean Shift Algorithm—David Ayllón, University of Alcala - Alcalá de Henares, Spain; Roberto Gil-Pita, University of Alcala - Alcalá de Henares, Spain; Manuel Rosa-Zurera, University of Alcala - Alcalá de Henares, SpainMicrophone arrays provide spatial resolution that is useful for speech source separation due to the fact that sources located in different positions cause different time and level differences in the elements of the array. This feature can be combined with time-frequency masking in order to separate speech mixtures by means of clustering techniques, such as the so-called DUET algorithm, which uses only two microphones. However, there are applications where larger arrays are available, and the separation can be performed using all these microphones. A speech separation algorithm based on mean shift clustering technique has been recently proposed using only two microphones. In this work the aforementioned algorithm is generalized for arrays of any number of microphones, testing its performance with echoic speech mixtures. The results obtained show that the generalized mean shift algorithm notably outperforms the results obtained by the original DUET algorithm.
Convention Paper 8799 (Purchase now)

P16-10 A Study on Correlation Between Tempo and Mood of Music—Magdalena Plewa, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, PolandIn this paper a study is carried out to identify a relationship between mood description and combinations of various tempos and rhythms. First, a short review of music recommendation systems along with music mood recognition studies is presented. In addition, some details on tempo and rhythm perception and detection are included. Then, the experiment layout is explained in which a song is first recorded and then its rhythm and tempo are changed. This constitutes the basis for a mood tagging test. Six labels are chosen for mood description. The results show a significant dependence between the tempo and mood of the music.
Convention Paper 8800 (Purchase now)