Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal methods for content-based semantic description of music material.

In this project, we research on the complementarity of audio and image/video description algorithms for the automatic description and indexing of user-generated music performance videos. We address relevant music information research tasks, in particular music instrument recognition, synchronization of audio / video streams, similarity, quality assessment, structural analysis and segmentation and automatic video mashup generation. In order to do so, we develop strategies to build multimedia repositories and gather human annotations.

This research is related to the Maria de Maeztu Strategic Program on Data-Driven Knowledge Extraction (https://portal.upf.edu/web/mdm-dtic/home). Our project deals with large-scale multimedia data. You can watch a video presenting our project here. This line of research involves a main collaboration of faculty members from two different groups: Music Information Research Lab at the Music Technology Group (Emilia Gómez), Image Processing Group (Gloria Haro).