ISSN 1082-9873

ISMIR 2004

ISMIR 2004's gala reception featured a traditional Catalonian Castella troupea community sport that involves constructing human skyscrapers. I couldn't help seeing their sport as a metaphor for the goals of ISMIR itself: to pile a number of disparate disciplines on top of one another to build something new. (Or maybe it was just all that cava going to my head.)

In any case, the fifth year of the conference is probably an appropriate time to reflect on how the field of music information retrieval (MIR) has changed. A good point of reference is a survey of ISMIR research conducted by Futrelle and Downie presented at ISMIR 2002 [1], which covers the first two years of the conference. At that time, the authors identified several notable gaps in MIR research. There was very little interdisciplinary collaboration and a relative under-representation by the Social Sciences and Humanities. There were few user studies, and therefore much of the research being done was disconnected from user needs. In addition, there was perhaps an undue emphasis on Western musical genres. Finally, the lack of standardized test data and methodologies made it difficult to compare methods within the same problem domain. At ISMIR 2004, it appears that many of these problems have been addressed. ISMIR has expanded, with more participants and papers this year than ever before as well as the introduction of the ISMIR Audio Description Contest and the ISMIR Graduate School.

Music information retrieval has relevancy to Computer Science, Audio Engineering, Musicology, Library and Information Sciences, Psychology, Sociology and Law, to name only a few. How well were these different disciplines represented at ISMIR 2004? It is of course impossible to determine the authors' academic backgrounds from the papers alone, but if you divide the authors by departmental affiliation, it breaks down roughly as shown in Table 1 and Figure 1 [2].

Table 1.

Computer Science, Electrical Engineering, etc.

49%

Interdisciplinary Music/Media Technology

23%

Library/Information Sciences, Informatics, etc.

20%

Music

7%

Other (includes Mathematics, Statistics and Physics)

1%

Figure 1. Division of papers by departmental affiliation

Clearly, the majority of papers were written by authors affiliated with the "hard" sciences.

Another interesting classification of ISMIR papers is by the kind of data being used for research. Musical data tends to fall in three loose categories:

Symbolic - the "notes" or other structural features of a score or performance, often, though not always, obtained from a score;

Audio - the acoustic signal of a musical performance;

Metadata - additional data about the music not contained in the music itself [3].

There is a sizable group of papers that clearly straddles the divide between symbolic and audio data, such as transcription or score-following that make up a fourth category of papers, and there is also a fifth category that does not deal with musical data at all, but instead analyzes the use of music (user studies). The division of papers along these lines is shown in Table 2 and Figure 2.

Table 2.

2002

2004

Audio

31%

32%

Symbolic

24%

26%

Symbolic & Audio

20%

23%

Metadata

14%

13%

User studies

10%

6%

Figure 2. Division of papers by type of data in Years 2002 and 2004

Figure 1 shows the fairly even split between papers dealing with audio and symbolic data in 2004. Interestingly, there was very little change in these proportions compared to those of ISMIR 2002 in Paris.

The overwhelming majority of papers presented at ISMIR 2004 applied primarily to Western art and popular music genres, whether explicitly or implicitly. There were, however, some notable exceptions, including research involving acousmatic music [4], microtonal pitches [5], West African rhythms [6], Carnatic (South Indian Classical) music [7], Australian aboriginal music (specifically didjeridu) [8], Brazilian popular music [9], and two papers on vocal percussion ("beat boxing") [10, 11]. It is exciting to see this broadening of applications. Papers from the ISMIR Conference can be downloaded in PDF version from the Program page at the ISMIR 2004 web site [12].)

This year, the first ISMIR Audio Description Contest [13] addressed the scarcity of standardized data sets and testing methodologies in MIR. Tests were run in the domains of genre classification, artist identification, melody extraction, tempo induction and rhythm classification. A panel session was held on the last day of the conference to announce the winners and discuss the contest. Creating and administering the contest was very labor-intensive, and the entire panel expressed gratitude for the hard work done by the organizers at Universitat Pompeu Fabra. It was agreed that the contest was very useful, though there may have been some barriers to participation that organizers should try to eliminate in future contests. Some of those barriers were technical: e.g., the difficulty of integrating algorithms written in different programming environments. Others were more social: e.g., the "fear of failure" in a public forum. Also, many more research areas were identified that the contest did not cover, most notably research on symbolic data. Perhaps having more fine-grained evaluation will reveal different strengths between various algorithms. It was also suggested that, in the future, a paper for the proceedings should accompany all contest entries, so that contest results can be analyzed in context.

ISMIR 2004 also initiated the ISMIR Graduate School, a six-day workshop with the goal of introducing graduate-level students to the fundamentals of MIR and providing opportunities for the students to interact with others in their field. Faculty for the ISMIR Graduate School this year included Anssi Klapuri (audio signal processing), J. Stephen Downie (digital music libraries), and Frans Wiering (musicology), and the chairs were Xavier Serra and Marc Leman. Approximately 30 students participated.

2004 was an exciting year for ISMIR. The music information retrieval field is maturing, with more research results available for comparison, increased development and usage of common terminology and methodology, and more referencing of prior art. The bar has been set high for next year's conference hosts in London [14], but I'm sure they are up to the challenge!

[3] The MPEG-7 definition of metadata includes attributes that can be extracted from the music itself, but in this report I take the more traditional view, combining such things within the "symbolic" category.