Description

In brief: algorithms that take a single piece of music as input, and output a list of patterns repeated within that piece. Also known as intra-opus discovery (Conklin & Anagnostopoulou, 2001).

We would be happy to receive ideas for improving aspects of this task. Researchers with wiki accounts are able to post comments below or to edit the relevant sections, and researchers without wiki accounts are welcome to email me directly: tomthecollins(a)gmail.com

In more detail: for understanding and interpreting a musical work, the discovery of repeated patterns within that piece is a crucial step (Cook, 1987). Meredith, Lemström, and Wiggins (2002) cite Schenker (1954) as claiming repetition to be 'the basis of music as an art' (p. 5), and also Lerdahl and Jackendoff (1983), who observe that 'the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece' (p. 52).

On the very next page Lerdahl and Jackendoff (1983) acknowledge their 'failure to flesh out the notion of parallelism,' which is symptomatic of a more general failure in music psychology and music computing to address the discovery of repetition. Algorithms that take pieces of music as input, and output a list, visualisation, or summary of repeated patterns do exist (Chiu, Shan, Huang, & Li, 2009; Collins, Thurlow, Laney, Willis, & Garthwaite, 2010; Conklin & Anagnostopoulou, 2001; Forth & Wiggins, 2009; Hsu, Liu, & Chen, 2001; Knopke & Jürgensen, 2009; Lartillot, 2005; Meek & Birmingham, 2003; Meredith et al., 2002; Müller & Jiang, 2012; Nieto, Humphrey, & Bello, 2012; Peeters, 2007), but the pattern discovery task has received less attention than many other tasks in MIR. Until the last two years!

What is a Pattern?

For the purposes of this task, a pattern is defined as a set of ontime-pitch pairs that occurs at least twice (i.e., is repeated at least once) in a piece of music. The second, third, etc. occurrences of the pattern will likely be shifted in time and perhaps also transposed, relative to the first occurrence. Ideally an algorithm would be able to discover all exact and inexact occurrences of a pattern within a piece, so in evaluating this task we are interested in both (1) whether an algorithm can discover one occurrence, up to time shift and transposition, and (2) to what extent it can find all occurrences. It has been pointed out by Lartillot and Toiviainen (2007) among others that as well as ontime-pitch patterns, there are various types of repeating pattern (e.g., ontimes alone, duration, contour, harmony, etc.). For the sake of simplicity, the current task is restricted to ontime-pitch pairs.

Some of the most recognisable riffs and motifs in music consist of as few as four ontime-pitch pairs (for example, the opening riff from 'Purple Haze' by Hendrix, or the opening of the first movement of Symphony no.5 in C minor by Beethoven). If, however, an algorithm returned all patterns consisting of four or more notes in a given piece, a lot of these patterns would not be perceptually salient or analytically interesting. Happily, solutions have been proposed for trying to determine which are the most noticeable and/or important patterns, which are of middling importance, and which have occurred by chance (Cambouropoulos, 2006; Conklin, 2010a, 2010b). Collins, Laney, Willis, & Garthwaite (2011) conducted a meta-analysis and experimental validation of many proposed solutions. More information about the differences between motif, theme, and repeated section can be found in answer to Question 7.6.

2017: New evaluation of algorithms on compression criteria

In line with participant feedback from previous years, in 2017 a new subtask will be run. This will use discovered patterns to compress a piece of music, with the notion that the discovery of meaningful patterns result in a shorter file length (cf., e.g., Louboutin & Meredith, 2016). For comparison with evaluation criteria from previous editions (2013-2016) of the task, discovered patterns will also be compared against annotated patterns in the JKUPTD (see below). Note that the compression measures are not meant to replace the previous evaluation measures, but are provided as an alternative way of interpreting pattern discovery results.

Data for the Task

Discovery of repeated themes and sections will be evaluated against the JKUPTD database of classical music annotated with repeated themes and sections (mainly from KernScores; see also Flossmann, Goebl, Grachten, Niedemayer, & Widmer, 2010). To encourage participation in the pattern discovery task, we are offering a representative sample called the JKU Patterns Development Database (~340 MB, August 2013 version). (If you prefer, here is a smaller version with no audio, ~40 MB.) Symbolic and audio versions are crossed with monophonic and polyphonic versions, giving up to four versions of the task in total. Researchers are welcome to submit to more than one version of the task.

As a ground truth, we are basing motifs and themes on Barlow and Morgenstern's (1953) Dictionary of Musical Themes, Schoenberg's (1967) Fundamentals of Musical Composition, and Bruhn's (1993) J. S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Repeated sections are based on those marked by the composer. For one of the pieces we created our own annotation. A paper that describes our construction of the Development Database and use of the sources is currently under preparation. No ground truth is perfect: we have chosen the sources as being relatively uncontroversial and transparent, but we would welcome ideas and suggestions from other researchers. As a quick example, Figure 1 is an excerpt from Beethoven's op.2 no.1 mvt.3, with a ground-truth pattern marked as P1 (first occurrence) and P2 (second occurrence).

Submission Format

Symbolic Version

Participants are able to choose from a number of symbolic representations (MIDI, kern, csv with columns for ontime, MIDI note number, staff height, duration, and staff number), as there may be differing opinions about which aspects of a representation are most useful for discovering repeated patterns. This choice also reflects the importance of designing pattern discovery code that functions irrespective of the exact input format (Wiggins, 2007). For the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. For instance, the point-set representation for P1 in Figure 1 is

Sectional repetitions are expanded in all pieces, i.e. as the piece would be heard in a performance. In the monophonic version, pieces consisting of voiced polyphony (e.g., a fugue or choral work) are unfolded, meaning each voice is extracted and re-encoded monophonically, one after the other in the order highest staff to lowest. For example, a fugue with upper, middle, and lower voices would be re-encoded with the upper voice heard first in isolation, followed by the middle voice, and then lower voice. In the monophonic version, pieces consisting of unvoiced polyphony are converted to monophony using the clipped skyline approach.

Audio Version

For the audio version of the task, participating algorithms will have to read audio in wav format, sample rate 44.1 KHz, 16 bit, mono. These wav files are rendered (synthesised) in a metronomically exact fashion from the corresponding symbolic data. Beats per minute (BPM) are different for different pieces, but this information is located in the corresponding kern file (e.g., in a kern file '*MM192' means 192 BPM).

As with the symbolic version of the task, for the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. Even if your algorithm only returns a time interval in seconds for an occurrence of a pattern, this conversion will be easy enough to do: convert to an ontime interval using the BPM provided, and then use the csv file for the piece to determine which ontime-MIDI pairs are sounding in . (A downside to this approach is that the evaluations metrics will be slightly be punitive if not all ontime-pitch pairs sounding in are part of the ground truth pattern.)

Example Algorithm Output for a Ground-Truth Piece

Regardless of symbolic/audio and polyphonic/monophonic task version, the output of your pattern discovery algorithm for a given piece should adhere to the following text file format:

That is, ontimes are in the left-hand column and MIDI note numbers are in the right. Each occurrence of a discovered pattern is given before moving on to the next pattern. Occurrences do not have to be of the same length, nor do they have to be constrained to exact or transposed repetition (e.g., variations are permitted). Neither the patterns nor the occurrences of patterns need to be in temporal order: the evaluation metrics are robust to different orders.

Order does matter, however, in the following two respects: if possible (1) place the patterns in decreasing order of predicted perceptual salience/musical importance; (2) define occurrence1 to be the prototypical occurrence of each pattern. Fulfilling point (1) is not essential (could defer to future work), but it concerns an application of discovery algorithms wherein a user browses the output patterns. It would be convenient for the user to be shown the most important patterns first, and one metric below (called first five target proportion) evaluates this aspect of algorithm performance. Fulfilling point (2) is important if your discovery method is capable of retrieving inexact occurrences. Some metrics below are designed for assessing the capability for retrieving inexact occurrences, but others are simply concerned with whether or not the prototypical occurrence is discovered. The evaluation code will consider occurrence1 to be the prototype.

Evaluation Procedure

In brief: An implementation of the evaluation metrics and example code are bundled with the Development Database, to save participants having to implement the evaluation metrics themselves. Participating algorithms will be evaluated against the following metrics:

runtime, fifth return time, first five target proportion (Tom Collins) and first five precision (David Meredith);

standard precision, recall, and F1 score;

Standard Precision, Recall, and F1 Score

In more detail: Denote the patterns in a ground truth by , and the patterns in an algorithm’s output by . If the algorithm discovers k of the ground truth patterns, up to translation, then the standard precision of the algorithm is defined as , the standard recall of the algorithm is defined as , and the standard F1 score as

The above metrics, which were used by Collins et al. (2010) in one of the first evaluations of a pattern discovery task, are very strict: an output pattern Q may have only one point different from a large ground truth pattern P, but this will not count as a successful discovery. Therefore, we propose the following new metrics, which are robust to slight differences between output and ground truth patterns.

Robust Versions of Precision, Recall, and F1 score

Symbolic Musical Similarity and the Score Matrix

Suppose that in the ground truth there is a pattern P with occurrences , and in an algorithm's output there is a pattern Q with occurrences . Central to evaluating an algorithm is measuring the extent to which constitutes the discovery of . In order to measure this, we need to be able to compute the symbolic musical similarity of Pi and Qj. We can use the simple cardinality score for symbolic musical similarity,

or the slightly more involved normalised matching scoresm(Pi,Qj), after Arzt, Böck, and Widmer (2012). Some examples of cardinality and matching scores between original and mutant versions of the theme from Beethoven's op.2 no.2 mvt.3 are given in Figure 2.

Either of these similarity measures, denoted s(Pi,Qj), can be recorded in a so-called score matrix,

The score matrix shows how all occurrences of a pattern in an algorithm's output compare to all occurrences of a ground truth pattern.

Summaries of the score matrix will be necessary for evaluating all of an algorithm's output against the whole ground truth for a piece. For instance, we may be interested in whether an algorithm is capable of establishing that a pattern P is repeated at least once during a piece, and less interested in whether the algorithm can retrieve all occurrences of P (exact and inexact). In this case, the maximum entry in the score matrix, denoted , is the appropriate summary. For a piece's ground truth , and an algorithm's entire output for that piece , it is now possible to record the algorithm's capability for establishing that patterns in Π are repeated at least once during the piece, using the so-called establishment matrix,

The establishment precision can then be calculated according to

If an algorithm discovers k of the ground-truth patterns exactly, and misses the remaining patterns completely, then the establishment precision is equal to standard precision (). The establishment recall is defined as

The establishment F1 score is defined as above, but replacing precision with establishment precision, and recall with establishment recall.

Occurrence Precision, Occurrence Recall, and Occurrence F1 Score

As mentioned above, there is a difference between a pattern discovery algorithm (or listener) being able to establish the existence of a repeated pattern, and being able to retrieve all occurrences. We showed how to measure the extent to which an algorithm is capable of establishing that a pattern P is repeated at least once during a piece. Now we focus on an algorithm's ability to retrieve all occurrences of P (exact and inexact). These metrics will favour an algorithm that is strong at retrieving all occurrences of the patterns it discovers, even if the algorithm fails completely to discover many of the salient patterns in a piece.

The indices I of the estalishment matrix with values greater than or equal to some threshold (default value c = .75) indicate which ground truth patterns an algorithm is considered to have discovered. We will focus on these indices to define a so-called occurrence matrix. Denoted O(Π,Ξ), the occurrence matrix begins as an zero matrix. Then for each index pair , we calculate the precision of the score matrix, and record this scalar as element (i,j) of O(Π,Ξ). The precision of indicates the precision with which algorithm output retrieved the ground truth item . The occurrence precision, denoted Pocc, is then defined as the precision of the occurrence matrix O(Π,Ξ), with the sum taken over nonzero columns. The occurrence recall, denoted Rocc, is defined analogously, but replacing mentions of 'precision' and 'columns' above with 'recall' and 'rows.' The occurrence F1 score can be defined also.

Three-Layer Precision, Three-Layer Recall, and Three-Layer F1 Score

Coverage and compression ratio

This year, next to comparison of discovered patterns to annotations, we also evaluate in how far pattern discovery contributes to compression of melodies. To this end, we define the measures coverage and compression ratio. These measures are all concerned with the relationship between the number of notes in the pattern discovery output as compared to the set of notes in a musical work W. The notes in a musical work are referred to as x, and are represented by pitch / onset pairs.

To calculate coverage, we are interested in how far a melody is covered by discovered patterns Ξ, and their occurrences (cf. Meredith, Lemström & Wiggins, 2003; Boot, Volk & de Haas, 2016). To this end, we construct a concatenated musical piece C, which is the union of all occurrences of all discovered patterns in that piece:

Conversely, all notes which are not part of the pattern discovery output are calculated as the set of uncovered notes D:

To determine coverage, we check the percentage of notes in the musical piece that is covered by discovered patterns:

Coverage in itself is not very meaningful: i.e., a pattern discovery algorithm which discovers all notes of a piece as part of patterns is not necessarily ``better than an algorithm which finds only a few patterns, and has therefore lower coverage. Coverage does give an insight into how dense the pattern discovery output is, and may therefore be used in addition to other measures to interpret pattern discovery results.

To determine compression ratio, we check in how far a musical work can be more efficiently expressed in terms of discovered patterns and their occurrences. We base this measure on the definition by Meredith, Lemström and Wiggins (2003), but to evaluate lossless compression, we also take the uncovered notes D into account.

For every discovered pattern in the output Ξ, the cardinality of the set of notes describing the first occurrence of a given pattern, | Qj,1 | , and the number of occurrences of that pattern, is summed. After adding the number of uncovered notes | D(W,Ξ) | , this gives us the number of descriptions necessary to express the whole melody, given the pattern discovery output. Weighting this against the length of the full melody gives us the compression ratio of a given pattern discovery output, CR(Ξ).

Note that while coverage is not sensitive to overlapping patterns, compression ratio is: a pattern discovery output with many overlapping patterns will lead to lower compression ratios than pattern discovery output which does not have overlapping patterns.

Runtime, Fifth Return Time, and First Five Target Proportion

Overall runtime is an important metric. Those wishing to develop pattern discovery algorithms for on-the-fly browsing, however, may find it more relevant to know the time taken to return a smaller number of patterns. (E.g., while the user browses, the algorithm can continue to discover extra patterns.) Fifth return time (FRT) is the time taken for the first five patterns to be output by an algorithm. As these patterns are of little use if none of them are ground truth, we counterbalance FRT with another metric called first five target proportion (FFTP), which is the establishment recall calculation applied to the first five columns only of the establihsment matrix S. First five precision (FFP) is the three-layer precision calculation applied to the first five output patterns only.

Friedman Tests for the Pattern Discovery Task

The Friedman test will be used to investigate whether any algorithms rank consistently higher or lower than the others, with regard to metrics for individual pieces.

Available Code

Entering an existing MIREX task, where results have been improving for up to 10 years, can be a daunting prospect. The pattern discovery task, on the other hand, is quite new, and so is a great opportunity for Master's and PhD students to make their mark in MIR. To this end, it should be noted that the following code is freely available, and that students/researchers are very welcome to define pattern discovery algorithms by altering/extending this code, or to use it as a point of comparison with their own algorithms. Please feel free to ask questions, either via this wiki, or by email to authors of the relevant papers.

If you would like to participate in the audio version but are missing an F0 estimator, then you could use the MELODIA plug-in as described by Salamon and Gómez (2012).

Please add links to more implementations here.

...

Questions and Comments

Please Can You Give an Overview of the Development Database's Folder Structre?

Users are encouraged to run their algorithms on either the text file representations of pieces (contained in 'lisp' folders) and their constituent patterns, or the csv file representations (beware rounding errors). The columns represent ontime (measured from zero in crotchet beats), MIDI note number; morphetic pitch number, duration (measure in crotchet beats), and staff number (integers from zero for the top staff). Users are discouraged from running their algorithms on the MIDI file representations. The MIDI files were created and included in the distribution for the purposes of mistake checking, but do not necessarily begin in the correct bar position and contain an extra quiet note at the end to avoid clipping.

If you are writing your own code for iterating over the ground truth patterns, the annotation folders to include for the polyphonic version of the task are are 'bruhn', 'barlowAndMorgensternRevised', 'sectionalRepetitions', 'schoenberg', and 'tomCollins'; for the monophonic task it is 'bruhn', 'barlowAndMorgenstern', 'barlowAndMorgensternRevised', 'sectionalRepetitions', 'schoenberg', and 'tomCollins'. Please note, a faithful barlowAndMorgenstern folder is included in the polyphonic ground truth for the sake of comparison with the revised folder, but it should/will not be iterated over for the evaluation. This is because the barlowAndMorgenstern originals contain some monophonic patterns that ought to be polyphonic (e.g., because a figure in one voice never occurs independently of a simulatneous figure in another voice) and some patterns have erroneous lengths (e.g., a theme is curtailed at five bars because it fits neatly on the page, but in reality the repetition extends for one or two more bars).

Occurrences of patterns consist of (ontime, MIDI note number) pairs. For example, see bachBWV889Fg -> polyphonic -> repeatedPatterns -> bruhn -> A -> occurrences. Inexact occurrences of a pattern are handled as follows: the prototypical version of a pattern is defined at the top level, e.g., bachBWV889Fg -> polyphonic -> repeatedPatterns -> bruhn -> A -> lisp. This definition may be shifted in time towards the beginning of the piece, but is in the correct bar position. The prototypical version of a pattern is always defined as 'occ1' in the occurrences folder. All of the definitions in the occurrences folder correspond exactly to (ontime, MIDI note number) pairs from the piece (i.e., none of these are shifted in time).

We expect structural segmentation algorithms to be adaptable to pattern discovery, so would really welcome segmentation researchers to submit to the pattern discovery task as well. The two tasks are different as follows: structural segmentation results in a list of labelled time intervals that cover an entire piece of music, such as

The output of a pattern discovery algorithm will not necessarily cover an entire piece. A four-bar theme beginning in bar 1 might be the only output of a pattern discovery algorithm, even if the piece is much longer and contains other material.

Whereas the output of a structural segmentation algorithm is non-overlapping, the output of a pattern discovery algorithm might be overlapping or even nested (hierarchical). For instance, the four-bar theme mentioned above might be output, as well as a sectional repetition that lasts from bars 1-8.

In a typical pattern matching task, more or less exact instances of a given query are retrieved from some larger dataset, and ranked by an appropriate measure of relevance to the original query (e.g., Barton, Cambouropoulos, Iliopoulos, & Lipták, 2012). The setup of pattern discovery is fundamentally different: there are no queries given to begin with, just single pieces of music and the requirement to discover repeating patterns within each piece.

The melodic similarity task fits the pattern matching paradigm, and so is also different to pattern discovery. In the melodic similarity task, algorithms are given a melodic query, and retrieve a supposedly relevant melody from the database. The similarity of the query and the algorithm's match is assessed by human listeners.

Why Not Just Use Optical Music Recognition to Detect Sectional Repetitions?

One could use optical music recognition instead, although what we are trying
to understand and model is a listener's awareness of thematic material and
sectional repetitions, which often exists without access to staff notation. It would also be interesting to apply pattern discovery to music for which there is no staff notation.

This Is Intra-Opus Discovery, But What About Inter-Opus Discovery?

Inter-opus discovery, the discovery of patterns that recur across multiple pieces of music (Conklin & Anagnostopoulou, 2001), is an interesting problem, and one that we would be interested to see cast as a MIREX task in future.
Currently, lack of an appropriate ground truth is an issue here.

There Are Some Issues With the MIDI Files, Please Can You Clarify?

The MIDI files were created and are provided for the purposes of sonifying and checking the symbolic data, and are not intended to be used themselves for input to the pattern discovery algorithms (please see the folders called 'csv' and/or 'lisp' instead). They are not ideal for input for the following reasons: (1) correct pitch spelling is lost, whereas this is maintained by presenting MIDI note number and morphetic pitch number side by side in the 'csv' and 'lisp' folders; (2) each MIDI file is zeroed in the sense that it begins more or less immediately, even if the pattern occurrence it represents occurs halfway through a piece; (3) each MIDI file also contains one extra, very quiet, low note to avoid clipping in the sound file.

What is the Difference Between a Motif, a Theme, and a Repeated Section?

Dictionary definitions of motif, theme, and repeated section are given below. To make the definitions more concrete, I refer to the top system of Figure 2. In terms of ontime-pitch pairs, the motif here consists of {(2, C#5), (2.25, A4), (2.5, E5), (2.75, C#5), (3, A5)}, beginning on beat 3 of bar 2 and ending on beat 1 of bar 3. This is repeated an octave lower one bar later, and occurs with a slightly different intervallic configuration at the very beginning. The theme, according to Barlow and Morgenstern (1948), lasts from the upbeat of bar 1, to beat 2 of bar 4. Bars 5-8 are not shown in Figure 2, but there is a repeated section consisting of bars 1-8. So one might infer from this example that typically a motif lasts less than one bar, a theme 4-8 bars, and a repeated section 8+ bars.

According to Drabkin (2001a), a "motif may be of any size, and is most commonly regarded as the shortest subdivision of a theme or phrase that still maintains its identity as an idea." A theme is the "musical material on which part or all of a work is based, usually having a recognizable melody and sometimes perceivable as a complete musical expression in itself" Drabkin (2001b). A repeated section is the "restatement of a portion of a musical composition of any length from a single bar to a whole section, or occasionally the whole piece. Since the Classical period, repeated passages have not usually been written out; instead they are enclosed within the signs ||: and :||" (Tilmouth, 2001).

Time and Hardware Limits

Try to make sure that your algorithm's runtime for the entire Development Database is 24 hours or less on a standard desktop computer, then there should be no need to place further limits on analysis times for the Test Database.

Geraint A. Wiggins. Computer-representation of music in the research environment. In T. Crawford and L. Gibson (Eds), Modern methods for musicology: prospects, proposals and realities, pp. 7-22. Ashgate, Oxford, UK, 2007.