Reading Group 1 – Introduction to music and predictive processing

Thanks to both Nikki and Lauren for the warm welcome and for helping set up these sessions!

The first EMPRes session of 2017 provided an introduction to predictive processing in music perception and cognition. We began with Rohrmeier & Koelsch’s (2012) detailed review of existing work in predictive information processing in music cognition, including converging theoretical, behavioral, computational, and brain imaging approaches. Then we looked at a commentary by Michael and Wolf (2014) regarding the impact on music research of a specific framework of predictive processing, namely Hierarchical Predictive Processing (HPP) as put forward by Schaefer. These papers were a bit more dense than the ‘introduction’ meeting was intended to be, so I’ll lay out a summary of them here, attempting to explain some of the computational bits as well. Please feel free to comment if you have any questions, or especially if you have any answers or better explanations! A review of our discussion points will be in this subsequent blog post.

Rohrmeier & Koelsch laid out the predictable qualities of music, how our brains may be utilizing those qualities (e.g. through perceptual Gestalts, structural knowledge), behavioral evidence of prediction, followed by various computational models and neural evidence for predictive processes.

Predictable information within the music

Predictability and combinatoriality requires a discrete, finite set of elements

Prediction in music occurs on both lower-level processes (predicting the next note) and higher-level processes (predicting a development section in a sonata)

Four sources of prediction, which may work together or be in ‘mutual conflict’:

Prediction in music is messy, constant parallel predictions are made in respect to not only single melodic strings, but complex harmonies, overall key structure, polyphonic and polyrhythmic sound streams, and at phrase-,movement-, or whole-composition levels. It becomes even messier when adding in texture/timbre changes, or considering more polyrhythmic, polymetrical, or complex polyphonic music of non-Western musics

Behavioral Findings

Prediction effects are found in behavioral responses (identification) of unexpected musical events in the case of unexpected tones, intervals, and chords

Why do we like them? “Predictive computational models provide a link between theoretical, behavioural, and neural accounts of music prediction”

Hand crafted models such as Piston’s table of usual root progressions, show the general harmonic (root) progression expectancies based on tendencies in Western music. Your theory courses teach you to explicitly recognize these tendencies, which are implicitly learned and recognized by persons enculturated around Western music.

Probabilistic models

N-gram models chop long segments up into shorter bits, and analyze those bits for statistical probabilities to predict the likelihood of the next unit.

Example of a 3-garm model of a sequence of pitches {A C E G C E A C E G}; the sequence will be chopped into shorter bits of three pitches, and the number of times each bit occurs [ACE: 2 (occurs two times); CEG:2 ; EGC:1; GCE:1; CEA:1; EAC:1]

We can use this model to predict that the notes ‘CE’ will occur after A 2/3 of the time, or after G 1/3 of the time (for this example, it’s easy to just count every instance of [_CE], 3 instances, and see 2 of those are A+CE, and 1 is G+CE)

Multiple Viewpoint Idea

Using information from multiple different features (viewpoints) to aid the prediction of a target feature

In music, using “duration or metrical position to improve the prediction of pitch class”, for example, an 8th note anacrusis at the start of a piece will likely be 5 leading to 1 (Sol, Do) — this particular prediction though, necessitates prediction based on previous exposure to a larger corpus of music, since it would be improbable to infer statistical correlations of a current piece based on only two notes

Short-term models and Long-term models

Short term- knowledge from the current listening; “specific repetitive and frequent patterns that are particular to the current piece and picked up during implicit online-learning”

Long term-knowledge from an entire corpus; “long-term acquired (melodic) patterns”

IDyOM- Information Dynamics of Music

A combination of all of the above, with “optimized ways of smoothing of viewpoint combinations and blending of short and long term predictions” through active, online learning

A Markov transition matrix is the same as a 2-gram model. So our earlier set of pitches {A C E G C E A C E G}, would be split into [AC:2; CE:3; EG:2; EA:1;] and the probabilities are modeled between single events.

A Hidden Markov Model (HMM) generates probabilities not from single events, but instead generates probability distributions from hidden deep structure states. The probability of each subsequent state depends only on the previous state (not future states), reflecting the temporality of musical processing

A DBN is an extension of an HMM in the same way that the multiple viewpoint model was an extension of n-gram models. DBNs analyze “dependencies and redundancies between different musical structures, such as chord, duration, or mode” to generate predictions

Connectionist Networks- Neural networks are designed to represent how actual biological neurons work, combining probabilistic models like those listed above with practical models of neural connections, firing, and growth dynamics

MUSCAT- an early musical neural net pre-programmed with Western features (12 chromatic pitches, 24 diatonic Major and minor chords, 24 keys)- does very well at predicting features of tone perception and prediction

Self-Organizing Map- unsupervised learning of features of tonal music (this is different from MUSCAT in that it was not pre-programmed with any training data)- matched some experimental data for predicting chord relations, key relations, and tone relations

Increased brain response to incongruent (unexpected) stimuli within a sequence- in music this may be hearing a normal chord progression followed by an unusually placed chord

ERAN- early right anterior negativity

MMN- mismatched negativity

It’s not clear through the neuroscientific evidence whether these responses are the result of local vs hierarchical violations

Brain areas involved

Ventral Pre-motor Cortex, BA44 (Broadman’s area 44, the right hemisphere analouge to Broca’s area for language in the left hemisphere- perhaps both do hierarchical processing)

Michael & Wolf laid out perhaps a more accessible overview of areas where a particular predictive processing framework, hierarchical predictive processing (HPP), might lend a novel contribution to the study of music cognition and human social cognition more generally.

HPP is a predictive framework which describes the brain as having a combination of lower level and higher level models arranged, of course, in a hierarchy. Each higher-level model generates and sends predictions down stream to the model immediately below it, while each lower-level model sends sensory input upstream to the model immediately above it. The goal is to minimize prediction error between the higher-level predictions and the lower-level sensory representations. Every time a higher-level prediction comes in contact with a lower-level sensory input that *does not match* the prediction, a prediction error is sent. The higher-level model then takes that prediction error, and changes its prediction, repeating until the incoming signal and the downward prediction are sufficiently matched. Higher-level models are thought to represent changes occurring over longer time scales, such as more abstract, structural, schematic, or specific style aspects of music. Lower-level models represent change in sensory input over shorter time scales, as in immediate local events of the next note or rhythm.

Musical Preferences

HPP seems of little use in the understanding of musical preferences. It can’t be assumed that the preferred balance between ‘optimally predictable’ and ‘a bit of uncertainty’ between individuals is the same. The author’s dub this search for the ‘sweet spot of predictability’ the “Golidlocks fallacy”, since even the right amount of predictability in a novice trumpet players crude sounds is likely still unpleasant.

Embodying Music

HPP might help in furthering our understanding of embodied music cognition by providing a clear link between perception and action, where perception simply is reflected by “a graded hierarchy of models functioning according to the same basic principles” separated only by time scales, and action is “in a sense equivalent to updating of higher-level cognitive models… through active inference”

Joint Action

In joint music making, agents engage in recursive higher-order modeling: “agents are not only modeling the music, but they are modeling the other agent’s actions and intentions, as well as the other agent’s model of her actions and intentions”. If joint music making is construed by the brain as a coordination problem, then HPP may the perfect model to step in and try to minimize prediction (coordination) error in these complex, recursive social interactions