A new member will join the band over the next couple months, which poses a problem: I’ve run out of limbs. Controlling three machines with two hand-held devices is hard enough, and trying to extend this framework of direct control would require either rapidly alternating control of four machines between two controllers or building foot controllers. Foot controllers are out of the question because I think they’re fairly ridiculous. For an organist or drummer, they’re fine, but for a quasi one-man band, simultaneous foot and hand control looks a little clownish.

The other approach of alternating control of the four machines with my existing controllers works well enough, be it leaves me stuck in loop-based music: play something on one instrument, loop it, switch to another and play something, loop that, now switch back, etc. I’m eager to overcome this approach. Live interaction and simultaneous constrained improvisation among all participants, whether human or computer controlled, is what interests me. (Why it interests me more than loop-based music is a topic for another post.) Musical human-machine interaction has been a research topic for at least two decades now but has seen mixed results. Research in the field often produces interesting demonstrations but little in the way of music you might actually enjoy. Among the more notable efforts is Robert Rowe’s Cypher system, which uses a rules-based approach, and Francois Pachet’s Continuator project, which favors a data-driven, machine-learning strategy. I favor the latter approach augmented by the ability to build hooks into the generative system so that I can steer its output.

Machine learning researchers often describe their models as “black boxes” because their inner operations are opaque to observation. You train your algorithm to predict the next note in a sequence or to distinguish dogs from cats in images, and then the model runs, but it’s hard to observe or influence the model once it has been trained. I want to build hooks or windows (or telephone lines or something; pick your metaphor) into my black boxes so that I can nudge the model in one direction or another. In the case of a generative algorithm for the trap set, I might want to influence the overall loudness, the average number of onsets per bar, how syncopated the rhythm is, and so on. These qualities of a rhythm are the features I alluded to in the title of the post, and their concatenation constitutes a feature vector. (Using “vector” in this way annoys physicists and other math-types because it doesn’t connote a direction, but that’s how the term is used in the machine learning literature.) Incorporating variable features into the generative model could be done in a variety of ways, but the question that preoccupies me now is which features to include. For the trap set model, these are some features that I think are worth considering:

Tempo
Number of onsets per unit of time
Degree of syncopation
Distribution of onsets across different drums and cymbals, e.g., percentage of onsets for the snare, for the ride cymbal, etc.
Inter-onset-interval histogram (see page 3 in this paper for explanation)
Beat-induction histogram

That’s what I’ve got so far. The list is strongly influenced by the categories of music cognition and machine learning, which I think has created some blind spots. To be more precise, the vantage point of this feature vector is so distant that it can’t easily observe subtle differences that distinguish one rhythm from another to human listeners. For practical purposes, we can evaluate the success a of feature vector by measuring how well it reflects human intuition about the similarity or difference of different rhythms and beats. If the selection of features is good, two beats that sound very different to humans will have very different values in their feature vectors. Beats that sound similar will have similar values in their feature vectors.

This is the challenge of selecting features: The rhythmic patterns that are used in real music constitute a small subset of all possible rhythms (even assuming constraints such as a 16th note metrical grid); just think of how many different kinds of popular music have snare hits or snare-like hits on beats two and four, a kick drum hit on beat one, and predominately use eighth notes in the hi-hat. Beats in everything from rock to R&B to hip hop to a lot of country satisfy that criterion even though these beats occupy a very narrow slice of the space of all possible beats. What distinguishes beats in these genres is more subtle, and I’m not sure that the features I listed above would pick up on those subtleties.

To return to the optical metaphor, I want to find features that zoom in on the space occupied by existing music, even at the expense of poorly observing beats that are theoretically possible but unlikely to be used in practice. Such features would look something more like this:

Does a prominent snare hit syncopate at the 16th note level?
Does the hi-hat play mostly eighth notes?
Do kick onsets fall on the beat?
Are kick onsets mostly separated by an inter-onset-interval of three sixteenth notes?

I’d like to hear what the domain experts (that’s you, drummers!) have to say about feature vectors for drum beats. So, drummers and percussionists, what are the qualities that distinguish one rhythm or one drum beat from another?

Tags:

Comments (1)

I heard your interview on NPR today; I have heard of your work before then, but had never had time to sample it – what I heard on NPR was pretty amazing. I was kinda bummed that you didn’t mention the Arduino, but I guess the piece was about you and the music, not so much about the technology. Anyhow, regarding machine learning, you might want to check out David L. Heiserman’s writings on it – I did an email interview with him a while back, and I list his books in the article here: