In June I travelled to NIME2018 at Virginia Tech to present some of work from the RITMO centre and EPEC project at the University of Oslo. This year, our NIME presentations were focussed on “standstill performance”—where participants have to stand as still as possible to create sound. In previous years, our group had created standstill performances using motion capture in the lab, but our new work was on ways to do this at live events, and even in installations, using the Bela single-board computer.

In an installation context, we worked on a hanging guitar system that was installed at Ultima 2017 in Oslo. The idea here was that each guitar was an independent “standstill” instrument with a Bela, acoustic actuator, and infrared distance sensor. Visitors could trigger a breath-inspired sonic performance by entering the beam of the distance sensor and not moving. In our installation, we used six guitars. Each guitar had quite a simple sonic interaction, but together the result could be complex and rewarding in the exhibition space. This work was presented in a paper that included the system design as well as some discussion of how the audience interacted with our artwork.

For mobile performances, we also used multiple Bela boards, but rather than IR sensors, we used Myo muscle sensor armbands to detect stillness, and provide a way for performers to control sounds without moving by subtly tensing muscles in their arms. Again, this was designed for an ensemble context: multiple performers standing still together to create new kinds of performances. Creating this work involved some technical challenges: using Myo sensors with Linux and on the Bela hardware was solved with an existing Myo C++ driver and connecting them to Pd on a Mac or Linux laptop was solved with a new Python application that I put together. Our Myo standstill instrument the 8 muscle sensors to control 8 sine tone oscillators. The angle and orientation of the performer’s arm was used to control the frequencies of these tones, and a distortion effect. Again, this was a simple instrument with complexity arising from its use in an ensemble context.

Our Myo and Bela work was presented in a poster, and also in a performance of our standstill work “Stillness Under Tension”. In this piece, a standstill ensemble assumes three poses over 9 minutes, with micro-movements and muscle tension allowing the performers to explore small sonic worlds through the Myo sensors. Alexander Refsum Jensenius and I had performed the piece previous in Oslo, and we were joined at NIME by a wonderful impromptu ensemble of colleagues from around the world: Anna Xambó, Federico Visi, and Fabio Morreale. With a standstill quintet, we were really about to spread out on the huge stage of the Fife Theatre at Virginia Tech’s Moss Arts Centre.

So overall we had a very successful NIME presenting our lack-of-motion performance work with the Bela platform! As usual, there were many amazing and inspiring projects on display and it was great to catch up with many colleagues and friends from all over the world. It was particularly good to see the conference seriously engage with gender imbalance through a packed workshop on Women in NIME, and several papers and discussions. This can seem like a difficult problem to address, but it evidently is not going to solve itself. Hopefully by introducing some positive actions to invite and keep women contributing to NIME we can make a difference in the gender balance in coming years, and also find pathways towards improving diversity more broadly.

Here’s the citations for the two contributions we had in the proceedings:

@inproceedings{nime18-Gonzalez,
author = {Gonzalez Sanchez, Victor Evaristo and Martin, Charles Patrick and Agata Zelechowska and Bjerkestrand, Kari Anne Vadstensvik and Victoria Johnson and Jensenius, Alexander Refsum },
title = {Bela-Based Augmented Acoustic Guitars for Sonic Microinteraction},
pages = {324--327},
booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
editor = {Luke Dahl, Douglas Bowman, Thomas Martin},
year = {2018},
month = {June},
publisher = {Virginia Tech},
address = {Blacksburg, Virginia, USA },
URL = {http://www.nime.org/proceedings/2018/nime2018_paper0068.pdf},
abstract = {This article describes the design and construction of a collection of digitally-controlled augmented acoustic guitars, and the use of these guitars in the installation \textit\{Sverm-Resonans\}. The installation was built around the idea of exploring `inverse' sonic microinteraction, that is, controlling sounds by the micromotion observed when attempting to stand still. It consisted of six acoustic guitars, each equipped with a Bela embedded computer for sound processing (in Pure Data), an infrared distance sensor to detect the presence of users, and an actuator attached to the guitar body to produce sound. With an attached battery pack, the result was a set of completely autonomous instruments that were easy to hang in a gallery space. The installation encouraged explorations on the boundary between the tactile and the kinesthetic, the body and the mind, and between motion and sound. The use of guitars, albeit with an untraditional `performance' technique, made the experience both familiar and unfamiliar at the same time. Many users reported heightened sensations of stillness, sound, and vibration, and that the `inverse' control of the instrument was both challenging and pleasant.}
}
@inproceedings{nime18-Martin,
author = {Martin, Charles Patrick and Jensenius, Alexander Refsum and Jim Torresen},
title = {Composing an Ensemble Standstill Work for Myo and Bela},
pages = {196--197},
booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
editor = {Luke Dahl, Douglas Bowman, Thomas Martin},
year = {2018},
month = {June},
publisher = {Virginia Tech},
address = {Blacksburg, Virginia, USA },
URL = {http://www.nime.org/proceedings/2018/nime2018_paper0041.pdf},
abstract = {This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce Myo-to-Bela and Myo-to-OSC software bridges that assist with prototyping compositions using the Myo controller.}
}

I recently had the chance to present a paper about my "Neural iPad Ensemble" at the Audio Mostly conference in London. The paper, discusses how machine learning can help to model and create free-improvised music on new interfaces, where the rules of music theory may not fit. I described the Recurrent Neural Network (RNN) design that I used to produce an AI iPad ensemble that responds to a "lead" human performer. In the demonstration session, I set up the iPads and RNN and had lots of fun jamming with the conference attendees.

Many of those at the conference were very curious about making music with AI systems and the practical implications of using deep learning in concert. Some had appealing, enigmatic, and sometimes confused, assumptions about musical AI. For example, I was often asked whether I could recognise personalities of the performers in the output. This isn't possible in my RNN, because, like in all machine learning systems, the output reflects only the training data, and not the context that we humans see around it.

A limitation of my system is that it learns musical interactions as sequences of high-level gestures. These measurements are made once every second on the raw touchscreen data and describe it as, for examples, "fast taps", or "small swirls". In the performance system, a synthesiser replays chunks of performances that correspond to the desired gestures. This simplification makes it easier to design the neural network, but mean that the RNN isn't trained on the low-level data that we might consider to contain the nuanced "personality" of a performance.

Another easily confused point of my system is that the human performer isn't really the "leader". In the interest of making the most of a limited data set, every performer in every example is permuted as the "leader" and each of the three "ensemble" performers. In fact, few (if any) performances in the corpus had a leader at all. In my system, the human performer is really just one of four equal performers, so it's unreasonable to expect that you could "control" the RNN performers by performing in a certain way, or doing something unexpected.

A practical issue with performance with the neural iPad ensemble is starting and stopping the music! In a human ensemble, the performers use cues like counting in, an audible breath, or a look, to bring in the ensemble and to signal the end of an improvisation. With my RNN ensemble, cues have no effect; sometimes the group starts playing, sometimes it doesn't. Just when you think the performance might be over, the group starts up again without you! Thinking about the training data, this behaviour makes sense. Each training example is 120 seconds long -- too short to capture the long-term curve of a performance, including starting and ending. The examples do contain plenty of instances where one performer plays alone, or three play while one lays out, so there's precedence in the data for completely ignoring the "leader" starting and stopping!

As with many AI systems, these limitations reveal that humans are so good at combining different datasources and contexts that we forget that we're doing it. A truly "natural" iPad ensemble might have to be trained on much more than just high-level gestures in order to keep up with "obvious" musical cues and reproduce musical personalities.

While this system has limitations, it is still fun to play with and useful as a reflection on "creative" AI. Of course, there's many ways to improve it, one of the most important would be to train the RNN on the lowest level data available; in this case, the raw touch event data from the iPad screens. A promising way to approach this is with Mixture Density RNNs (more on that later). I'm looking forward to more chances to perform with and talk about Musical AI soon!

We presented MicroJam this week at the Boost Technology and Equality in Music Conference at Sentralen, Oslo. The conference arranged a Tech Showcase session in Hvelvet, Sentralen's old bank vault with developers of music apps, synthesisers, robots and education software. I was joined by two master's students from the UiO Department of Informatics, Benedikte and Henrik who helped to demonstrated MicroJam to the many participants - thanks guys!

It was wonderful to meet so many wonderful teachers, musicians, and developers from around the region and the world, looking forward to getting the app out to them as soon as we can!

We recently hosted a music technology event at the Department of Informatics to gather together researchers and students from the University of Oslo to see performances and demonstrations of current research.

Luckily, Maria Finkelmeier happened to be swinging through the area from Boston, so we were able to present some new percussion and touch-screen works together, and hear Maria's new live version of #improvAday. Christina Hopgood and Maria joined me for three iPad ensemble pieces, including a new experiment performing live ensemble music with MicroJam --- exactly the opposite of how that app was designed to be used! It was also great to have Kristian Nymoen demonstrate with the Xsens full-body motion tracking system and have other demos from the Departments of Musicology and Informatics.

The event featured the following projects:

Ensemble Metatone: new music for touch-screen and percussion

#improvADay with Maria Finkelmeier (USA)

MuMyo: muscle sensing music from IMV

PhaseRings for iPad Ensemble and Ensemble Director Agent

Xsens Motion Music: making music with full-body motion tracking

Prototype music interfaces from the DESIGN group.

Great to have so many engaged researchers visit IFI and to perform in Escape, the student pub in IFI's basement! Thanks to the cybernetics student society for their help and hope to perform down in Escape again soon!

Since about 2011, I've been performing music with various kinds of touch-screen devices in percussion ensembles, new music groups, improvisation workshops, installations, as well as my dedicated iPad group, Ensemble Metatone. Most of these events were recorded; detailed touch and gestural information was collected including classifications of each ensemble member's gesture every second during each performance. Since moving to Oslo, however, I don't have an iPad band! This leads to the question: Given all this performance data, can I make an artificial touch-screen ensemble using deep neural networks?

I've collected a lot of data from four years of touch-screen ensemble concerts (left). Now, I've used it to train an artificial neural network (right) to interact in a similar way!

As it turns out, the answer is yes! To make this neural touch-screen ensemble, I've used a large collection of collaborative touch-screen performance data to model ensemble interactions and to synthesise ensemble performances. These performances were free-improvisations, gestural explorations between tightly interacting performers of synthesised sounds, samples, and field recordings. In this context, the music theory of melody and harmony doesn't help much to understand what is going on. A data-driven strategy for musical representation is required. Machine learning (ML) is an ideal approach, as ML algorithms can learn from example, rather than from theory.

In this article, I'll explain the parts of this system but first, here's a demonstration of what it looks like:

Live Interaction with a Neural Network

The rough idea of the neural touch-screen ensemble is this: one human improvises music on a touch-screen app and an ensemble of computer-generated 'musicians' reacts with their own improvisation. This system works as follows:

Next, a recurrent neural network, Gesture-RNN, takes this lead gesture and predicts how an ensemble might respond in terms of their own gestures, this is described in more detail below.

The touch-synthesiser then searches the corpus of performance data for sequences of touches that match these gestures and sends them to the other iPads which also run PhaseRings.

Finally, the ensemble iPads 'perform' the sound (and visuals) from these sequences of touches, as if a human were tapping on their screens.

One cool thing about this system is that the 'fake' ensemble members sound quite authentic, as their touches are taken directly from human-recorded touch data. The totality of these components is a system for co-creative interaction between neural network and human performer. The neural net responds to the human gestures, and in turn, the live performer responds to the sound of the generated ensemble iPads. This system is currently used for in-lab demonstrations and we're hoping to show it off at a few events soon!

Learning Gestural Interaction

The most complex part of this system is the Gesture-RNN at the centre. This artificial neural network is trained on hundreds of thousands of excerpts from ensemble performances to predict appropriate gestural responses for the ensemble.

In improvising touch-screen ensembles, the musicians often work as gestural explorers. Patterns of interaction with the instruments and between the musicians are the most important aspect of the performances. Touch-screen improvisations have been previously categorised in terms of nine simple touch-gestures, and a large corpus of collaborative touch-screen performances is freely available. Classified performances consist of sequences of gesture labels (numbers between 0 and 8) for each player in the group - similar to the sequences of characters that are often used as training data in text-generating neural nets.

Like other creative neural nets, such as folkRNN and charRNN, Gesture-RNN is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. These LSTM cells preserve information inside the network, acting as a kind of memory, and help the network to predict structure in sequences of multiple time-steps. The difference between character-level RNNs and this system is that Gesture-RNN is trained to predict how an ensemble would react to a single performer, not what that lead performer might do next.

Training data for Gesture-RNN consists of time series of gestural classification for each member of the group at one second intervals. The network is designed to predict the ensemble response to a single 'lead' sequence of gestures. So in the case of a quartet, one player is taken to be the leader, and the network is trained to predict the reaction of the other three players.

In this case, the input for the network is the lead player's current gesture, and also the previous gestures of the other ensemble members. The output of the network is the ensemble members' predicted reaction. This output is then fed back in to the network at the next time-step.

Here's an example output from Gesture-RNN. In these plots, a real lead performance (in red) was used as the input and the ensemble performers (other colours) were generated by the neural net. Each level on the y-axis in these plots represents a different musical gesture performed on the touch-screens.

Recurrent Neural Networks and Creativity

Gesture-RNN uses a similar neural network architecture to other creative machine learning systems, such as folkRNN, Magenta's musical RNNs, and charRNN. It has recently become apparent that recurrent neural networks, which can be equipped with "memory" cells to learn long sequences of temporally-related information, can be unreasonably effective. Creative neural network systems are beginning to be a bit of a party trick, like the amusingly scary NN-generated Christmas carol. In the case of high-level ensemble interactions, we don't have tools (like music theory) to help us understand and compose them, so a data-driven approach using RNNs could be much more useful!

The neural touch-screen ensemble is a unique way for a human performer to interact with a creative neural network. We're using this system in the EPEC (Engineering Predictability with Embodied Cognition) project at the University of Oslo to evaluate how a predictive RNN can be engaged in co-creation with a human performer. In our current application, the synthesised touch-performances are played back through separate iPads which embody the "fake" ensemble members. In future, this system could also be integrated within a single touch-screen app, and it might allow individual users to experience a kind of collaborative music-making. It might also be possible to condition Gesture-RNN to produce certain styles of responses, that model particular users, or performance situations.

The code for this system is available online: Gesture-RNN, Metatone Classifier, PhaseRings. While there are lots of creative applications of recurrent neural networks out there, there aren't too many examples of interactive and collaborative RNN systems. It would be great to see more creative and interactive systems using these and other neural net designs!