Composing Music with LSTM Recurrent Networks - Blues Improvisation

Note:This page was created by
Schmidhuber's
former postdoc
Doug Eck
(now assistant professor
at Univ. Montreal), on the LSTM long time lag project.

Here are some multimedia files related to the LSTM music composition
project. The files are in MP3 (hi-resolution 128kbps and low resolution 32kbps)
and MIDI.
A helpful reference document for understanding
the compositions is IDSIA Technical Report
IDSIA-07-02, A First Look at Music Composition using LSTM
Recurrent Neural Networks
[postscript or
pdf].

These compositions were made by an LSTM recurrent neural network. After
learning by example from the training set (next-step prediction),
the network generated these musical examples. When the network
composed, it worked with no guidance whatsoever. That is, the network
had to reproduce both the chords and the melodies. For the simple
training set used here, the chord structure was fixed. This
meant that the network did not need to generalize chords. Even in
this simple case, however, neither a feed-forward network nor very likely
a traditional recurrent neural network (RNN) can learn these chords.
See the technical report for more.

This work marks, we believe, a first step towards a neural network
music composer that can learn and use global musical structure.
Previous attempts at this task showed that RNNs can capture local
structure in music but fail to capture the long-term structure that
defines a musical form. These compositions show that LSTM can capture
and reproduce long-term musical structure and can generate new
(sometimes somewhat pleasing) examples of a form. In short, though the
chord structure was fixed and the training set somewhat dull,
these results are promising enough to warrant further research.

A note on terminology: We are using the terms composition and
improvisation loosely. It is probably
more accurate to describe the behavior LSTM as improvisation
because it is inventing new melodies on top of a set form;
however, the end goal is the creation of new melodies and new forms,
thus the somewhat optimistic use of the term composition.

Background Information

Chords: LSTM was trained using a form of blues common in jazz bebop
improvisation. The form is in 12 bars and contains the following chords:
You can listen to these bebop jazz blues chords
[MP3 hi-res (493Kb),
MP3 lo-res (123Kb),
MIDI (2Kb)].
Note, in these examples the letter name of the chord (the tonic)
is used to form a melody line.

Notes: The possible chord notes were limited to the octave below middle C.
The possible melody notes were limited to the octave above (and including)
middle C.
The improvisations were based around the pentatonic scale:
You can listen to the pentatonic scale
[MP3 hi-res (94Kb),
MP3 lo-res (24Kb),
MIDI (1Kb)].
You can also listen to random quarter notes chosen from
the pentatonic scale sound
[MP3 hi-res (5.5Mb),
MP3 lo-res (1.4Mb),
MIDI (24Kb)]. In this example the random notes
are played along with the chords described above.

Training Set
[MP3 hi-res (5.5Mb),
MP3 lo-res (1.4Mb),
MIDI (21Kb)]:
The training sets
were composed by choosing
randomly from melodic segments that fit the blues form.
Those segments were worked out by me on the piano. Be warned:
this is a boring training set! The goal of these experiments
were to see if LSTM could learn a fixed chord structure while
in parallel learning elements of a varying melody structure.
It was easier to stick with a basic melody. Note that every
12-bar segment is unique; however, because only one or two bars
are changed at a time, you may have to listen for a while to hear
differences. We are currently working on a much more interesting
set of training melodies and chords

Some Compositions

These examples are quite long. Rather than listening to
them in their entirety, it's probably beter to use your audio player to select
the playing times mentioned in the descriptions.

Composition 1
[MP3 hi-res (5.6Mb),
MP3 lo-res (1.4Mb),
MIDI (22Kb)]:
This first example shows initial failure followed by
stabilization. At first, the network does
not correctly reproduce the chords, resulting in a bad
melody line as well. However, at around 25 seconds
the chords fall into place, and the melody follows.
This isn't a particularly good example of composition but it is
an effective example of how the chord structure constrains the
melody line. That is, once the chords are reproduced correctly by
the network the melody follows. Compare for example the first
25 seconds to the 12 bars starting at around 2:04 and around 4:54.

Composition 2
[MP3 hi-res (5.6Mb),
MP3 lo-res (1.4Mb),
MIDI (22Kb)]:
This second example shows how the network can drift from fairly close reproduction
of training set melodies to freer improvisation. Listen
from the beginning and notice how at around 0:14 the
network begins reproducing the melody with some
variation. At around 0:50 the network drifts somewhat
from the melody and recovers at around 1:00. Then at 1:13 it
begins to alternate between constrained and freer improvisation.
Notice also the passages starting at 3:50.

Composition 3
[MP3 hi-res (5.6Mb),
MP3 lo-res (1.4Mb),
MIDI (22Kb)]:
This third example is similar to Composition 2. It contains some
nice sections (presuming, of course, you think any of this is nice).
Starting at 0:28 and continuing through 1:12 is freer improvisation. At 1:12
is an example of the network repeating a motif not found in the training set.

Composition 4
[MP3 hi-res (5.6Mb),
MP3 lo-res (1.4Mb),
MIDI (22Kb)]:
This fourth example shows an example of a network trained longer than those
used for examples 1 through 3. Here, due to longer training time the network does
a better job of reproducing the training set. It still improvises freely but
with less departure from the target melodies.