A Kinect-Based Instrument; Polyphonic Theremin, No April Fool’s Joke?

It’s hard to assemble an April Fool’s Joke involving technology these days, because actual inventions keep proving stranger than fiction. When Google created a prank involving gestures for controlling email, it was only a matter of time before someone whipped up a prototype that actually did the job.

The Moog Music company, therefore, may be asking for trouble. Their highly-entertaining polyphonic Theremin is spot-on parody, down to the “Stairway to Heaven” solo. And part of the geekier joke for Theremin players is the knowledge that the technology behind this instrument makes what they’re describing safely impossible.

But what’s impossible with conventional Theremin technology could be very possible with computer vision – even the goofy gestures in Moog’s faked video. Artist, inventor, and musician Tim Thompson has been at the bleeding edge of new music instruments for some time. It wouldn’t be overstatement to say Tim was using multi-touch before multi-touch was cool. When I shared a booth with him at Maker Faire a few years ago, he had with him FingerWorks hardware, a now-discontinued tactile, multi-touch pad, and was using it to play visuals live. In a pattern too often repeated in technology, the independent niche tool was snapped up by a larger player. In this case, that larger player was Apple – and, apparently backed at least in part by FingerWorks’ know-how and patents, Apple made history.

In a new project filmed by the superb Modulate This!, Tim works instead with touch-less control, using the Kinect to track multiple areas of expression. (Tim is using the free environment Cinder, which joins tools like Processing and OpenFrameworks as well-liked options for Kinect hackers. In this case, the Kinect support itself comes from libfreenect, the open-source drivers for Mac, Windows, and Linux.)

What he’s built, in other words, is a true polyphonic Theremin – able to play more than one line and employ more than a monophonic gesture, all without touch. The joke may be on Moog.

Part of the value of trying extreme ideas is to demonstrate not only advantages, but disadvantages. And I still find some reason to express healthy skepticism. The similarity to the Theremin isn’t accidental in the Kinect experiments. These projects also inherit the Theremin’s weaknesses. A lack of tactile feedback means it’s difficult to orient pitch or achieve precise control, without the resistance a physical object provides. Reliance on gestural control also opens the opportunity for accidental input and calibration challenges. (The Kinect fares better than the Theremin, but it’s not immune to similar problems, if for different reasons.) Taking a page from the Theremin, Tim’s physical frame makes a big difference – while it doesn’t provide tactile resistance, it at least creates a point of reference in physical space.

The Kinect also adds a new problem the Theremin didn’t face: latency. All of this means if you still like knobs, keys, strings, or even physical multi-touch (which can in certain variations provide excellent tactile feedback via deformable meshes), you needn’t worry. Your revolution may not be Kinect-ified.

But if there were one perfect design for musical instruments, we’d all play just one instrument. Instead, the history of instrument design across the world is an evolutionary explosion of different tradeoffs, different playing styles, and resulting different musical idioms. Any joke can become an instrument, just as any instrument – to someone – can seem like a joke. And that means if you’re looking for something new, you might just celebrate every day as if it’s April Fool’s Day. No kidding.

Updated: Tim offers some comments. He says what other musicians experimenting with Kinect have told me – that while it has certain restrictions as a solo instrumental controller, there’s tremendous potential for multi-user scenarios like installations. And that is itself significant (back to the question of choosing tradeoffs in order to accomplish goals). Tim writes:

Folks whose goal is to replace conventional instruments are sure to be disappointed, as you describe. You could add more detail on other goals:

Goal: using it for art installations at events like Burning Man, creating new and “casual” instruments which are unusual yet inviting and easy to play. Matt Bell ran an experiment related to that goal: http://www.youtube.com/watch?v=mQiyKFDvzkU

Goal: creating controllers which have a much larger visual appeal to an audience, who deserve performers more interesting to look at than someone hunched over buttons and sliders. That’s the reason why musicians like Mark Mosher are interested, in the same way he’s interested in the Percussa Audiocubes, for their visual appeal in performances.

Goal: provide an instrument that dancers can use in performances. I’ll be exploring this in the fall, with a choreographer friend.

People are finding very interesting ways to use the Kinect. Lack of tactile feedback is indeed a deterrent for me, but if the Kinect were used in combination with a physical instrument.. I think that's the ticket. But, in what way, I can't say.

Very funny Moog video. I lol'd when she began playing Stairway to Heaven. Have you heard about the Korg Monotribe? Not much to tell, really, as near nothing is known for certain. But I don't believe it's a joke.

Peter Kirn

Right, actually a good point – and a bit like the approach Sony took, which was to use a physical control *and* vision sensing. I expect that could be the next wave of hacks, and once you have that many dimensions, it's really up to the ingenuity of the artist to determine how it works.

I would like to hear what people think of the airpiano in comparison with the concepts presented in this post and other non tactile instruments. Please visit http://www.airpiano.de to learn more about the airpiano.

I've been using gestural interfaces for some time. I've initially developed the Airstick based on IR sensing, but tried a lot of other DIY methods, like tracking through led fingers with Wiimotes and many other Computer-Vision related techniques.

Later I've been experimenting with the Kinect camera, because it represents a big step for computer vision. CV algorithms on top of traditional cameras are just too noisy and difficult to calibrate in a live set. The Kinect opens-up a whole new world for what you can do with gestural based interfaces.

Artistically it really fulfills me to be on stage and have an expressive controller that I feel translates the notion of a real instrument, in a sense that knobs, faders or mouse clicks just can't (virtual or real).

I would argue that the limitations on expressiveness with a free gestural controller pretty much depend on the type of articulation and how it translates into sound. Rich musical experiences (for performer and audience) are possible with the right ingredients. Over the years I've developed many different approaches to the Airstick, relying mainly on Max/MSP programing, but the real journey was to reach a point where less is more, and that's when you'll reach the necessary sensibility.

> once you have that many dimensions,
> it’s really up to the ingenuity of the artist

What might not be obvious (and wasn't mentioned in the interview or description) is that there are actually 4 dimensions easily available with the Kinect – x, y, depth, and size (of your hand, which can be expanded even further by using your forearm, elbow, etc.). Rotational angle could be considered another dimension as well. Even before you get into movement-gestures (which use time as a dimension), there's more than enough dimensions to have all sorts of fun with.

Peter Kirn

Right, and actually, that's more important than polyphony. If you think about it, the Theremin is fundamentally a coupling of two one-dimensional controls. (I say two one-dimensional rather than two-dimensional as they're on axes that don't interact). With Kinect, you are using effectively four dimensional axes in one spatial environment. (Get that about right, Tim?)

I would say even more: The Open NI framework (http://www.openni.org/), which is the base for some of the Kinect stuff we've been seeing, allows skeleton tracking through NITE. This means you can know exactly where your hands (and other body parts) are in the 3D space and thus have all kinds of different interaction areas mapped around you, which you could play with different body nodes. You could even bang your head and measure tilt! Yeah! 🙂

@peter – Yes, the theremin's two one-dimensional inputs are quite different than a two-dimensional input. Having multiple (2,3,4) dimensions accessible from a single hand is much easier to understand and manipulate. The comparison of a Theremin to these new Kinect things is more useful as a marketing technique than it is as something that tells you what it's like to play it.

@ivan – In my own experimentation and deployments, I'm actually purposely avoiding skeletal tracking, whether through NITE or through the Microsoft SDK, whenever it becomes available. My reasons include: 1) it introduces extra registration steps and/or issues, 2) my frame (and whatever other new physical objects are introduced) will likely get in the way of skeletal recognition/tracking, 3) I'm finding more than enough dimensions of fun using only the depth data, 4) there's lots of room for fun with non-body things – like pendulums and other moving objects, 5) most body-activated instruments act pretty much the same no matter which body part touches the instrument, so mapping areas (rather than body parts) to sounds or actions seems more natural,and 6) I like to avoid doing what everyone else is doing. 🙂 There's no doubt that people will do incredible things with skeletal tracking, I can't wait to see them.

This one does a much better job of showing what it's like to play it, including the use of depth as aftertouch.

Tony

Dear Tim,

were you planning to release some sort of tutorial for Cinder and OpenCV absolute beginners or even source codes how to achieve the Kinect data to OSC mapping via Cinder and OpenCV? OSC to midi mapping via Keykit should not be that difficult to find out, I hope. Except the multi-touch story I assume this would work in the same way also in Windows XP? So for the four areas in your frame you would track just a single point, with its x, y, depth and size as you wrote above.

Anyway, very cool demos, I hope more will follow.

CDM is an online magazine for creative technology, from music and DJing to motion and more.