Kinect Hacking and Art Round Table: Why it Matters, What You Need to Know

When Microsoft gobbled up vision technology and announced they were channeling their own research into a product for their game console, artists, researchers, and hackers lamented. It seemed the tech might be destined only for a handful of mainstream game titles. Hours after the product launch, however, and one open source bounty later, it was clear the opposite was happening: Kinect was opening to new possibilities. Some of the world’s leading visual experimenters, many of them regulars in this site’s stories, were quickly pulling in data and reimagining what the device could do. And that’s just in the first days: given its sophistication, the real potential lies ahead.

I pulled together a number of the artist-hackers to get their thoughts:

Phil Torrone talks to us from Adafruit Industries, who put up the bounty for the project and contributed to the EFF to protect the rights of hackers

Theo Watson, OpenFrameworks co-originator, is one of the original hackers and has built Mac support

Memo Akten, OpenFrameworks contributor, is building expressive and artistic applications of the tech

Kyle McDonald, artist and visual researcher, is working with massive clouds of point data, building on his previous work in 3D scanning

Dan Shiffman, Processing guru and NYU faculty, is working on tools to make this more accessible to Processing and Java coders.

Adafruit on the Competition to Hack Kinect

Phil Torrone of Adafruit explains what went on behind the scenes as Adafruit Industries offered a bounty to hack Kinect.

Why hack the Kinect in the first place?

McDonald: It’s essential that we develop drivers and libraries for Kinect, because we have to decide what new technology means to us.

Kinect has taken a technology out of academic labs and defense agencies, and put it in our living room. now we need to decide where we want to point the camera.

Shiffman: A cheap (relatively speaking) “3D” camera is killer technology for the interaction design / computational art community. This kind of tech has been around, but it’s either been too hard to find or prohibitively expensive. I think that you will see a ton of creative uses (in digital art, exhibition design, assistive tech, etc.) that you wouldn’t find if it was only used for console gaming.

Watson: It’s a really amazing piece of hardware for a really affordable price. To put it in perspective, I currently have a commercial-depth camera on loan which produces a similar quality depth image and it retails for $7000! That is really way out of reach for most people who might be hobbyists, artists or researchers, but $150 is incredibly cheap for what the technology allows you to do.

For me, it’s very simple. I like to make things that know what you are doing, or understand what you are wanting to do, and act accordingly. There are many ways of implementing these ideas. You can strap accelerometers to your arms and wave them around, and have the accelerometer values drive sound or visuals. You can place various sensors in the environment, you can use camera(s) to track movement etc. Ultimately, you create an environment that ‘knows’ what is happening inside it, and responds as you designed and developed it to. What excites me is not the technology, but how you interpret that environment data, and make decisions as a result of it. How intuitive is the interface? You can randomly wire the environmental parameters (e.g. orientation of arm), to random parameters (e.g in audio and/or visuals), and it will be fun for a while, but I don’t think it will have longevity, it won’t be an *instrument* that you can ultimately learn to play and naturally express yourself with. In order to create an instrument, you first need to establish a language of interaction – which is the fun side of interaction design, but you always have the technical challenge of making sure you can create a system which can understand that language. It’s too common to design an interaction, but not have the technical capabilities to detect or implement it – then you have a system which reports incorrectly, and makes inaccurate assumptions resulting in confusing, non-intuitive interaction. So you need a smarter system, and the more data you have about the environment, the better you can understand it, and the smarter, more informed decisions you can make. You don’t *need* to use all the data all the time, but it is there if you need it.

Kinect is ultimately a depth-sensing camera. To put it simply, it returns a normal RGB image just like a webcam, but for every pixel in the image, it also returns a ‘distance to camera’. This kind of tech has been around for a while, but very expensive (minimum thousands of dollars), and definitely not a consumer device, more for labs, robotics, military etc. That depth information, is a ton of extra data. With that extra data, we are a lot more knowledgable about what is happening in our environment, we can understand it more accurately, thus we can create smarter systems that respond more intuitively.

One point which is often overlooked – which is a very important point – is not only ‘what can you do with the Kinect that you couldn’t before’, but ‘how much simpler is it technically to do something with the Kinect, as opposed to using other consumer devices’. This really is a very important point. A simple example is the recent rough demo I posted of drawing in 3D with your hands.

That is completely possible to do pre-Kinect. You would need two webcams, you would need to setup your lighting quite specifically. You would want control over your background and overall lighting of the space. And then you would need a lot of hairy maths and code. With the kinect, you just plug it in, make sure there isn’t any bright sunlight around, and with a few lines of code you have the information you need. So now that interaction is available for developer / artists of *all* levels, not just hardcore math geeks – and that is very important. Once you have loads of people playing with these kinds of interactions (who pre-Kinect would not have been able to) then we are bound to see loads of really innovative, fresh applications for it. Sure we’ll get a ton of “pinch to zoom and rotate the photo” demos which will get sickening after a few thousand, but people will be developing ideas that you or I would never have thought off, but instantly love – which in turn will spark new ideas in us to go off and play with – which in turn will feed others.

It’s still really early days yet, it’s just been a case of getting the data off the Kinect into the computer, and then seeing what actually is that data, how reliable is it, how is it’s performance, what can we do with it. Once this gets out to the masses, that’s when the fun will start pouring in 🙂

What might people do with these tools as artists?

Watson: There is quite a lot that it can be used for. For interactive installations, we are often dealing with trying to track people in a space. Typically this requires careful lighting and IR cameras and it can be quite a tricky issue, but with the Kinect the depth image allows us not only to track people but understand where they are in relation to each our in z-space. This is just one application however, another really nice feature is that it has pixel matched color and depth cameras and this could allow for a ‘greenscreen-less’ live greenscreening. And then of course there is its use as a 3D scanner, for building depth maps, understanding the space around us etc and more possibilities than I probably realise.

Shiffman: All sorts of things I can’t possibly imagine! (Just the fact that having depth makes background removal so easy is killer for my students.)

McDonald: I’ve noticed tendencies to work at very different levels of abstraction.

Some people are most interested in the raw data, the inherent glitches, the aesthetic of 3d scanning.

Others are interested in slightly generalized data, maybe the idea of ‘scenes’ that are being captured and analyzed, reconstructed.

Some people are interested in specific applications — object recognition, pose estimation, gestures. these are the most abstracted.

I expect work to come from all different levels, in every different medium.

Sound artists and musicians will use the device to control standard audio parameters, or use the values as input parameters to complex synthesis environments and for controlling spatialized sound with large speaker arrays.

Photographers will work with long exposures in combination with 3d-reactive projection to augment layers of the space over time.

Interaction designers will invent new gestures and modes of interaction specifically targeted at the strengths of the sensor.

Interactive art will experience a minor renaissance as a variety of tasks that were previously very difficult become very simple (e.g., tracking someone against a background that is the same color, or even tracking someone against a moving background)

What’s technically possible with the libraries now; what’s coming?

Watson: At the moment, we can get back the depth image and color image from the two cameras, access the motor, LED and the accelerometer of the device. Some developers are now working on accessing the four microphones which allows for location of sounds in 3D space. Also, a big part of the Kinect as it relates to the Xbox is the full body skeletal tracking, which from a researcher or artist’s perspective is very valuable feature. This is implemented in software on the Xbox and is the result of many years work by some of the top people in the field. A big part of the future research will be at the software level developing tools that build of off and extend the functionality of the hardware, like open source implementations of the realtime skeletonization code.

McDonald: The general rundown is that Linux is fastest, OS X is 5-10 fps behind,
and Windows is just starting to work.

ofxKinect was originally developed by Dan Wilcox and Theo Watson, with some minor contributions from me, and is now also being developed by Arturo Castro. It runs well on OS X and Arturo is still adding Linux support.

Right now it’s only possible to get the RGB and depth images, and to get the depth image in centimeters (which is not what the sensor returns by default). Next will be alignment of the RGB and depth images, and of course making it cross platform. Other suggestions are on the OF forum.

Shiffman: Right now the library just returns two pixel arrays (640×480 RGB image and 640×480 image with depth mapped to grayscale). My to-do list is (a) make all the raw data available, (b) optimize for speed, and (c) add any little analysis tricks / features that might be particularly useful. Basically, anything people do with the openkinect project and OF, I’ll try to add as a feature for Java / Processing.

Stay tuned to CDMotion for more… and let us know if you have specific comments or questions, or have seen work that inpires you. Ed.

Excellent roundup, thanks Peter. I just bought a Kinect, and am watching developments closely. To me the most impressive advance in the technology is how well it works with variable lighting / backdrops. Waiting for robust Windows driver support, though I may try to jump in the Linux side if I get impatient.

dennis

I have a few questions about the kinect, do you need to buy the xbox 360 along with it? how does it connect to your computer? via USB? and then after I download the drivers, exactly what type of information (besides raw video image) does kinect send to it?

@dennis you don't need to own an xbox. the USB driver for the kinect is an open source driver for Linux, OS X, and WinDoze. That means, once you install the driver, the cam works just like a regular webcam. The fun is that the plugins and software being used alongside PROCESSING and openframeworks create interesting visuals.

Social

14.5k

Followers

32.8k

Fans

923

Subscribers

CDM is an online magazine for creative technology, from music and DJing to motion and more.