November 4, 2010 AT 3:40 pm

The Open Kinect project – THE OK PRIZE – get $3,000 bounty for Kinect for Xbox 360 open source drivers

Hi from team Adafruit, we’re going to do our first ever “X prize” type project. Hack the Kinect for Xbox 360 and claim the $2,000 bounty! NOW $3,000

What is Kinect?

Kinect for Xbox 360, or simply Kinect (originally known by the code name Project Natal (pronounced /nəˈtɒl/ nə-tahl)), is a “controller-free gaming and entertainment experience” by Microsoft for the Xbox 360 video game platform, and may later be supported by PCs via Windows 8. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users to control and interact with the Xbox 360 without the need to touch a game controller through a natural user interface using gestures, spoken commands, or presented objects and images. The project is aimed at broadening the Xbox 360’s audience beyond its typical gamer base. It will compete with the Wii Remote with Wii MotionPlus and PlayStation Move motion control systems for the Wii and PlayStation 3 home consoles, respectively. Kinect is scheduled to launch worldwide starting with North America in November.

What is the hardware?

The Kinect sensor is a horizontal bar connected to a small base with a motorized pivot, and is designed to be positioned lengthwise below the video display. The device features an “RGB camera, depth sensor and multi-array microphone running proprietary software”, which provides full-body 3D motion capture, facial recognition, and voice recognition capabilities.

According to information supplied to retailers, the Kinect sensor outputs video at a frame rate of 30 Hz, with the RGB video stream at 32-bit color VGA resolution (640×480 pixels), and the monochrome video stream used for depth sensing at 16-bit QVGA resolution (320×240 pixels with 65,536 levels of sensitivity). The Kinect sensor has a practical ranging limit of 1.2–3.5 metres (3.9–11 ft) distance. The sensor has an angular field of view of 57° horizontally and a 43° vertically, while the motorized pivot is capable of tilting the sensor as much as 27° either up or down. The microphone array features four microphone capsules, and operates with each channel processing 16-bit audio at a sampling rate of 16 kHz.

Sound cool? Imagine being able to use this off the shelf camera for Xbox for Mac, Linux, Win, embedded systems, robotics, etc. We know Microsoft isn’t developing this device for FIRST Robotics, but we could! Let’s reverse engineer this together, get the RGB and distance out of it and make cool stuff! So……

What do we (all) want?
Open source drivers for this cool USB device, the drivers and/or application can run on any operating system – but completely documented and under an open source license. To demonstrate the driver you must also write an application with one “window” showing video (640 x 480) and one window showing depth. Upload all of this to GitHub.

How get the bounty ($3,000 USD)
Anyone around the world can work on this, including Microsoft Upload your code, examples and documentation to GitHub. First person / group to get RGB out with distance values being used wins, you’re smart – you know what would be useful for the community out there. All the code needs to be open source and/or public domain. Email us a link to the repository, we and some “other” Kinect for Xbox 360 hackers will check it out – if it’s good to go, you’ll get the $3,000 bounty!

But Microsoft isn’t taking kindly to the bounty offer. Bounty offered for open-source Kinect driver – “Microsoft does not condone the modification of its products,” a company spokesperson told CNET. “With Kinect, Microsoft built in numerous hardware and software safeguards designed to reduce the chances of product tampering. Microsoft will continue to make advances in these types of safeguards and work closely with law enforcement and product safety groups to keep Kinect tamper-resistant.”

81 Comments

I love you guys for doing this! Depth cameras, sometimes called RGB-D cameras are extremely useful for robotics. For example, work by Dieter Fox (etal) have used these sensors to create a system that is basically a Google StreetView indoors.

These sensors normally cost many thousands of dollars, so the Kinect will be a _big_ deal for roboticists — especially with an open API. Please ping me when you have a winner so that I can spread the word to the professional / academic robotics world.

I wish you guys would make this more like an X-prize and get donations from a number of organizations instead of just doing it solo at adafruit. As a foundation you could solicit prize donations from every DIY electronics or robotics firm out there… I bet willow garage would put up some cash for example.

Why a prize? You know there are those of us out there working on this stuff for free, we have for years, and you were going to get a driver out of this no matter what. Instead of just pointing them to us and creating a community, now we’re going to have to worry about people not distributing whatever USB traces (since this is probably gonna be easier with a USB analyzer) they have so they can net the cash quicker than than everyone else, which means this is not going to be a community effort.

This isn’t a case of everyone working together but still having a bounty like the DS Wifi hack, because it hasn’t been long enough to even gain that sort of attention.

Anyways, I’m going to put my lack of money where my mouth is. All information I obtain, when I obtain it, will be at:

Very nice. thanks guys this will be a game changer (or at least change it from being a game). What would we do with an NUI that can also calculate depth perception look forward to the driver and an API so we can all find out.

Please. Take the money, buy a USB analyzer, analyze along with recording input via another camera, and hand us some dumps to work with. This is NOT GOING TO HAPPEN without hardware analyzers, they are expensive (I’m probably going to have to rent one if someone else doesn’t post the needed data), and if you don’t help provide the information, why would anyone share with that much cash on the line. Stop throwing money at the situation and start helping the engineering. I /know/ you guys have access to this stuff.

it confuses the heck out of this for me why folks would try to muddy up a piece of hardware just to make it unhackable. antifeatures at work, geez. all it will mean is more folks buying this gizmo which should only benefit the manufacturer? dont they like money? Its not like you’d return a hacked gizmo if it got broken. good luck to all who try to get this thing accessible for floss folks!

I kinda agree with Kyle here, if you are interested in an Open Source driver for it, you can consider sponsoring well known developers donating hardware (usb analyzers or the kinect toy itself) and setting up an information sharing playground, a wiki on openkinect.org would be OK if there is not one already.

I helped writing the Linux driver for the Playstation 3 Eye, and I did that only because I happened to have the camera already and people published their usb traces somewhere.

i’m digging my gear out now, 2k is a bit low if you look at the amount of time that needs to be posted. i wonder what m$ implemented to keep it locked down. for what ever its worth, ill submit a few dumps, since im a college student, i dont think i can place 2 grand in front of school work, ill submit what ever i get to help the cause.

Donations should be taken to add to the prize money. I don’t know how much publicity this kind of thing will get but if 2000 people each donate one measly dollar then the prize is doubled! I’d gladly throw some money in the pot if I could. I want to see this completed but I’m no programmer.

I don’t think you guys understand. The $2000 isn’t a salary, it’s a prize. Be the first to do it and you’ll get $2K. It’s meant to be an incentive for those who were thinking about doing this, not a wage to get people to quit their day job.

If you considered hacking Kinect, here’s an incentive to get on it. May the best person/team win.

You are all going to be very disappointed. Everything clever the ‘kinect’ does, it does with proprietary software on the xbox360. There is no depth camera, just an ordinary monocrome webcam reading a projected (fixed) infra-red pattern projected on the scene. The main purpose of this camera is to rapidly identify a silhouette of a human. Movement is largely identified by using statistical motion vector software (a technique used for years by previous console webcams). The monochrome camera allows the software in the xbox to ‘know’ which small part of the colour camera image to process when searching for motion vectors.

The projected IR pattern allows crude identification of z-depth motion, but the resolution of the mono-camera should be a clue as to how limited this data is.

The skeletal tracking is largely an AI system on the xbox360, and is a statistical assumptive algorithm, rather than any absolute measurement. Research into calculating limb position from simple body outline images pre-dates the xbox console by a very long time.

Interestingly, several years before the kinect, MS gave massive promotion to the Codemaster’s game ‘You’re in the movies’. This game, using only the standard webcam, identified the silhouettes of the players in front of the camera with a high degree of accuracy, allowing ‘green-screening techniques to do background substitution, without the need for a green-screen. This shows that much of what the kinect does in hardware was already redundant using modern image processing algorithms.

However, the kinect obviously makes such methods far more robust, at the cost of mechanical complexity (and a big spend by the consumer).

Had MS been serious about general reading of z-depth, it would have deployed a 3 camera system, with the left and rightmost cameras being discrete and individually positionable, like speakers. These 2 would provide a ‘stereo pair’ of images that would allow depth to be identified at each pixel position by using perspective variation algorithms that could be simply accelerated in hardware. However, setting up 3 discrete boxes would have been a pain for most consumers under 10, or over 20, and lost MS most of its intended audience.

As an aside, many of us know that the companied MS allied with to do the ‘depth’ system was supposedly just about to launch a cheap z-depth camera for the PC at the time. However, I’m sure that camera was intended for objects way closer than 7-feet away, where the sharpness of the projected IR grid pattern would have returned much better information. Can the kinect work with objects placed much closer? Probably not if one wishes to use the colour camera as well. Then there is the focus issue of the optics, and the ‘sharpness’ of the IR pattern.

The bottom line is that the kinect is not like the wii-remotes and sony-moves giving us access to remarkably cheap and robust combinations of gyroscopes, accelerometers, and cameras. For visual processing, kinect is 90%+ a software system on the xbos360 side. Kinect is a bunch of simple hardware choices designed to assist the software. And what does that software drive? Largely a load of silly, imprecise casual games for people that can’t even bring themselves to take gaming seriously. That should inform you about the likely engineering choices, and their usefulness in other areas.

Believe me, motion tracking studios won’t be replacing their multi camera setups, and body-suits, with anything like the tech in the kinect.

PS like everyone here, I hope the kinect is turned into another ‘open’ USB device for everyone to exploit, as soon as possible. It is just that for visual image processing, it is already cheaper to buy multiple high quality USB cameras, and infra-red LEDs, and experiment with readily available open-source software. This just was not true of the tech in the wii-remote.

At lease the calculation of the depth map will happen on the device itself. I think the PrimeSense chip (see iFixit tear downb (1)) calculates the depth map, what else should it be good for?! This calculation can be cumbersome and take quite some processing time. Furthermore the CPU load on the xbox is low during kinect usage, which also indicates that some more computer vision is implemented in the hardware part (2). Additionally the depth calculation using projected IR light more robust than just using a pair of stereo cameras. Even a longer baseline (distance between the cameras of a stereo pair) as suggested by mark would increase the accuracy of such a system but a the same time create a lot of mismatches in the calculation of the depth map, which will reduce the quality of such a system. Commercially available stereo cameras (e.g. (3)) usually use a much shorter baseline and cost quite some dollars/euros (>> 150€, much more than 150€).

In the end i want to say: It is definitely worth having a look at the kinect hardware. I hope someone really hacks that thing. Of course I would spend some bucks on that

I created USB drivers and have the Xbox NUI Motor, Xbox NUI Camera, and Xbox NUI Audio devices connected to my PC. I can communicate with them about standard USB protocol stuff (get descriptors, enumerate interfaces, etc.) and can see the pipes that I need to get data on. The Kinect itself just shows a flashing green light, which means it is waiting for some commands to start going.

I need the USB protocol dumps in order to make further progress, but don’t have access to a USB protocol analyzer. If anyone has one please share it. I’ve tried brute-forcing a few likely commands but most of the incorrect commands cause device resets. Contact me on twitter if you have info: @joshblake.

How about spending this money (you’ll need way more) to create your own 3D sensor? Should be easier than reverse engineer the most complicated vision tech out there… I don’t think you can move far just by cracking the camera. Ask MS for Windows drivers

The PrimeSense IC connects to the host via built in USB. Its functions include adding a depth overlay to a colour image using a second IR camera and IR projector and presumably correcting that overlay for parallax error caused by using separate cameras. The promotional material on their website suggests it also does skeletal mapping using “highly parallel computational logic”, allowing the host system to do gesture recognition.

The PrimeSense reference design is USB powered so must fit within the USB power budget. Natal/Kinect has a USB hub and a cooling fan so there’s obviously a lot more going on in there.

The PrimeSense chip has only two microphone inputs but Kinect has four microphones. Perhaps they are used to steer the camera towards the player using DSP which is happening on the host system.

From what I’ve researched it works on a really simple way. Have a look here: http://www.youtube.com/watch?v=nvvQJxgykcU
The projector projects tiny dots, and the sensor measures the dots size. That explains the low 320×240 resolution. The sensor’s native resolution is probably a lot higher.
Anyway that doesn’t really help on anything as long as the camera itself delivers everything already well formed. It’s just a question of sniffing the USB port probably.

“You are all going to be very disappointed. Everything clever the ‘kinect’ does, it does with proprietary software on the xbox360. There is no depth camera, just an ordinary monocrome webcam reading a projected (fixed) infra-red pattern projected on the scene.”
—
Well a lot of people in the know beg to differ. Current ToF cameras used for robotics cost > 5000$ and don´t even have half the resolution of the Kinect´s sensor. Even if the sensor data is noisy, this device will be extremely competetive with sensors that normaly cost 40x as much.
You´ll probably want to take a look at this:http://www.hizook.com/blog/2010/03/28/low-cost-depth-cameras-aka-ranging-cameras-or-rgb-d-cameras-emerge-2010

I think the biggest win for the bounty isn’t the value to the person who creates the Kinect drivers, but the PR that it is getting which makes this a very visible challenge. Just Google “$1000 Kinect bounty” to see how many high-profile publications have written about this.

This isn’t about adafruit, it’s about the open hardware movement that is gaining momentum. It’s a wakeup call to big companies to get on board and embrace it since there’s no putting this genie back in the bottle.

I think Mark is right. It is unlikely that the Kinect itself is doing any of the 3d processing. A quick Google search shows several mentions of how the x-box is using 10 to 15 % of its processing capacity to do the depth map work.

What would be interesting to know is what the projected image is and does it change over time. It may vary the structured light dot array so that farther objects have a higher dot density.

I think the dot system looks very like Livescribe, the digital pen that records what you write and uses special paper with a unique dot array.

Why are people complaining about expensive logic analyzers? There are many cheap ones out there as well – like USBee, DigiView, GoLogic, etc. — most for less than $500, and some for less than even $200. It’s not tough to hook one of these up to start monitoring the USB traffic. Although I wouldn’t be surprised if there was some encryption handshaking involved in the initial device enumeration…

If Microsoft tries to take legal action against you, you should protect yourself invoking the DMCA exemption clauses:
Chap 12, Section 1201 (f):
“Reverse Engineering. — (1) Notwithstanding the provisions of subsection (a)(1)(A), a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure that effectively controls access to a particular portion of that program for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an independently created computer program with other programs, and that have not previously been readily available to the person engaging in the circumvention, to the extent any such acts of identification and analysis do not constitute infringement under this title.”

I think Mark is right. It is unlikely that the Kinect itself is doing any of the 3d processing. A quick Google search shows several mentions of how the x-box is using 10 to 15 % of its processing capacity to do the depth map work.
—-
It is likely that the Kinect does preprocessing of image data from both the color and IR cam and outputs RGB-D data, that is point clouds with color information (or image data with depth, whatever you want to call it).
10-15% can easily account for processing of this point cloud data that comes from the Kinect. We´re talking 320x240x30 points here, which is quite a lot. Interpreting and processing point cloud data is a hot topic in robotics research at the moment and known to consume quite a lot of CPU (or GPU) resources.

If anyone is willing to send me some PCB photos of Kinect innards and can give remote access to a Linux machine with plugged-in Kinect, then I’ll start a development (can buy myself a Kinect, but it would take additional month).

Will give out any needed tips how to set up Kinect box.. so it will be unable to do any malicious stuff from machine, and will be easy to reboot/reset. It is not equipment-cost, if you have a spare Pentium-4 that can work 2-3 hours a day, it’s yours.

Currently have some exp. with Linux USB stack (actually, ALSA USB) and a custom hardware, also currently working with in-kernel network stuff.

@Antony Merquis:
We have similar ideas. You (and anyone interested in making this a community effort) should join my group:http://groups.google.com/group/openkinect
We already have several people sharing there.

User AlexP from the NUI Group forums (former EyeToy hacker) has posted a quick video depicting the Kinect connected to a PC running Windows 7 and delivering more or less the same level of functionality as when connected to an Xbox 360.

Microsoft has issued the following statement through a spokesperson: "Kinect for Xbox 360 has not been hacked–in any way–as the software and hardware that are part of Kinect for Xbox 360 have not been modified. What has happened is someone has created drivers that allow other devices to interface with the Kinect for Xbox 360. The creation of these drivers, and the use of Kinect for Xbox 360 with other devices, is unsupported. We strongly encourage customers to use Kinect for Xbox 360 with their Xbox 360 to get the best experience possible."

I can confirm the PrimeSense IS doing the depth mapping and handing it off at high frame rates, despite the opinion of #37 mark. We’d done research on other z options including stereo cams, time of flight sensors etc and the work that the PrimeSense silicon is doing offloaded a TON of crunching on the software side.

As to the inverse kinematics and skeleton stuff, and additional bells and whistles, that is clearly coming from the MS side, but I wonder how far the PrimeSense in the toy is from the reference design that goes along with PS’s sdk?

I did some work to reverse engineer the depth video frame format. Currently, I have the kinect connected through a USB analyzer and to the xbox. I then source the data from the analyzer in real-time and display it in a little QT app. You may need to install the QT libraries for your OS.

I wasn’t able to get github to accept my ssh key, so I posted the code on google code.

This shows how to parse the video format. If you want to run the code as is, you will need the Beagle USB 480 analyzer:

I wanted to invite you to join the OpenKinect mailing I started:http://groups.google.com/group/openkinect
We have over 100 members already who are interested in making and supporting a true open source community around Kinect. We’re not after the bounty but if we do happen to get it we plan to use it for community/charity rather than personal gain. We have USB experts, driver experts as well as people interested in using Kinect for robotics to interaction design.

I hope you’ll considering joining the group, sharing what you know, and taking advantage of the experts in the group. A real community effort will produce much better results than scattered individuals.

Now people are seeing the raw output of the Kinect, perhaps people can stop refering to the non-existant depth camera. The Kinect uses the crudest of light scattering techniques to do what is essentially outline extraction, combined with some extremely inaccurate (but sometimes useful) z-depth motion vectors. The tech from primesense looks like a gigantic con, compared to their claims.

Remember, for the year or so that Kinect was being designed, Primesense allowed publicists to make the laughable assertion that they were measuring the speed of their projected light, in order to create a z-depth map.

Given that the Kinect monochrome camera has a far lower resolution than the projected IR dot grid, it looks as if their so-called depth map is no more than the brightness and contrast adjusted image output from that camera. The mono camera can probably output images at 4 times the speed of the colour one, so maybe some kind of averaging of 2 or 4 consecutive frames is being done.

Here is a lesson guys- if it sounds too good to be true, it is. The same applied to Sony’s Move if you research the motion tracking method they thought they could use, versus the Wii identical system they ended up releasing when the fancy tech they had paid a fortune for turned out to be useless junk (or Sony’s hilarious attempt to use the ‘Cell’ CPU as their graphics chip, before they went crawling on their knees to Nvidia).

The faux depth camera on the Kinect reminds me of a ‘voice recognition’ peripheral I bought for my Sinclair Spectrum so many years ago. That device used zero crossing to measure a sequence of average frequencies, and hoped that the words targeted might have different enough ‘signatures’ for a statistical pattern match. Did it work? If you understand anything about engineering, you will already know the long answer to that question.

In the end, Microsoft wanted a Human outline, and some kind of indication of gross movement toward or away from the screen. Add a massive dose of assumption and imagination in the minds of the ‘player’ and there is your so-called next-generation motion control system.

In reality, Move, and Wii with the gyroscope addon, are infinitely better for accuracy and user selected input. Microsoft’s innovation is the body skeleton derived from outline data at a lower processing cost and a somewhat greater accuracy than say what Sony could do with the Eyetoy camera alone. However, I would expect improvements in visual processing methods to allow future use of the single colour webcam to replicate most of the tricks of Kinect.

The real advantage of the two camera system comes when the player wears clothes very similar in colour response to the background, but how likely is that problem? Everything else is just a matter of how cleverly you can image process, and how quickly. When the xbox + Kinect does the clever stuff (full skeleton), it is extremely laggy, so even Microsoft’s expensive hardware assist doesn’t do well on the quickly side of the equation.

I will say again that despite the disappointing reality of Kinect, open source docs and drivers are most welcome because every peripheral has some uses, even that sad little ‘voice recognition’ Spectrum device. Just ensure your expectations are based on a real understanding of how Kinect actually works, not the outrageous hyperbole of Microsoft and Primesense. Likewise, know that Kinect is really an advanced software package on the console, and without that software, your Kinect is pretty much the world’s most expensive console webcam.

Sounds great. I’m quite impressed that it was cracked this fast. Does this mean I can finally throw away my mouse? How about a really good, gesture-based interface for Linux? Preferably one that can distinguish the cat from me doing a “delete all” gesture…

I’d like to see a two KINECT system. The first KINECT would do the typical full body scanning in order to accurately locate the position of your hands. The position would be supplied to the second KINECT which would be “zoomed in” on your hands to perform very fine position tracking. This would allow for “fine grain” hand commands. A possible application is a virtual keyboard. The second KINETC would use simple geometry to position a larger than normal virtual keyboard as if it were floating in air in front of you. As your hand enters the plane of the virtual keyboard, a letter would be “selected” and highlighted on-screen, re-positioning of the hand while still within the plane would adjust selection of the key (similar to what smart phone keyboards do). Removing your hand from the plane of the keyboard would select a letter. This would be more pecking than typing, but should still be effective. A more sophisticated implementation would determine the position of your fingers as your hand enters the plane in order to “double check” the intended key. For example if your left hand enters the plane with the left pinky “extended” an “A” would be “verified”. If your hand enters the plane slightly elevated from the initial or “home” position, a “Q” would be verified, etc. So come on, let’s see it, so we can find out if there’s of any real world benefit!