Science and technology, served light and fluffy.

For the last several months, Doug, a colleague of mine, and I have been working on another Kinect project, a follow-up to the awesomeness that was the target-tracking system for the missile launcher. When I have a lull in billable work at Biggs, I pull out the Kinect and plug away at it.

My desk has a curve to it that effective gives it a 90 degree angle at one end. My computer is at one end, and I normally set the Kinect up behind me. When I’m ready to try something, I’ll turn around, slide my chair out of the way, and start waving at the Kinect (which in and of itself looks like I’m either trying to conduct an orchestra without a baton, or convince the Kinect to do what I want using Jedi mind tricks). At any rate, the key here is that when I’m working on the computer, the Kinect is basically pointed at my back.

Yesterday, another colleague of mine, Matt, came up and said, "You know, you gotta be careful otherwise that thing is going to suck you into the computer, Tron-style." I burst out laughing.

I doubt a $140 piece of hardware is going to be capable of digitizing me. However, I do love working with the Kinect, so in a way, it’s already sucked me in.

My latest side project involving the Kinect started to get a bit hairy. The logic for what we were trying to do was at least an order of magnitude greater than the Target Tracking system my colleagues and I built last year. It functioned, but it was getting exponentially more difficult to add features to it, let alone debug it.

So, suffering from a lull in my regular project work over the holiday break, I decided to start building some unit tests for it. If nothing else, having a solid test suite would allow me to regression-test the application whenever I monkeyed with the code, and THAT would enable some good-sized refactorings that were long overdue. My first task, then, was to figure out how to mock out the data coming off the Kinect. My first task quickly hit a wall.

The application uses the SkeletonData object available in the SkeletonFrameReady event. My original event handler looked something like this:

The UpdatePositions() method would handle moving the objects around based on the new positions of the skeletons/joints, and that was the primary method I wanted to test. I figured if I could create my own SkeletonData object, and pass that into UpdatePositions, I could test any scenario I wanted. Unfortunately, the SkeletonData class is sealed, and there aren’t any public constructors on it. So, I went the route of writing my own version of SkeletonData – one that I could create objects from, and would effectively function the same as SkeletonData:

When the class is instantiated, the Joints collection is also instantiated with a "blank" Joint object for every joint defined by the Kinect (the complete list is defined by the Microsoft.Research.Kinect.Nui.JointID enumeration). Then, the UpdateJoint method is called to overwrite those blank joints with the real values. I also use this method in the unit tests to precisely place the joints I was interested in, just before running a given test.

I thought I would end up needing to mock out portions of the class, so I created an interface for it as well:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Research.Kinect.Nui;
public interface ISkeletonData
{
}

As it turns out, I didn’t need to mock anything out – I can just create SkeletonDataAbstraction classes, and pass them directly into UpdatePositions. I decided to keep the interface around, just in case I later found something that required a mock.

I also needed to be able to construct a JointsCollection object (what the SkeletonData.Joints property is defined as), but that was also marked sealed with no public constructors. So, I created a JointsCollectionAbstraction object for it:

That worked like a charm. With each SkeletonFrameReady event-raise, I copy the key pieces of information from the Kinect over to my own structures, and use those from that point on. Now the task of writing tests around this could begin in earnest. I wrote a "CreateSkeleton" method for my unit tests that would encapsulate setting one of these up:

In Part 1 of this series, I went through the prerequisites for getting the Kinect/Foam-Missile Launcher mashup running. In Part 2, I walked through the core logic for turning the Kinect into a target-tracking system, but I ended it talking about some major performance issues. In particular, commands to the launcher would block updates to the UI, which meant the video and depth feeds were very jerky.

In this third and final part of the series, I’ll show you the multi-threading scheme that solved this problem. I’ll also show you the speech recognition components that allowed the target to say the word "Fire" to actually get a missile to launch.

What did you say?

We had tried to implement the speech recognition feature by following the "Audio Fundamentals" tutorial. That code looked like it SHOULD work, but there a couple of differences between the tutorial app and ours: the tutorial’s was a C# console application, while ours was a VB WPF application. As it turns out, those two differences made ALL the difference.

For the demo, Dan (the host) mentions the need for the MTAThread() attribute on the Main() routine in his console app. Since our solution up to this point was VB, it looked like we would need this. I tried adding that to every place that didn’t generate a compile error, but nothing worked – the application kept throwing this exception when it fired up:

Unable to cast COM object of type ‘System.__ComObject’ to interface type ‘Microsoft.Research.Kinect.Audio.IMediaObject’. This operation failed because the QueryInterface call on the COM component for the interface with IID ‘{D8AD0F58-5494-4102-97C5-EC798E59BCF4}’ failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).

I decided to try a different tack. I wrote a C# console app, and copied all of Dan’s code into it (removing the Using statements and initializing the variables manually to avoid scoping issues). That worked right out of the gate. Since we were very short on time (this was two days from the demo at this point) I decided to port our application to C#, then incorporated the speech recognition pieces.

First, the "setup" logic was wrapped into a method called "ConfigureAudioRecognition" (I pretty much copied this right from the tutorial). That method was invoked in the Main window’s Loaded event, on its own thread. In addition to initializing the objects and defining the one-word grammar ("Fire"), this adds an event handler for the recognizer engine’s SpeechRecognized event:

The command to launch a missile is only given if the Launcher object is defined, the app is in "auto-track" mode, and the confidence level of the recognition engine is greater than 95%. This last check is an amusing one. Before I included this check, I would read a sentence that happened to contain some word with the letter "f", like "if", and the missile would launch. Inspecting the Confidence property, I found that this only had a value in the 20-30% range. When I said "Fire", this value as 96-98%. The confidence check helps tremendously, but it’s still not perfect. Words like "fine" can fool it. It’s much better than having it fire with every "f", though.

Take a number

Doug, Joshua, and I discussed some solutions to the UI updates earlier in the week, and the most promising one looked like using BackgroundWorker (BW) to send a command to the launcher asynchronously. That was relatively easy to drop into the solution, but I almost immediately hit another problem. The launcher was getting commands sent to it much more frequently than my single BW could handle it, and I started getting runtime exceptions to the effect of "process is busy, go away". I found an IsBusy property on the process that I could check to see if it had returned yet, but that meant that I would have to wait for it to come back before I could send it another command – basically the original blocking issue, but one step removed.

I briefly toyed with the idea of spawning a new thread with every command, but because they were all asynchronous there was no way to guarantee that they would be completed in the order I generated them in. Left-left-fire-right looks a lot different than fire-right-left-left. What I really needed was a way to stack up the requests, and force them to be executed synchronously. What I found was an unbelievably perfect solution from Matt Valerio with his post titled "A Queued BackgroundWorker Using Generic Delegates". As the title suggests, he wrote a class called “QueuedBackgroundWorker” that would add another BW to a queue, and then pop them off and process them in order. This was EXACTLY what I needed. This was also the most mind-blowing use of lambda expressions I’ve ever seen: you pass entire functions to run as the elements on the queue which get executed when that element is popped off the queue.

I added a small class called "CannonVector" that would roll up a direction (up, down, left, or right) and a number of steps. Then, I created two methods – FireCannon() and MoveCannon() that would now wrap my calls to the launcher methods that Matt Ellis wrote (see Part 2 of this series):

In Part 1 of this series I laid out the prerequisites. Now we’ll get into how to turn the Kinect into a tracking system for the cannon.

Manual Targeting

As I mentioned in Part 1, one of the pieces to this puzzle was already written for us – a .NET layer around the launcher. This layer was provided by Chris Smith in his Being an Evil Genius with F# and .NET post. He links to this source code at the very end of the post, and included several projects. We ended up using the RocketLib\RocketLauncher_v0.5.csproj project.

So, now we had a class that we could give commands to the launcher such as

Me._Launcher.MoveLeft(5)Me._Launcher.MoveDown(10)Me._Launcher.Fire()

Where “Me._Launcher” was an object of type RocketLib.RocketLauncher. The numbers being passed to the “Move” commands are the number of times to move the launcher turrent. The unit of “time” or “step” (as we came to refer to it) seemed to translate into a little less than half a degree of rotation (either left/right or up/down).

Armed with this knowledge (see what I did there?), we were able to whip together a little WPF interface that had five buttons on it – Up, Down, Left, Right, and Fire – that controlled the launcher manually. That became the “Manual” mode. The “Auto-track” mode, where the Kinect would control the launcher, would come next.

Auto-Targeting

Now we started going through the Kinect SDK Quickstart video tutorials, produced by Microsoft and hosted by Dan Fernandez. To begin, we wanted to get to the raw position data (X, Y, and Z) from the camera. We ended up compressing the first four tutorials (“Installing and Using the Kinect Sensor”, “Setting up the Development Environment”, “Skeletal Tracking”, and “Camera Fundamentals”) into a Friday to get ramped up as quickly as possible.

In “Skeletal Tracking Fundamentals”, Dan explains that the Kinect tracks skeletons, not entire bodies. Each skeleton has 20 different joints, such as palms, elbow, head, shoulders, etc. We decided to select the “ShoulderCenter” joint as our target.

Next, we added labels for the X, Y, and Z positions of the ShoulderCenter joint to the app, and then started moving around the room in front of the Kinect, seeing how the values changed. The values are given in meters, with X and Y being 0 when you’re directly in front of the depth camera. These values are updated in the SkeletonFrameReady event.

Now, the fun could really begin. We decided to focus on left/right movement of our target, so the Y value is not used in the app at all.

We also decided that since the launcher had a real physical limitation as to how fast it could move, we couldn’t give it too many commands at a time. The Kinect sends data 30 times a second, so we decided to sample the data twice a second (every 15 frames).

Our first attempt at this was very complicated and clunky, and didn’t work well unless you were at a magical distance from the Kinect (basically we threw enough magic numbers into the equation until it worked for that one distance). We really ran into problems when we tried to extend that to work for any depth.

It was Doug that hit upon the idea of calculating the angle to turn the launcher as the arc tangent of X/Z as opposed to what we had been doing (the number of steps). That did two things for us – first, the angle approach was correctly taking the depth information (Z measurement) into account, and second, it meant we only had to store the last known position of the launcher (measured as a number of steps, either positive or negative, with 0 being straight ahead). If we knew the last position, and we knew where we had to move to, we could swivel the launcher accordingly.

With this logic in place, the tracking became fairly good, regardless of the distance between the target and the Kinect.

Assumptions Uncovered

Since there really wasn’t any feedback that the launcher could give us about it’s current position, this logic make a couple of major assumptions about the world. First, the Kinect and the launcher have to be pointed straight ahead to begin with, and second, the Kinect needs to remain pointing ahead.

We uncovered the first assumption when the launcher stopped responding to commands to move right. We could move it to the left, but not to the right. We fired up the application that comes with it, and discovered a “Reset” button that caused the launcher to swivel all the way to one side, then to a “center” point. This center point was actually denoted by a raised arrow on the launcher’s base – something I had not seen up to this point. After we reset it, it would move left and right just fine. As it turns out, the launcher can’t move 360 degrees indefinitely – it has definite bounds. The reset function moved it back to center to maximize the left/right motion.

After we discovered that, I would jump out to that app to reset the launcher, and then I had to shut it down again before I could use ours (two apps couldn’t send commands to the launcher – in fact we got runtime errors if we tried to run both apps at the same time). After a while that got old, so we included a reset of our own. Since we knew the launcher’s current position, we’d just move in the opposite direction that amount. We added a Reset button to our own app, and also called the same method when the app was put back to Manual tracking and when it was shut down.

We uncovered the second assumption in a rather amusing way. During one of our tests we noticed the cannon was constantly aiming off to Doug’s (our target at the time) right. He could move left or right, but the launcher was always off. He happened to look up and noticed that the Kinect had been bumped, so it wasn’t pointing directly ahead any more. As a result, the camera was looking off to one side and all of its commands were off. After that, we were much more careful about checking the Kinect’s alignment, and not bumping it.

Some fun to be had

Early on we had thought up a “fun” piece of icing on this electronic cake. What if we took the video image from the camera, and superimposed crosshairs on it? We could literally float an image with a transparent background over the image control on the form. If we could get the scaling right, it could track on top of the user’s ShoulderCenter joint.

And we did. This is turned on using the “Just for Mike” button at the bottom of the app. During the agency meeting demo, I had walked through the basic tracking, using Mike (our President) as the target, and explained about the video and depth images. Then – very dramatically – I “noticed” the screen and turned to Doug (who was running the computer) – “uh, Doug? I think we’re missing something.” At which point he hit the button to add the cross hairs to the video image. “There we go! That’s better.” Mike got a good laugh out of it, as did most of the rest of the audience. Fun? Check!

Beyond the fun, though, I thought it was cool that we could merge the video and depth information to such great effect. Between having the launcher track you, and seeing the cross hairs on your chest – it’s downright eerie.

Performance Issues

So, by this point, we had launcher tracking, both video and depth images refreshing 30 times a second, and crosshairs.

And everything was running on the same thread.

Yeah. We now had some performance issues to solve.

When the launcher moved at all, and especially when it fired (which took 2-3 seconds to power up and release), the images would completely freeze, waiting for the launcher to complete. The easy solution? Duh! Just put the launcher and the image updates on their own threads. Um, yeah. That turned out to be easier said than done. We’ll cover the multi-threading solution, as well as the speech recognition features in Part 3. Those two topics turn out to be intertwined.

On some Friday back in late June, he mentioned that Microsoft had released a Beta SDK for the Kinect just the week before. He asserted “We need to do something with it. I have a Kinect I can bring in.” By “do something with it” he meant in the Friday lunch sessions we’d been holding for a year called “Sandbox”. Sandbox was where a small group of us got together to work with something we didn’t normally get to use during our day jobs. We tried to keep it light and fluffy, and the Kinect fit both to a T.

We decided that the Kinect would feed commands to the launcher telling it where to aim, and we’d use the speech recognition abilities of the Kinect to let the person in the cross-hairs say the word “fire” to send off a missile. And so began a series of Fridays where we hacked together an app that turned the Kinect into a target-tracking system for the launcher. We thought we had the hard stuff already done for us – we simply needed to write something that would connect A to B. But, as any good project should be, we found the easy parts weren’t so easy, and were pushed and prodded into learning something new. This is the first of a three-part series describing our solution.

Before we dive into code, I want to call out the software, frameworks, and SDKs that were ultimately needed for this project. Some of these were called out by the quickstart tutorials, and the rest were discovered along the way. These were installed in this order:

First, our laptop started with Visual Studio 2010 Professional, but in the quickstart tutorials, Dan (Fernandez) mentions that he’s working with the Express version.

The sample application that comes with the launcher. This includes the drivers for the launcher itself, USBHID.dll. The .NET wrapper provided by Matt Ellis will poke into the OpenHID method to send commands to the launcher.

The DirectX End-User Runtime. This is required by the DirectX SDK, and is available from here. This installer will need to be run as an Administrator (on Windows 7, anyway), and I had to do a manual restart of the machine after it finished. The installer will not prompt you to do this, but the DirectX SDK (the next step) wouldn’t install correctly until I did.

The Coding4Fun Kinect Library available from http://c4fkinect.codeplex.com. This is not strictly required, but contains a couple of extension methods that simplify translating the Kinect camera data into images.