Avateering with Kinect V2 – Joint Orientations

For my own learning I wanted to understand the process of using the Kinect V2 to drive the real-time movement of a character made in 3D modelling software. This post is the first part of that learning which is taking the joint orientations data provided by the Kinect SDK and using that to position and rotate ‘bones’ which I will represent by rendering cubes since this is a very simple way to visualise the data. (I won’t cover smoothing the data or modelling/rigging in this post). So the result should be something similar to the Kinect Evolution Block Man demo which can be discovered using the Kinect SDK browser.

To follow this along you would need a working Kinect V2 sensor with USB adapter, a fairly high-specced machine running Windows 8.0/8.1 with USB3 and a DirectX11-compatible GPU and also the Kinect V2 SDK installed. Here are some instructions for setting up your environment.

To back up a little there are two main ways to represent body data from the Kinect; the first being to use the absolute positions provided by the SDK which are values in 3D Camera-space which are measured in metres, the other is to use the joint orientation data to rotate a hierarchy of bones. The latter is the one we will look at here. Now, there is an advantage in using joint orientations and that is, as long as your model has the same overall skeleton structure as the Kinect data then it doesn’t matter so much what the relative sizes of the bones are which frees up the modelling constraints. The SDK has done the job of calculating the rotations from the absolute joint positions for us so let’s explore how we can apply those orientations in code.

Code

I am going to program this by starting with the DirectX and XAML C++ template in Visual Studio which provides a basic DirectX 11 environment, with XAML integration, basic shaders and a cube model described in code.

Body Data

Let’s start by getting the body data into our program from the sensor. As always we start with getting a KinectSensor object which I will initialise in the Sample3DSceneRenderer class constructor, then we open a BodyFrameReader on the BodyFrameSource, for which there is a handy property on the KinectSensor object. We hold the sensor object and the reader object as class variables as we don’t want those to fall out of scope. Additionally, we need to create a vector of type Body to store the data supplied by the sensor. Once we have the opened reader object we can use it to pull the latest frame of body data from within out render loop. I’m not modifying the structure of the project template so I am using Sample3DSceneRenderer class and inserting my code into the Render function. So to initialise:

_sensor=KinectSensor::GetDefault();

_reader=_sensor->BodyFrameSource->OpenReader();

_bodies=refnewVector<Body^>(_sensor->BodyFrameSource->BodyCount);

_sensor->Open();

and from within the Render function:

{

autobodyFrame=_reader->AcquireLatestFrame();

if(bodyFrame!=nullptr)

{

bodyFrame->GetAndRefreshBodyData(_bodies);

updated=true;

}

}

Note the use of scoping to ensure that the body frame gets closed as soon as possible. Then we can write a loop like this to process the body data:

for(autobody:_bodies)

{

if((Body^)body==nullptr||!body->IsTracked)

continue;

// do stuff here…

}

I read through quite a few discussions on the Kinect SDK forums here but I didn’t find anything that I felt provided a clear description of how you could use the joint orientations. The best source was the code for the Block Man demo since it worked well but often key concepts and assumptions can get hidden inside working code so I set about to clear that up in my mind. I felt that I needed a few additional things to help me explore the scenario: an orbit camera to allow orbiting and zooming in a scene, a floor plane grid and some positional markers. I find it really helpful to be able to explore a 3D scene from different angles and also to be able to draw markers at key locations.

Kinect Joint Hierarchy

The first subject to consider is how the Kinect joint hierarchy is constructed as it is not made explicit in the SDK. Each joint is identified by one of the following enum values:

Starting with the SpineBase which can be considered as the root of the hierarchy we end up with something like this:

Which corresponds to the following skeleton:

I added some utility code to the project to represent this hierarchy – two functions; CreateBoneHierarchy and TraverseBoneHierarchy. The first creates an in-memory representation of the parent-child relationships between the joints and the second does a depth-first traversal of the hierarchy allowing a function/lambda to be called as each node is traversed. I will use the traversal method to draw and transform each bone in the skeleton.

Bones

To draw each separate bone I modified the original cube model that was supplied with the default project template. I modified the coordinates of the original cube so that one end was at the origin and the other was 4 units in the y-direction; so when rendered without an additional transform it looks like the orange cube below and when rotated 90 degrees looks like the dark blue cube. The point being that the model is not centred on the origin and so won’t rotate about its centre but about its end.

QuaTernions

I’m not going to delve into quaternions here suffice to say that they are a way to describe an orientation in 3d space and are used to avoid gimbal-lock related problems which arise from using Euler angles for rotation. They provide a great way to store and animate rotations but ultimately are converted back to matrix form and your graphics programming environment most-likely provides functions to do this. See this for further information.

Transforming and Rendering

As we traverse the joint hierarchy we need to position our local origin at the end of our parent bone before we apply our local model matrix which will in turn apply the joint orientation rotations and also scale the cube according to which bone it represents. To illustrate this let’s look at the first three bones drawn – note that I also draw a marker at each local origin.

Here are the main steps in the render function in pseudocode:

// Lookup joint orientation data

orientation = body->JointOrientations->Lookup(jointType)

// create rotation matrix

rotationMatrix = matrixFromQuaternion(orientation)

// get our local origin

origin = parent->transformed()

// Draw marker transformed to local origin

DrawMarker(origin)

// create model matrix

model = scale * translate * rotate

// transform position of child local orign and store it

transformed = model * endOfBone

// Draw bone

DrawBone(model)

In the actual code this is complicated a little by the following:

– The leaf joint orientations are set to zero so the leaf bones just take their parent orientations – this is the same in the Block Man implementation.

– If we are at the root we need to transform to the absolute position of the joint (this will position the whole body in camera space)

Here is the code I used for drawing each bone:

// Lookup the joint orientation for this joint

t->_orientation=body->JointOrientations->Lookup(t->JointType());

// if orientation is zero use parent orientation. (Some of the leaf joint orientations

// are zero)

JointOrientationorientation=t->_orientation;

autov4=XMFLOAT4(t->_orientation.Orientation.X,

t->_orientation.Orientation.Y,

t->_orientation.Orientation.Z,

t->_orientation.Orientation.W);

autoparent=t->Parent();

if(XMVector4Equal(XMLoadFloat4(&v4),XMVectorZero())&&parent!=nullptr)

{

orientation=parent->_orientation;

}

// Create a rotation matrix from the orientation quaternion. If we are at the root start with a transform

// to take us to the absolute position of the whole body. If we are not at the root start with the

// parent's transform.

autof4=XMFLOAT4(orientation.Orientation.X,orientation.Orientation.Y,

orientation.Orientation.Z,orientation.Orientation.W);

autorotMatrix=XMMatrixRotationQuaternion(XMLoadFloat4(&f4));

if(parent!=nullptr)

{

transformed=parent->_transformed;

}

else

{

// We are at the root so transform to the absolute position (this transform will affect all bones in

// the hierarchy)

autopos=body->Joints->Lookup(t->JointType()).Position;

autov3=XMFLOAT3(FACTOR*pos.X,FACTOR*pos.Y,FACTOR*pos.Z);

transformed=XMLoadFloat3(&v3);

}

// Convert the vector into a transform matrix and store into the model matrix

6 thoughts on “Avateering with Kinect V2 – Joint Orientations”

I haven’t been able to capture nods no and yes with KinectV2fW — it seems the sensor just doesn’t detect rotation of the head joint around the Y axis, nor tilting forward of the neck bone unless it’s very exaggerated. Can you suggest a way that these conversational gestures could be avateered?

I use kinect v1 for avateering, but I find the bone orientations calculated by the sdk only guarantee the y direction of each bone ( cause y direction is the bone’s direction) and the x and y direction is not accurate (will have some kind of undesired rotation). Such situation will cause serious artifacts in the skinning results, Does Kinect v2 have the same problem (for example. when you just raise your arm with no rotation, the avatar being animated will raise its arm and also rotate its arm)?