complete guide to converting code from kinect sdk beta to the latest kinect sdk

Question 1

I found the following information from on this post:

All the credit for the information from here and on goes to the original poster of that article. I am simply sharing his knowledge

If you had been working with the beta 2 of the Kinect SDK prior to February 1st, you may have felt dismay at the number of API changes that were introduced in v1.

For getting the right and left joints the code you used to write

Joint jointRight = sd.Joints[JointID.HandRight];
Joint jointLeft = sd.Joints[JointID.HandLeft];

At first you need to create a skeleton so

Skeleton[] skeletons = new Skeleton[0];

and then you must go over the skeleton

 foreach (Skeleton skel in skeletons)

and then you get the joints using

Joint rightHand = skeleton.Joints[JointType.HandRight];
Joint leftHand = skeleton.Joints[JointType.HandLeft];

for camera elevation you used to write this

_nui.NuiCamera.ElevationAngle = 17;

now you simply use the sensor you created (explained below how it replaced the Runtime class) and you write

sensor.ElevationAngle = 17;

Manipulating Color Image frame this is what had to been written before

    rawImage.Source = e.ColorImageFrame.ToBitmapSource();

Now you have to open the colorimage frame and check if something is returned before doing the above. And converting to bitmap source has also changed. The transformation is like this

 using (var videoFrame = e.OpenColorImageFrame())
            {
                if (videoFrame != null)
                {
                    var bits = new byte[videoFrame.PixelDataLength];
                    videoFrame.CopyPixelDataTo(bits);
                }
            }

After porting several Kinect applications from the beta 2 to v1, however, I finally started to see a pattern to the changes. For the most part, it is simply a matter of replacing one set of boilerplate code for another set of boilerplate code. Any unique portions of the code can for the most part be left alone.

In this post, I want to demonstrate five simple code transformations that will ease your way from the beta 2 to the Kinect SDK v1. I’ll do it boilerplate fragment by boilerplate fragment.

Namespaces have been shifted around. Microsoft.Research.Kinect.Nui is now just Microsoft.Kinect. Fortunately Visual Studio makes resolving namespaces relatively easy, so we can just move on.

The Runtime type, the controller object for working with data streams from the Kinect, is now called a KinectSensor type. Grabbing an instance of it has also changed. You used to just new up an instance like this:

Runtime nui = new Runtime();

Now you instead grab an instance of the KinectSensor from a static array containing all the KinectSensors attached to your PC.

KinectSensor sensor = KinectSensor.KinectSensors[0];

Initializing a KinectSensor object to start reading the color stream, depth stream or skeleton stream has also changed. In the beta 2, the initialization procedure just didn’t look very .NET-y. In v1, this has been cleaned up dramatically. The beta 2 code for initializing a depth and skeleton stream looked like this:

_nui.SkeletonFrameReady += new EventHandler( _nui_SkeletonFrameReady ); _nui.DepthFrameReady += new EventHandler( _nui_DepthFrameReady ); _nui.Initialize(RuntimeOptions.UseDepth, RuntimeOptions.UseSkeletalTracking); _nui.DepthStream.Open(ImageStreamType.Depth , 2 , ImageResolution.Resolution320x240 , ImageType.DepthAndPlayerIndex);

In v1, this boilerplate code has been altered so the Initialize method goes away, roughly replaced by a Start method. The Open methods on the streams, in turn, have been replaced by Enable. The DepthAndPlayerIndex data is made available simply by having the skeleton stream enabled. Also note that the event argument types for the depth and color streams are now different. Here is the same code in v1:

sensor.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        sensor_SkeletonFrameReady
        );
sensor.DepthFrameReady += 
    new EventHandler<DepthImageFrameReadyEventArgs>(
        sensor_DepthFrameReady
        );
sensor.SkeletonStream.Enable();
sensor.DepthStream.Enable(
    DepthImageFormat.Resolution320x240Fps30
    );
sensor.Start();

Transform Smoothing: it used to be really easy to smooth out the skeleton stream in beta 2. You simply turned it on.

nui.SkeletonStream.TransformSmooth = true;

In v1, you have to create a new TransformSmoothParameters object and pass it to the skeleton stream’s enable property. Unlike the beta 2, you also have to initialize the values yourself since they all default to zero.

sensor.SkeletonStream.Enable(
    new TransformSmoothParameters() 
    {   Correction = 0.5f
    , JitterRadius = 0.05f
    , MaxDeviationRadius = 0.04f
    , Smoothing = 0.5f });

Stream event handling: handling the ready events from the depth stream, the video stream and the skeleton stream also used to be much easier. Here’s how you handled the DepthFrameReady event in beta 2 (skeleton and video followed the same pattern):

void _nui_DepthFrameReady(object sender , ImageFrameReadyEventArgs e) { var frame = e.ImageFrame; var planarImage = frame.Image; var bits = planarImage.Bits; // your code goes here }

For performance reasons, the newer v1 code looks very different and the underlying C++ API leaks through a bit. In v1, we are required to open the image frame and check to make sure something was returned. Additionally, we create our own array of bytes (for the depth stream this has become an array of shorts) and populate it from the frame object. The PlanarImage type which you may have gotten cozy with in beta 2 has disappeared altogether. Also note the using keyword to dispose of the ImageFrame object. The transliteration of the code above now looks like this:

void sensor_DepthFrameReady(object sender
    , DepthImageFrameReadyEventArgs e)
{
    using (var depthFrame = e.OpenDepthImageFrame())
    {
        if (depthFrame != null)
        {
            var bits =
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(bits);
            // your code goes here
        }
    }
}

I have noticed that many sites and libraries that were using the Kinect SDK beta 2 still have not been ported to Kinect SDK v1. I certainly understand the hesitation given how much the API seems to have changed.

If you follow these five simple translation rules, however, you’ll be able to convert approximately 80% of your code very quickly.

Question 2

With the latest SDK your SkeletonFrameReady callback should look something like this:

private Skeleton[] _skeletons = new Skeleton[0];

private void OnSkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame())
    {
        if (skeletonFrame == null || skeletonFrame.SkeletonArrayLength == 0)
            return;

        // resize the skeletons array if needed
        if (_skeletons.Length != skeletonFrame.SkeletonArrayLength)
            _skeletons = new Skeleton[skeletonFrame.SkeletonArrayLength];

        // get the skeleton data
        skeletonFrame.CopySkeletonDataTo(_skeletons);

        foreach (var skeleton in _skeletons)
        {
            // skip the skeleton if it is not being tracked
            if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
                continue;

            leftElbow = skeleton.Joints[JointType.ElbowLeft];
            rightHand = skeleton.Joints[JointType.HandRight];
        }
    }
}

Notice that SkeletonData and JointID no longer exists. You get a collection of Skeleton objects, each with a Joints array. You can pull individual joints out using the JointType enum.

JointCollections are returned for each Skeleton and can be accessed by calling Skeleton.Joints. You can reference the array for an individual joint, or save the JointCollection off for some other processing.

Scaling is not specific to the SDK. When scaling, you are taking a real-world coordinate from the Kinect and maping it onto the screen. How you get those real-world coordinates might be slightly different (i.e., how you access the skeletons) but the scaling itself is no different. There is no internal function to scale an individual joint like myJoint.ScaleTo().

The Coding4Fun Library has a scaling function that will allow you to scale joints positions to screen pixels. Alternatively you can write your own to match a specific need, such as:

private static double ScaleY(Joint joint)
{
    double y = ((SystemParameters.PrimaryScreenHeight / 0.4) * -joint.Position.Y) + (SystemParameters.PrimaryScreenHeight / 2);
    return y;
}

private static void ScaleXY(Joint shoulderCenter, bool rightHand, Joint joint, out int scaledX, out int scaledY)
{
    double screenWidth = SystemParameters.PrimaryScreenWidth;

    double x = 0;
    double y = ScaleY(joint);

    // if rightHand then place shouldCenter on left of screen
    // else place shouldCenter on right of screen
    if (rightHand)
    {
        x = (joint.Position.X - shoulderCenter.Position.X) * screenWidth * 2;
    }
    else
    {
        x = screenWidth - ((shoulderCenter.Position.X - joint.Position.X) * (screenWidth * 2));
    }


    if (x < 0)
    {
        x = 0;
    }
    else if (x > screenWidth - 5)
    {
        x = screenWidth - 5;
    }

    if (y < 0)
    {
        y = 0;
    }

    scaledX = (int)x;
    scaledY = (int)y;
}

Or something like this:

double xScaled = (rightHand.Position.X - leftShoulder.Position.X) / ((rightShoulder.Position.X - leftShoulder.Position.X) * 2) * SystemParameters.PrimaryScreenWidth;
double yScaled = (rightHand.Position.Y - head.Position.Y) / (rightHip.Position.Y - head.Position.Y) * SystemParameters.PrimaryScreenHeight;

For scaling, all you are doing is defining where in the real-word (ala: the Kinect coordinates) equals the left, right, top and bottom of your screen. You are just telling your application that "this Kinect coordinate is equal to this screen pixel".

IS SCALING NEEDED?

Some sort of scaling is required in order to interact with objects on the screen. The Kinect returns values in meters, relative to its field of view. It would not be a usable system without scaling.

Remember that scaling is nothing unique to the Kinect or to the old vs. new SDK. You have one coordinate system that you are working in, and another coordinate system you need to translate to. Happens in lots of different situations. What you are doing is saying that "this" position in one coordinate system is equal to "that" position the other coordinate system.

There are two basic ways to decide what position in the real world is equal a pixel.

One is to take the Kinect's coordinate system and just map it to the screen. This means that 0,0 in the Kinect is equal to 0,0 on the screen. You then take the outer bounds of the Kinect's system and map them to the screens resolution.

I do not recommend this. It creates a very large space to work in and will frustrate users.

Another way is to create a "hit box". Have a look at the two line translation I do above. This creates a hit box around the body to work in. Using the right hand, the left side of the screen is equal to the x-cord of your left shoulder; the right side of the screen is a short distance to the right of your right shoulder (it is your right shoulder's x-coord plus the distance between your two shoulders). The vertical position of the screen is mapped between your head and hips.

This method allows the user to stand anywhere in the Kinect's field of view and manipulate objects in the same way. The hit box it creates is also very comfortable to work in for your average user.