Sunday, March 13, 2011

Kinect Color - Detph Camera Calibration

Kinect has two cameras, one for capturing a color image and the other for capturing an IR image. Although real-time depth information is provided by IR camera, the depth map tells how far the IR camera's pixels are and we actually do not know the depth information of the color image because the two cameras have different characteristics. As we can see in the image below, the pixels do not match in the two images. The locations of the hand and arm are completely different in the two images.

If we use the Kinect device for HCI, it does not matter much because using depth information is enough in most cases. However, if we'd like to use it for 3D scene capture or to want to relate the RGB and depth images, we need to match the color image's pixels to the depth image's. Thus, we need to perform calibration.

Kinect camera calibration is not different from the general camera calibration. We just need to capture images of a chessboard pattern from IR and RGB cameras. We need to capture several images of a chessboard pattern. When capturing images from the IR camera, we need to block the emitter with something for good corner detection in chessboard images. If not, the captured images will look like below and corner detection will fail.

If the lightings in your environment does not have enough IR rays, you need a light source that emits IR rays (Halogen lamp ?). It might be good to capture the same scenes with two cameras. The below images show two images captured from the IR and RGB cameras, respectively.

Once images are taken, then we can perform calibration w.r.t each camera by using OpenCV API, MATLAB calibration toolbox, or GML calibration toolbox. After calibration, the intrinsic camera matrices, K_ir and K_rgb, and distortion parameters of the two cameras are obtained.

To achieve our goal, we need one more information, the geometrical relationship between the two cameras expressed as a rotation matrix R and a translation vector t. To compute them, capture the same scene containing the chessboard pattern with the two cameras and compute extrinsic parameters. From two extrinsic parameters, the relative transformation can be computed easily.

Now, we can compute the depth of the color image from the depth image provided by the IR camera. Let's consider a pixel p_ir in the IR image. The 3D point P_ir corresponding to the p_ir can be computed by back-projecting p_ir in the IR camera's coordinate system.

P_ir = inv(K) * p_ir

P_ir can be transformed to the RGB camera's coordinate system through relative transformation R and t.

P_rgb = R * P_ir + t

Then, we project P_rgb onto the RGB camera image and we obtain a 2D point p_rgb.

P_ir : 3D point in the IR camera's coordinate system
R, t : Relative transformation between two cameras
P_rgb : 3D point in the RGB camera's coordinate system
p_rgb : The projection of P_rgb onto the RGB image

In the above, conversion to homogeneous coordinates are omitted. When two or more 3D points are projected to the same 2D location in the RGB image, the closest one is chosen. We can also compute the color values of the depth map pixels in the same way. p_ir's color corresponds to the color of p_rgb.

Here is the resulting depth image of the RGB camera. Since the RGB camera sees wider region than the IR camera, not all pixels' depth information are available.

If we ovary the RGB image and the computed depth image, we can see that the two match well, while they do not before calibration as shown at the beginning of this post.

Here is a demo video showing the depth map of the RGB image and the color map of the depth image.

57 comments:

Can you tell me if the resolution of the rgb image and the depth map have to be the same for the registration process? As I read in other sources the rgb cam has a 640x480 px res and the res of the IR cam is downsampled!

Well, usually distortion parameters are used to remove lens distortions from captured images.

The relative transmation can be computed from the chessboard pattern.

Assume you capture two images of the same scene (each from RGB, IR cameras), and their extrinsic parameters are M_rgb and M_ir. Then the relative transformation that maps a 3D point from the IR camera's coordinate frame to the RGB camera's coordinate frame will be :

Hi, thanks for your work! can you explain better this point of discussion:"To compute them, capture the same scene containing the chessboard pattern with the two cameras and compute extrinsic parameters. From two extrinsic parameters, the relative transformation can be computed easily." Many thanks

thanks for the howto. your video shows perfect coverage...i want to have that too :)could you provide a code snippet where you compute the matrices and the corresponding point? (also, which are your ex and intrinsic matrices?) I think i have done everything like described in your article, and i get values but unfortunately they don't make any sense.

hello Daniel, I am also working on this part and it is the final step in my project. Have you solved your problem? Because I also did everything but get a bad result now. Would you please give me some suggestion on my code below?

After this, I tried to show the final depth image, but the result is not good. The depth image I get can not match well with the original one. Would you please take a few seconds to see whether my code is good or not. Thank you.

And for the better result, I used the ideal data for K_rgb, K_ir, R and t. I think the circle in my code can correctly complete the 3 steps in your blog, but still I get a bad depth image. Is there any point that I deal in a wrong way? I' m looking forward to your reply. Thank you.

@ Jintao Lu / Hello. Judging from your codes, it seems that you are loading depth values from the depth image. However, the depth values in the image (intensities) is not real depth values (in meters or centimeters). Thus, using the intensities as depth for back-projecting pixels to 3D points may result in wrong 3D point clouds. You need to compute the real depth values from the intensities.

wow! Thank you for your detail explanation after 1 year's time of publishing of this blog! I have checked everything with my guider and finally we are focusing on the p_ir. I know something is wrong with this 3×1 matrix but I have no idea with it. But now, thank you again! I will try now!

@ Jintao Lu / You may get some idea about depth-intensity calibration from the paper 'Depth-assisted 3D object detection for augmented reality', which can be found here : http://sites.google.com/site/wleeprofile/research/detection_rgbd .

Hello, thank you for your explanation and I have successfully invert intensities to actual distance.Now I have a question about the Z axis value of P_rgb, is it also an actual distance? Because this value is similar to the actual distance of depth image.If yes, I wonder whether I need to invert again, from actual distance to intensity, to get the final depth image of rgb camera.

@Jintao Lu / Hello. It seems that you are missing something. If you get 3D point data from an IR camera image, What you need to do in the next step is mapping the 3D points from the coordinate system of the IR camera to the one of the RGB camera. Finally, by projecting the transformed points onto the RGB image, you will be able to compute the depth of pixels of the RGB image.

Hi Wonwoo... I tried to use your formulae. But have not been able to successfully map the depth and rgb image. It would be really helpful, if you can provide some source code which can be used to map between depth and rgb image of Kinect..

ok and how did you do that? (could you give me a little step-by-step plz?)

i haven't find an option to say 'this is camera 1 and this 2'have you loaded the pics from 1 and calibrated then form 2 and calibrated again? oder alltogether and calibrated?(i tried.. get no results, only in the combined is a strange extrinsic matrix for every image i guess but the rotation and translation matrix is empty)

the problem are the extrinsic parameters..i don't know how to get the translation and rotation matrix via GML.i have pictures of camera1 and camera 2 but how can i tell GML which is which etc. it alwas looks like GML is only made for a single camera.

Hi,I'm trying to test the calibration of my Kinect with the GML tool. What I do is repeat the calibration of the RGB cam with different images, taken from different positions. I'm obtaining different results of camera matrix and distortion.. is it correct? They shouldn't be equal? I don't understand if I'm making some mistake...Thanks

I'm trying to calibrate both Kinect depth camera and colour camera. The problem is I'm not sure of the image acquisition for depth camera. Block the IR emitter and run following code still?capture.retrieve( depthMap, CAP_OPENNI_DEPTH_MAP );By doing blocking and so, the depth map is totally black. I think I misunderstand your idea.Could you help me to make it clear?

Thanks for your great work. Recently, I do a work about calibrating Kinect color-depth cameras. I have followed steps you wrote, unfortunately, I got a weird result. The result depth map in the RGB camera coordinate system is divided into two parts, the left half and the right half,and two parts are same. I have checked my code again and again, but I don't find where is wrong. Have you ever met such problem? I look forward to your reply, thanks again.

There is one question: how do you make sure that the calculated parameters p_rgb(x,y) do not go beyond the range[480,640]? I find there are some x axis p_rgb_x are greater than 480, so the operation result(p_rgb_x,p_rgb_y) = Z axis of P_rgb is wrong, The error message is : Microsoft C++ exception : cv::Exception at memory location 0x0089f860. Can you tell me why?