We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignment method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.

Digitization of dynamic real-world scenes is an important area in the fields of computer graphics and computer vision. These digitization representations are used in products prototyping, game design, virtual reality and many other related areas. Earlier approaches used multi-view camera systems or laser scanners to capture the motion, shape and appearance of dynamic scenes. In our work, we show that it is possible to use very low cost RGB-D cameras to achieve the comparable results in digitization of real-world scenes. We make use of Microsoft Kinect cameras which employs both a Depth sensor and an RGB sensor. The Depth sensor provides us with the scenes depth information while RGB sensor provides the scenes color information. We show that using the data from these two sensors we can efficiently reconstruct a 3D representation of dynamic real-world scenes. First we acquire the scene data using our own acquisition system; as a result we will get depth data and RGB data of each recorded frame. Our software-only multi-view acquisition is low cost and does not require any special hardware for the multi-camera synchronization. Acquired dynamic depth and color data is re-sampled in a dynamic point cloud representation of the real-world scene. Finally, we present a passive method which uses features from depth and color images for reconstructing a temporally-coherent representation of the dynamic point cloud. Our work demonstrate that despite the limitation imposed by Microsoft Kinect in terms of image quality and synchronization, it is possibleto reconstruct a time-coherent 3D animation of a real world object with a low cost acquisition system.