ENGLISH ABSTRACT: The fairly recent introduction of low-cost depth sensors such as Microsoft’s Xbox Kinect
has encouraged a large amount of research on the use of depth sensors for many
common Computer Vision problems. Depth images are advantageous over normal
colour images because of how easily objects in a scene can be segregated in real-time.
Microsoft used the depth images from the Kinect to successfully separate multiple
users and track various larger body joints, but has difficulty tracking smaller joints
such as those of the fingers. This is a result of the low resolution and noisy nature of
the depth images produced by the Kinect.
The objective of this project is to use the depth images produced by the Kinect to
remotely track the user’s hands and to recognise the static hand poses in real-time.
Such a system would make it possible to control an electronic device from a distance
without the use of a remote control. It can be used to control computer systems during
computer aided presentations, translate sign language and to provide more hygienic
control devices in clean rooms such as operating theatres and electronic laboratories.
The proposed system uses the open-source OpenNI framework to retrieve the depth
images from the Kinect and to track the user’s hands. Random Decision Forests are
trained using computer generated depth images of various hand poses and used to
classify the hand regions from a depth image. The region images are processed using
a Mean-Shift based joint estimator to find the 3D joint coordinates. These coordinates
are finally used to classify the static hand pose using a Support Vector Machine trained
using the libSVM library. The system achieves a final accuracy of 95.61% when tested
against synthetic data and 81.35% when tested against real world data.