Abstract

We propose a novel method for model-based 3D tracking of hand articulations that is effective even for fast-moving hand postures in depth images. A large number of augmented reality (AR) and virtual reality (VR) studies have used model-based approaches for estimating hand postures and tracking movements. However, these approaches exhibit limitations if the hand moves rapidly or into the camera's field of view. To overcome these problems, researchers attempted a hybrid strategy that uses multiple initializations for 3D tracking of articulations. However, this strategy also exhibits limitations. For example, in genetic optimization, the hypotheses generated from the previous solution may search for a solution in an incorrect search space in a fast-moving hand gesture. This problem also occurs if the search space selected from the results of a trained model does not cover the true solution although the tracked hand moves slowly. Our proposed method estimates the hand pose based on model-based tracking guided by classification and search space adaptation. From the classification by a convolutional neural network (CNN), a data-driven prior is included in the objective function and additional hypotheses are generated in particle swarm optimization (PSO). In addition, the search spaces of the two sets of the hypotheses, generated by the data-driven prior and the previous solution, are adaptively updated using the distribution of each set of the hypotheses. We demonstrated the effectiveness of the proposed method by applying it to an American Sign Language (ASL) dataset consisting of fast-moving hand postures. The experimental results demonstrate that the proposed algorithm exhibits more accurate tracking results compared to other state-of-the-art tracking algorithms.