K-Pose Results on KTH Football II and HumanEva Datasets

Sara Ershadi-Nasab, Afshin Bozorgpour, Shohreh Kasaei, Esmaeil Sanaei

Abstract

Human body pose estimation is a challenging task because of its high degree of freedom. In this paper, a novel and efficient method for 2D and 3D human body pose estimation is proposed. The proposed method consists of two stages. In the first stage, the appearance of human body is learned by a proposed method that is based on cascaded random forests. Because of large variations in the appearance of human body, the training data is first partitioned into K clusters via the K-Means clustering algorithm. Afterwards, a global random forest is trained on the whole training dataset and a local one is trained for each cluster. Deep features are extracted by a convolutional neural network (CNN), integral channel features (ChnFtr), histogram of oriented gradients (HOG), and scale invariant feature transform (SIFT) are used as feature descriptors. The performance of the model trained on each feature space is evaluated using the probability of correct parts (PCP) criterion. In the second stage, the spatial dependency of human body joints is modeled with a conditional random field defined on the tree model of human body. Noise in random forest output and confusion between left/right sides of human body is corrected by a novel method based on 3D K-Means clustering. By using the evidences across multiple viewpoints, a robust 3D pose estimation method is introduced. The inference is then computed by the max-product algorithm. The results evaluated on the KTH Football dataset I and II, indicate that the proposed method has substantial improvement in computing the probability of the correct pose in comparison with the existing state-of-the-art methods. It is also faster and requires fewer memory consumption and RAM support.
K-Pose results on the KTH dataset