Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-depth devices has led to many new approaches to MHT, and many of these integrate colour and depth cues to improve each and every stage of the process. In this survey, the authors present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. They identify and introduce existing, publicly available, benchmark datasets and software resources that fuse colour and depth data for MHT. Finally, they present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

Interests in biometric identification systems have led to many face recognition task-oriented studies. These studies often address the detection of face images taken from a camera and the recognition of faces via extracted meaningful features. To meet the requirement of defining data with fewer features, principal component analysis (PCA)-based techniques are widely used due to their efficiency and simplicity. There is a remarkable interest in the used efficiency of PCA by extending this traditional technique with various aspects. From this viewpoint, this study is specifically focused on the PCA-based face recognition techniques. By enhancing the methods in the reviewed studies, a novel class-wise two-dimensional PCA-based face recognition algorithm is presented in this study. Unlike the traditional method, this method generates more than one subspace considering within-class scattering. A system based on the presented approach can successively detect and recognise faces in not only images but also in video files. In addition, analyses were conducted to evaluate the efficiency of the proposed algorithm and its extension comparing with other addressed PCA-based methods. On the basis of the experimental results, it is clear to say that the presented approach and its extension are superior to the compared PCA-based algorithms.

Action recognition is one of the hottest research topics in computer vision. Recent methods represent actions based on global or local video features. These approaches, however, lack semantic structure and may not provide a deep insight into the essence of an action. In this work, the authors argue that semantic clues, such as joint positions and part-level motion clustering, help verify actions. To this end, a meta-action descriptor for action recognition in RGBD video is proposed in this study. Specifically, two discrimination-based strategies – dynamic and discriminative part clustering – are introduced to improve accuracy. Experiments conducted on the MSR Action 3D dataset show that the proposed method significantly outperforms the methods without joint position semantic.

Since image background is normally composed of congenial regions, it can be represented by a feature dictionary via sparse representation. Based on this theory, the authors propose a novel bottom-up saliency detection method that unites the syncretic merits of sparse representation and multi-hierarchical layers. In contrast to most pre-existing sparse-based approaches that only highlight the boundaries of a target, the proposed method highlights the entire object even if it is large. Given a source image, a multi-scale background dictionary is structured with the features form different layers. Each region of the image is then reconstructed by the dictionary to compute its reconstruction error as a saliency score. Although a reconstruction map can be generated by the saliency scores, it is not good enough to be the final result because of low resolution and high error detection rates. Therefore, in middle cue, they propose a multi-scale contour zooming approach to address the error detection across the hierarchical layers. To improve the resolution of the final detection, a pixel-level rectification based on the Bayesian observation likelihood is calculated as the bottom cue. Combining sparse representation and multi-scale correction, the precision of the final saliency map is significantly improved for the detection results.

Among all image representation and classification methods, sparse representation has proven to be an extremely powerful tool. However, a limited number of training samples are an unavoidable problem for sparse representation methods. Many efforts have been devoted to improve the performance of sparse representation methods. In this study, the authors proposed a novel framework to improve the classification accuracy of sparse representation methods. They first introduced the concept of the approximations of all training samples (i.e., virtual training samples). The advantage of this is that the application of virtual training samples can allow noise in original training samples to be partially reduced. Then they proposed an efficient and competent objective function to disclose more discriminant information between different classes, which is very significant for obtaining a better classification result. The devised sparse representation method employs both the original and virtual training samples to improve the classification accuracy since the two kinds of training samples makes sample information to be fully exploited in a good way, also satisfactory robustness to be obtained. The experimental results on the JAFFE, ORL, Columbia Object Image Library (COIL-100) AR and CMU PIE databases show that the proposed method outperforms the state-of-art image classification methods.