1
Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona, Cerdanyola, Spain Dept. Matemàtica Aplicada i Anàlisi, UB, Gran Via 585, 08007, Barcelona, Spain Sub-title Abstract We present a hand detector system used for the Person Layout competition at PASCAL VOC Challenge 2010 in conjunction with an external head detector module. HOG features are extracted after patch normalization from the training set, and clustering is performed in order to categorize the different poses that hands can have. A cascade of classifiers is trained for each one of the discovered hand sub-classes, and a sliding-window approach is used for the detection process, followed by a grouping and filtering step. Results are shown on the corresponding Person Layout dataset from PASCAL VOC Images of the dataset are sampled to extract positive and negative patches. Extracted patches are normalized by rotating them with the orientation of maximum gradient magnitude. HOG features are used as image descriptor, and a clustering (K-means) is performed over positive samples. This way, we make a first categorization, and try to reduce the intra-class variability of the hands. K independent cascades of Linear SVMs are trained (one for each class from the clustering) with their corresponding positive samples. The negative samples set is the same for all the cascades. 2. Detection At detection time, the input image of a person is scanned with a sliding window at different scales. Each window is tested with the K cascades of classifiers, and a detection is considered if any of the cascades returns a positive answer. Once the sliding window has scanned all the image, a filtering step is performed: 3. Results Data: For the training step of our system we have used the training set for the Person layout competition in the PASCAL VOC Challenge 2010, and the Human limb dataset. For the validation, the corresponding validation set from the PASCAL dataset has been used. Validation measurement: The mean Average Precision (AP) is computed, considering a match if the overlapping between the detection and the ground truth is greater than Conclusions We proposed a hand detection system based on HOG features. At training time, a clustering of the positive hands is performed, and different cascades of Linear SVM classifiers are trained with the corresponding samples belonging to each cluster. At detection time, a sliding-window approach was implemented, filtering the output of the classifiers with agglomerative clustering and ranking the detections using SVM mean margin and color information. This work is a base over which we can keep improving in this challenging problem. Future work includes the analysis of more complex classifiers, structure analysis of positive detections, and contextual constraints. [1] Paul Viola, Michael Jones, ”Robust Real-Time Face Detection”, International Journalof Computer Vision, 57(2): , [2] Mathias Klsch, Matthew Turk, ”Analysis ofrotational robustness of hand detection with a viola-jones detector”, International Conferece on Pattern Recognition, , [3] Mathias Klsch, Matthew Turk, ”Robust hand detection”, ICAFGR, Seoul, , [4] EJ Ong, R Bowden, ”A Boosted Classifier Tree for Hand Shape Detection”, Face and gesture recognition, [5] N. Dalal and B. Triggs, ”Histogram of Oriented Gradients for Human Detection”, CVPR, 2: , [6] David G. Lowe, ”Object Recognition from Local Scale-Invariant Features”, International Conference on Computer Vision, Training Agglomerative clustering Detections ranking Agglomerative clustering reduces the number of True Positives (TPs) for one A score function is defined joining the SVM mean margin and skin color information: An HSV histogram is computed over the region of the head of the person -returned by an external head detector module-. Finally, we assume that in the image of the person the two hands will be present and then we return the two bounding boxes with the highest score. Here we can see the influence of incorporating color information in the score function (right) versus just SVM mean margin (left). In blue, the head detection returned by the external head detector module. In green, the four best hand candidates. In red, pixels with high skin likelihood.