Welcome to the VIVA hand detection benchmark! The dataset consists of 2D bounding boxes around driver and passenger hands from 54 videos collected in naturalistic driving settings of illumination variation, large hand movements, and common occlusion. There are 7 possible viewpoints, including first person view. Some of the data has been captured in our testbeds, while some was kindly provided by YouTube.

Please use the following kit for evaluating your method or for plotting curves of methods in the table below: Download here!.

For evaluation, we compute the area under the precision-recall curve (AP) and average recall (AR) rate. AR is calculated over 9 evenly sampled points in log space between 10^-2 and 10^0 false positives per image. Detection evaluation is done using the PASCAL overlap requirement of 50%. As a minimum requirement for submission, you must provide predicted hand bounding boxes for each image. Classification of left/right driver/passenger hands and identification of the number of hands on the wheel are optional, but encouraged, challenges.

Below, we display the current results of AP/AR for the L1 and L2 evaluation settings. For both of the evaluation metrics, higher is better. The results are sorted by AR on L2. Left-right (L-R) hand classification, driver-passenger (D-P) hand classification, and number of hands on the wheel (#HANDS) (using mAP/mAR) are shown in the second table. You are encouraged to submit labels for both hand detection and hand type classification in your submission.

?.SCUT HCII Lab. In We use FRCNN as basic framework to achieve a relatively good performance. By visualising feature we found that some unwanted features are learned due to imbalanced data distribution (i.e. strong linear relevance of left and right hand). We do data augmentation according to bad features.