Abstract

Our goal is to automatically recognize hand grasps and to discover the visual structures (relationships) between hand grasps using wearable cameras. Wearable cameras provide a first-person perspective which enables continuous visual hand grasp analysis of everyday activities. In contrast to previous work focused on manual analysis of first-person videos of hand grasps, we propose a fully automatic vision-based approach for grasp analysis. A set of grasp classifiers are trained for discriminating between different grasp types based on large margin visual predictors. Building on the output of these grasp classifiers, visual structures among hand grasps are learned based on an iterative discriminative clustering procedure. We first evaluated our classifiers on a controlled indoor grasp dataset and then validated the analytic power of our approach on real-world data taken from a machinist. The average F1 score of our grasp classifiers achieves over 0.80 for the indoor grasp dataset. Analysis of real-world video shows that it is possible to automatically learn intuitive visual grasp structures that are consistent with expert-designed grasp taxonomies. [Paper][Code][Dataset]