Abstract:We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information. Our approach leverages unique 4D spatio-temporal signatures to address the identification problem across days. Formulated as a reinforcement learning task, our model is based on a combination of convolutional and recurrent neural networks with the goal of identifying small, discriminative regions indicative of human identity. We demonstrate that our model produces state-of-the-art results on several published datasets given only depth images. We further study the robustness of our model towards viewpoint, appearance, and volumetric changes. Finally, we share insights gleaned from interpretable 2D, 3D, and 4D visualizations of our model's spatio-temporal attention.

Bio:Albert is a MS student in the Computer Science department at Stanford University, advised by Professor Fei-Fei Li. He received a BS and BBA from the University of Texas at Austin. His research focuses on computer vision and deep learning applied to 3D scenes and human activity understanding.