Combining Pose-Invariant Kinematic Features and Object Context Features for RGB-D Action Recognition

Manoj Ramanathan, Jaroslaw Kochanowicz, and Nadia Magnenat Thalmann

Abstract—Action recognition using RGB-D cameras is a popular
research topic. Recognising actions in a pose-invariant
manner is very challenging due to view changes, posture
changes and huge intra-class variations. This study aims to
propose a novel pose-invariant action recognition framework
based on kinematic features and object context features. Using
RGB, depth and skeletal joints, the proposed framework
extracts a novel set of pose-invariant motion kinematic features
based on 3D scene flow and captures the motion of body parts
with respect to the body. The obtained features are converted to
a human body centric space that allows partial viewinvariant
recognition of actions. The proposed pose-invariant kinematic
features are extracted for both foreground (RGB and depth)
and skeleton joints and separate classifiers are trained. Bordacount
based classifier decision fusion is employed to obtain an
action recognition result. For capturing object context features,
a convolutional neural network (CNN) classifier is proposed to
identify the involved objects. The proposed context features also
include temporal information on object interaction and help in
obtaining a final action recognition. The proposed framework
works even with non-upright human postures and allows
simultaneous action recognition for multiple people, which are
topics that remain comparatively unresearched. The
performance and robustness of the proposed pose-invariant
action recognition framework are tested on several benchmark
datasets. We also show that the proposed method works in
real-time.