This paper focuses on recovering the 3D structure and motion of human faces from a sequence of 2D images. Based on a probabilistic model, we extensively studied the rotation constraints of the problem. Instead of imposing numerical optimizations, the inherent geometric properties of the rotation matrices are taken into account. The conventional Newton's method for optimization problems was generalized on the rotation manifold, which ultimately resolves the constraints into unconstrained optimization on the manifold. Furthermore, we also extended the algorithm to model within-individual and between-individual shape variances separately. Evaluation results give evidence to the improvement over the state-of-the-art algorithms on the Mocap-Face dataset with additive noise, as well as on the Binghamton University A 3D Facial Expression (BU-3DFE) dataset. Robustness in handling noisy data and modeling multiple subjects shows the capability of our system to deal with real-world image tracks.