A coarse-to-fine facial feature detection and tracking system which is used under complex background is introduced in this paper. The system uses stereo cameras for video input. By stereovision technique, face is roughly and quickly segmented from complex background. Then, the multiple template matching method is applied to find the accurate face region from this rough segmentation. Facial organ candidates are extracted from the detected face region at a specific scale space called organ scale for Sobel filter. Finally, eyes, nose and mouth corners are detected. Techniques for checking and correcting errors in facial feature detection based on multiple cues are developed to make the algorithm more robust in facial feature detection and tracking in video sequence. Experiments on 189 video sequences demonstrate its effectiveness.