Projects

Emotion Recognition takes an image with faces as an input, and returns the confidence across a set of emotions for each face in the image, as well as bounding box for the face (using MS Face API). The algorithm infers emotions from appearance using a custom Deep Convolution Network. To improve labeling, we labeled each image with more than 10 taggers using crowd source, which allowed us to learn probability distribution for each image. We are also sharing…

We develop novel eye-gaze tracking technologies in order to make eye-gaze tracking technology ubiquitously available for improved natural user interaction (NUI). In particular, we investigate two approaches: Active IR lighting: We investigate the possibility of using multiple IR lights around a display. The key idea is to use many (more than 4) IR lights distributed around a monitor border. RGB + Depth: We leverage both RGB camera and depth sensor already available in Kinect…

ViiBoard uses vision techniques to significantly enhance the user experience on large touch displays (e.g. Microsoft Perceptive Pixel) in two areas: human computer interaction and immersive remote collaboration. Simple Setup ViiBoard uses only an RGBD camera (Microsoft Kinect), mounted on the side of a large touch display, to enhance user interaction and enable 3D immersive collaboration in a desirable form factor, practical for home or office use. Part I: Vision-enhanced Interaction ViiBoard augments the touch…

It is a novel interaction system for mobile computing. Our goal is to bring Microsoft Surface experience to mobile scenarios, and more importantly, to enable 3D interaction with mobile devices. We do research on how to transform any surface (e.g., a coffee table or a piece of paper) to Mobile Surface with a mobile device and a camera-projector system. Besides this, our work also includes how to get 3D object model in real-time, augmented reality…

About the System In multiparty conferencing, one hears voices of more than one remote participants. Current commercial systems mix them into a single mono audio stream, and thus all voices of remote participants will sound like coming from the same location when using loudspeakers or from inside the listener's head when using headphones. This is in sharp contrast to what happens in real life where each voice has its distinct location. We have built and…

With globalization and workforce mobility, there is a strong need of research and development of advanced infrastructures and tools to bring immersive experience into teleconferencing so people across geographically distributed sites can interact collaboratively. The Personal Telepresence Station project aims at bringing Telepresence experience to offices. We try to replicate the same experience people enjoy in face-to-face meetings such as gaze awareness and spatial audio.

Speaker verification is the process of verifying the claimed identity of a speaker based on the speech signal from the speaker (voiceprint). There are two types of speaker verification systems: Text-Independent Speaker Verification (TI-SV) and Text-Dependent Speaker Verification (TD-SV). TD-SV requires the speaker saying exactly the enrolled or given password. Text independent Speaker Verification is a process of verifying the identity without constraint on the speech content. Compared to TD-SV, it is more convenient because…

Visual Echo Cancellation for Seamless Integration of Remote Sites About In a typical remote collaboration setup, two or more projector-camera pairs are "cross-wired" to form a full-duplex system for two-way communication. A whiteboard can be used as the projector screen, and in that case, the whiteboard serves as an output device as well as an input device. Users can write on the whiteboard to comment on what is projected or to add new thoughts in…

Digital Technology for Effective Whiteboard Use Introduction Whiteboard is ubiquitous and will exist for foreseeable future, but its content is hard to archive and share. While digital cameras can be used to capture whiteboard content, the image is usually taken from an angle, contains irrelevant information, and has shadows. We have developed an intelligent and automatic technique to reproduce the whiteboard content as a crisp and faithful image which can be archived or shared with…

The lack of eye contact in desktop video teleconferencing substantially reduces the effectiveness of video contents. While expensive and bulky hardware is available on the market to correct eye gaze, researchers have been trying to provide a practical software-based solution to bring video-teleconferencing one step closer to the mass market. This paper presents a novel approach that is based on stereo analysis combined with rich domain knowledge (a personalized face model). This marriage is mutually…

Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures. This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action graph, where nodes of the graph represent salient postures that are used to characterize the actions and are shared by all actions. The weight between two nodes measures the transitional probability between the two postures represented by…

We present a framework for view-dependent rendering from arbitrary viewpoints and relighting under novel illumination conditions of a real object from a sparse set of images and a pre-acquired geometric model of the object. Using a 3D model and a small set of images of an object, we recover all the necessary photometric information for subsequent rendering. We recover the illumination distribution, represented as a hemisphere covering the object, as well as the parameters of…

Overview 3d Modeling. Generating realistic 3D human face models and facial animations has been a persistent challenge in computer vision and graphics. We have developed a system that constructs textured 3D face models from videos with minimal user interaction. Our system takes a video sequence of a face with an ordinary video camera. After five manual clicks on two images to tell the system where the eye corners, nose tip and mouth corners are, the…

Transforming an ordinary paper into a wireless mobile input device. Virtual mouse, keyboard and 3D controller with an ordinary piece of paper. Abstract In many intelligent environments, instead of using conventional mice, keyboards and joysticks, people are looking for an intuitive, immersive and cost-efficient interaction device. We are developing a vision-based gesture interface prototype system, VisualPanel, which employs an arbitrary quadrangle-shaped panel (e.g., an ordinary paper) and a tip pointer (e.g., fingertip) as an intuitive,…

Transforming an ordinary screen into a Touch Screen with a camera. About Touch screens are very convenient because one can directly point to where it is interesting. This paper presents an inexpensive technique to transform an ordinary screen into a touch screen using an ordinary camera. The setup is easy: position a camera so it can see the whole screen. The system calibration involves the detection of the screen region in the image, which determines…

We propose a flexible new technique to easily calibrate a camera. It is well suited for use without specialized knowledge of 3D geometry or computer vision. The technique only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form…

Before joining Microsoft, Zhengyou worked at INRIA (French National Institute for Research in Computer Science and Control) for 11 years, and was a Senior Research Scientist since 1991, where he worked in the Computer Vision and Robotics group. In 1996-1997, he spent one-year sabbatical as an Invited Researcher at the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan.

He holds more than 130 US patents and has about 20 patents pending. He also holds a few Japanese patents for his inventions during his sabbatical at ATR.

He has published over 200 papers in refereed international journals and conferences, and is the author of the following books