A webcam is enough to produce a real-time 3-D model of a moving hand

Franziska Müller, Max Planck Institute for Informatics, has developed a software system that requires only the built-in camera of a laptop to produce a real-time 3-D model of a moving hand. Credit: Oliver Dietze

Capturing hand and finger movements within milliseconds is becoming increasingly important for many applications, from virtual reality to human-machine interaction and Industry 4.0. So far, its enormous technical demands have limited possible applications. Computer scientists at the Max Planck Institute for Informatics have now developed a software system involving the interaction of various neural networks that requires only the built-in camera of a laptop.

For the first time, the researchers will be presenting the program at stand G75 in hall 27 of the computer fair Cebit, which will take place in Hannover from June 11th onward.

When computer scientist Franziska Müller holds her hand in front of the laptop camera, the hand’s virtual counterpart appears on the screen. This is overlaid by a colorful virtual hand skeleton. No matter what movements Müller’s hand makes in front of the webcam, the colored bones of the model do the same. Müller demonstrates the software she developed together with Professor Christian Theobalt and other researchers from the Max Planck Institute for Computer Science in Saarbrücken, Stanford University and the Spanish King Juan Carlos University. So far, no other software can do this with such a low-cost camera.

Since it works in almost every kind of filmed scene, it can be used anywhere, and thus trumps previous approaches that require a depth camera or multiple cameras. The algorithm the system uses transforms the two-dimensional information of the video image in real time into the three-dimensional movement model of the hand’s bones. It is based on a so-called “convolutional neural network,” or CNN for short. The researchers have trained it to detect the bones of the hand. They have generated the necessary training data with another neural network. The result: The software calculates the exact 3-D poses of the hand’s bones in milliseconds. Even if some of them are occluded, for example, by an apple held in the user’s hand, the software compensates. However, the system still has trouble processing several hands working together, and solving this is the researchers’ next goal.