Thumbs up for gesture-based computing

FASHION crime it may be, but a multicoloured dayglo glove could bring Minority Report-style computing to your home PC.

Interest in so-called gesture-based computing has been stoked by the forthcoming launch of gaming systems from Microsoft and Sony that will track the movements of players' bodies and replicate them on screen. But an off-the-shelf system that can follow delicate hand movements in three dimensions to manipulate virtual objects remains tantalisingly beyond reach.

The problem with systems such as Microsoft's Project Natal for the Xbox is that they do not focus on the detailed movement of hands, limiting the degree to which players can manipulate virtual objects, says Javier Romero, a computer-vision researcher at the Royal Institute of Technology in Stockholm, Sweden. Arm movements can be captured but more subtle pinches or twists of the wrists may be missed.

Until now, capturing detail required expensive motion-capture systems like those used for Hollywood's special-effects fests. These utilise markers placed around the body, or sensor-studded data gloves in which flexible sensors detect joint movements. "Really accurate gloves cost up to $20,000 and are a little unwieldy to wear," says Robert Wang, a computer scientist at the Massachusetts Institute of Technology's Artificial Intelligence Lab.

Wang has developed a system that could bring gesture-based computing to the masses and it requires nothing more than a pair of multicoloured latex gloves, a webcam and a laptop (pictured).

The key to the system is the gloves, each of which is comprised of 20 patches of 10 different colours - the maximum number a typical webcam can effectively distinguish between. The patches are arranged to maintain the best possible separation of colours. For example, the fingertips and the palm, which would frequently collide in natural hand gestures, are coloured differently.

The upshot is that when a webcam is used to track a glove-clad hand, the system can identify each finger's location and distinguish between the front and the back of the hand. "It makes the computer's life easier," says Wang.

Once the system has calculated the position of the hand, it searches a database containing 100,000 images of gloved hands in a variety of positions. "If you have more images than that it slows the computer down, and if you have fewer then you don't provide an adequate representation of all the positions the hand can be in," Wang explains.

Once it finds a match it displays it on screen. The process is repeated several times per second, enabling the system to recreate gestures in real time.

Wang presented some early-stage research at last year's SIGGRAPH meeting in New Orleans, Louisiana. "Back then it only worked in windowless rooms and took half an hour to calibrate," says Wang. Now it can be calibrated in 3 seconds, he says.

Wang has already shown that the system can correctly replicate most of the letters of the American Sign Language alphabet, although those that require rapid motion (J and Z) or involve the thumb (E, M, N, S and T) have yet to be perfected.

The gloves are so cheap to make - costing about a dollar - that they could bring gesture-based computing to a wider audience, says Douglas Lanman, an expert in human-computer interaction at Brown University, Providence, Rhode Island. But if it's going to have truly widespread appeal, it will need to lose the gloves. "Wearing a glove is an inconvenience," he says. "Markerless motion-capture is where I think the field is moving, and where the larger commercial market will be."

Last month, at the IEEE International Conference on Robotics and Automation in Anchorage, Alaska, Romero and his colleague Danica Kragic demonstrated how markerless motion-capture may be possible. Their system also uses a webcam and a database of hand positions to recreate an on-screen version, but attempts to pick out a bare hand in a stream of video from a webcam by detecting flesh colours. If you reach down and pick up a ball, say, the program will aim to find a matching image in its database of the positions the had adopts as it reaches down and picks up a spherical object.

Identifying a hand using skin colour is far more difficult than picking out a multicoloured glove. Even once a hand is detected, it is a massive challenge to accurately identify its position - especially if it is holding something, says Kragic. "The object blocks out parts of the hand, preventing the computer from knowing what the hidden bit is doing."

To tackle the problem, Romero and Kragic created a reference database containing images of hands picking up 33 different objects, such as a ball or a cylinder. They then set up a webcam, which captured 10 frames per second, and tested their system's capabilities by filming people grasping a cup, a ball or a pair of pliers. The database had images of a hand picking up a ball, but nothing for a cup or pliers. The system successfully created virtual representation of a hand grabbing a ball, and came as close as it could to the cup by displaying a hand grasping a cylinder. It came up empty with the pliers.

If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.