ChaLearn takes gesture recognition to the crowd with Microsoft Kinect(TM)

A competition to help improve the accuracy of gesture recognition using Microsoft Kinect(TM) motion sensor technology promises to take man-machine interfaces to a whole new level. From controlling the lights or thermostat in your home to flicking channels on the TV, all it will take is a simple wave of the hand. And the same technology may even make it possible to automatically detect more complex human behaviors, to allow surveillance systems to sound an alarm when someone is acting suspiciously, for example, or to send help whenever a bedridden patient shows signs of distress.

Through its low cost 3D depth-sensing cameras, Microsoft Kinect(TM) has already kick-started this revolution by bringing gesture recognition into the home. Humans can recognize new gestures after seeing just one example (one-shot-learning). With computers though, recognizing even well-defined gestures, such as sign language, is much more challenging and has traditionally required thousands of training examples to teach the software.

To see what the machines are capable of, ChaLearn launched a competition hosted by Kaggle with prizes donated by Microsoft, in the hope they can give the state of the art a rapid boost. The ChaLearn team has been organizing competitions since 2003, featuring hard problems such as discovering cause-effect relationships in data. It has selected the young and dynamic startup Kaggle to host the gesture challenge because Kaggle has very rapidly established a track record for using crowdsourcing to find solutions that outperform state-of-the- art algorithms and predictive models in a wide variety of domains (from helping NASA build algorithms to map dark matter to helping insurance companies improves claims prediction). And now the first round of the gesture challenge helped narrow down the gap between machine and human performance. Over a period of four months starting in December 2011, 153 contestants making 573 entries have built software systems that are capable of learning from a single training example of a hand gesture (so-called one-shot-learning). They lowered the error rate, starting from a baseline method making more than 50% error to less than 10% error.

The winner of the challenge, Alfonso Nieto Castanon, used a method he invented, which is inspired by the human vision system. He and the second and third place winners will be awarded $5000, $3000 and $2000 respectively and get an opportunity to present their results in front of an audience of experts at the CVPR 2012 conference in Rhode Island, USA, in June. A demonstration competition of gesture recognition systems using Kinect(TM) will also be held in conjunction with this event, with similar prizes donated by Microsoft.

Now, from May 7 and until September 10, new competitors can enter round 2 of the challenge and get a chance to close the gap with human performance, which is under 2% error! The entrants are given a set of examples with which to apply and test their algorithms, so that they may improve them. Compared to round 1, they will benefit from a wealth of resources including the fact sheets and published papers of the participants of round 1, data annotations, and data transformations having had success in round 1. During a four month period they will be able to compare their system with those of other contestants, by using it to predict gestures from a feedback sample. Throughout the competition the evaluations of these are posted on a live leaderboard, so participants can monitor their performance in real time. The contestants will then have the opportunity to put their best algorithms to the final test in an evaluation phase. Here they will be given a few days to train their system on an entirely new set of gestures, after which the one with the best recognition score will be rewarded with $5000. Those coming second and third place will receive

$3000 and $2000 respectively. Similarly as in round 1, the results will be discussed at a scientific conference (ICPR 2012, Tsukuba, Japan, November 2012) where a demonstration competition will be held also crowned with prizes in the same amount. Microsoft will be evaluating successful participants in all challenge rounds for two potential IP agreements of $100,000 each. See official challenge rules for more details at http://gesture.chalearn.org.

The winner of the first round believes that it is possible to reach and even beat human performance. Others will also join in the race.

According to Kaggle, that is the power of the crowd: bringing together expert talent, sometimes from previously untapped quarters. And with Microsoft interested in buying the intellectual property, the hope is that the new algorithms that emerge from the contest will not only boost accuracy but also open the doors to a whole new range of applications. From using communicating with Kinect(TM) through sign language or even speaking, with the algorithms interpreting what you say by reading your lips to smart homes or using gestures to control surgical robots.

The challenge was initiated by the US Defense Advanced Research Projects Agency (DARPA) Deep Learning Program and is supported by the US National Science Foundation, the European Pascal2 network of excellence, Microsoft and Texas Instruments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors and funding agencies.