YodaQA Grand Challenge!

Recently, the YodaQA team is collaborating with Falk Pollok from RWTH Aachen who is interested in using Question Answering in education to help people digest what they have learned better and to generally assist with studying. To this end, he has created PalmQA – a QA application that multiplexes between many question answering backends, ensembling them together to a more accurate system.

Falk has built backends for IBM Watson’s DeepQA among others (Google, Evi, Kngine and MIT’s Start), but in the end, the combination of YodaQA and Wolfram Alpha is “a match made in heaven,” as Falk said in an email a short while ago.

As a finishing touch to his work (being submitted as a diploma thesis), Falk made a Grand Challenge – letting an independent third party make a list of 30 factoid questions of varying difficulty, and pitching the PalmQA against a wide variety of humans. Perhaps not quite as dramatic or grandiose as IBM Watson’s Jeopardy participation, but still a nice showcase of where we are now.

Well, PalmQA did great! 26 people competed, and typically could get about 15 out of the 30 right. The best human answered 24 questions correctly. But no matter – PalmQA managed to answer 25 out of 30 questions right!

So, in this challenge, Falk’s ensemble-enhanced YodaQA beats the best human!

As mentioned above, PalmQA offers integration of YodaQA with Wolfram Alpha, Google QA, MIT’s Start, Amazon’s Evi and Kngine. We hope to merge this ensembling system into the YodaQA project in the future!

I also entered just plain YodaQA into the Grand Challenge, in the configuration that’s running at live.ailao.eu right now. It got 18 questions right, still better than an average human! If we also included purely “computational” questions (algebra, unit conversions) that YodaQA just isn’t designed to answer (it’s still essentially a search engine), that’d make 24 questions out of 30. Pretty good!

See the Grand Challenge Github issue for more info. We should get the complete details of the challenge, comparisons to other public QA engines (like Google) etc. in Falk’s upcoming thesis.