Digital intelligence blog

Boehringer data competition produces academic-standard models in just three months

Kaggle partnership used gamification approach to encourage work on predictive models

Boehringer Ingelheim has successfully applied a 'gamification' approach to clinical research, using an online competition to produce predictive data models.

The company harnessed non-pharma data expertise from an online research community to build models that relate public domain molecular information to an actual biological response.

“This is a data set that the academic community have been bouncing around for years,” David Thompson, a social media strategist at Boehringer, told PMLiVE.

“And it looks as if in three months people with no formal training in chemistry have developed models that are as good, if not better, than the models that the academic community are putting together.”

This is one of the first times gamification, which is based on the theory of using game design techniques to solve problems and engage audiences, has shown such a practical contribution to pharma drug development.

The three-month competition has just finished and attracted more than 700 teams that between them submitted nearly 9,000 entries.

Players were given 1,776 different variables, each representing a molecular descriptor pertaining to a characteristic of the molecule, such as its size, shape or chemical composition. They were also given experimental data relating to an actual biological response.

Combined, these were used as training data by the contestants to develop and test their models. While they were doing that a real-time leader board encouraged teams to refine their solutions and leapfrog their rivals.

The winning team was made up of Jeremy Achin, Tom DeGodoy and Sergey Yurgenson, two research directors at an insurance firm and a neurobiologist from Harvard University.

They are in the process of becoming the first Kaggle team to start up their own company and by creating the best algorithm, as assessed by an objective measure of performance, in Boehringer's competition they will share a $10,000 cash prize.

There were also second and third prizes of $6,000 and $4,000, respectively, and in return for its investment Thompson said Boehringer saw “multi-faceted benefits”.

There was positive PR from the interest and excitement about the competition, and connections were made between the community, Boehringer and Kaggle “which hopefully will lead to more competitions”.

Boehringer also now has the source code for the three winning models and will use this as part of its drug development efforts.

“We've taken a data set that's been in the public domain. We've had the Kaggle community squeeze every possible utility out of it in the hope that we're developing extraordinarily predictive models. Then we're going to expose that, in a really lightweight way, to our medicinal chemists to help them drive decisions,” Thompson explained.

The data used in the competition had its chemical background stripped out, but the actual data set will be made public at the American Chemical Society's meeting later this year.

And Thompson stressed that the company wants to put information about the competition on as scientific a footing as possible.

“Not with a view to poking fun at the academic community, but to say: There's this whole other community of people, all of whom are doing the same thing you're doing, but they're coming at it from wildly different perspectives. Look what they did with your data – how extraordinary is that! Why don't we use them in the future?”

Thompson, who has been making internal company presentations about the findings, said Boehringer expects to use Kaggle again.

“Anything that you can pose as a data question, with an objective measure on the backend – you have to be able to say 'is this better or not', this kind of a community would be a place to play,” he said.

Unsurprisingly, Kaggle suggests its approach could have substantial benefits for the industry.

“Expertise in a field only gets you so far; sometimes the answers are buried in the data,” said Jeremy Howard, Kaggle's chief scientist and president. “Given the success of Boehringer Ingelheim's experiment, we anticipate this approach to become the norm for the pharmaceutical industry.”