Iowa State is first U.S. student team to win international data mining competition

Posted Jul 10, 2014 11:46 am

Iowa State data mining team wins international competition.

AMES, Iowa -- A team of Iowa State University graduate students topped 98 other universities from 28 countries to capture first place in the 15th annual Data Mining Cup. The winner was announced July 2 in Berlin. It is the first time a team from the United States has won the competition.

Prudsys AG, a leading European data mining company, sponsors the intelligent-data analysis competition for universities. According to Prudsys, the competition is meant to be a "bridge between university and industry to identify the best up-and-coming data miners."

Teams had six weeks to develop a solution for a data mining problem about optimal return prognosis. This year, teams had to use an unidentified online store's historical purchase data to create a model for new orders that predicts the probability of a purchase being returned.

"The motivation for this contest data is that some online retailers offering free return shipping have almost half of their orders returned," said Iowa State's team leader and statistics Ph.D. candidate Cory Lanker.

"We could advance our ideas to create an application that helps online retailers reduce returned shipments and increase profit margins," he said.

Between April 2 and May 14, teams worked at their respective universities to develop their probability predictions.

"Teams submitted return probabilities for approximately 50,000 purchases made in one month using data from approximately 481,000 orders from the previous 12 months," Lanker said.

"They used 12 variables that characterize the customer information — such as age, location and purchase history — and information about ordered items — such as size, color, price, etc."

Lanker said that the basis of Iowa State's technical solution was "to fully characterize customer behavior, which we did using advanced statistical learning concepts on the provided history of purchases. Once we successfully characterized customer behavior, we could then best predict whether a new purchase would be returned."

"This was specifically a student contest," said Steve Vardeman, University Professor of statistics and industrial engineering. "The team had no direct faculty input on the problem. They organized and executed their solution entirely on their own."

A jury scored all 57 submitted solutions, and invited the top 10 teams to Berlin to present their solution methods at the Prudsys User Days conference. Each team gave a 10-minute presentation.

The top-place Iowa State team received 2,000 euro prize money (about $2,700) and a plaque. No other American university placed in the top 20. The next highest were Northwestern University (24th place) and the University of Southern California (36th place).

Basulto-Elias, Yin and Lanker went to Berlin for the presentation and announcement. Final team rankings were announced beginning with 10th place.

"Before long, fifth place was announced and it wasn't us, so I knew we did better this year," Lanker said. "When it was down to two teams, (Prudsys organizer) Jens Scholz said, 'The United States lost in the World Cup last night,' and I thought, 'Well this is us, we finished second,' but then he added, 'But a United States team has won the 2014 Data Mining Cup!'"

Lanker says the shock has not worn off yet. He attributes the team's success to multiple weekly team meetings that were well attended at the end of the semester, demonstrating the "dedication we all had to our team's success."

"As a leader, I stressed sticking to a schedule so we didn't run out of time, and involving everyone in discussions about making the many important statistical decisions," Lanker said. "The level of teamwork was extraordinary ... with many large contributions from all members."

Contacts

Quick look

A team of ISU graduate students topped 98 universities from 28 countries to capture first place in the 15th annual Data Mining Cup. It is the first time a U.S. team has won. A leading European data mining company sponsors the intelligent-data analysis competition for universities "to identify the best up-and-coming data miners." Teams had six weeks to develop a solution for a data mining problem about optimal return prognosis.

The team

Guillermo Basulto-Elias

Fan Cao

Xiaoyue Cheng

Marius Dragomiroiu

Jessica Hicks

Cory Lanker

Ian Mouzon

Lanfeng Pan

Xin Yin

Quote

"We could advance our ideas to create an application that helps online retailers reduce returned shipments and increase profit margins."