The Lessons of Moneyball for Big Data Analysis

Long before "Big Data" was cool, Paul DePodesta brought it to the big leagues. And today, his story will be told on the big screen in "Moneyball." In a presentation at the Strata Summit in New York, DePodesta reflected on the role of performance analysis in baseball, and lessons that can be applied to data-driven organizations.

Brad Pitt and Jonah Hill in the film "Moneyball," which hits theaters today. Hill's character is based on Mets executive Paul DePodesta, who spoke at the Strata Summit this week.

Long before "Big Data" analysis was cool, Paul DePodesta brought it to the big leagues. And today, his story will be told on the big screen.

A phase of DePodesta's career is depicted in the movie "Moneyball," which premieres today on more than 3,800 screens around the country. The film is based on the best-selling 2003 book in Michael Lewis chronicled the data-driven resurgence of the Oakland A's engineered by A's general manager Billy Beane and DePodesta, who used computer analysis to identify undervalued players. The character based on DePodesta has been renamed Peter Brand and is played by Jonah Hill.

In a presentation Tuesday at the Strata Summit in New York, DePodesta, who is now Vice President for Player Development for the New York Mets, reflected on the role of performance analysis in baseball and lessons that can be applied to data-driven organizations. When he arrived in Oakland, DePodesta recalled, small-market teams like the A's with limited budgets found themselves outgunned in bidding wars with wealthier teams in markets like New York and Boston.

"We had to come up with a different way," said DePodesta. "It was like preparing a gourmet meal, but having to shop at 7-11."

Data vs. Scouting Subjectivity
The solution embraced by Beane and DePodesta was influenced by a school of baseball statistical analysis known as sabermetrics (a reference to the Society for American Baseball Research), which was often at odds with traditional methods of scouting players.

"Subjectivity ruled the day in evaluating players," he said. "We had a completely new set of metrics that bore no resemblance to anything you’d seen. We didn’t solve baseball. But we reduced the inefficiency of our decision making."

Speaking to a crowd of executives and data scientists, DePodesta discussed the process of making those data -driven decisions, and how to avoid analytical errors that could lead to bad conclusions. In many instances, the challenge is in taking a clear-eyed view of the data - which often involves filtering out emotional responses to data and player performance.

"We constantly seek causal relationships, and we can be tricked by them," said DePodesta. "Often times we get tied to things, and don’t necessarily know why."

Common Biases in Data Analysis
It's easy to develop "affirmation bias," DePodesta said. "Once we’ve made up our minds, we resist information that doesn’t agree with our conclusion," he said.

A particular problem in baseball is "appearance bias" - the notion that some athletes look more like great baseball players than others. It's also an issue in business, DePodesta said, citing a data point from Malcolm Gadwell on height and business success. Gladwell found that although just 3.9 percent of American males are 6-foot-2 or taller, about 30 percent of Fortune 500 CEOs are 6-foot-2 or taller.

Making good decisions meant stripping away those biases.

"We turn to data as our flashlight in the cave – our guiding light," DePodesta said. "We said 'unless we can prove it, we’re not going to believe it.' We had to be absolutely relentless in asking the naïve question. The only thing we were wed to was the idea of being open-minded."