"Moneyball" 2.0 and the NBA

USC Viterbi scientists to crunch the largest amount of basketball data ever gathered

Last night, Game 1 of the NBA Finals, the stage was set for the first chapter in the LeBron James-Kevin Durant wars. But the hidden narrative, the one that may forever change the National Basketball Association, is the rise of computer science.

For the past three seasons, the Oklahoma City Thunder have sought a competitive advantage in SportVU, a new optical tracking technology created for the Israeli military. But instead of tracking incoming missiles, the video cameras mounted high in the rafters of Chesapeake Energy Arena are tracking players like Durant and Russell Westbrook.

Boris Diaw of the San Antonio Spurs shoots as Oklahoma City Thunger's Kendrick Perkins and Serge Ibaka defend the basket during Game 5 in the NBA Western Conference finals. (Photo / Eric Gay, Associated Press)

It’s the latest mutation of “Moneyball”: an alchemy of video and computer algorithms that may be the largest amount of statistical basketball data ever captured. USC Viterbi School of Engineering computer scientists Rajiv Maheswaran and Yu-Han Chang are the first university research team in the country to be tasked with analyzing the revolutionary new SportVU optical tracking data.

Only a third of all NBA teams currently use SportVU, but that number is growing. In fact, if we include the Boston Celtics and the San Antonio Spurs, three of the four 2012 conference finals teams are now clients of the technology. Factor in the Dallas Mavericks — who were analyzing motion sensor data in their 2010-11 championship season — and the days of coaches and scouts just “trusting their gut” may be a thing of the past.

The video footage captured at the stadiums is fed to Chicago-based STATS, owners of the SportVU technology, whose image processing algorithms recognize — by each player’s faces — where individuals are on the court, how high the ball is bouncing, etc. Essentially, all that video is reduced to a massive data file with lots and lots of numbers.

For the average NBA coach or scout, the raw data is meaningless, a madman’s code, but to Maheswaran and Chang, it’s a gold mine of information: spatial dynamics, basketball trajectories, player velocities, movement tracks, etc. They’re like digital archaeologists, sifting through data, cleaning data, making pretty pictures of the data. All of this in the hopes of finding insights that will transform a 120 year-old game that Dr. James Naismith, a P.E. teacher, made up with recycled peach baskets and a soccer ball.

“What (Rajiv and Yu-Han) can do with the data,” said Brian Kopp, STATS’ vice president of strategy and development, “is far beyond what we or any NBA team can do with the data.”

Kopp first encountered the USC researchers at the 2011 Sloan MIT Sports Analytics Conference. He was looking for researchers and computer scientists to play with this new optical tracking data, but many were daunted by the prospect of dealing with “a million data records per game.” Maheswaran and Chang, however, research scientists with USC Viterbi’s Information Sciences Institute (ISI) have attacked big data problems ranging from “The World of Starcraft” to modeling cancer.

Said Maheswaran: “Whether it’s energy or health or social media, we can basically track almost every aspect of you all the time. And so it’s opening up all sorts of new problems when you come to, how do you deal with all this data?”

“‘Moneyball’ took data everybody had and just looked at it in a different way,” said Kopp. “That’s something we’re certainly trying to do, but we’re also looking at data you’ve never had before.”

One example is defense, which Maheswaran calls the “holy grail of basketball” — at least for analytics. He and Chang are the first research group in the country to be tasked with analyzing all the STATS optical tracking data from the 2011-12 season.

Said Maheswaran: “Box scores have a lot of offensive statistics, but very little about defense. Basically, with this data, we might be able to solve the holy grail, which is: what is good defense? What does it mean to play good defense? Because, if somebody runs around with the ball for 24 seconds and misses a shot, you can actually see what the players were doing to prevent a good shot from being taken. We can also really do very specific player profiling, if we do it right”

The current work is just the tip of the iceberg. The insights to be found are only limited by the creativity of the questions. What are the differences when Dwyane Wade drives left versus right? What is the impact of Russell Westbrook's driving ability on the Thunder's offensive efficiency? How much of the court can LeBron James defend effectively? Why does Serge Ibaka get so many blocks?

Said Chang, “When the data gets to us, it’s just this big log file of lots and lots of numbers. And so it’s a challenge to transform that into a meaningful data that we can actually apply algorithms to, something we can actually discern patterns from.”

And patterns are already starting to appear. The USC duo won best paper at last March’s MIT Sloan Sports Analytics Conference for their work on "Deconstructing the Rebound with Optical Tracking Data." After looking at over 11,000 shots from the past season, they found a very striking relationship between where players shot the ball and the odds they’ll grab the rebound.

“The interesting thing about rebounding,” said Maheswaran, “is that, when you look at the box score, they just tell you who got the rebound. They don’t tell you any context of the rebound. Like, was that the only guy standing there for a mile? Were there 10 thousand people trying to get the rebound? Where was everybody else standing?”

Their research showed that the mid-range jump shot — from 10 to 22 feet — may be the worst shot in all of basketball. Not only are they worth less than traditional 3-pointers and go in less than short-range shots, but it’s also the hardest shot to grab as an offensive rebound. So, it is unsurprising that a team like the Celtics, who take a lot of long-two pointers are also one of the worst offensive rebounding teams in the league — though strategy also plays an important role. By studying the trajectories of the ball, they discovered that to get offensive rebounds, players need to move far closer to the basket: 90 percent of all missed shots are reboundable within 11 feet of the basket.

Both Maheswaran and Chang are quick to point out, their analytics aside, “the most important thing to being successful is having very good players.” But that said, in a playoff environment between two evenly matched teams, every strategic insight on match-ups and player tendencies could be the difference between hoisting the Larry O’Brien Championship Trophy and four months of “what if?”

“For example,” said Maheswaran, “in the Celtics-Heat Game (2), it was 99-99. It went to overtime. One slight strategic shift could have made the difference between a win and a loss for either team. So that’s really where the value for a lot of this stuff is. When good teams play good teams in close games, you want every little advantage that you can get.”