Six Degrees of Nolan Ryan: Network Science Ranks Baseball Greats

Arguing over who’s the better player is as much a pastime as baseball itself.

Pedro Martinez or Sandy Koufax? Barry Bonds or Mickey Mantle? Of course it’s impossible to say. You can’t compare players from different eras. Heck, it’s hard enough to compare them between teams, in the same season.

But that doesn’t stop stat junkies from trying. They use equations only slightly less complex than credit derivatives formulas, and no more comprehensible to outsiders than the nose-tapping, ear-tugging, cap-pulling signals of a third base coach.

The latest entry to this field of Monte Carlo simulations and regression analyses and optimization algorithms was posted last Thursday in arXiv, an informal online repository of papers devoted to high-energy physics and self-organizing systems and other such knuckle-balling disciplines.

The study’s authors used network science to crunch the results of every single at-bat between 1954 and 2008 — and thanks to a baseball version of “Six Degrees of Kevin Bacon,” it’s possible to compare players who never faced each other.

“The time frame of baseball history we studied is connected, and those connections in principle could be leveraged to compare players across eras,” said Peter Mucha, a University of North Carolina mathematician. “There is at least one path between every pair of players in the network.”

The model begins with a statistic called Runs Until End, according to which the outcome of every at-bat is assigned a score derived from the expected number of runs a team would score before an inning’s end. The total RUE score of every pitcher-batter pair over the course of a season is calculated.

Then the model gets interesting. The final value for how Albert Pujols fared against Tim Lincecum is affected by how Lincecum matched up with Hanley Ramirez, which in turn is affected by how Ramirez did against Jamie Moyer, and so on down the line for every last at-bat in a season. Once those numbers are calculated, any two players can be compared.

And then the same process can be applied between seasons. Hank Aaron can be set beside Barry Bonds — not just according to how they each did against Nolan Ryan, though that would be part of the score, but according to how each did against every pitcher they ever faced, and how each of those pitchers did against every hitter, so long as some series of links connected the two sluggers.

So what are the results? The researchers have only released a few, preferring to wait until their model’s ready for the show. It’s still rough around the edges, having not yet learned to handle stolen bases, injuries and differences between ballparks. (Todd Helton, his numbers accumulated in the thin air and vast gaps of Denver’s Coors Field, currently surpasses Mickey Mantle.) Many contemporary stars benefit from having their numbers not yet reflect the performance decline of age, and the defensive side of the game is ignored.

Another factor that’s not accounted for is the number-skewing effects of performance-enhancing drug use. “I don’t think we can say anything at all about PEDs, other than where the impact is that might be quantified. If I have a match up between Pedro Martinez and Rafael Palmeiro, maybe that value would have been different,” said paper co-author Mason Porter, an Oxford University mathematician.

But for all those caveats, the proof-of-principle numbers are fun. Barry Bonds is indeed best hitter. His godfather, Willie Mays, is sixth-best, and beats out Alex Rodriguez. Frank Viola won the Cy Young award in 1988 despite being the season’s 24th-ranked pitcher. And Pedro Martinez is the best pitcher of the modern era.

The model also ranks Bert Blyleven, considered the greatest pitcher not elected to the Hall of Fame, ahead of Hall members Steve Carlton, Phil Niekro and Don Sutton.