Category Archives: Sports

Post navigation

Well they’ve been running around on the flat expanses of the early Holocene lake bed with impressively large machines, whacking down and gathering the soybeans and corn. This puts dirt clods on the roads that cause one on a road bike at dusk to weave and swear, but I digress. The Farmer’s Almanac indicates says that it must therefore be about World Series time, which in turn is just about approximately guaranteed to initiate various comments regarding the role of luck, good or bad, in deciding important baseball game outcomes.

There are several important things to be blurted out on this important topic and with the Series at it’s climax and the leaves a fallin’ now’s the time, the time is now.

It was Bill James, the baseball “sabermetric” grandpa and chief guru, who came up with the basic idea some time ago, though not with the questionable terminology applied to it I think, which I believe came later from certain disciples who knelt at his feet.

The basic idea starts off well enough but from there goes into a kind of low-key downhill slide, not unlike the truck that you didn’t bother setting the park brake for because you thought the street grade was flat but found out otherwise a few feet down the sidewalk. At which point you also discover that the bumper height of said truck does not necessarily match that of a Mercedes.

The concept applies not just to baseball but anything involving integer scores. Basic idea is as follows (see here). Your team plays 162 baseball games, 25 soccer matches or whatever, and of course you keep score of each. You then compute the fraction S^x/(S^x + A^x), where using the baseball case, S = runs scored, A = runs allowed and x = an exponent that varies depending on the data used (i.e. the teams and years used). You do this for each team in the league and also compute each team’s winning percentage (WP = W/G, where W = number of wins and G = games played in the season(s)). A nonlinear regression/optimization returns the optimal value of x, given the data. The resulting fraction is known as the “pythagorean expectation” of winning percentage, claiming to inform us of how many games a given team “should” have won and lost over that time, given their total runs scored and allowed.

Note first that the value of x depends on the data used: the relationship is entirely empirically derived, and exponents ranging from (at least) 1.8 to 2.0 have resulted. There is no statistical theory here whatsoever, and in no description of “the pythag” have I ever seen any mention of such. This is a shame because (1) there can and should be, and (2) it seems likely that most “sabermatricians” don’t have any idea as to how or why. Maybe not all, but I haven’t seen any discuss the matter. Specifically, this is a classic case for application of Poisson-derived expectations.

However the lack of theory is one, but not really the main, point here. More at issue are the highly questionable interpretations of the causes of observed deviations from pythag expectations, where the rolling truck smashes out the grill and lights of the Mercedes.

You should base an analysis like this on the Poisson distribution for at least two very strong reasons. First, interpretations of the pythag always involve random chance. That is, the underlying view is that departures of a given team’s won-loss record from pythag expectation is always attributed to the action of randommness–random chance. Great, if you want to go down that road, that’s exactly what the Poisson distribution is designed to address. Secondly, it will give you additional information regarding the role of chance that you cannot get from “the pythag”.

Indeed, the Poisson gives the expected distribution of integer-valued data around a known mean, under the assumption that random deviations from that mean are solely the result of sampling error, which in turn results from the combination of Complete Spatial Randomness (CSR) complete randomness of the objects, relative to the mean value and the size of the sampling frame. In our context, the sampling frame is a single game and the objects of analysis are the runs scored, and allowed, in each game. The point is that the Poisson is inherently designed to test just exactly what the SABR-toothers are wanting to test. But they don’t use it–they instead opt for the fully ad-hoc pythag estimator (or slight variations thereof). Always.

So, you’ve got a team’s total runs scored and allowed over its season. You divide that by the number of games played to give you the mean of each. That’s all you need–the Poisson is a single parameter distribution, the variance being a function of the mean. Now you use that computer in front of you for what it’s really ideal at–doing a whole bunch of calculations really fast–to simply draw from the runs scored, and runs allowed, distributions, randomly, say 100,000 times or whatever, to estimate your team’s real expected won-loss record under a fully random score distribution process. But you can also do more–you can test whether either the runs scored or allowed distribution fits the Poisson very well, using a chi-square goodness-of-fit test. And that’s important because it tells you basically, whether or not they are homogeneous random processes–processes in which the data generating process is unchanging through the season. In sports terms: it tells you the degree to which the team’s performance over the year, offensive and defensive, came from the same basic conditions (i.e. unchanging team performance quality/ability).

The biggest issue remains however–interpretation. I don’t how it all got started, but somewhere, somebody decided that a positive departure from “the pythag” (more wins than expected) equated to “good luck” and negative departures to “bad luck”. Luck being the operative word here. Actually I do know the origin–it’s a straight forward conclusion from attributing all deviations from expectation to “chance”. The problem is that many of these deviations are not in fact due to chance, and if you analyze the data using the Poisson as described above, you will have evidence of when it is, and is not, the case.

For example, a team that wins more close games than it “should”, games won by say just one or two runs, while getting badly smoked in a small subset of other games, will appear to benefit from “good luck”, according to the pythag approach. But using the Poisson approach, you can identify whether or not a team’s basic quality likely changed at various times during the season. Furthermore, you can also examine whether the joint distribution of events (runs scored, runs allowed), follows random expectation, given their individual distributions. If they do not, then you know that some non-random process is going on. For example, that team that wins (or loses) more than it’s expected share of close games most likely has some ability to win (or lose) close games–something about the way the team plays explains it, not random chance. There are many particular explanations, in terms of team skill and strategy, that can explain such results, and more specific data on a team’s players’ performance can lend evidence to the various possibilities.

So, the whole “luck” explanation that certain elements of the sabermetric crowd are quite fond of and have accepted as the Gospel of James, may be quite suspect at best, or outright wrong. I should add however that if the Indians win the series, it’s skill all the way while if the Cubs win it’ll most likely be due to luck.

Sports are interesting, and one of the interesting aspects about them, among many, is that the very unlikely can sometimes happen.

The Louisville Cardinals baseball team went 50-12 this year through the regular season and first round (“regional”) of the NCAA baseball playoff. Moreover, they were an astounding 36-1 at home, the only loss coming by three runs at the hands of last year’s national champion, Virginia. Over the last several years they have been one of the best teams in the country, making it to the College World Series twice, though not yet winning it. They were considered by the tournament selection committee to be the #2 team in the country, behind Florida, but many of the better computer polls had Louisville as #1.

The college baseball playoff is one of the most interesting tournaments out there, from a structural perspective. Because it’s baseball, it’s not a one-loss tournament, at any of the four levels thereof, at least since 2003. Those four levels are: (1) the sixteen regionals of four teams each, (2) the eight “super regionals” determined by the regional champs, and (3) two rounds at the College World Series in Omaha, comprised of the eight super regional champs. A team can in fact lose as many as four games total over the course of the playoff, and yet still win the national championship. It’s not easy to do though, because a loss in the first game, at either the regional level, or in round one of the CWS, requires a team to win four games to advance, instead of three. In the 13 years of this format, only Fresno State has pulled that feat off, in 2008.

In winning their regional and being one of the top eight seeds, Louisville hosted the winner of the Nashville regional, which was won in an upset over favorite Vanderbilt, by UC Santa Barbara of the Big West Conference. That conference is not as good top to bottom as is the Atlantic Coast Conference (ACC) that Louisville plays in, but neither is it any slouch, containing perennial power CSU Fullerton, and also Long Beach State, who gave third ranked Miami fits in its regional. More generally, the caliber of the baseball played on the west coast, including the PAC-12 and the Big West, is very high, though often slighted by writers and pollsters in favor of teams from the southeast (ACC and Southeast (SEC) conferences in particular). Based on the results of the regional and super regional playoff rounds, the slighting this year was serious: only two of the eight teams in the CWS are from the ACC/SEC, even though teams from the two conferences had home field advantage in fully 83 percent (20/24) of all the first and second round series. Five schools west of the Mississippi River are in, including the top three from the Big 12 conference.

In the super regional, the first team to win twice goes on to the CWS in Omaha. To make a long and interesting story short, UCSB won the first game 4-2 and thus needed just one more win to knock out Louisville and advance to the CWS for the first time in their history. Down 3-0, in the bottom of the ninth inning, they were facing one of the best closers in all of college baseball, just taken as the 27th overall pick in the MLB amateur draft by the Chicago White Sox. Coming in with 100+ mph fastballs, he got the first batter out without problem. However, the second batter singled, and then he began to lose his control and he did exactly what you shouldn’t do: walked the next two batters to load the bases. The UCSB coach decided to go to his bench to bring in a left-handed hitting pinch-hitter, a freshman with only 26 at-bats on the season, albeit with one home run among his nine hits on the year.

Well we’re long overdue for another installment of “Ask the Self-Appointed Experts“, or at least for the question part. In today’s edition a follower from Two Forks Montana wrestles with the following conundrum, inviting others to help, or at least reassure him that he is not confused alone. He writes:

I know this issue is all over AM talk radio but the inmates pretty clearly run the asylum there and I’m more confused on the following issue than ever.

It is well known that, given a known rate process, the gamma distribution defines the expected values from random starting points to the nth “nearest” object, and so inversely, we can estimate unknown rates by such values. For example, in a forest of randomly distributed trees, the circular area, a, defined by the distance to the nth closest tree, will estimate tree density. But as Skellam (1952), Moore (1954) and Pollard (1971) showed analytically, these estimates are biased, in inverse magnitude to the value of n, specifically, as n/(n-1) for n > 1. Thus, the distance to, say, the 2nd closest tree will correspond to the area represented by one tree, not two. All very well and good.

Now, the mean of the integration of the gamma distribution from 0 to 1, for a known rate, should return the mean area a, but when I closely approximate the integral (in R, which can’t integrate), I seem to get bias-corrected values reflecting the rates, rather than the biased values reflecting the areas (a) to the nth object. I’m flummoxed and not a little aggravated. Do they know what they’re doing there at R headquarters, or is it me that’s got all turned about the wrong way? If I can’t even trust the values from the statistical distributions in R, then just what can I trust there? I tried taking my mind off the matter by following the Winnipeg Jets (WJPCA, 2015), but man, one can just take only so much of that and I sure as hell ain’t going to follow Edmonton. The ice fishing seems to help, at least until the alcohol wears off, but really there should be an 800 number I think. If you can put me onto some clues I would be most grateful.

Black cherry trees in ripe fruit and goldenrod in full bloom and that can only mean one thing, and no I’m not talking about football season.

They opened the Summerfield Dam gates at 7AM this morning; for godsake get down, or up, or over there in the next six weeks and try to kill yourself with all the others if you can. There will be a party, a rather large and extended one and it’s anybody’s guess at to whether river flow will exceed that of beer. Now, when on the river, try to remember, apriori if possible, that plastic (or rubber) side down is optimal, that rocks are typically fairly hard and to take a big gulp of air before you go under. Remembering these aposteriori is fairly automatic. Everything else is open to personal interpretation.

Best to put in downstream from the nozzle a bit, although I’m sure it’s been tried:

Ezekiel Elliott, Ohio State, breaks through the line in Ohio State’s NCAA football championship game victory over Oregon Monday night, capping an improbable run to the title in the first year of the college football playoff. Photo by Kirby Lee, USA TODAY sports

Third baseman Pablo Sandoval hits the ground after catching a foul pop fly for the last out of the 2014 World Series, as the Giants erupt from their dugout.

World Series champs for the third time in the last five years (every other year), those “scratch ’em ’till they bleed to death” San Francisco Giants have done it again. Not quite a dynasty yet, but you have to go back fifteen years or so to find a team better at consistently winning games when they really count, over several years, than does this group of characters. When all was said and done, it came down to having the best World Series pitcher in a long, long time on your side.

For the record, I picked the Giants in six. Matt also actually picked the Giants but then went with his “logical opposites theory” to go with the Royals. Harold and Clem, well they were just patently off the deep end 🙂

Predictions for 2015 and 2016 are now open. For 2015 I’m picking anyone except the Giants, and for 2016, I’m going Giants 🙂

So suppose you have your basic Major League Baseball (MLB) structure, consisting of two leagues having three divisions of five teams each, each of which plays a 162 game, strongly unbalanced*, schedule. There are, of course, inherent quality differences in those teams; some are better than others, when assessed over some very large number of games, i.e. “asymptotically” **. The question thus arises in your mind as you ponder why the batter feels the need to step out of the batter’s box after each pitch ***: “how often will the truly best team(s) win their league championships and thus play each other in the World Series”. The current playoff structure involves having the two wild card teams play each other in a one game elimination, which gives four remaining playoff teams in each league. Two pairings are made and whoever wins three games advances to the league championship series, which in turn requires winning four games.

I simulated 1000 seasons of 162 games with leagues having this structure. Inherent team quality was set by a normal distribution with a mean of 81 wins and a standard deviation of ~7, such that the very best teams would occasionally win about 2/3 (108) of their games, and the worst would lose about that same fraction. Win percentages like those are pretty realistic, and the best record in each league frequently falls between 95 and 100 wins.

Results:
1) The truly best team in each league makes the playoffs about 80 percent of the time under the current system, less when only four teams make it.
2) That team wins its league championship roughly 20 to 30 percent of the time, getting knocked out in the playoffs over half the time. It wins the whole shebang about 10 to 15 percent of the time.
3) Whenever MLB expands to 32 teams, in which the playoff structure will very likely consist of the four division winners in each league and no wild card teams, the truly best (and second and third best) teams in each league will both make the playoffs, and advance to the World Series, less frequently than they do now.

This type of analysis is generalizable to other types of competitions under structured systems, at least for those in which the losers of individual contests live to fight another day, or if they don’t, are replaced by others of the same basic quality. The inherent spread in team quality makes a very big difference in the results obtained however. It’ll apply very well to baseball and hockey, but not so well to the NBA, for example.

So the next time an MLB team wins it’s league, or the World Series, and you’re tempted to think this means they must be the best team in the league (or MLB overall), think about that again. Same for the NHL.

* Currently, each team plays around 3 times as many games against each intra-division opponent as inter-division opponents, not even including the 20 inter-league games (which I’ve ignored in these analyses, assuming all games are within-league).
** These records are conceived of as being amassed against some hypothetical, perfectly average team. This team is from Lake Wobegon Minnesota.
*** It is perfectly OK to think other things of course, and we need not worry about the particulars of the language embodied therein.

I’ve discussed no baseball here yet, which is kind of surprising, given that I’ve been a big fan all my life. I played a lot growing up, through high school and even a little in college and afterwards. If I had the time, I would likely start a blog just devoted strictly to baseball (and not just analysis either), because I have a lot to say on a lot of topics. But alas…

To me, the real interest in any sport comes from actually playing the game, not watching it, and I watch very little baseball (now) because the games are just too time consuming (though I still have a hard time refraining in October). When I do watch, I’m not obsessively analytical–that takes the fun out of it for me. It’s an athletic contest, not a statistics class; I want to see the center fielder go full speed and lay out for a catch, or a base thief challenge the pitcher or whatever, not sit there with numbers in my head. Analysis is for later, and I do like it, so I wade in at times, thereby joining the SABR-metric (or “sabermetric”) revolution of the last 3-4 decades (the Society for American Baseball Research (SABR), initiated much of this). And baseball offers endless analytical opportunities, for (at least) two reasons.

Yesterday I had an interesting experience which I’m not sure how to fully interpret.

I got hit and knocked down by a large SUV while on my bike ride. I’ve ridden unknown thousands of miles in my life and this is the first time I’ve ever been hit. It happened in an unusual way; most riders get hit from behind by a vehicle moving ~ near the speed limit. I was lucky–even though I got broadsided from the left, the vehicle was only going maybe 5-7 mph (but accelerating), and I was just starting from a cold stop, barely moving. But I was also in the act of clipping into the pedals, and thus not freely mobile. I was however able, given that I was looking straight at the oncoming vehicle, to turn slightly to the right and get my left hand off the handlebars just enough to prevent a more serious collision. The impact spun me around about 270 degrees and I landed on my left side. What happened next was the interesting part though.

I wasn’t hurt but was stunned and laid on the ground for a few seconds trying to comprehend what had happened. Cars were lined up at a red light and one of drivers yelled out and asked if I was OK. I said yeah I thought so, although I wasn’t 100% sure. I saw the SUV pull over–no chance for a hit and run incident at a red light with clear witnesses. Then I see someone with some type of badge on their shirt, though not in a police uniform, walk up to me and say “What do you need”? Paramedic, already? I’m still trying to unclip my right foot from the pedal so I can get up off the roadway, which I finally do.

As I get up I notice a gun on his hip and then realize this is the person who hit me. FBI agent, unmarked car [correction: it was a Homeland Security agent]. I sort of spontaneously say something like “What the hell are you doing you idiot, didn’t you see me?“, among other things. His first response is “You’re supposed to cross the street at the crosswalk up there”. Obvious nonsensical bullshit; we were both emerging from parking lots, on opposite sides of the road, and trying to initiate left hand turns onto the road. We were both in the roadway, and he just simply wasn’t watching, presumably looking over his shoulder to see if there was any traffic coming. I’m just lucky the light 30 m away was red and therefore he didn’t accelerate even more.

The several witnesses to the incident were now departing and I realized immediately that this guy was going to try to deny any responsibility. What I said next is more or less unprintable, FBI agent and gun or no. He said some other nonsense, mainly that he was in fact watching where he was going, the logical conclusion from that being that he must then have hit me on purpose, which we can be pretty sure an FBI agent would not do. I was busy inspecting my bike, which since it took the brunt of the collision, I was sure must be damaged. It’s a LeMond, which went out of business several years ago due to Trek/Armstrong’s reaction to LeMond’s doping allegations against Armstrong. So getting a replacement frame is limited to what you can find on E-bay and similar sites, and also expensive. Amazingly, and much to my great relief, the bike did not appear to suffer any obvious structural damage. The front wheel wasn’t even out of true. Apparently the impact point had been the left ram-horn of the handlebars, and it just flipped me around. Hairline micro-fractures in the frame are still a possibility though; these will only become apparent once they propagate and grow under riding stresses.

The Sheriff showed up about 15 minutes later and filled out a report. He seemed like a good guy, and sympathetic to my version of events, but nevertheless he refused to assign fault to the driver, saying something to the effect that the party further out into the roadway–which was the driver–has the right of way. I don’t think this is correct for a couple of reasons, but there was nothing I could do, given that any witnesses were gone. I was just so glad that neither my bike nor I were damaged that I just didn’t want to press it. Plus there was only about an hour of daylight left and I just wanted to get back on and ride, which is what I did. I even shook the agent’s hand before leaving, which kind of surprised me actually.

But it’s incidents like this, among many others, that make me increasingly suspicious of the trustworthiness of human beings generally. On the other hand, it makes me think of friend Alan Reinbolt, who only a couple of years after I did mine, was hit and killed by a large truck on his cross-the-country bike ride, and the two bikers who’ve already been killed in the county by drivers this year. In those contexts, I’ve been very fortunate indeed.

Recently, studies aimed at better quantifying and communicating the “consensus” on climate change have become more popular. To take advantage of the increasing monetary flow in this direction, and to advance the science even further, our institute, meaning me, have been designing a new research protocol. In the spirit of the “open science” movement, we/I thought it would be good to get some public feedback on potential flaws and possible improvements to this protocol. There are several advantages of this “crowd sourcing” approach to blog science, not the least of which is avoidance of placing a rather tacky “Tip Jar” icon on one’s home page.

What we want to know is what sort of message will really stick with people, make them think. New research has shown that communication methods are very important in this regard. For example, van der Linden et al. (2014) showed that “simple text” and pie charts are slightly superior to mixed metaphors involving doctors’ opinions regarding bridge failures. This is an important advancement that we want to build upon, leveraging the moment to effectively effect an optimal messaging paradigm that cannot be falsified.

One improvement that can be made involves how the experimental units are chosen. van der Linden et al. (2014) queried about 1000 volunteers, but these were chosen from among a “nationwide panel of people who are willing to participate in online surveys”. Those people want to be asked random questions by unknown people having unknown motives, which being abnormal, is not representative of the entire population. A better approach is to just target everybody, and a good way to do that is to confront them on the street while they are minding their own business, before they have any idea what you’re up to really.

Another issue is the treatments themselves; van der Linden et al. used a set of treatments involving pie charts, metaphors and numbers. This is nice but c’mon we put a man on the moon; we believe we can achieve more here. Our design applies less pedestrian treatments to these pedestrian experimental units, each chosen after careful thought. Our procedure is similar however. That is, we first ask what each unit believes that scientists believe about the climate, record the response, then apply the randomly chosen treatment, repeat the original question, and record the second response. Pretty simple really; the whole thing hinges on the treatments, which are:

Treatment 1:
The unit is shown a pie chart with the AAAS logo below it, indicating that 97 percent of scientists believe that climate change is real.

Treatment 2:
The unit is shown a pie chart with images of kittens and Jesus below, with statement as above.

Treatment 3:
The unit is shown a rerun of an old Sesame Street episode featuring the numbers 9 and 7, in sequence, over a backdrop picture of a hurricane.

Treatment 4:
The unit is informed that only Australian Aborigines and Death Row inmates are unaware that 97 of scientists believe that climate change is real.

Treatment 5:
Free dinner and beer at a nice local pub is promised to the unit for all answers over 95 regarding what percentage of scientists believe in climate change.

Treatment 6:
“97% consensus” and “Mother” are tatooed prominently on the unit’s right inner forearm.

Treatment 7:
The unit’s face is situated ~ 0.3 meters proximate to the front end of an OSHA-certified megaphone and unit is informed three times, at the “riot control” setting, that 97 percent of scientists believe in climate change.

Treatment 8:
Justin Verlander is placed approximately 60.5 feet from unit, facing, and delivers a ~97 mph fastball to unit’s upper left rib cage quadrant, while yelling “Get a real-time feeling for what 97 is all about partner”

Many more treatments than this are possible of course. For example we can certainly improve upon the “Indirectness Factor” (IF) one or more steps by asking people what they think other people think scientists believe about the climate, what they think they would think if exposed to a particular treatment, and so forth. There is a rich garden for potential studies following this path.

Thank you in advance for any contributions to the science that you may have, the world will be a better place for it. If you would like to donate $1000 or more that would be fine as well.

Tonight, the Stanley Cup Finals of the National Hockey League begin, between the NY Rangers and LA Kings. The first three rounds have been especially entertaining, especially in the Western Conference, where the Kings have pulled off some truly amazing feats in running through a gauntlet of three of the league’s top teams, San Jose, Anaheim and Chicago. The Kings are fairly heavy favorites to win their second Cup in three years.The LA Kings pulled another rabbit out of their helmets against the Chicago Blackhawks, to advance to the Stanley Cup Finals against the New York Rangers.

One topic I’m always interested in is how sports leagues and their playoffs are structured, one that I don’t think gets nearly enough attention compared to other issues, like personnel, trades, salaries, team strategy, etc. The way that teams are grouped into divisions/conferences can have a very definite and strong effect on who does and does not make the playoff round, although the NHL is better than say, baseball in that respect, and definitely far better in terms of the structure of the playoff rounds themselves.

Suppose you’re sitting there watching the Stanley Cup playoffs and you realize all at once, “Hey, I bet I can estimate an important missing parameter in old tree data sets using probability theory, and then use it to evaluate, and make, tree density estimates“. Of course that issue is plastered all over the internet–everybody’s sick of it frankly–and there are only about 2.36 million NHL playoff games, so you might not act on your idea at all. But you also might jump up and rush to the computer, startling the dog in the process, open up R and start whacking away at the keyboard, and be swearing furiously and grabbing your head in no time at all. There’s big, big money to be made on this after all.

Now the question soon arises, “Just how the hell am I supposed to go about this anyway?”. Well the answer to that is “Carefully, with patience, gobs of it”. Because you’ll learn a lot about some important science concepts if you do. And of course, there’s the money, gobs of it, that should motivate a fair bit.

Some background here. One can estimate the density (or “intensity” in statistical parlance) of objects if you measure the distance from random points to a sample of those objects. The accuracy and precision of the estimate will depend on the sample size, the objects’ spatial pattern, and the rank order of the distances of the objects you measure to. All very exciting stuff, but old hat.

Now, suppose some abject clowns went out and purposely didn’t record those rank distances on like a whole bunch of trees over a biggish area, like, say, two-thirds of the United States, simply because they were distracted by avoiding arrows from the Native Americans, or felt the malaria and blood loss from the mosquitoes over the 50 miles of swamp crossed, or some other lame excuse like that. Well, now if that isn’t frustrating for keyboard stats jockeys! The complete lack of concern for our needs 150 years later trying to get publications! Wankers!

Should time allow, we shall investigate this question in coma-inducing detail in future posts. We shall however take periodic breaks to watch the Red Wings dispose of the Beantown “Bruins” in short order, and of course for beer runs. Might want to replace the batteries in the remote control if you get a chance.