Tag: statistics

Every fall, the 120 teams in the NCAA Football Bowl Subdivision (FBS) play 12 or so weeks of college football. At the end of this regular season, the Bowl Championship Series (BCS) releases its final rankings; the teams ranked 1 and 2 are awarded the privilege of competing for the BCS National Championship.

And that’s it.1

The other bowl games select their participants in rather arbitrary fashion, whether by historical conference affiliations (most famously the venerable Rose Bowl Game, which historically pits a team from the West/PCC/AAWU/Pac-8/10/12 against one from the East/Big Nine/Ten), by selecting the best teams available (the bowls have an arcane but ostensibly logical selection hierarchy), or simply by ignoring all traditional rankings and picking the most financially lucrative matchup for the bowl game itself.

The nature of the championship (a single game between teams ranked 1 and 2 by the BCS) is rather frustrating because in almost all forms of competition the custom is typically to determine the champion by an elimination tournament. The college football model seems not only arbitrary, but unjustifiably so; often more than two teams (maybe many more) can make a reasonable case for being in the championship game. Consequently, the BCS receives considerable and (in my opinion) completely deserved criticism.

What baffles me the most, however, is the disdain for the use of computer models by the BCS. If anything, they are (or ought to be)2 the best part of the entire college football circus.

It is extremely difficult for humans to make dispassionate analyses. We struggle to identify the sources of our own biases, we subconsciously process information selectively, and we make mistakes. Computers do none of these things. They perform no more or less than the tasks with which they are entrusted, barring technical errors (which are exceedingly uncommon). Moreover, the decisive element of the “computer rankings” of the BCS is not the computers themselves (modern computers being more or less fungible), it is the mathematical formulae by which the rankings are computed. The entire endeavor can only be criticized on the basis of the soundness of said formulae.

And therein lies my primary objection to the way the BCS implements computer rankings, an objection that can hardly be expressed more eloquently or scathingly than Bill James already did in an article in 2009. What the BCS has right now is not a good representation of what mathematical and statistical modeling has to offer for college football, so to criticize it on the basis of its performance is akin to criticizing automobile safety on the basis of a 2007 Brilliance BS6 crash test. The computer models are hampered neither by any flaw inherent to the concept of computer rankings, nor by a lack of football knowledge on the part of their creators. Their shortcomings are symptomatic of an institutional sluggishness on the part of college football, wherein age-old truisms supersede contradictory evidence.

That most of the six computer models employed by the BCS are run by individuals who like the current system is not insignificant. Some of the justifications for the considerable role of human polls in the BCS ranking are downright silly. This gem appeared in a Daily Fix (a Wall Street Journal sports blog) post about the BCS computer models:

[Jeff Anderson, co-creator of the Anderson & Hester computer ranking] argues that human voters are better equipped to judge scores, and distinguish between a 24-14 game where the losing team scores two touchdowns in garbage time and a 24-14 team where the losing team trailed by three late but threw an interception returned for a touchdown while attempting to mount a game-winning drive. “If margin of victory is going to be included in any part of the rankings, it should be included only in the subjective part,” Anderson says. Others point out that in many other sports, playoff seedings are determined solely by won-loss record, and the computer rankings account for the unique nature of college football by accounting for strength of schedule.

“It’s a matter of sportsmanship,” [Bill Hancock, executive director of the BCS] says. ”You don’t want a team to run up the score on their opponents, merely so they can move up in the computer rankings.” [1]

So instead of giving the computer models the freedom to employ the soundest methods, the BCS bars them from considering the margin of victory, ostensibly to encourage sportsmanship. Yet it gives two thirds of the vote to humans, who will vote not only on the basis of margin of victory, but really on the basis of whatever the hell they feel like. How is that any more fair? And Jeff Anderson, are you sure computers can’t tell the difference between garbage time and a late win?

I would argue that most people vastly overestimate the value of human polls and desperately underestimate the extent of human biases, particularly their own. If you perceive a computational model to be biased, I can assure you it is not (unless it’s Richard Billingsley’s, but that’s for another time). You are biased.

From 2001 to 2004, the BCS gradually eliminated the the use of margin of victory in its computer models. It also doubled the weight of human polls (from 1/3 to 2/3) in 2004, largely in response to the controversy of a split championship between the BCS and the AP poll. The message sent by the BCS (and much of the media, and pretty much everyone else who supported the change) was that the computer models exist only to corroborate and legitimize the human polls. When the computer models diverge meaningfully from human polls or the hopelessly vague and utterly uninformative “eyeball test,” they are made the scapegoat and forced to fall in line.

Throughout this process, we’ve met the most resistance from the computer people,” [Grant Teaff, executive director of American Football Coaches Association] said. “But that’s their deal. They talk about numbers and figures, and we talk about our responsibility to the game and responsibility to coaches and players emotionally. And besides, the polls that are done by the coaches and the writers will probably still make margin of victory a factor still anyhow. [2]

Responsibility to the game and coaches and players emotionally? What does that even mean? This quotation says everything you need to know about the BCS. Yes, the polls will indeed probably still make margin of victory, and the relative strength of the conferences in 1997, and in which time zone the games were played, and how the outcome will impact the coach’s own national championship game, and whether the team’s conference is spelled SEC, and on which team a writer’s son is a third-string kicker, a factor. And they will do it arbitrarily, without telling you. And if the computers don’t match the completely transparent and fair gold standard set by the polls, it’s because they were programmed by some scrawny, glasses-wearing, pocket-protecting brainiac at MIT who doesn’t know anything about what it’s like to coach or play football. Right?

1Okay, well, other polls (notably the Associated Press, a fascinating tale in its own right) rank teams outside of the BCS, and it is possible for the final AP champion to differ from the BCS champion, but the latter arguably carries more weight de facto.2If all of the computer models employed were methodologically sound, I would not qualify this statement; sadly this is not currently the case, for all the reasons outlined above.

So before you start building your vault, a few points to keep in mind:

1. First of all, calm down.

2. There is still no compelling reason to believe that this strain, influenza A(H1N1)1, is significantly more virulent than a typical seasonal influenza.

Your run-of-the-mill flu season has a case-fatality ratio of very roughly 0.1%, or 32% of hospitalizations [1]. Let’s narrow that to the 19-to-64 demographic, which could be most susceptible to this current outbreak (an unusual pattern seen in pandemic flus and likely caused by an overly robust immune response in healthy adults [2]), and is least susceptible to the seasonal flu. Within that population, CFR is about 0.03%, or 7% of hospitalizations [1]. Past influenza pandemics have had CFRs of anywhere from 0.1% in the 1957 and 1968 outbreaks to 2.5%2 in the 1918 “Spanish flu” [3].

In contrast, the CFR in the case of influenza A(H1N1) could be anywhere from 3.1% (an upper bound, based on a maximum of 8 laboratory-confirmed influenza A(H1N1) deaths out of a minimum of 257 laboratory-confirmed influenza A(H1N1) cases worldwide, from WHO figures available at time of writing) to 0.0016% (a very conservative lower bound, based on an approximate hospitalization rate of 0.4% of all cases in the 19-64 demographic in a typical flu season [1], with which an attack rate was extrapolated from 2000 estimated hospitalizations in Mexico).

Using figures that are quitepopular in the press gives a CFR of about 7.5% in Mexico (some 150 deaths in 2000 hospitalizations, the latter very dubiously assumed to be equal to the number of cases). Because of the unreliability of the “suspected” case count in Mexico, I am not convinced that this particular CFR estimate is useful at all, even as an upper bound. It’s far more likely that the actual CFR falls somewhere between 0.0016% and 3.1%.

All of these numbers don’t tell us very much (except that it is highly unlikely that this is some epic killer virus), but that’s exactly the point. Just because (thanks in large part to the surveillance infrastructure put into place in the wake of the “avian flu” panic) this (potential) pandemic has been spotted, there is no reason to assume that we have any solid evidence suggesting that the virulence of this pathogen is particularly high. However, this may very well change as time goes on and as the situation becomes clearer, and it certainly does not mean that the virus is not dangerous.

3. Virulence is not the same as pathogenicity. Perhaps more precisely, the concepts are not the same, though the terms may often become scrambled in the fray. The salient point is that while influenza A(H1N1) has proven highly pathogenic (i.e. it is highly infectious and spreads rapidly), there is not much evidence to suggest that it is especially virulent (i.e. it has not been associated with unusually high mortality or morbidity). So while governments everywhere are preparing for the possibility of a pandemic, the severity of the disease (to wit, the “causing serious illness” criterion from the linked WHO document) is far from clear at this point. And hopefully I was able to convince you in Point 2 that there is as yet no reason to suspect any greater virulence from this strain than a typical seasonal flu strain.

4. Influenza A(H1N1) has a few key differences to Severe Acute Respiratory Syndrome (SARS) and influenza A(H5N1) or “avian flu”. For one, both SARS and avian flu were much deadlier; the SARS outbreak in Hong Kong had a CFR of about 14-17% [4], while the avian flu has a CFR of something like 14-33% [3]. However, avian flu never demonstrated efficient human-to-human transmission, which made it a very deadly disease that was unlikely to spread quickly. Likewise, SARS has never been observed to be contagious before the onset of symptoms, which significantly increases the likelihood that a person at risk of transmitting SARS can be identified by basic surveillance. Influenza A(H1N1), while appearing (for now) to be far less virulent than either of these two recent serious respiratory disease outbreaks, is also considerably more likely to spread rapidly and become pandemic.

A confirmed case of S-OIV infection is defined as a person with an acute febrile respiratory illness with laboratory confirmed S-OIV infection at CDC by one or more of the following tests:

real-time RT-PCR

viral culture

A probable case of S-OIV infection is defined as a person with an acute febrile respiratory illness who is positive for influenza A, but negative for H1 and H3 by influenza RT-PCR

A suspected case of S-OIV infection is defined as a person with acute febrile respiratory illness with onset

within 7 days of close contact with a person who is a confirmed case of S-OIV infection, or

within 7 days of travel to community either within the United States or internationally where there are one or more confirmed cases of S-OIV infection, or

resides in a community where there are one or more confirmed cases of S-OIV infection.

You can make of that what you will. It seems to me that there is probably no logistical barrier preventing health care entities other than the CDC from confirming the influenza A(H1N1) subtype, except for one reason or another it doesn’t count as “confirmed” unless the CDC does it.

6. When I first began considering and looking into the actual severity of the whole “swine flu” panic, I thought exactly the same thing that Obama said earlier this week: this flu outbreak (and likely pandemic) is, based on the information we currently have, a cause for concern but not alarm.

If there is one good thing that has come out of what is arguably a gross overreaction by the American media, it is a heightened awareness of the importance of public health and good hygiene. So remember kids, listen to the President and wash your hands.

1I have used the nomenclature preferred by the World Health Organization as of 30 April 2009.

2The 2.5% CFR figure for the 1918 pandemic, though almost canonical, seems highly questionable given the estimates of 20-100 million deaths at a time when the world had a population under 2 billion. In any case, data from that pandemic are likely iffy at best.

Work finally begins on the Tbr2 project, focusing for now on the SVZ. Since this is really no longer my project it’s become considerably less exciting, but the pursuit of knowledge and other vague concepts remain intact.

A better understanding and awareness of history (and particularly the ability to think critically about such things) on the part of Americans would all but eliminate the possibility of a tragedy like the Bush Administration ever recurring. But that is nothing but a distant fantasy that will likely never be realized in this world.

Yet another area in which the United States of America leads the world. Americans sure do like being #1. I would write a paragraph or two on how terrible the system of “justice” is in the United States but I feel that this subject would more appropriately fall under Nate‘s “jurisdiction,” if you will.

In closing, I would appreciate any insight as to why the water in Drumheller Fountain was brown today.

The staggering amounts of money thrown around by corporations are simply beyond comprehension. Can you even conceive of $1 billion? Estimating the world’s population to be very roughly 6.6 billion (source: U.S. Census Bureau estimate), and the number of billionaires in the world to be 946 (source: Forbes), it is safe to say that 99.999986% of the world’s living persons have no sense of what it means to have $1 billion USD. Alternatively, we can say that 0.0000143% of the world’s population has at least $1 billion in assets.

Exxon Mobil broke its own record by posting a $40.6 billion net income in the last fiscal year. This is the largest profit ever posted by any company. Ever. The last sentence of the second paragraph of the New York Times article is truly extraordinary (not in a good way).

The company’s sales, more than $404 billion, exceeded the gross domestic product of 120 countries.

This is utterly ridiculous. An alternative comparison: Microsoft today made some pretty big news by offering to acquire Yahoo! for $44.6 billion. The company that runs the second most popular search engine on the internet (after the juggernaut that is Google) is valued at only 10% more than the annual profit of an oil company.

I am not a fan of the basic concept behind World of Warcraft – make tons of money on an MMO that is super easy to play and really little more than a glorified RPG that happens to have some online functionality. But you can’t argue that Blizzard didn’t succeed in the “make tons of money” department.

Here are some interesting statistics, formatted into attractive Excel charts for ease of digestion. I chose the 15-24 age group because it encompasses typical high school and college students – those youths who are arguably in the prime of their lives. Incidentally, it is also the age group into which I fall. The chart titles are self-explanatory. I only included males of three races; this was mostly because these groups exhibit some pretty interesting trends. Perhaps more significantly, it is rather late and I am pressed for time.