Friday, December 21, 2012

It was the first week of the 2007 National Football League (NFL) season. After waiting all summer for the NFL season to begin, the fans were rabid with anticipation. The airwaves were filled with sportscasters debating the prospects of teams from both conferences and how they would perform.

Of particular interest was the New England Patriots. They had two starters out with injuries and their star receiver, Randy Moss, was questionable for the game. New England was playing against the NY Jets and their simmering rivalry add heat to the fire. Many of the sportscasters were lining up with the Jets and Vegas was favoring the Jets with a 6 point line at home.

When betting opened for the game the action on the Patriots was heavy. The shear volume of bets place on New England to win forced the sportsbooks to move the spread in an attempt to equalize betting on both sides. Eventually the line moved all of the way to New England being a seven point favorite by game time. New England went on to win the game 38 to 14, easily covering the spread. This is one example where the collective intelligence of the NFL fans was confident that New England would win even when the "experts" thought otherwise.

We recently released a paper examining this phenomenon entitled The Performance of Betting Lines for Predicting the Outcome of NFL Games. A copy can be found on arXiv. In this paper we investigated the performance of the collective intelligence of NFL fans predicting the outcome of NFL games using the Vegas betting lines.

We found that although home teams only beat the spread 47% of the
time, a strategy of betting the home team underdogs (2002 - 2011) would
have produced a cumulative winning strategy of 53.5%, above the
threshold of 52.38% needed to break even on a 10% vigorish.

The NFL Playoffs are only a few weeks away. With the end of the regular season in sight there are a few trends that subtlety change the game. One of the trends is the weather. Tennessee is playing at Green Bay and snow is in the forecast. Another end of the season trend is displayed by teams that have clinched playoff positions. They rest their starting lineup and play backup players. That is more of a week 17 phenomenon but with Atlanta and Houston both at 12-2 for the season they may play some non-starters during the game. This ranking system is based on team performance and does not take trends like the weather into account.

Our ranking system is based on Google's PageRank algorithm.It is explained in some detail in past posts.
A directed graph is created to represent the current years season. Each
team is represented by a node in the graph. For every game played a
directed edge is created from the loser pointing to the winner and it is
weighted by the Margin of Victory.

In the Pagerank model each link from a webpage i to webpage j causes webpage i to give some of its own Pagerank to webpage j. This is often characterized as webpage i voting for webpage j.
In our system the losing team essentially votes for the winning team
with a number of votes equal to the margin of victory. Last week the Redskins beat the Browns 38 to 21, in the graph a directed edge from
the Browns to the Redskins with a weight of 17 was created.

The season graph so far can be visualized in the following graph.

The Pagerank algorithm is run and all of the votes from losing teams
are calculated. The nodes in the graph are given a final ranking and
that ranking is represented by the size of the node in the graph. This algorithm
does a much better job of taking the strength of schedule into account
than many of the other ranking systems that are essentially based on win
loss ratios. Atlanta has had a good season so far but their strength of schedule was the easiest of the entire league. Strength of schedule was based on performance from the 2011 season. Up until last week Atlanta was not considered a strong contender for the Superbowl but after shutting out the Giants, their performance shows the team is capable of a dominant effort against a quality opponent. Although with the Giants you have to wonder which team will show up each week.

Monday, December 17, 2012

I decided to attend at the last minute, and Kristine and Lori graciously let me have 5 minutes to talk about our project and upcoming NEH proposal. We're looking for humanities-types and Archive-It partners to work with in evaluating our visualizations. After my presentation, I was able to make contacts with several potential partners.

Related to what we're working on, Alex Thurman from Columbia University Libraries talked about their local portal to their Human Rights collection (collection at Archive-It). They offer a rotated list of screenshots for featured sites and have tabs to show the collection pages by title, URL, subject, place, and language. One nice feature they've added is the ability to collect and group different URLs that point to the same site (i.e., handling URL changes over time). They're currently in the user testing phase.

Students from Virginia Tech's Crisis, Tragedy, and Recovery Network (list of collections at Archive-It) presented their work on archiving web pages related to disasters and visualizing tweets related to disasters. Their recent Hurricane Sandy collection contains 8 million tweets. For their Archive-It collection, they extracted seed URIs from tweets, Google news, popular news/weather portals, and direct user input. A big problem was the addition of spam links into the archive. The tweet visualization project looks at classifying tweets into four different phases of a disaster (response, recovery, mitigation, and preparedness). From this, they produced a several views, including a ThemeRiver/Streamgraph type view, social network view, a map, and a table of tweets.

Friday, December 14, 2012

I was selected give a 5-minute faculty lightning talk at the Grace Hopper Celebration of Women in Computing in October in Baltimore. Short talks are among the most difficult to prepare, especially short talks for a general audience. I decided to increase my level of difficulty for the talk by combining two topics in my 5-minute talk, information visualization (infovis) and web archiving.

The faculty lightning talks session was new at Grace Hopper, but went very well. We had a 45-minute session and got to hear about 8 totally different research projects. Info and slides from all of the presentations are available on the GHC wiki. Especially for work-in-progress, this format was a great way for the speakers to really focus in on the important aspects of their work and for the audience to hear snippets about different research projects without any presentation being long enough to be boring.

The GHC wiki has a ton of information about the conference, including notes and slides for many of the talks.

I hadn't been to GHC in about 5 years and was amazed to see how much it had grown. There were over 3600 attendees (1500 students) from 42 countries. Happily, even with that many people, I was able to meet up with all of my old friends.

The highlight of the conference for me was Nora Denzel's keynote on Thursday morning. It's recommended viewing for all, but especially for female students in CS or Engineering. The video is embedded below, but if you'd rather read about it, here are some blog posts it generated: Aakriti's blog, Valerie's blog, and Kathleen's blog.