One of the few aspects of my job I enjoy more than designing a new market is actually building it. Turning some wild concept that sprung from the minds of a bunch of scientists into a working artifact is a huge rush, and I can only smile as people from around the world commence tinkering with the thing, often in ways I never expected. The “build it” phase of a research project, besides being a ton of fun, inevitably sheds important light back on the original design in a virtuous cycle.

PredictWiseQ is our greedy attempt to scarf up as much information as is humanly possible and use it, wisely, to forecast nearly every possible detail about the upcoming US presidential election. For example, we can project how likely it is that Romney will win Colorado but lose the election (6.2%), or that the same party will win both Ohio and Pennsylvania (77.6%), or that Obama will paint a path of blue from Canada to Mexico (99.5%). But don’t just window shop, go ahead and customize and buy a prediction or ten for yourself. Your actions help inform the odds of your own predictions and, crucially, thousands of other related predictions at the same time.

For example, a bet on Obama to win both Ohio and Florida can automatically raise his odds of winning Ohio alone. That’s because our market maker knows and enforces the fact that Obama winning OH and FL can never be more likely than him winning OH. After every trade, we find and fix thousands of these logical inconsistencies. In other words, our market maker identifies and cleans up arbitrage wherever it finds it. But there’s a limit to how fastidious our market maker can be. It’s effectivelyimpossible to rid the system of all arbitrage: doing so is NP-hard, or computationally intractable. So we clean up a good bit of arbitrage, but there should be plenty left.

So here’s a reader’s challenge: try to identify arbitrage on PredictWiseQ that we did not. Go ahead and profit from it and, when you’re ready, please let me and others know about it in the comments. I’ll award kudos to the reader who finds the simplest arbitrage.

Why not leave all of the arbitrage for our traders to profit from themselves? That’s what nearly every other market does, from Ireland-based Intrade, to Las Vegas bookmakers, to the Chicago Board Options Exchange. The reason is, we’re operating a prediction market. Our goal is to elicit information. Even a completely uninformed trader can profit from arbitrage via a mechanical plug-and-chug process. We should reserve the spoils for people who provide good information, not those armed (solely) with fast or clever algorithms. Moreover, we want every little crumb of information that we get, in whatever form we get it, to immediately impact as many of the thousands or millions of predictions that it relates to as possible. We don’t want to wait around for traders to perform this propagation on their own and, besides, it’s a waste of their brain cells: it’s a job much better suited for a computer anyway.

Intrade offers an impressive array of predictions about the election, including who will win in all fifty states. In a sense, PredictWiseQ is Intrade to the 57th power. In a combinatorial market, a prediction can be any (Boolean) function of the state outcomes, an ungodly degree of flexibility. Let’s do some counting. In the election, there are actually 57 “states”: 48 winner-takes-all states, Washington DC, and two proportional states — Nebraska and Maine — that can split their electoral votes in 5 and 3 unique ways, respectively. Ignoring independent candidates, all 57 base “states” can end up colored Democratic blue or Republican Red. So that’s 2 to the power 57, or 144 quadrillion possible maps that newscasters might show us after the votes are tallied on November 6th. A prediction, like “Romney wins Ohio”, is the set of all outcomes where the prediction is true, in this case all 72 quadrillion maps where Ohio is red. The number of possible predictions is the number of sets of outcomes, or 2 to the power 144 quadrillion. That’s more than a googol, though less than a googolplex (maybe next year). To get a sense of how big that is, if today’s fastest supercomputer starting counting at the instant of the big bang, it still wouldn’t be anywhere close reaching a googol yet.

Create your own league to compare your political WiseQ among friends. If you tell us how much each player is in for, we’ll tell you how to divvy things up at the end. Or join the “Friends Of Dave” (FOD) league. If you finish ahead of me in my league, I’ll buy you a beer (or beverage of your choice) the next time I see you, or I’ll paypal you $5 if we don’t cross paths.

PredictWiseQ is part of PredictWise, a fascinating startup of its own. Founded by my colleague David Rothschild, PredictWise is the place to go for thousands of accurate, real-time predictions on politics, sports, finance, and entertainment, aggregated and curated from around the web. The PredictWiseQ Game is a joint effort among David, Miro, Sebastien, Clinton, and myself.

The academic paper that PredictWiseQ is based on is one of my favorites — owed in large part to my coauthors Miro and Sebastien, two incredible sciengineers. As is often the case, the theory looks bulletproof on paper. But I’ve learned the hard way many times that you don’t really know if a design is good until you try it. Or more accurately, until you build it and let a crowd of other people try it.

So, dear crowd, please try it! Bang on it. Break it. (Though please tell me how you did, so we might fix it.) Tell me what you like and what is horribly wrong. Mostly, have fun playing a market that I believe represents the future of markets in the post-CDA era, a.k.a the digital age.

The paper describes how web search engines may be used to monitor and predict flu outbreaks. We studied four years of data from Yahoo! Search together with data on flu outbreaks and flu-related deaths in the United States. All three measures rise and fall as flu season progresses and dissipates, as you might expect. The surprising and promising finding is that web searches rise first, one to three weeks before confirmed flu cases, and five weeks before flu-related deaths. Thus web searches may serve as a valuable advance indicator for health officials to spot the onset of diseases like the flu, complementary to other indicators and forecasts.

I haven’t read the paper, but the article hints at nearly identical results:

Google … dug into its database, extracted five years of data on those queries and mapped it onto the C.D.C.’s reports of influenzalike illness. Google found a strong correlation between its data and the reports from the agency…

Tests of the new Web tool … suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

To the reporter’s credit, he interviewed Phillip and the article does mention our work in passing, though I can’t say I’m thrilled with the way it was framed:

The premise behind Google Flu Trends … has been validated by an unrelated study indicating that the data collected by Yahoo … can also help with early detection of the flu.

I’m thrilled to see the attention given to the topic, and the Google team deserves a huge amount of credit, especially for launching a live web site as a companion to their publication, a fantastic service of great social value. That’s an idea we had but did not pursue.

In the business world, being first often means little. However in the world of science, being first means a great deal and can be the determining factor in whether a study gets published. The truth is, although the efforts were independent, ours was published first — and Clinical Infectious Diseases scooped Nature — a decent consolation prize amid the go-google din.

Update 2008/11/24: We spoke with the Google authors and the Nature editors and our paper is cited in the Google paper, which is now published, and given fair treatment in the associated Nature News item. One nice aspect of the Google study is that they identified relevant search terms automatically by regressing all of the 50 million most frequent search queries against the CDC flu data. Congratulations and many thanks to the Google/CDC authors and the Nature editors, and thanks everyone for your comments and encouragement.

In the course of writing the chapter, we compiled an extended annotated bibliography that ended up being too long to publish in its entirety in the book. So we trimmed the bibliographic notes in the book to cover only the most directly relevant citations. You can download the full extended bibliography here.

Here is the abstract of our chapter:

Prediction markets (also known as information markets) are markets established to aggregate knowledge and opinions about the likelihood of future events. This chapter is intended to give an overview of the current research on computational aspects of these markets. We begin with a brief survey of prediction market research, and then give a more detailed description of models and results in three areas: the computational complexity of operating markets for combinatorial events; the design of automated market makers; and the analysis of the computational power and speed of a market as an aggregation tool. We conclude with a discussion of open problems and directions for future research.

If you’re interested in this topic, you might also take a look at our recent paper on Betting on permutations, published after the book chapter was completed.