We create a solid linear learning algorithm that works well on real-world terafeature datasets. The result is a breakthrough in scalability, because we can learn faster than data can be shoved through any single machine interface.

We propose a new contextual bandit algorithm under the assumption of realizability: that there exists perfect predictor of expected rewards. In the worst case, this turns out to be useless, but in special cases extremely powerful.

Experiments and analysis of all the obvious feature-partition based online learning algorithms. Results varied between superior and catastrophic in surprising ways, with no one algorithm the clear winner.

We call this the "monster paper". It provides the second known class of contextual bandit algorithms, and reduces the complexity of contextual bandit learning from linear in the number of things competed with to polylog given an optimzation oracle.