Tuesday, August 27, 2013

If you're a fan of primal linear methods, then you've probably spent a lot of time thinking about feature engineering. If you've used kernel learning, then you've probably spent a lot of time thinking about kernels that are appropriate to your problem, which is another way of thinking about feature engineering. It turns out there is a way to leverage the work of the kernel community while solving a primal convex optimization problem: random feature maps. This idea has been around for a while: the paper by Rahimi and Recht that really kicked things off is from 2007, Alex has blogged about it and refined the technique, and Veeramachaneni has a nice blog post which graphically explores the technique. One general strategy is to find an integral representation of a kernel, and then approximate the integral representation via Monte Carlo. At the end of the day, this looks like a randomized feature map $\phi$ for which dot products $\phi (x)^\top \phi(y)$ in the primal are converging to the value of kernel function $k (x, y)$. When using the Fourier basis for the integral representation of the kernel, the random feature map consists of cosines, so we call it ``cosplay'' over in here in CISL.

The technique deserves to be more well known than it is, because it gives good learning performance (when the associated kernel is a good choice for the problem), it is straightforward to implement, and it is very fast. I'm hoping that I can increase awareness by providing a simple implementation on a well-known dataset, and in that spirit here is a Matlab script which applies the technique to mnist. Before running this you need to download mnist in matlab format, and download maxent and lbfgs for matlab.