Tuesday, July 2, 2013

ICML 2013 Reading List

The ICML is now already over for two weeks, but I still wanted to write about my reading list, as there have been some quite interesting papers (the proceedings are here). Also, I haven't blogged in ages, for which I really have no excuse ;)

There are three topics that I am particularly interested in, which got a lot of attention at this years ICML: Neural networks, feature expansion and kernel approximation, and Structured prediction.

This is the newest in a series of papers by James Bergstra on hyperparamter optimization. I quite enjoy his work and his hyperopt software is in active use in my lab. In particular in computer vision applications, there is so much engineering, that it is very hard to separate research contributions from engineering contributions. This paper shows 1) how important engineering is and 2) how far automatization of the engineering part can really go.

Neural Networks

Now, let's come to the somewhat most unlikely candidate, neural networks.
They
gained a lot of attention in the more machine-learny circles in the
last couple of years. Still I was a bit surprised how many - in
particular very empirical papers - made it to ICML.

One of the zoo of follow-ups on the drop-out work by Hinton, this paper suggests setting weights to zero, instead of hidden unit activations. It achieves better accuracy and is more efficient than drop-out.

One of the most impressive follow-ups on the drop-out work, this paper demonstrates how to combine drop-out with a maximum nonlinearity.That's right. The only nonlinearity is the maximum over a group of hidden units.I feel this is pretty innovative and the results speak for themselves.The authors argue that the max non-linearity allows the network to learn a linear approximation of any convex activation function. Unfortunately, it is not really clear from the paper how much of the performance can be attributed to the max non-linearity, as there are no results without max-out.

Kernel Approximation and Feature Extraction

Alex Gittens, Michael MahoneyThis work compares sample based and projection based methods for low rank approximations. I haven't looked into the details yet, but I'm a big fan of the Nystroem method for kernel approximations, so I will definitely see what's in there.

Structured Prediction

This is quite exciting work by the folks from MSRC which I met during my internship. They propose to use a QP relaxation for learning structured prediction. Basically they parametrize the problem in a way that inference via the QP relaxation is always convex and learn this restricted family. I only skimmed it yet ;)

This is a continuation of the authors work on dense random fields for semantic image segmentation. It is another example of "learning for inference". In their previous work, it was shown that mean-field inference can be implemented efficiently by convolutions in certain cases. Here, the authors show how it is possible to directly minimize the loss of the prediction produced by mean-field inference.

There are several more papers on optimization for inference and / or learning,

but I can't possibly list them all. There are also some interesting theory papers, for example on random forests.