Interesting Papers

The above Marr prize winning papers are very nice, but here I also want to
highlight three other papers I found interesting today.

Fast R-CNN

By Ross Girshick.

Since 2014 the standard object detection pipeline for natural images is the
R-CNN system which first extracts a set of object proposals then scores them
using a convolutional neural network.
The two key weaknesses of the approach are: first, the separation between
proposal generation and scoring, preventing joint training of model parameters;
and second the separate scoring of each hypothesis which leads to significant
runtime overhead.
This work and the follow-up work ("Faster R-CNNs" at NIPS this year) addresses
both issues by proposing a joint model that is trained end-to-end, including
proposal generation, leading to a new state of the art in object detection.

Unsupervised Visual Representation Learning by Context Prediction

By Carl Doersch, Abhinav Gupta, and Alexei A. Efros.

Supervised deep learning needs lots of labeled training data to achieve good performance.
This paper investigates whether we can create and train deep neural networks on
artificial tasks for which we can create large amounts of training data. In
particular, the paper proposes to predict where a certain patch appears within
the image. For this task, an almost infinite amount of training data is easily
created.
Perhaps surprisingly the resulting network, despite being trained on this
artificial task, has learned useful representations for real vision tasks such
as image classification.

Deep Fried Convnets

In deep convolutional networks the last few densely connected layers have the
most parameters and thus most of the required memory during test time and
training.
This work proposes to leverage the fastfood kernel approximation to replace
densely connected layers with specific efficient and low parameter operations.

The empirical results are impressive and the fastfood justification is
plausible, but I wonder if this work may even provide a hint at a more general
approach to construct efficient neural network architectures by using arbitrary
dense but efficient matrix operations (FFT, DCT, Walsh-Hadamard, etcetera).