Over the last decade, Neural networks have revolutionized machine learning. Uber uses neural networks for a diverse set of applications, from minimizing wait times by modelling spatiotemporal rider demand patterns across cities, to enabling faster customer response.

In many cases, in order to achieve optimal performance the most successful neural networks employ large numbers of parameters. The news is that, by achieving impressive performance at whatever their task might be, these networks often work very well. But such models are systems that are fundamentally complex, belying easy understanding because of a large number of parameters that without human intervention are learned. Nonetheless, efforts at understanding network behaviour still continue, because it becomes important to understand their operation as networks increasingly impact society, and because better understanding network mechanisms and properties will hasten the construction of the next generation of models.

In this paper, that is to be presented at ICLR this year, by developing a simple way of measuring a fundamental network property known as Intrinsic Dimension a contribution to this ongoing effort is made.

In the paper, as a quantification of the complexity of a model in a manner decoupled from its raw parameter count intrinsic dimension is developed, and a simple way of measuring this dimension using random projections is provided. It is found that many problems than one might suspect have smaller intrinsic dimension.

Basic Approach:

In training of a typical neural network, a network’s parameters are randomly instantiated and then optimized to minimize some loss. This strategy can be thought of as choosing an initial point in parameter space and then to one with a lower loss slowly following gradients from that point. In this paper, this typical training procedure is slightly modified: the initial point is chosen in the same way, but then instead of optimizing in the full parameter space, a random subspace around the initial point is generated and then the optimizer is allowed to move only in that subspace.

The random subspace is reconstructed from the initial point by sampling a set of random directions; these random directions are then frozen for the duration of training. Optimization directly proceeds in the coordinate system of the subspace. Computationally, this requires projection from the subspace to the native space, and this projection can be achieved with some simpler projection and others more complex but more computationally efficient projection.

How well will this random projection training work depends entirely on the size of the random subspace.

In the sections that follow, this approach is used to measure the intrinsic dimension of networks used to solve MNIST, random MNIST variants, CIFAR-10, and three reinforcement learning (RL) tasks, bringing some interesting conclusions along the way.

Conclusions And Future Directions:

In this paper, the intrinsic dimension of objective landscapes is defined and shown a simple method — random subspace training — of approximating it for neural network modelling problems. This approach is used to compare problem difficulty within and across domains.

We find in some cases the intrinsic dimension is much lower than the direct parameter dimension, and hence enable network compression, and in other cases, the intrinsic dimension is similar to that of the best-tuned models, and suggesting those models are better suited to the problem. Further work could also identify better ways of creating subspaces for reparameterization: here random linear subspaces are chosen, but one might carefully construct other linear or non-linear subspaces to be even more likely to contain solutions.

Finally, as the field departs from single stack of-layers image classification models toward larger and more heterogeneous networks often composed of many modules and trained by many losses, methods like measuring intrinsic dimension that allow some automatic assessment of model components might provide a much-needed greater understanding of individual black-box module properties.

Their approach and a few interesting findings are summarized in the video given below: