Tuesday, November 14, 2017

Biologically Inspired Random Projections

At LightOn, we are often asked by different stakeholders about Random Projections. One way we explain them, is through its universal compression properties derived from the Johnson-Lindenstrauss lemma. Everyone know about the benefits of mp3, jpeg, mpeg file formats, thereby making compression a simple and easy concept to explain.

Sometimes, we also have to explain why it is a good thing to "blow up" the initial data in a higher dimension so that it can easily be separable. While this is a perfectly good explanation, this "blowing up" of data still requires some getting used to.

As it turns out it looks like these sort of random projections have a biological analog in the brains of fruit flies and rats as explained in this recent paper published in Science by Sanjoy Dasgupta, Charles F Stevens and Saket Navlakha (featured earlier on BiorXiv and below). These techniques can be used for Deep neural networks approaches as well (see second paper below).

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly's olfactory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly's circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly's algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to traditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.

Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded devices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall computational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets.