This work explores hypernetworks: an approach of using a small network, also
known as a hypernetwork, to generate the weights for a larger network.
Hypernetworks provide an abstraction that is similar to what is found in
nature: the relationship between a genotype - the hypernetwork - and a
phenotype - the main network. Though they are also reminiscent of HyperNEAT in
evolution, our hypernetworks are trained end-to-end with backpropagation and
thus are usually faster. The focus of this work is to make hypernetworks useful
for deep convolutional networks and long recurrent networks, where
hypernetworks can be viewed as relaxed form of weight-sharing across layers.
Our main result is that hypernetworks can generate non-shared weights for LSTM
and achieve state-of-art results on a variety of language modeling tasks with
Character-Level Penn Treebank and Hutter Prize Wikipedia datasets, challenging
the weight-sharing paradigm for recurrent networks. Our results also show that
hypernetworks applied to convolutional networks still achieve respectable
results for image recognition tasks compared to state-of-the-art baseline
models while requiring fewer learnable parameters.