Keras

Keras ResNet: Building, Training & Scaling Residual Nets on Keras

ResNet took the deep learning world by storm in 2015, as the first neural network that could train hundreds or thousands of layers without succumbing to the “vanishing gradient” problem. Keras makes it easy to build ResNet models: you can run built-in ResNet variants pre-trained on ImageNet with just one line of code, or build your own custom ResNet implementation. You can speed up the process with MissingLink’s deep learning platform, which automates training, distributing, and monitoring ResNet projects in Keras.

What is a ResNet Neural Network?

Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture which was designed to enable hundreds or thousands of convolutional layers. While previous CNN architectures had a drop off in the effectiveness of additional layers, ResNet can add a large number of layers with strong performance.

ResNet was an innovative solution to the “vanishing gradient” problem. Neural networks train via the backpropagation process (see our guide on backpropagation ), which relies on gradient descent, moving down the loss function to find the weights that minimize it. If there are too many layers, repeated multiplication makes the gradient smaller and smaller, until it “disappears”, causing performance to saturate or even degrade with each additional layer.

The ResNet solution is “identity shortcut connections”. ResNet stacks up identity mappings, layers that initially don’t do anything, and skips over them, reusing the activations from previous layers. Skipping initially compresses the network into only a few layers, which enables faster learning. Then, when the network trains again, all layers are expanded and the “residual” parts of the network explore more and more of the feature space of the source image.

The creators of ResNet demonstrated they can train a ResNet with hundreds or thousands of layers that outperforms shallower networks, and ResNet has become one of the most popular architectures for computer vision tasks.

ResNet Variations You Can Use on Keras

ResNet has inspired several similar architectures, two of which come built into Keras:

ResNetV2

The primary difference between ResNetV2 and the original (V1) is that V2 uses batch normalization before each weight layer.

ResNeXt

Uses a different identity mappings building block, which has several different paths of stacked identity layers, with their outputs merged via addition. ResNeXt introduces a new hyperparameter called “cardinality”, which defines how many paths exist in each block.

Why it’s Difficult to Run ResNet Yourself and How MissingLink Can Help

ResNet can have between dozens to thousands of convolutional layers and can take a long time to train and execute – from hours to several weeks in extreme cases. You will need to distribute a ResNet model across multiple GPUs, and if performance is insufficient, scale out to multiple machines.

However, you’ll find that running a deep learning model on multiple machines is difficult:

On-premises, you need to set up multiple machines for deep learning, manually run experiments and carefully watch resource utilization

In the cloud, you can spin up machines quickly, but need to build and test machine images, and manually run experiments on each machine. You’ll need to “babysit” your machines to ensure an experiment is always running, and avoid wasting money with expensive GPU machines.

MissingLink solves all that. It’s a deep learning platform that lets you scale out ResNet and other computer vision models automatically across numerous machines.

Just set up jobs in the MissingLink dashboard, define your cluster of on-premise or cloud machines, and the jobs will automatically run on your cluster of machines. You can train a ResNet model in minutes – not hours or days.

To avoid idle time, MissingLink immediately runs another experiment when the previous one ends, and cleanly shuts down cloud machines when all jobs complete.

Options for Running ResNet on Keras

Built-In Keras ResNet Implementation: Applications Packages

Keras provides the Applications modules, which include multiple deep learning models, pre-trained on the industry standard ImageNet dataset and ready to use.

ImageNet training is extremely valuable because training ResNet on the huge ImageNet dataset is a formidable task, which Keras has done for you and packaged into its application modules. You can thus leverage transfer learning to apply this trained model to your own problems.

Keras Applications include the following ResNet implementations. Keras provides ResNet V1 and ResNet V2 with 50, 101, or 152 layers, and ResNeXt with 50 or 101 layers.

Each of these is a function that takes the following arguments, allowing you to configure your ResNet model:

Parameter

Data Type

What it Does

include_top

Boolean

Whether to include a fully-connected layer at the output end of the architecture. If true, the input shape must be (224, 224, 3).

weights

String: None, ‘imagenet’

Whether to train with randomized weights or weights trained on the ImageNet dataset.

input_tensor

Tensor

Optional – a tensor to use as image input for the model.

input_shape

Tuple

Optional – a shape tuple, which you need to specify if include_top is false. Must have exactly 3 inputs channels and width/height up to 32.

pooling

String: None, ‘avg’, ‘max’

Optional – specifies pooling mode for feature extraction if include_top is false. None means the network will output the 4D tensor output of the last convolutional layer. avg uses global average pooling for the last layer, meaning it outputs a 2D tensor. max uses max pooling.

classes

Integer

Optional – number of classes to classify images into, only to be specified if include_top is True, and if no weights argument is specified.

Coding a ResNet Architecture Yourself in Keras

What if you want to create a different ResNet architecture than the ones built into Keras? For example, you might want to use more layers or a different variant of ResNet. Priya Dwivedi created an extensive tutorial that shows, step by step, how to implement all the building blocks of ResNet in Keras, so you can build your own architectures from scratch.

For example, here is Dwivedi’s Keras code that builds the identity block:

See the full tutorial to see how to create all ResNet components yourself in Keras.

Scaling ResNet on Keras

In this article, we learned the basics of ResNet and saw two ways to run ResNet on Keras: Using a pre-trained model in the Keras Applications modules, or by building ResNet components yourself by directly creating their layers in Keras.

As we mentioned above, training ResNet, especially with larger numbers of layers, is extremely computationally intensive. Don’t wait hours or days for ResNet to train! Use the MissingLink deep learning framework to:

Scale out ResNet automatically across numerous machines, either on-premise or in the cloud

Define a cluster of machines and automatically run deep learning jobs, with optimal resource utilization

Avoid idle time by immediately running experiments one after the other, and shutting down cloud machines cleanly when jobs complete.

MissingLink can also help you manage large numbers of experiments, track and share results, and manage large datasets and sync them easily to training machines.