Keras

Keras Conv2D: Working with CNN 2D Convolutions in Keras

2D convolutional layers take a three-dimensional input, typically an image with three color channels. They pass a filter, also called a convolution kernel, over the image, inspecting a small window of pixels at a time, for example 3×3 or 5×5 pixels in size, and moving the window until they have scanned the entire image. The convolution operation calculates the dot product of the pixel values in the current filter window with the weights defined in the filter.

In Keras, you create 2D convolutional layers using the keras.layers.Conv2D() function. Unlike in the TensorFlow Conv2D process, you don’t have to define variables or separately construct the activations and pooling, Keras does this automatically for you.

What is a 2D Convolution Layer, the Convolution Kernel and its Role in CNN Image Classification

Briefly, some background. A convolution layer “scans” A source image with a filter of, for example, 5×5 pixels, to extract features which may be important for classification. This filter is also called the convolution kernel. The kernel also contains weights, which are tuned in the training of the model to achieve the most accurate predictions.

In a 5×5 kernel, for each 5×5 pixel region, the model computes the dot products between the image pixel values and the weights defined in the filter.

A 2D convolution layer means that the input of the convolution operation is three-dimensional, for example, a color image which has a value for each pixel across three layers: red, blue and green. However, it is called a “2D convolution” because the movement of the filter across the image happens in two dimensions. The filter is run across the image three times, once for each of the three layers.

After the convolution ends, the features are downsampled, and then the same convolutional structure repeats again. At first, the convolution identifies features in the original image (for example in a cat, the body, legs, tail, head), then it identifies sub-features within smaller parts of the image (for example, within the head, the ears, whiskers, eyes). Eventually, this process is meant to identify the essential features that can help classify the image. Learn more in our guide to Convolutional Neural Networks (CNN).

Building a Convolutional Neural Network in Keras: A Brief Primer

To help you understand the Conv2D operation, here is a quick primer on how to build Convolutional Neural Networks in Keras.

A CNN architecture has three main parts:

A convolutional layer that extracts features from a source image.

A pooling layer that downsamples each feature to reduce its dimensionality and focus on the most important elements.

A fully connected layer that flattens the features identified in the previous layers into a vector, and predicts probabilities that the image belongs to each one of several possible labels.

In Keras, you build a CNN architecture using the following process:

1. Reshape the input data into a format suitable for the convolutional layers, using X_train.reshape() and X_test.reshape()

2. For class-based classification, one-hot encode the categories using the to_categorical() function.

3. Build the model using the Sequential.add() function. For a 2D convolutional layer, the command looks like the following.

4. Add a pooling layer, for example using the Sequential.add(MaxPooling2D()) function – not showing all parameters.

5. Add a “flatten” layer which prepares a vector for the fully connected layers, using Sequential.add(Flatten()).

6. Add one or more fully connected layer using Sequential.add(Dense)). Typically you will follow each fully connected layer with a dropout layer (learn more about dropout in our guide to neural network hyperparameters ), using Sequential.add(Dropout)).

7. Compile the model using model.compile()

8. Train the model using model.fit(), supplying X_train() and X_test() which are the source images; y_train() and y_test() which are known classification results.

9. Use model.predict() to generate a prediction.

Keras CNN example and Keras Conv2D

Here is a simple code example to show you the context of Conv2D in a complete Keras model. The example was created by Andy Thomas. This model has two 2D convolutional layers, highlighted in the code.

Below we explain each of these parameters, what it does, and some best practices for setting and tuning it. To get more background about tuning neural networks, see our guide on neural network hyperparameters.

Keras Conv2D Parameter

What it Does

Best Practices and Tuning

filters

Sets the number of filters used in the convolution operation.

Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc.

kernel_size

Specifies the size of the convolutional filter in pixels. Must be an odd integer.

Filter size may be determined by the CNN architecture you are using – for example VGGNet exclusively uses (3, 3) filters. If not, use a 5×5 or 7×7 filter to learn larger features and then quickly reduce to 3×3. If your images are smaller than 128×128, consider working with smaller filters of 1×1 and 3×3.

strides=(1, 1)

The strides parameter is a 2-tuple of integers, specifying how the convolutional filter should “step” along the x and y-axis of the source image.

In most cases, it’s okay to leave the strides parameter with the default (1, 1). However, you may increase it to (2, 2) to reduce the size of the output volume.

padding='valid'

The padding parameter has two values: valid or same. Valid means the input is not zero-padded, so the output of the convolution will be smaller than the dimensions of the original image. Same means the input will be zero-padded, so the convolution output can be the same size as the input.

The default Keras value is valid, but it is often effective to set it to same for most of the layers, then reduce spatial dimensions using max pooling or strided convolutions.

data_format=None

Specifies the order of data in the input received from the backend deep learning framework: channels_last or channels_first

The TensorFlow backend to Keras uses channels last ordering. Do not change this parameter unless you are using Theano as your backend.

dilation_rate=(1, 1)

A 2-tuple of integers, controlling the dilation rate for dilated convolution. Dilated convolution is a convolution applied to the input volume with defined gaps (the filter does not scan the entire image, skipping certain segments).

Dilated convolutions are useful for working with higher resolution images, but wanting to still focus on fine-grained details, or when constructing a network with fewer parameters.

activation=None

The activation parameter specifies the name of the activation function you want to apply after performing the convolution.

You should typically leave this as the default, zeroes, meaning the bias will be initially filled by zeroes.

kernel_regularizer=None

These parameters control the type and amount of regularization. Regularization is a method which helps avoid overfitting and improve the ability of your model to generalize from training examples to a real population.

For large datasets and deep networks, kernel regularization is a must. You can use either L1 or L2 regularization. If you detect signs of overfitting, consider using L2 regularization. Tune the amount of regularization, starting with values of 0.0001-0.001. For bias and activity, we recommend leaving at the default values for most scenarios.

bias_regularizer=None

activity_regularizer=None

kernel_constraint=None

Impose constraints on the Conv2D layer, such as unit normalization, non-negativity, min-max normalization.

These are advanced settings which should be left at defaults unless you have a special reason to use them in your model.

bias_constraint=None

Running CNN at Scale on Keras with MissingLink

In this article, we explained how to create 2D convolutional layers in Keras. When you start working on Convolutional Neural Networks and running large numbers of experiments, you’ll run into some practical challenges:

Tracking Experiments

Tracking experiment progress and hyperparameters can be challenging when you run a large number of experiments. You will have to scale up your experiments to tune your CNN and try all relevant variations of network architecture and hyperparameters.

Running experiments on multiple machines

CNNs can take a long time to run, especially with large datasets. You will want to run your CNNs on more machines and GPUs, either on-premise or in the cloud. It can be very time consuming to provision these machines, distribute experiments between them and monitor progress.

Manage training data

Computer vision projects with images, video or other rich media, training sets can have very large datasets. Copying the data to each training machine, replacing it for each new experiment and managing changes to datasets can be difficult. To scale up you must do this in an automated way.

MissingLink is a deep learning platform that does all of this for you, and lets you concentrate on building the most accurate model. Learn more to see how easy it is.