Convolutional Neural Networks

Fully connected layers are an essential component of Convolutional Neural Networks (CNNs), which have been proven very successful in recognizing and classifying images for computer vision. The CNN process begins with convolution and pooling, breaking down the image into features, and analyzing them independently. The result of this process feeds into a fully connected neural network structure that drives the final classification decision.

What Is a Convolutional Neural Network?

A Convolutional Neural Network (CNN) is a type of neural network that specializes in image recognition and computer vision tasks.

CNNs have two main parts:

A convolution/pooling mechanism that breaks up the image into features and analyzes them

A fully connected layer that takes the output of convolution/pooling and predicts the best label to describe the image

Convolutional Neural Networks vs Fully Connected Neural Networks

Fully connected layers in a CNN are not to be confused with fully connected neural networks – the classic neural network architecture, in which all neurons connect to all neurons in the next layer. Convolutional neural networks enable deep learning for computer vision.

The classic neural network architecture was found to be inefficient for computer vision tasks. Images represent a large input for a neural network (they can have hundreds or thousands of pixels and up to 3 color channels). In a classic fully connected network, this requires a huge number of connections and network parameters.

A convolutional neural network leverages the fact that an image is composed of smaller details, or features, and creates a mechanism for analyzing each feature in isolation, which informs a decision about the image as a whole.

As part of the convolutional network, there is also a fully connected layer that takes the end result of the convolution/pooling process and reaches a classification decision.

CNN Architecture: Types of Layers

Convolutional Neural Networks have several types of layers:

Convolutional layer━a “filter” passes over the image, scanning a few pixels at a time and creating a feature map that predicts the class to which each feature belongs.

Pooling layer (downsampling)━reduces the amount of information in each feature obtained in the convolutional layer while maintaining the most important information (there are usually several rounds of convolution and pooling).

Fully connected input layer (flatten)━takes the output of the previous layers, “flattens” them and turns them into a single vector that can be an input for the next stage.

The first fully connected layer━takes the inputs from the feature analysis and applies weights to predict the correct label.

Fully connected output layer━gives the final probabilities for each label.

Below is an example showing the layers needed to process an image of a written digit, with the number of pixels processed in every stage. This is a very simple image━larger and more complex images would require more convolutional/pooling layers.

The role of a fully connected layer in a CNN architecture

The objective of a fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label (in a simple classification example).

The output of convolution/pooling is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label. For example, if the image is of a cat, features representing things like whiskers or fur should have high probabilities for the label “cat”.

The image below illustrates how the input values flow into the first layer of neurons. They are multiplied by weights and pass through an activation function (typically ReLu), just like in a classic artificial neural network. They then pass forward to the output layer, in which every neuron represents a classification label.

The fully connected part of the CNN network goes through its own backpropagation process to determine the most accurate weights. Each neuron receives weights that prioritize the most appropriate label. Finally, the neurons “vote” on each of the labels, and the winner of that vote is the classification decision.

Convolutional Neural Networks in the Real World

In this article, we explained the basics of Convolutional Neural Networks and the role of fully connected layers within a CNN. When you start working on CNN projects, processing and generating predictions for real images, you’ll run into some practical challenges:

Tracking Experiments

Tracking experiment progress, hyperparameters and source code across CNN experiments. Convolutional networks have numerous hyperparameters and require constant tweaking. Testing each of these will require running an experiment and tracking its results, and it’s easy to lose track of thousands of experiments across multiple teams.

Running experiments across multiple machines and GPUs

—CNNs are computationally intensive and running multiple experiments on different data sets can take hours or days for each iteration. You’ll need to run experiments on multiple machines or GPUs, and you’ll find it is difficult to provision these machines, configure

Manage training datasets

—convolutional networks typically use media-rich datasets like images and video, which can weigh Gigabytes or more. In each experiment, or each time you tweak the dataset, changing image size, rotating images, etc., you’ll need to re-copy the full dataset to the training machines. This is very time-consuming and error-prone.