Playing with convolutions in TensorFlow

From a short introduction of convolutions to a complete model.

In this post we will try to develop a practical intuition about convolutions and visualize different steps used in convolutional neural network architectures. The code used for this tutorial can be found here.

This tutorial does not cover back propagation, sparse connectivity, shared weights and other theoritical aspects that are already covered in other courses and tutorials. Instead it focuses on giving a practical intuition on how to use tensorflow to build a convolutional model.

So what are convoltions?

Convolution is a mathematical operation between two functions producing a third convoluted function that is a modefied version of the first function. In the case of image processing, it’s the process of multiplying each element of matrix with its local neighbors, weighted by the kernel (or filter). For example, given a maxtrix and kernel as follow:

and

The discrete convolution operation is defined as

To visualize how convolution slides to calculate the output matrix, it’s good to look at a vizualization:

In convolutional architectures it’s also common to use pooling layer after each convolution, these pooling layers generally simplify the information of the convolution layer before, by choosing the most prominent value (max pooling) or averaging the values calculated in by the convolution (average pooling).

Now that we have a way to visualize every step, let’s create the tensorflow operations

1defconvolve(img,kernel,strides=[1,3,3,1],pooling=[1,3,3,1],padding='SAME',rgb=True): 2withtf.Graph().as_default(): 3num_maps=3 4ifnotrgb: 5num_maps=1# set number of maps to 1 6img=img.convert('L',(0.2989,0.5870,0.1140,0))# convert to gray scale 7 8 9# reshape image to have a leading 1 dimension10img=numpy.asarray(img,dtype='float32')/256.11img_shape=img.shape12img_reshaped=img.reshape(1,img_shape[0],img_shape[1],num_maps)1314x=tf.placeholder('float32',[1,None,None,num_maps])15w=tf.get_variable('w',initializer=tf.to_float(kernel))1617# operations18conv=tf.nn.conv2d(x,w,strides=strides,padding=padding)19sig=tf.sigmoid(conv)20max_pool=tf.nn.max_pool(sig,ksize=[1,3,3,1],strides=[1,3,3,1],padding=padding)21avg_pool=tf.nn.avg_pool(sig,ksize=[1,3,3,1],strides=[1,3,3,1],padding=padding)2223init=tf.initialize_all_variables()24withtf.Session()assession:25session.run(init)26conv_op,sigmoid_op,avg_pool_op,max_pool_op=session.run([conv,sig,avg_pool,max_pool],27feed_dict={x:img_reshaped})2829show_shapes(img,conv_op,sigmoid_op,avg_pool_op,max_pool_op)30ifrgb:31show_image_ops_rgb(img,conv_op,sigmoid_op,avg_pool_op,max_pool_op)32else:33show_image_ops_gray(img,conv_op,sigmoid_op,avg_pool_op,max_pool_op)

The convolve function that we just built, will allow us to try any filter from this list of filters in the gimp’s documentation. We will try some of them here, but you can modify the kernels or other kernels from the list to see how image changes.

After trying the filters in the example to see how they change the original image, we not only started to develop an intuition about how these operations work, but also we prepared the practical tools to build a convolutional neural network.

CNNs are a family of neural network architecture built essentially based on multiple layers of convolutions with nonlinear activation functions, e.g sigmoid, relu or tanh applied to the results, followed with either other convolutions layers or pooling layers and finally fully connected layers. In this section we will be using the high-level machine learning API tf.contrib.learn, tf.contrib.layers and tf.contrib.layers to create, train and configure our models.

LeNet model

LeNet model contains the essence of CNNs that are still used in larger and newer models. LeNet consists of 2 convolutional layers followed by a dense layer.

A convolution layer in LeNet model consists of a convolution operation followed by a max pooling operation: