Homework 10: Extra Credit - Part 1 [45 points]

Instructions

This first part of the Extra Credit assignment involves implementing functions commonly used in Neural Networks from scratch without use of external libraries/packages other than NumPy.

A skeleton file homework10_part1.py containing empty definitions for each question has been provided. Since portions of this assignment will be graded automatically, none of the names or function signatures in this file should be modified. However, you are free to introduce additional variables or functions if needed.

You will find that in addition to a problem specification, most programming questions also include a pair of examples from the Python interpreter. These are meant to illustrate typical use cases, and should not be taken as comprehensive test suites.

You are strongly encouraged to follow the Python style guidelines set forth in PEP 8, which was written in part by the creator of Python. However, your code will not be graded for style.

Once you have completed the assignment, you should submit your file on Gradescope.

You may submit as many times as you would like before the deadline, but only the last submission will be saved.

1. Individual Functions [45 points]

The goal of this part of the assignment is to get an intuition of the underlying implementation used in Convolutional Neural Networks, specifically performing convolution and pooling, and applying an activation function.

As mentioned in the instructions, you are restricted from using any external packages other than NumPy. Numpy has a Quickstart tutorial, which we recommend looking at if you are not familiar or would like to refresh memory.

[15 points] Write a function convolve_greyscale(image, kernel) that accepts a numpy array image of shape (image_height, image_width) (greyscale image) of integers and a numpy array kernel of shape (kernel_height, kernel_width) of floats. The function performs a convolution, which consists of adding each element of the image to its local neighbors, weighted by the kernel (flipped both vertically and horizontally).

The result of this function is a new numpy array of floats that has the same shape as the input image. Apply zero-padding to the input image to calculate image edges. Note that the height and width of both image and kernel might not be equal to each other. You can assume kernel_width and kernel_height are odd numbers.

There exist a few visualisations hands-on experience of applying a convolution online, for instance a post by Victor Powell. For more information, you can also use real images as an input. We recommend selecting a few images of type gray from the Miscellaneous Volume of the USC-SIPI Image Database. (Image in the third example below is taken from this dataset labelled under 5.1.09.)

[5 points] Write a function convolve_rgb(image, kernel) that accepts a numpy array image of shape (image_height, image_width, image_depth) of integers and a numpy array kernel of shape (kernel_height, kernel_width) of floats. The function performs a convolution on each depth of an image, which consists of adding each element of the image to its local neighbors, weighted by the kernel (flipped both vertically and horizontally).

The result of this function is a new numpy array of floats that has the same shape as the input image. You can use convolve_greyscale(image, filter) implemented in the previous part to go through each depth of an image. As before, apply zero-padding to the input image to calculate image edges. Note that the height and width of both image and kernel might not be equal to each other. You can assume kernel_width and kernel_height are odd numbers.

We recommend selecting a few images of type color from the Miscellaneous Volume of the USC-SIPI Image Database. (Images in the examples below are taken from this dataset labelled under 4.1.07)

[15 points] Write a function max_pooling(image, kernel_size, stride) that accepts a numpy array image of integers of shape (image_height, image_width) (greyscale image) of integers, a tuple kernel_size corresponding to (kernel_height, kernel_width), and a tuple stride of (stride_height, stride_width) corresponding to the stride of pooling window.

The goal of this function is to reduce the spatial size of the representation and in this case reduce dimensionality of an image with max down-sampling. It is not common to pad the input using zero-padding for the pooling layer in Convolutional Neural Network and as such, so we do not ask to pad. Notice that this function must support overlapping pooling if stride is not equal to kernel_size.

As before, we recommend selecting a few images of type gray from the Miscellaneous Volume of the USC-SIPI Image Database. (Image in three examples below are taken from this dataset labelled under 5.1.09.)

[5 points] Similarly to the previous part, write a function average_pooling(image, kernel_size, stride) that accepts a numpy array image of integers of shape (image_height, image_width) (greyscale image) of integers, a tuple kernel_size corresponding to (kernel_height, kernel_width), and a tuple stride of (stride_height, stride_width) corresponding to the stride of pooling window.

The goal of this function is to reduce the spatial size of the representation and in this case reduce dimensionality of an image with average down-sampling.

As before, we recommend selecting a few images of type gray from the Miscellaneous Volume of the USC-SIPI Image Database. (Image in the third example is taken from this dataset labelled under 5.1.09.)