Gaussian prototypical networks in meta-learning [Tutorial]

A Gaussian prototypical network is a variant of a prototypical network. A prototypical network learns the embeddings of the data points and how it builds the class prototype by taking the mean embeddings of each class and use the class prototype for performing classification.

This article is an excerpt from a book written by Sudharsan Ravichandiran titled Hands On Meta-Learningwith Python. In this book, you will learn the prototypical network along with its variants.

In a Gaussian prototypical network, along with generating embeddings for the data points, we add a confidence region around them, characterized by a Gaussian covariance matrix. Having a confidence region helps in characterizing the quality of individual data points and would be useful in the case of noisy and less homogeneous data.

So, in Gaussian prototypical networks, the output of the encoder will be embeddings, as well as the covariance matrix. Instead of using the full covariance matrix, we either include a radius or diagonal component from the covariance matrix along with the embeddings:

Radius component: If we use the radius component of the covariance matrix, then the dimension of our covariance matrix would be 1, as the radius is just a single number.

Diagonal component: If we use the diagonal component of the covariance matrix, then the dimension of our covariance matrix would be the same as the embedding matrix dimension.

Also, instead of using the covariance matrix directly, we use the inverse of a covariance matrix. We can convert the raw covariance matrix into the inverse covariance matrix using any of the following methods. Let Sraw be the covariance matrix and S be the inverse covariance matrix:

So, the encoder, along with generating embedding for the input, also returns the covariance matrix. We use either the diagonal or radius components of the covariance matrix. Also, instead of using a covariance matrix directly, we use the inverse covariance matrix.

But what is the use of having the covariance matrix along with the embeddings? As said earlier, it adds the confidence region around the data points and is very useful in the case of noisy data. Look at the following diagram. Let’s say we have two classes, A and B. The dark dots represent the embeddings of the data point, and the circles around the dark dots indicate the covariance matrices. A big dotted circle represents the overall covariance matrix for a class. A star in the middle indicates the class prototype. As you can see, having this covariance matrix around the embeddings gives us a confidence region around the data point and for class prototypes:

Let’s better understand this by looking at the code. Let’s say we have an image, X, and we want to generate embeddings for the image. Let’s represent the covariance matrix by sigma. First, we select what component of the covariance matrix we want to use—that is, whether we want to use the diagonal or radius component. If we use the radius component, then our covariance matrix dimension would be just one. If we opt for the diagonal component, then the size of the covariance matrix would be same as the embedding dimension:

So far, we have seen that we calculate the covariance matrix along with embeddings of an input. What’s next? How can we compute the class prototype? The class prototype, , can be computed as follows:

In this equation, is the diagonal of the inverse covariance matrix, denotes the embeddings and superscript c denotes the class.

After computing the prototype for each of the classes, we learn the embedding of the query point. Let be the embedding of a query point. Then, we compute the distance between the query point embedding and class prototype as follows:

Finally, we predict the class of a query set ( ), which has the minimum distance with the class prototype:

The algorithm for Gaussian prototypical networks

Now, we will better understand the Gaussian prototypical network by going through it step by step:

Let’s say we have a dataset, D = {(x1, y1,), (x2, y2), … (xi, yi)}, where x is the feature and y is the label. Let’s say we have a binary label, which means we have only two classes, 0 and 1. We will sample data points at random without replacement from each of the classes from our dataset, D, and create our support set, S.

Similarly, we sample data points at random per class and create the query set, Q.

We will pass the support set to our embedding function, f(). The embedding function will generate the embeddings for our support set, along with the covariance matrix.

We calculate the inverse of the covariance matrix.

We compute the prototype of each class in the support set as follows:

In this equation, is the diagonal of the inverse covariance matrix, denotes the embeddings of the support set and superscript c denotes the class.

After computing the prototype of each class in the support set, we learn the embeddings for the query set, Q. Let’s say x’ is the embedding of the query point.

We calculate the distance of the query point embeddings to the class prototypes as follows:

After calculating the distance between the class prototype and query set embeddings, we predict the class of the query set as a class that has a minimum distance, as follows:

In this tutorial, we learned about the Gaussian prototypical network, which, uses embeddings, and the covariance matrix to compute the class prototype. To learn more about meta-learning in Python, check out the book Hands-On Meta-Learning with Python.