This intermediate-level course introduces the mathematical foundations to derive Principal Component Analysis (PCA), a fundamental dimensionality reduction technique. We'll cover some basic statistics of data sets, such as mean values and variances, we'll compute distances and angles between vectors using inner products and derive orthogonal projections of data onto lower-dimensional subspaces. Using all these tools, we'll then derive PCA as a method that minimizes the average squared reconstruction error between data points and their reconstruction.
At the end of this course, you'll be familiar with important mathematical concepts and you can implement PCA all by yourself. If you’re struggling, you'll find a set of jupyter notebooks that will allow you to explore properties of the techniques and walk you through what you need to do to get on track. If you are already an expert, this course may refresh some of your knowledge.
The lectures, examples and exercises require:
1. Some ability of abstract thinking
2. Good background in linear algebra (e.g., matrix and vector algebra, linear independence, basis)
3. Basic background in multivariate calculus (e.g., partial derivatives, basic optimization)
4. Basic knowledge in python programming and numpy
Disclaimer: This course is substantially more abstract and requires more programming than the other two courses of the specialization. However, this type of abstract thinking, algebraic manipulation and programming is necessary if you want to understand and develop machine learning algorithms.

CD

I found this course really excellent. Very clear explanations with very hepful illustrations.\n\nI was looking for course on PCA, thank you for this one

From the lesson

Orthogonal Projections

In this module, we will look at orthogonal projections of vectors, which live in a high-dimensional vector space, onto lower-dimensional subspaces. This will play an important role in the next module when we derive PCA. We will start off with a geometric motivation of what an orthogonal projection is and work our way through the corresponding derivation. We will end up with a single equation that allows us to project any vector onto a lower-dimensional subspace. However, we will also understand how this equation came about. As in the other modules, we will have both pen-and-paper practice and a small programming example with a jupyter notebook.

Taught By

Marc P. Deisenroth

Lecturer in Statistical Machine Learning

Transcript

In the last video, we learned about orthogonal projections onto one-dimensional subspaces. In this video, we look at the general case of orthogonal projections onto n dimensional spaces. For this, we exploit the same concepts that worked in the one-dimensional case. Let's start with an illustration. We're going to look at a case where we have a vector x that is living in a three-dimensional space, and we define a subspace, a two-dimensional subspace, u which has basis vectors b1 and b2, which are for example, this vector, and b2 is this vector. So, we write u is spanned by b1 and b2. So, u in this case would be the plane down here, this is u. So, now we're looking at the orthogonal projection of x onto u, and we're going to denote this by pi u of x. So, that projection is going to look something like this. That's the projection point. So, this is the orthogonal projection of x onto the subspace u. So, we can already make two observations. The first thing is that because pi u of x is an element of u, it can be represented as a linear combination of the basis vectors of u. So, it means we can write pi u of x is lambda one times b1, plus lambda two times b2, for appropriate values of lambda one, and lambda two. And the second property is that the difference vector of x minus pi u of x, so this vector over here is orthogonal to u, which means it's a orthogonal to all basis vectors of u. And we can now use the inner product for this, and we can write that x minus pi u of x inner product with b1 must be zero, and the same is true for b2. But now, let's formulate our intuition for the general case, where x is a D-dimensional vector, and we are going to locate an M-dimensional subspace u. Okay. Let's derive this result. I copied our two insights up here, and I have defined two quantities, a lambda vector which consists of all these lambda i here, and a B-matrix where we just concatenate all basis vectors of our subspace u. Now, with this definition we can also write pi u of x equals B times lambda. Let's assume we use the dot products as our inner product. Now, if we use our second property, we'll get that, so pi u of x minus x inner product with b_i is now equivalently written as the inner product of B lambda minus x and bi, and this needs to be zero, where I just used the definition of pi u of x in here. So, now we can simplify this by exploiting the linearity of the inner product, and we'll get B lambda times or inner product with bi minus the inner product of x with bi needs to be zero, and this holds for i equals one to M. With the inner product, we can now write this in the following way. We can write this as lambda transpose times B transpose times bi minus x transpose times bi equals zero, for i equals one to M. And now, we can write this as a set of conditions, and if we summarize this, we would get lambda transpose times B transpose times B minus x transpose times B must be zero. Now, we need to talk here about an M-dimensional zero vector. What we would like to do now, is we would like to identify lambda. For this, we are going to right multiply the inverse of B transpose times B onto the entire equation, and then we get lambda transpose equals x transpose times B times B transpose B inverse, which then also means we can write lambda as the transpose of this entire expression, we get B transpose B inverse, so this matrix is symmetric, so its transpose is the same as the original matrix, times B transpose x. So, now we have identified lambda to be this, but we also know that our projection point can be written as B times lambda. So, this means we will get pi u of x as B times Lambda, which is B times B transpose B inverse times B transpose x. We can now identify this expression as the projection matrix similar to the one-dimensional case. And in the special case of an orthonormal basis, B transpose times B is the identity matrix. So, we would get pi u of x is B times B transpose times x. The projected vector pi u of x is still a vector in RD But we only require M coordinates, the Lambda vector over here, to represent it as a linear combination of the basis vectors of a subspace u. We also effectively got the same result as in the one-dimensional case. Remember in 1D we got lambda equals b transpose x divided by b transpose b, and we got a projection point pi u of x which was b transpose x divided by b transpose b times b. Now b transpose b is now expressed as matrix B transpose times matrix B, but we now have the inverse matrix sitting here instead of dividing by a scalar, that's the only difference between these two results. In this video, we looked at orthogonal projections of a vector onto a subspace of dimension M. We arrived at the solution by exposing two properties. We must be able to represent the projection using a linear combination of the basis of the subspace, and the difference vector between the original vector, and its projection is orthogonal to the subspace. In the next video we're going to look at a concrete example.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.