Linear Algebra for Deep Learning

Linear algebra is a field of applied mathematics that is a prerequisite to reading and understanding the formal description of deep learning methods, such as in papers and textbooks.

Generally, an understanding of linear algebra (or parts thereof) is presented as a prerequisite for machine learning. Although important, this area of mathematics is seldom covered by computer science or software engineering degree programs.

In their seminal textbook on deep learning, Ian Goodfellow and others present chapters covering the prerequisite mathematical concepts for deep learning, including a chapter on linear algebra.

In this post, you will discover the crash course in linear algebra for deep learning presented in the de facto textbook on deep learning.

After reading this post, you will know:

The topics suggested as prerequisites for deep learning by experts in the field.

The progression through these topics and their culmination.

Suggestions for how to get the most out of the chapter as a crash course in linear algebra.

Deep Learning Prerequisites

The book “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville is the de facto textbook for deep learning.

In the book, the authors provide a part titled “Applied Math and Machine Learning Basics” intended to provide the background in applied mathematics and machine learning required to understand the deep learning material presented in the rest of the book.

This part of the book includes four chapters; they are:

Linear Algebra

Probability and Information Theory

Numerical Computation

Machine Learning Basics

Given the expertise of the authors of the book, it is fair to say that the chapter on linear algebra provides a well reasoned set of prerequisites for deep learning, and perhaps more generally much of machine learning.

This part of the book introduces the basic mathematical concepts needed to understand deep learning.

Therefore, we can use the topics covered in the chapter on linear algebra as a guide to the topics you may be expected to be familiar with as a deep learning and machine learning practitioner.

Linear algebra is less likely to be covered in computer science courses than other types of math, such as discrete mathematics. This is specifically called out by the authors.

Linear algebra is a branch of mathematics that is widely used throughout science and engineering. However, because linear algebra is a form of continuous rather than discrete mathematics, many computer scientists have little experience with it.

Linear Algebra Topics

As a first step, it is useful to use this as a high-level road map. The complete list of sections from the chapter are listed below.

Scalars, Vectors, Matrices and Tensors

Multiplying Matrices and Vectors

Identity and Inverse Matrices

Linear Dependence and Span

Norms

Special Kinds of Matrices and Vectors

Eigendecomposition

Singular Value Decomposition

The Moore-Penrose Pseudoinverse

The Trace Operator

The Determinant

Example: Principal Components Analysis

There’s not much value in enumerating the specifics covered in each section as the topics are mostly self explanatory, if familiar.

Progress Through Concepts

A reading of the chapter shows a progression in concepts and methods from the most primitive (vectors and matrices) to the derivation of the principal components analysis (known as PCA), a method used in machine learning.

It is a clean progression and well designed. Topics are presented with textual descriptions and consistent notation, allowing the reader to see exactly how elements come together through matrix factorization, the pseudoinverse, and ultimately PCA.

The focus is on the application of the linear algebra operations rather than theory. Although, no worked examples are given of any of the operations.

Finally, the derivation of PCA is perhaps a bit much. A beginner may want to skip this full derivation, or perhaps reduce it to the application of some of the elements learned throughout the chapter (e.g. eigendecomposition).

One area I would like to have seen covered is linear least squares and the use of various matrix algebra methods used to solve it, such as directly, LU, QR decomposition, and SVD. This might be more of a general machine learning perspective and less a deep learning perspective, and I can see why it was excluded.

Linear Algebra References

The authors also suggest two other texts to consult if further depth in linear algebra is required.

The Matrix Cookbook is a free PDF filled with the notations and equations of practically any matrix operation you can conceive.

These pages are a collection of facts (identities, approximations, inequalities, relations, …) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference.