The derivative matrix

Suggested background

The definition
of differentiability in multivariable calculus is a bit
technical. There are
subtleties
to watch out for, as one has to remember the existence of the derivative
is a more stringent condition than the existence of partial derivatives.
But, in the end, if our function is nice enough so that it is differentiable,
then the derivative itself isn't too complicated.
It's a fairly straightforward generalization of the single variable derivative.

In single variable calculus, you learned that the derivative of a function $f: \R \to \R$ (confused?) at a single point
is just a real number, the rate of increase of the function
(i.e., slope of the graph) at that point.
We could think of that number as a $1 \times 1$ matrix, so if we like,
we could denote the derivative of $f(x)$ at $x=a$ as
\begin{align*}
Df(a) = \left[\diff{f}{x}(a)\right].
\end{align*}

For a scalar-valued function of multiple variables,
such as $f(x,y)$ or $f(x,y,z)$, we can think of the
partial derivatives
as the rates of increase of the function in the coordinate directions.
If the function is
differentiable,
then the derivative is simply a row matrix containing all of these partial
derivatives, which we call the matrix of partial derivatives
(also called the Jacobian matrix).
For $f: \R^n \to \R$, viewed as a $f(\vc{x})$, where $\vc{x} = (x_1,x_2,\ldots,x_n)$,
the $1 \times n$ matrix of partial derivatives at $\vc{x}=\vc{a}$ is
\begin{align*}
Df(\vc{a}) = \left[\pdiff{f}{x_1}(\vc{a}) \ \pdiff{f}{x_2}(\vc{a}) \ \ldots \
\pdiff{f}{x_n}(\vc{a})\right].
\end{align*}

The last generalization is to vector-valued functions,
$\vc{f}: \R^n \to \R^m$.
Here, $\vc{f}(\vc{x})$ is a function of the vector $\vc{x} = (x_1,x_2,\ldots,x_n)$
whose output is a vector of $m$ components. We could write $\vc{f}$
in terms of its components as
\begin{gather*}
\vc{f}(\vc{x}) = (f_1(\vc{x}),f_2(\vc{x}), \cdots, f_m(\vc{x}))
=
\left[\begin{array}{c}
f_1(\vc{x})\\f_2(\vc{x})\\ \vdots\\ f_m(\vc{x})
\end{array}\right].
\end{gather*}
(Recall that when we view vectors as matrices, we view them as column matrices,
so the components are stacked up on top of each other.)

To form the matrix of partial derivatives, we think of $\vc{f}(\vc{x})$
as column matrix, where each component is a scalar-valued function.
The matrix of partial derivatives of each component $f_i(\vc{x})$ would be a $1 \times n$ row matrix, as above.
We just stack these row matrices on top of each other to form a larger matrix.
We get that the full $m \times n$ matrix of partial derivatives at $\vc{x}=\vc{a}$ is
\begin{gather*}
D\vc{f}(\vc{a})=
\left[
\begin{array}{cccc}
\displaystyle\pdiff{f_1}{x_1}(\vc{a})&
\displaystyle\pdiff{f_1}{x_2}(\vc{a})&
\ldots &
\displaystyle\pdiff{f_1}{x_n}(\vc{a})\\
\displaystyle\pdiff{f_2}{x_1}(\vc{a})&
\displaystyle\pdiff{f_2}{x_2}(\vc{a})&
\ldots &
\displaystyle\pdiff{f_2}{x_n}(\vc{a})\\
\vdots & \vdots & \ddots & \vdots\\
\displaystyle\pdiff{f_m}{x_1}(\vc{a})&
\displaystyle\pdiff{f_m}{x_2}(\vc{a})&
\ldots &
\displaystyle\pdiff{f_m}{x_n}(\vc{a})
\end{array}
\right].
\end{gather*}

Though we should probably probably refer to the derivative of $\vc{f}$
as the linear transformation
that is associated with the matrix
$D\vc{f}(\vc{a})$, it's fine at this level to refer to the matrix of partial
derivatives $D\vc{f}(\vc{a})$ as “the derivative” of $\vc{f}$ at
the point $\vc{a}$ (assuming that $\vc{f}$ is differentiable at $\vc{a}$, of course).