What is the importance and intuition behind the the contraction operator on tensors (or the trace of a matrix, for that matter)?

In addition, I see that one of the requirements for a covariant derivative (in the context of connections) is to commute with contraction. Why is that a natural requirement (like it would be for, say, the product rule)?

6 Answers
6

I like the question. Below is a somewhat sketchy version of how I see this.

I think the importance of tensors and contraction of tensors originates from trying to do basic differential geometry or vector calculus from a co-ordinate-free point of view. The most basic objects are curves and velocity vectors of curves. The key observation is that the set of all possible velocity vectors at a single point in space is naturally an abstract vector space. Given any set of co-ordinates, the $n$ velocity vectors you get by holding all but one co-ordinates fixed form a basis of this vector space. If you change co-ordinates, then chain rule leads to the appropriate change of basis formula for velocity vectors. This space of all velocity vectors at a point in space is also known as the tangent space at that point. The construction described above defines the tangent space at each point in a smooth manifold.

But once you have a naturally defined abstract vector space, all of linear and multilinear algebra becomes available for use. For example, the dual space to the space of velocity vectors is the cotangent space, and at each point there is a natural contraction between a tangent vector and a cotangent vector. It is also natural to consider tensor products of copies of the tangent and cotangent spaces, leading to higher order tensors. Whether this is useful or not is not immediately obvious and becomes clear only after you go on further and discover how to construct tensors that have important geometric, physical, or topological meaning. But the upshot is that I like to think of a smooth manifold as being, among other things, a parameterized family of abstract vector spaces. Of course, that is exactly what a vector bundle is.

As for the covariant derivative, this can be introduced on its own. It arises, because if you try to find a natural co-ordinate-free way to differentiate vector fields, the best you can do is the Lie derivative or Lie bracket of two vector fields and a closer study of this shows that it is limited in its properties and usefulness. In particular, you want to, say, be able to define the directional derivative of one vector field $V$ in the direction of another vector field $W$. Moreover, just as the directional derivative at a point of a function in the direction $W$ depends only on the value of $W$ at that point (and not a neighborhood), you want the same property for the directional derivative of $V$. Moreover, the Leibniz rule for a scalar multiple of a vector field is a natural property to want. But you find that these properties don't determine a unique connection, so a connection is an additional geometric assumption. This is somehow unsatisfying.

The covariant derivative becomes more compelling, when it is introduced in the context of a Riemannian metric, which given the discussion about tangent vectors above, is a natural extension of the concept of Euclidean space and the inner product. Note that the Riemannian metric, as defined, is naturally a tensor. Next, there is something that, as far as I know is simply a miracle, namely that there is a unique naturally compatible torsion-free connection known as the Levi-Civita connection. After this, the value of tensors and the Levi-Civita connection is justified both because they correspond to naturally defined and interesting geometric concepts for submanifolds of Euclidean space and because they lead to so much insight and deep results in geometry, analysis, and topology.

I'm not sure what you're looking for, but maybe this will help. I am completely avoiding any mention of coordinates or bases.

A connection on a vector bundle determines and is determined by the corresponding covariant derivative on sections of $V$.

Connections on vector bundles $V$ and $W$ determine a connection on $V\otimes W$. In terms of covariant derivatives the rule is $\nabla(v\otimes w)=\nabla v\otimes w+v\otimes \nabla w$, just like the usual product rule.

A connection on $V$ determines a connection on the dual $V^\ast$. If we write $<\omega,v>$ for the result of evaluating $\omega\in V^\ast$ on $v\in V$, then the rule is
$$ d<\omega,v>=<\nabla\omega,v>+<\omega, \nabla v>.$$ Here $d$ denotes the usual derivative of functions; if we like we can call it the covariant derivative for the trivial connection on the trivial bundle $1$ with fiber $\mathbb R$. This equation may then be read as saying that the canonical "contraction" map $V\otimes V^\ast\to 1$ is compatible with connections: the result is the same whether you first contract $\omega\otimes v$ and then differentiate or first differentiate and then contract. (Of course, the equation may also be used as a definition of the connection on $V^\ast$ determined by a connection on $V$, by rewriting: $<\nabla\omega,v>=d<\omega,v>-<\omega, \nabla v>$.)

Connections on $V$ and $W$ also determine a connection on $Hom(V,W)$. The rule can be given as $\nabla(Tv)=(\nabla T)(v)+T(\nabla v)$. This generalizes the rule for $V^\ast=Hom(V,1)$. It also agrees with the rule for tensor products, if we identify $Hom(V,W)$ with $V^\ast\otimes W$. Note that a section $T$ of $Hom(V,W)$ satisfies $\nabla T=0$ if and only if this map $T:V\to W$ satisfies $\nabla (Tv)=T(\nabla v)$.

The contraction operator $\omega\otimes v\mapsto <\omega,v>$ is a section of $Hom(V\otimes V^\ast,1)$. The displayed equation says that the covariant derivative of this section has zero.

In general a connection on $V$ determines connections on all the bundles you can make from $V$ by dualizing and tensoring (and symmetrizing). It works out that every contraction operator, for example the various maps $V\otimes V\otimes V\otimes V^\ast \otimes V^\ast\to V\otimes V\otimes V^\ast$, will have zero derivative,

The usual connection on a Riemannian manifold is chosen to be compatible with the inner product in the sense that $d(v.w)=\nabla v.w+v.\nabla w$. In other words, it is chosen so that the inner product, as a section of $T^\ast\otimes T^\ast=(T\otimes T)^\ast$, has zero derivative.

Yes, up to a locally constant factor. Equivalently the identity $V\to V$ is the only map, up to such a factor, that commutes with parallel transport for every connection on $V$.
–
Tom GoodwillieJan 26 '13 at 17:05

Ah, put that way it is pretty obvious! This gives some sort of explanation for the importance of contraction. (Another, connection-free way to put this «explanation» is that the identity map is obviously an important gadget, and contraction is just the identity up to transposition; if one likes one's tensors to be elements of $V^{\otimes r}\otimes V^{*\otimes s}$, as is classically done, then it is the closest to the identity that one gets)
–
Mariano Suárez-Alvarez♦Jan 26 '13 at 17:13

So what is the intuition or motivation for defining $(\nabla T)(v) = \nabla(Tv) - T(\nabla v)$ for $T\in \mathrm{Hom}(V,W)$? :) It is reminiscent of the definition of action of a group on the same space and thus it seems that the explanation boils down to naturality in the sense of Peter Michor's answer.
–
Vít TučekDec 16 '13 at 13:08

Consider the category of $n$-dimensional smooth manifolds and local diffeomorphisms between them. A natural vector bundle is a functor on this category which associates a vector bundle over $M$ to each manifold $M$ and a vector bundle homomorphism over $f$ to each local diffeomorphism. Examples are the tangent bundle, the cotangent bundle, and all tensor bundles.
Indeed, each natural bundle is associated to a (higher order) frame bundle with respect to a representation of the (higher order) jet group.

Now for first order natural bundles (i.e., tensor bundles) contractions and permutations of the tensor orders (order of the indices) are the only natural transformations.
See this book (pdf) for all this.
Covariant derivatives should be equivariant with respect to all natural transformations
(that is the meaning of covariant).

As for the trace on matrices, $Trace: \mathfrak g\mathfrak l(n)\to \mathbb R$, is the simplest
(and only nontrivial) Lie algebra homomorphism; it is the infinitesimal version of the determinant. This is of course the source of the naturality described above.

This is mainly a rehash of previous answers. In the realm of Riemannian
geometry, we have the Riemannian covariant derivative $\nabla$ and, besides
the contraction of covariant with contravariant tensors, the metric
contraction of tensors of the same type, i.e., using the isomorphism
$g:TM\rightarrow T^{\ast}M$. Let $X,Y,Z$ denote vector fields and $f$ be a
function. The basic contraction is: If $\alpha$ is a $1$-form, then
$\alpha(Y)$ is a function.

We have $\nabla_{X}f=X(f)$, i.e., $\nabla f=df$. Thinking of a contraction as
a product, commuting with contraction (CWC) is like the product rule:
$X(\alpha(Y))=(\nabla_{X}\alpha)(Y)+\alpha(\nabla_{X}Y)$. This defines the
covariant derivative of a $1$-form. Similarly, for a $2$-tensor $\beta$ we
have: $X(\beta(Y,Z))=(\nabla_{X}\beta)(Y,Z)+\beta(\nabla_{X}Y,Z)+\beta
(Y,\nabla_{X}Z)$ by CWC; think of $\beta(Y,Z)$ as $\beta\cdot Y\cdot Z$ and
apply $\nabla_{X}$ to it with the product rule in effect. The compatibility of
the $\nabla$ with $g$ is usually written as: $X(g(Y,Z))=g(\nabla
_{X}Y,Z)+g(Y,\nabla_{X}Z)$, which is equivalent to $\nabla_{X}g=0$.

At a point $x$, the trace, or metric contraction, of a $2$-tensor $\beta$ is
given by the following formula:
$$
\operatorname{Trace}{}_{g}(\beta)=\frac{1}{\omega_{n}}\int_{S^{n-1}}
\beta(V,V)d\sigma(V),
$$
where $S^{n-1}\subset T_{x}M$ is the unit $\left( n-1\right) $-sphere,
$n\omega_{n}$ is its volume, and $d\sigma$ is its volume form. Without loss of
generality, we may assume that we are in $\mathbb{R}^{n}$, in which case
$\beta(V,V)=\sum_{i,j=1}^{n}\beta_{ij}V_{i}V_{j}$. The formula follows from
$\int_{S^{n-1}}V_{i}V_{j}d\sigma(V)=0$ for $i\neq j$ and $n\int_{S^{n-1}}
V_{i}^{2}d\sigma(V)=\int_{S^{n-1}}|V|^{2}d\sigma(V)=\omega_{n}$ for each $i$.

In this way we see that the Ricci $2$-tensor $\operatorname{Ric}$ is an
average of the Riemann curvature $4$-tensor $\operatorname{Rm}$, since
$\operatorname{Ric}=\operatorname{tr}_{1,4}\operatorname{Rm}$. More
geometrically, the Ricci curvature of a line $L$ in $T_{x}M$ is the average of
all sectional curvatures of $2$-planes in $T_{x}M$ containing $L$. Similarly,
the scalar curvature function $R$ is the average of all Ricci curvatures of
lines in $T_{x}M$.

Another way the trace enters is: For a family of
invertible square matrices $A(t)$ we have Jacobi's formula: $\frac{d}{dt}\det
A=\det A\operatorname{tr}(A^{-1}\frac{dA}{dt})$ (using Cramer's rule). Since
the Riemannian measure is $d\mu_{g}=\sqrt{\det g_{ij}}dx^{1}\cdots dx^{n}$, if
we vary a metric by $\frac{\partial}{\partial s}g=v$, then its measure varies
by $\frac{\partial}{\partial s}d\mu_{g}=\frac{\operatorname{tr}_{g}v}{2}
d\mu_{g}$.

The covariant derivative takes a degree $r$ tensor $T$ to the degree $r+1$
tensor $\nabla T$. By tracing we have a differential operator that decreases
the degree by $1$: The divergence is $\operatorname{div}T=\operatorname{tr}
_{1,2}\nabla T$. Tracing also allows us to average the Hessian: The (rough)
Laplacian of $T$ is $\Delta T=\operatorname{tr}_{1,2}\nabla^{2}T$. Another
example is: If $v=\mathcal{L}_{X}g$, then $\frac{\operatorname{tr}_{g}v}
{2}=\operatorname{div}X$.

The trace also arises when considering the irreducible decomposition of a
tensor. For example, given a symmetric $2$-tensor $\beta$, we may write
$\beta=(\beta-\frac{1}{n}(\operatorname{tr}_{g}\beta)g)+\frac{1}
{n}(\operatorname{tr}_{g}\beta)g$. Here, the norm of the trace-free part
$|\beta-\frac{1}{n}(\operatorname{tr}_{g}\beta)g|$ is a measure of how far
$\beta$ is from a multiple of $g$. If $\beta=\operatorname{Ric}$ and $n\geq3$,
then its trace-free part is zero iff $g$ is Einstein.

Re the most basic question, on the motivation for the trace of a matrix, the fundamental reason we care is because it's a characteristic of the matrix that's independent of the choice of basis.

As a physical example, when Schwarzschild found the metric for the spacetime surrounding a spherical body, he found that if the body was compact enough, the metric's elements would blow up at some radius $r>0$, and so would some elements of the Ricci curvature tensor $R_{jk}$. Did this mean that the fabric of spacetime was torn apart at this $r$? No, because the singularity depended on his choice of coordinates; by switching to different coordinates, the singularity could be eliminated. A typical technique in relativity to prove that a singularity is physical (not just a coordinate singularity) is to take contractions such as $R^i{}_i$ and show that they blow up. Scalars like this are coordinate-independent, so if they blow up, it's not just because of a bad choice of coordinates. This is how we know, for instance, that the singularity at $r=0$ for a black hole is real and not an artifact of the choice of coordinates.

The most common application of contracting an index (as opposed to contracting two indices of the same tensor) is simply to find the norm of a vector, $x^ix_i$. Again, the reason we care is because the norm is independent of the choice of basis.

A very common physical application of contractions involving derivatives is continuity equations, e.g., $\nabla_m J^m=0$, which expresses charge conservation in terms of the four-dimensional charge-current vector.

It is a natural requirement and is more-ore less equivalent to the natural analog of the leibniz rule. Let us consider the following example:
$$
d g(v,u) = \nabla g(u, v)+ g(\nabla u, v)+ g(u, \nabla v) \hspace{1cm} (\ast)$$

$g$ is a $(0,2)$-tensor and $u,v$ are two vector fields. In the index notation this formula reads

The left hand side of the formula is "you first contract and then differentiate", the right hand side of the formula is "you first differentiate using the Leibniz rule for the product and then contract".

I do not think you need an explanation why the Leibniz rule is a natural condition.