Tensors and General Covariance

Matthew Louis / September 2024 (4210 Words, 24 Minutes)

In physics, people like to throw around the term “tensor” all the time, and for a while I thought: “sounds like cool math, but that’s only really necessary for complicated stuff like general relativity”, and, although basic tensor calculus was covered in one or two of my classes, I never took the time to actually understand it, and that was okay. I didn’t fail those classes or any of my other physics classes later on, so I must’ve been right to a certain extent. Well, while it might not be strictly necessary for doing physics, learning about tensor calculus makes everything “make a lot more sense”. It also motivates some of the strange definitions of differential operators in different coordinate systems (here). Unfortunately, I’m only learning this now because I wanted to understand certain techniques of model discovery for physical systems, which require a real understanding of tensor calculus, although if I would’ve took the time to learn it earlier, it probably would’ve made a lot of my physics classes make more sense.

So What Is a Tensor?

A quick disclaimer: if you want to actually learn about tensor calculus, then please take a look at the awesome textbook by Neuenschwander (Neuenschwander, 2014), this is gonna be a whirlwind tour by comparison. The simplest example, that motivates the general definition, is a vector. These are objects that have a defined magnitude, and point in a specific direction in space. For vectors to be useful objects, they’re defined to point in the same direction regardless of the coordinate system we use to describe them. That said, to describe the same vector in different coordinate systems, we use the following transformation rule

\[\begin{equation} \label{eq:vector-rule} A^{\prime \mu} = \frac{\partial x^{\prime\mu}}{\partial x^\nu}A^\nu \end{equation}\]

The \(A^\mu\) denote the components of the vector \(\boldsymbol{A}\) in some “original” coordinate system with coordinates \(x^i\) (say cartesian coordinates \(x^1 = x, x^2 = y, x^3=z\)), and \(A^{\prime \mu}\) denotes the components of the vector described in the “new” coordinate system with coordinates \(x^{\prime i}\) (say spherical coordinates, \(x^{\prime 1}=r, x^{\prime 2}=\theta, x^{\prime 3}=\varphi\)). Note that in the above equation, we’ve used the Einstein summation convention, where repeated indices (in this case \(\nu\)) are summed over (i.e. there’s an implicit \(\sum_{\nu=1}^3 \cdots\) preceding the expression on the lefthand side), and we’ve written the transformation component-wise, which makes it look more mysterious than necessary. In matrix form, this transformation rule just says: \begin{equation} \boldsymbol{A}^\prime = [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}\nonumber \end{equation} Where \(\boldsymbol{f}:\hspace{1mm}\mathbb{R}^n\to \mathbb{R}^n\) is the transformation that expresses the change of coordinates, i.e. \(f_\mu(x^\mu) = x^{\prime \mu}\), and \([\boldsymbol{J}\boldsymbol{f}]\) denotes its (\(n\times n\)) Jacobian matrix. For the case of the cartesian to spherical coordinates transformation, we have

\[\begin{equation} \boldsymbol{f}\begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} \sqrt{x^2 + y^2 + z^2} \\ \cos^{-1}\left(\frac{z}{\sqrt{x^2 + y^2 + z^2}}\right) \\ \tan^{-1}\left(\frac{y}{x}\right) \end{pmatrix}\equiv \begin{pmatrix} r \\ \theta \\ \varphi \end{pmatrix}\nonumber \end{equation}\]

and its Jacobian (written partially in terms of spherical coordinates, and partially in terms of cartesian coordinates, for convenience) is given by

\[\begin{equation} [\boldsymbol{J}\boldsymbol{f}]=\begin{pmatrix} \frac{x}{r}& \frac{y}{r}& \frac{z}{r}\\ \frac{zx}{r^2\sqrt{x^2+y^2}} & \frac{zy}{r^2\sqrt{x^2+y^2}} & \frac{-(x^2+y^2)}{r^2\sqrt{x^2+y^2}}\\ -\frac{y}{x^2+y^2}& \frac{x}{x^2+y^2}& 0\\ \end{pmatrix}\nonumber \end{equation}\]

then, we can represent an arbitrary vector \(\boldsymbol{A}\) with components \(x,y,z\) in spherical coordinates via

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}^\prime&=\begin{pmatrix} \frac{x}{r}& \frac{y}{r}& \frac{z}{r}\\ \frac{zx}{r^2\sqrt{x^2+y^2}} & \frac{zy}{r^2\sqrt{x^2+y^2}} & \frac{-(x^2+y^2)}{r^2\sqrt{x^2+y^2}}\\ -\frac{y}{x^2+y^2}& \frac{x}{x^2+y^2}& 0\\ \end{pmatrix}\begin{pmatrix} x \\ y \\ z \end{pmatrix}\\ &=\begin{pmatrix} \frac{x^2 + y^2 + z^2}{r} \\ \left(\frac{zx}{r^2\sqrt{x^2+y^2}}\right)x+\left(\frac{zy}{r^2\sqrt{x^2+y^2}}\right)y+\left(\frac{-(x^2+y^2)}{r^2\sqrt{x^2+y^2}}\right)z \\ \left(-\frac{y}{x^2+y^2}\right)x+\left(\frac{x}{x^2+y^2}\right)y \end{pmatrix}\\ &= \begin{pmatrix} r\\ \frac{z(x^2 + y^2) - z(x^2 + y^2)}{r^2\sqrt{x^2+y^2}}\\ \frac{-yx + xy}{x^2+y^2} \end{pmatrix}\\ &= \begin{pmatrix} r\\ 0\\ 0 \end{pmatrix} \end{split} \end{equation}\]

A well known result. All vectors can be represented with just one component in spherical coordinates, because the basis vector \(\hat{\textbf{r}}\) changes with spatial location, and is defined conveniently as \begin{equation} \hat{\textbf{r}} = \frac{1}{r}\left(x\hat{\textbf{x}} + y\hat{\textbf{y}} + z\hat{\textbf{z}}\right)\nonumber \end{equation} so its not suprising that every vector is expressible as \(r\hat{\textbf{r}}\).

The important takeaway is that there are objects that represent real physical things irrespective of the coordinate system, and for them to do this, they must transform in a specific way. This transformation rule is so important, that it actually defines what a tesor is. Any tensor that obeys the transformation rule in Eq. \ref{eq:vector-rule} is called a vector, or a rank 1 contravariant tensor (because the old components depend on the new ones multiplied by the inverse Jacobian - kind of unfortunate that it’s not defined the other way around) and its components are denoted with “upstairs indices”.

Covectors

There are also objects like vectors that transform inversely

\[\begin{equation} \label{eq:covector-rule} A^{\prime}_{\mu} = \frac{\partial x^{\nu}}{\partial x^{\prime \mu}}A_\nu \end{equation}\]

or in matrix form

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}^\prime &= \left([\boldsymbol{J}\boldsymbol{f}]^{-1}\right)^\top\boldsymbol{A}\\ &= [\boldsymbol{J}\boldsymbol{f}^{-1}]^\top\boldsymbol{A} \end{split} \end{equation}\]

One example of such an object is a partial derivative (as strange as that may sound). The chain rule says

\[\nonumber \begin{equation} \begin{split} \frac{\partial}{\partial x^{\prime \mu}}&= \frac{\partial x^{\nu}}{\partial x^{\prime \mu}}\frac{\partial}{\partial x^{ \nu}} \end{split} \end{equation}\]

which is evidently the transformation rule in Eq. \ref{eq:covector-rule}. Interestingly, all basis vectors also obey this transformation rule. The reason why is a bit involved, but basically, basis vectors are partial derivatives. As a nice example, consider the normalized position vector, defined as \begin{equation} \textbf{r}=r\cos\varphi \sin \theta \hat{\textbf{x}} + r\sin \varphi \sin \theta \hat{\textbf{y}} + r\cos\theta \hat{\textbf{z}}\nonumber \end{equation} from this position vector, the (not necessarily normalized) basis vectors \(\textbf{e}_r\), \(\textbf{e}_\theta\), \(\textbf{e}_\varphi\) can be defined as

\[\nonumber \begin{equation} \begin{split} \textbf{e}_r&=\frac{\partial \textbf{r}}{\partial r}\\ &=\frac{\partial }{\partial r}\left(r\cos\varphi \sin \theta \hat{\textbf{x}} + r\sin \varphi \sin \theta \hat{\textbf{y}} + r\cos\theta \hat{\textbf{z}}\right)\\ &=\cos\varphi \sin \theta \hat{\textbf{x}} + \sin \varphi \sin \theta \hat{\textbf{y}} +\cos\theta \hat{\textbf{z}}\\ \textbf{e}_\theta&=\frac{\partial \textbf{r}}{\partial \theta}\\ &= \frac{\partial }{\partial \theta}\left(r\cos\varphi \sin \theta \hat{\textbf{x}} + r\sin \varphi \sin \theta \hat{\textbf{y}} + r\cos\theta \hat{\textbf{z}}\right)\\ &=r\cos\varphi \cos \theta \hat{\textbf{x}} + r\sin \varphi \cos \theta \hat{\textbf{y}} - r\sin\theta \hat{\textbf{z}}\\ \textbf{e}_\varphi&=\frac{\partial \textbf{r}}{\partial \varphi}\\ &= \frac{\partial }{\partial \varphi}\left(r\cos\varphi \sin \theta \hat{\textbf{x}} + r\sin \varphi \sin \theta \hat{\textbf{y}} + r\cos\theta \hat{\textbf{z}}\right)\\ &=-r\sin\varphi \sin \theta \hat{\textbf{x}} + r\cos \varphi \sin \theta \hat{\textbf{y}} \end{split} \end{equation}\]

which (when properly normalized) agree with the textbook definitions. In this way the unit vectors are partial derivatives of the position vector with respect to the coordinate directions, which explains why they trasnform using the covariant transformation rule (Eq. \ref{eq:covector-rule}).

Covariance of Vectors

Now we have everything we need to prove that vectors are real objects that transcend their description in a given coordinate system. We may express an arbitrary vector \(\boldsymbol{v}\) as \begin{equation} \boldsymbol{v} = v^\mu \hat{\textbf{e}}_{\mu}\nonumber \end{equation} which implies that, under an arbitrary smooth change of coordinates, the resulting vector \(\boldsymbol{v}'\) is

\[\nonumber \begin{equation} \begin{split} \boldsymbol{v}' &= v^{\prime \mu} \hat{\textbf{e}}^\prime_{\mu}\\ &=\left(\frac{\partial x^{\prime\mu}}{\partial x^{\nu}}v^{ \nu}\right)\left( \frac{\partial x^{\sigma}}{\partial x^{\prime \mu}}\hat{\textbf{e}}_{\sigma}\right)\\ &=\delta_\nu^\sigma v^{ \nu}\hat{\textbf{e}}_{\sigma}\hspace{5mm}\text{By the chain rule}\\ &=v^{\sigma}\hat{\textbf{e}}_{\sigma}\\ &=\boldsymbol{v} \end{split} \end{equation}\]

which makes sense, since contravariant vectors transform with the Jacobian matrix, and covariant vectors (or covectors) with its inverse, which cancel out. This is what we mean when we say that vectors are coordinate independent. It turns out that this concept of coordinate independent objects generalizes to objects that can be written as matrices as well (and even some that can’t).

Higher Rank Tensors

The tensors we discussed in the previous section were rank 1, and there’s rank 0 tensors that correspond to scalars, which are trivially coordinate independent (they’re just numbers afterall). Beyond that, there’s rank 2 tensors, which, for the longest time, I just thought were ordinary matrices. That’s not the case. While rank 2 tensors can be expressed as matrices (i.e. they have a square array of components), matrices don’t have any special transformation rules under coordinate transformations, and so it doesn’t make sense to call any matrix a tensor. Note, although matrices change form when expressed in a different basis via a similarity transformation, they don’t care about whether the coordinates are cartesian, curvilinear, etc. For a given “matrix” to transform a specific way under coordinate transformations, it must be related to the coordinates.

The Inertia Tensor

The simplest example of a rank 2 tensor is the ineria tensor, whose components are given by

\[\nonumber \begin{equation} \begin{split} I^{ij}&=\int [\delta^{ij}(\boldsymbol{r}\cdot \boldsymbol{r}) - x^i x^j]dm \end{split} \end{equation}\]

now how does the components of this object transform under coordinate transformations?

\[\nonumber \begin{equation} \begin{split} I^{\prime ij}&= \int [\delta^{\prime ij}(\boldsymbol{r}\cdot \boldsymbol{r}) - x^{\prime i} x^{\prime j}]dm\\ &= \int \left[\left(\frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^{\prime j}}{\partial x^\ell}\delta^{k \ell}\right)(\boldsymbol{r}\cdot \boldsymbol{r}) - \left(\frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^{\prime j}}{\partial x^\ell} x^{ k} x^{ \ell}\right)\right]dm\\ &= \left(\frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^{\prime j}}{\partial x^\ell}\right)\int [\delta^{ k\ell}(\boldsymbol{r}\cdot \boldsymbol{r}) - x^{ k} x^{\ell}]dm\\ &= \frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^{\prime j}}{\partial x^\ell} I^{k\ell} \end{split} \end{equation}\]

where we have (somewhat circularly) used the fact that \begin{equation} \delta^{\prime ij} = \frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^{\prime j}}{\partial x^\ell}\delta^{k \ell}\nonumber \end{equation} and the fact that since \(\boldsymbol{r}\cdot \boldsymbol{r}\) is a scalar, and so doesn’t change under coordinate transformations (despite being a function of the vector componetns, which do change under coordinate transformations). An intuitive way to think about this is that, if vectors represent “real” objects in space, then their norm (squared) shouldn’t depend on which coordinate system we choose to express them in. Interestingly, this new transformation rule is just the product of two of the contravariant transformation rules from Eq. \ref{eq:vector-rule}, so we call the tensor defined by it a (2,0) tensor, since it transforms like a product of 2 contravariant vectors and 0 covariant vectors. Generalizing this, a \((i,j)\) tensor of rank \(i+j\) (whatever that means) must transform like a product of \(i\) contravariant vectors and \(j\) covariant vectors.

A good example of a mixed character tensor: a (1,1) rank 2 tensor is the good old kronecker delta \(\delta^i_{j}\). The transformation rule seems to imply that

\[\nonumber \begin{equation} \delta^{\prime i}_{j}= \frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^\ell}{\partial x^{\prime j}}\delta^k_\ell \end{equation}\]

Now, we can actually use \(\delta^k_\ell\) to cancel out the \(\ell\) index, which gives us

\[\nonumber \begin{equation} \delta^{\prime i}_{j}= \frac{\partial x^{\prime i}}{\partial x^k}\frac{\partial x^k}{\partial x^{\prime j}} \end{equation}\]

If we write this out for the cartesian to spherical conversion, we get, for, say \(i=1, j=2\)

\[\nonumber \begin{equation} \begin{split} \delta^{\prime 1}_{2}&= \frac{\partial r}{\partial x}\frac{\partial x}{\partial \theta} + \frac{\partial r}{\partial y}\frac{\partial y}{\partial \theta} + \frac{\partial r}{\partial z}\frac{\partial z}{\partial \theta}\\ &=\frac{\partial r}{\partial \theta}\hspace{5mm}\text{By the chain rule}\\ &= 0 \end{split} \end{equation}\]

we see that for \(i=1, j=1\), we have

\[\nonumber \begin{equation} \delta^{\prime 1}_{1}= \frac{\partial r}{\partial r}=1 \end{equation}\]

and in general, if \(i\neq j\), \(\delta^{\prime i}_{j} = 0\), and \(\delta^{\prime i}_{j}=1\) if \(i=j\). So the kronecker delta is defined the same way in all coordinate systems (the object that picks out indices with \(i=j\)) and transforms as a (1,1) rank 2 tensor.

The Metric Tensor

But, wait a second, we wrote \(\delta^{\prime i}_{j}\) with one upstairs and one downstairs index (corresponding to the contra and covariantness of the index), but in the intertia tensor, we wrote \(\delta^{\prime ij}\), what’s the deal with that? Is there some way to relate \(\delta^{\prime i}_{j}\) to \(\delta^{\prime ij}\)? Literally, we’re just raising one of the indices, but what does this correspond to mathematically? While we’re at it, if there is such a relation, then what about a relation between \(x^\mu\) and \(x_\mu\) (i.e. vectors and covectors). The answer to this question is the metric tensor. This tensor (\(g_{\mu\nu}\)) gets its name from the fact that it is used to define distance

\[\nonumber \begin{equation} \begin{split} ds^2&= g_{\mu\nu}dx^\mu dx^\nu\\ &=dx_\nu dx^\nu\\ \implies dx_\nu&= g_{\mu\nu}dx^\mu \end{split} \end{equation}\]

the motivation for this definition is out of scope, but just know that it’s used for lowering indices. How do we know the coefficients of this metric tensor? In Cartesian, \(g_{\mu\nu}\) is just the identity matrix, and we can find its components in any other coordinate system by using the fact that its a (0,2) tensor. That’s good for lowering indices, but what about raising them? For that, we define the inverse of the metric tensor \(g^{\mu\nu}\) so that

\[\nonumber \begin{equation} \begin{split} g^{\mu\lambda}g_{\lambda\nu}&= \delta^\mu_{\hspace{1mm}\nu}\nonumber \end{split} \end{equation}\]

Now we multiply both sides by \(x^\nu\)

\[\nonumber \begin{equation} \begin{split} g^{\mu\lambda}g_{\lambda\nu}x^\nu&= \delta^\mu_{\hspace{1mm}\nu}x^\nu\nonumber\\ g^{\mu\lambda}x_\lambda&= x^\mu \end{split} \end{equation}\]

in other words, \(g^{\mu\nu}\) (the inverse of the metric) is used for raising indices. In Euclidean geometries, (e.g. cartesian, spherical, cylindrical, etc.) the metric tensor is always diagonal, \(g_{\mu\nu} = (h_\mu)^2 \delta_{\mu\nu}\) (no sum), where the \(h_\mu\) are known as the Lamé coefficients.

Relation Between Contravariant, Covariant, and “Ordinary” Vectors

Note, in this section, we suppress the summation convetion. After all of this, it’s necessary to clarify how the contravariant and covariant vector components are related to the “ordinary” vectors that we’re used to. Ordinary vectors are written as (where the subscripts on the components don’t mean that they’re covariant components, it’s just a plain old subscript)

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}&= \sum_i \tilde{A}_i \hat{\textbf{e}}_{x^i} \end{split} \end{equation}\]

Now, if we want teh familiar dot product definition with ordinary vectors \(\boldsymbol{A}\), \(\boldsymbol{B}\) to give the same result as the scalar product in generalized coordinates, we need

\[\nonumber \begin{equation} \begin{split} \sum_{\mu \nu}g_{\mu\nu}A^\mu B^\nu &= (h_1)^2A^1B^1 + (h_2)^2A^2B^2 + (h_3)^2 A^3B^3\\ &= \tilde{A}_1\tilde{B}_1 + \tilde{A}_2\tilde{B}_2 + \tilde{A}_3\tilde{B}_3 \end{split} \end{equation}\]

which requires that an ordinary vector component is related to the contravariant component via

\[\nonumber \begin{equation} \begin{split} A^\mu&= \frac{\tilde{A}_\mu}{h_\mu} \end{split} \end{equation}\]

and the covariant components follow from an application of the metric tensor

\[\nonumber \begin{equation} \begin{split} A_\mu &= \sum_\nu g_{\mu\nu}A^\nu\\ &= \sum_\nu (h_\mu)^2 \delta^\nu_{\mu}\frac{\tilde{A}_\nu}{h_\nu}\\ &= (h_\mu)^2 \frac{\tilde{A}_\mu}{h_\mu}\\ &= h_\mu\tilde{A}_\mu \end{split} \end{equation}\]

Derivatives of Tensors

Now, we can also think of taking derivatives of tensors, and how to do that in a coordinate transcending way so that the derivative of a vector (say) with respect to a scalar (like time) points in an objective direction in space that doesn’t depend on the choice of coordinates. Another motivating example might be trying to define the gradient of a scalar field \(\phi\) in a covariant manner. In multivariable calculus, the definition is always given in Cartesian coordinates

\[\nonumber \begin{equation} \begin{split} \nabla \phi &= \frac{\partial \phi}{\partial x}\hat{\textbf{x}} + \frac{\partial \phi}{\partial y}\hat{\textbf{y}} + \frac{\partial \phi}{\partial z}\hat{\textbf{z}} \end{split} \end{equation}\]

which might lead the unsuspecting student to think we define the gradient in spherical coordinates as

\[\nonumber \begin{equation} \begin{split} \nabla \phi &= \frac{\partial \phi}{\partial r}\hat{\textbf{r}} + \frac{\partial \phi}{\partial \theta}\hat{\boldsymbol{\theta}} + \frac{\partial \phi}{\partial \varphi}\hat{\boldsymbol{\varphi}} \end{split} \end{equation}\]

which is not correct. For some reason, we instead write

\[\nonumber \begin{equation} \begin{split} \nabla \phi &= \frac{\partial \phi}{\partial r}\hat{\textbf{r}} + \frac{1}{r}\frac{\partial \phi}{\partial \theta}\hat{\boldsymbol{\theta}} + \frac{1}{r\sin\theta}\frac{\partial \phi}{\partial \varphi}\hat{\boldsymbol{\varphi}} \end{split} \end{equation}\]

which ensures that the gradient of a scalar field \(\phi\) at a given point \(P\) points the same direction in spherical and Cartesian coordinates. Now that we’re thinking of it, the best way to ensure that the derivative of a given tensor (or scalar field, say) is coordinate independent, we just need to make sure that the result is a tensor! Ensuring that derivatives of all tensors are tensors is where the covariant derivative (so called because it’s a definition of the derivative that’s coordinate independent) comes along. For the gradient, the covariant definition is simple:

\[\nonumber \begin{equation} \begin{split} (\nabla \phi)_\mu &= \frac{1}{h_\mu}\partial_\mu \phi \hspace{5mm}\text{(no sum)} \end{split} \end{equation}\]

because we’re talking about the derivative of a scalar field. We can also construct derivatives of vector fields. For instance the time derivative of the velocity field \(\boldsymbol{u}\).

The Covariant Derivative

In general, we define such a scalar derivative of a vector component \(A^\lambda\) via

\[\nonumber \begin{equation} \begin{split} \frac{D A^\lambda}{D\tau}=\frac{dA^\lambda}{d\tau}+\Gamma^\lambda\hspace{1mm}_{\mu\nu}\frac{dx^\mu}{d\tau}A^\nu \end{split} \end{equation}\]

where, instead of just the derivative term that we’d expect (on the left), we have an additional term that involves the derivatives of the coordinates with respect to the scalar, and some \(\Gamma^\lambda\hspace{1mm}_{\mu\nu}\), called the “affine connection” or the Christoffel symbols. In normal Euclidean space, the spatial coordinates \(x\), \(y\), \(z\) don’t depend on time (although the velocity components in the coordinate directions do), so the second term cancels, and we’re left with the usual

\[\nonumber \begin{equation} \begin{split} \frac{D \boldsymbol{u}}{D\tau}&=\frac{d\boldsymbol{u}}{d\tau}\\ &=\frac{\partial\boldsymbol{u}}{\partial \tau} \end{split} \end{equation}\]

This simple reduction does not happen for more involved theories like General Relativity.

Covariant Derivatives with Respect to Coordinates

We may also think of taking derivatives of (contravariant) vector fields with respect to coordinates (maybe for defining the divergence, etc.), which is defined via

\[\nonumber \begin{equation} \begin{split} \frac{D A^\lambda}{Dx^\mu}=\partial_\mu A^\lambda+\Gamma^\lambda\hspace{1mm}_{\mu\nu}A^\nu \end{split} \end{equation}\]

likewise, the derivative of a covariant vector component with respect to a coordinate is

\[\nonumber \begin{equation} \begin{split} \frac{D A_\lambda}{Dx^\mu}=\partial_\mu A_\lambda - \Gamma^\alpha\hspace{1mm}_{\lambda\mu}A_\alpha \end{split} \end{equation}\]

Again, it’s not the naïve definition with only the partial derivatives, but also an extra term that ensures the result transforms as a tensor (in this case, a rank 2 tensor).

The Affine connection

We’ve written down this “affine connection” term enough times, we’d better define it. The affine connection essentially bridges two coordinate systems under a coordinate transformation, and is defined via

\[\nonumber \begin{equation} \begin{split} \Gamma^\lambda\hspace{1mm}_{\mu\nu}=\frac{\partial x^\lambda}{\partial x^{\prime \rho}}\frac{\partial^2 x^{\prime \rho}}{\partial x^\mu \partial x^\nu} \end{split} \end{equation}\]

which contains a factor of second derivatives of the new coordinates with respect to the old ones, and somehow quantifies curvature. If the second derivatives of the change of coordinates are zero, then the Christoffel symbols vanish, and the covariant derivative is just the partial derivative (neat), but even in Eucliden spaces, they can be nonzero, for example when transforming from cartesian to spherical coordinates. Most often, we want to compute a derivative while working entirely in a single coordinate system, and the definition above inherently involves a transformation of coordinates. Thankfully, there’s a way to define the affine connection in terms of the metric tensor

\[\nonumber \begin{equation} \begin{split} \Gamma^\lambda\hspace{1mm}_{\mu\nu}=\frac{1}{2}g^{\lambda \rho}[\partial_\mu g_{\nu\rho} + \partial_\nu g_{\mu\rho} - \partial_\rho g_{\mu\nu}] \end{split} \end{equation}\]

Note the only index that’s summed over is \(\rho\).

Covariant Derivatives of Higher-Order Tensors

We can likewise define covariant derivatives of higher order derivatives of arbitrary character, for example, these demonstrate the general rule

\[\nonumber \begin{equation} \begin{split} \frac{D T^{\mu\nu}}{Dx^\beta}&= \partial_\beta T^{\mu\nu} + \Gamma^\mu\hspace{1mm}_{\alpha\beta}T^{\alpha\nu} + \Gamma^\nu\hspace{1mm}_{\alpha\beta}T^{\mu\alpha}\\ \frac{D T^{\mu\nu}\hspace{1mm}_\sigma}{Dx^\beta}&= \partial_\beta T^{\mu\nu}\hspace{1mm}_\sigma + \Gamma^\mu\hspace{1mm}_{\alpha\beta}T^{\alpha\nu}\hspace{1mm}_\sigma + \Gamma^\nu\hspace{1mm}_{\alpha\beta}T^{\mu\alpha}\hspace{1mm}_\sigma - \Gamma^\alpha\hspace{1mm}_{\sigma\beta}T^{\mu\nu}\hspace{1mm}_\alpha \end{split} \end{equation}\]

in general, you get one “\(+\Gamma\)” term for each contravariant index, and a “\(-\Gamma\)” for each covariant index.

Covariant Definitions of Grad, Div, Curl, and all that

Now that we have these covariant derivatives in hand, we can write down the covariant definitions of our beloved differentail operators from vector calculus.

The Covariant Divergence

The covariant definition of divergence reads

\[\nonumber \begin{equation} \begin{split} (\nabla\cdot \boldsymbol{A})&= D_\mu A^\mu\\ &= \partial_\mu A^\mu + \Gamma^\mu\hspace{1mm}_{\nu\mu}A^\nu \end{split} \end{equation}\]

that’s cool, but involves computing the Christoffel symbols, which can be a hassle. Thanks to the patience of our differential geometry predecessors, we have the following simplified formula written in terms of the metric

\[\nonumber \begin{equation} \begin{split} D_\mu A^\mu&= \frac{1}{\sqrt{\left|g\right|}}\partial_\mu(\sqrt{\left|g\right|}A^\mu) \end{split} \end{equation}\]

where \(\sqrt{\lvert g\rvert}\) denotes the square root of the absolute value of the determinant of the metric, which is conveniently related to the determinant of the Jacobian matrix (under a coordinate transformation) via

\[\nonumber \begin{equation} \begin{split} \det{[\boldsymbol{J}\boldsymbol{f}]}&= \sqrt{\frac{\left|g\right|}{\left|g^\prime\right|}} \end{split} \end{equation}\]

and is also used to transform the differentials in integrals under a change of coordinates via

\[\nonumber \begin{equation} \begin{split} \sqrt{\left|g'\right|}dx^\prime dy^\prime = \sqrt{\left|g\right|}dxdy \end{split} \end{equation}\]

in case you ever run across this strange notation again.

The Covariant Curl

The curl of a contravariant vector field is defined in a coordinate transcending way via

\[\nonumber \begin{equation} \begin{split} (\nabla \times \boldsymbol{A})_\mu = \varepsilon_{\mu\nu\lambda}D^\nu A^\lambda \end{split} \end{equation}\]

the result of which notably transforms as a covariant vector. And for a covariant vector field

\[\nonumber \begin{equation} \begin{split} (\nabla \times \boldsymbol{A})^\mu = \varepsilon^{\mu\nu\lambda}D_\nu A_\lambda \end{split} \end{equation}\]

the result of which transforms as a contravariant vector.

The Covariant Laplacian

Probably the most widely used operator in physics, the Laplacian of a scalar field’s covariant definition is

\[\nonumber \begin{equation} \begin{split} \nabla^2 \phi &= D_\mu D^\mu \phi\\ &= \frac{1}{\sqrt{\lvert g \rvert}}\partial_\mu (\sqrt{\lvert g \rvert} \partial^\mu \phi) \end{split} \end{equation}\]

As an example, we can use this covariant definition to write down the Laplacian (of a scalar field) in spherical coordinates in just a few lines (rather than a few pages, when I did this for this fist time in a hw problem before learning about tensor calculus). First off, we need to compute the metric tensor in spherical coordinates. Recall the transformation rule

\[\nonumber \begin{equation} \begin{split} g_{\mu\nu}' &= \frac{\partial x^\rho}{\partial x^{\prime \mu}}\frac{\partial x^\sigma}{\partial x^{\prime \nu}}g_{\rho\sigma}\\ &= [\boldsymbol{J}\boldsymbol{f}]^{-1}_{\rho\mu}g_{\rho\sigma}[\boldsymbol{J}\boldsymbol{f}]^{-1}_{\sigma\nu}\\ &=\left([\boldsymbol{J}\boldsymbol{f}]^{-1}\right)^\top_{\mu\rho}g_{\rho\sigma}[\boldsymbol{J}\boldsymbol{f}]^{-1}_{\sigma\nu}\\ \implies \boldsymbol{G}^\prime &= \left([\boldsymbol{J}\boldsymbol{f}]^{-1}\right)^\top\boldsymbol{G}[\boldsymbol{J}\boldsymbol{f}]^{-1} \end{split} \end{equation}\]

Now, since we know that \(\boldsymbol{G} = \boldsymbol{I}\) in cartesian coordinates, we only need to compute the inverse Jacobian from cartesian to spherical coordinates, this time purely in terms of spherical coordinates. After computing a bunch of partial derivatives, we get

\[\nonumber \begin{equation} \begin{split} [\boldsymbol{J}\boldsymbol{f}]^{-1}&= \begin{pmatrix} \cos\varphi\sin\theta & r\cos\varphi\cos\theta & -r\sin\varphi\sin\theta\\ \sin\varphi\sin\theta & r\sin\varphi\cos\theta & r\cos\varphi\sin\theta\\ \cos\theta & -r\sin\theta & 0 \end{pmatrix} \end{split} \end{equation}\]

from which we can compute the metric

\[\nonumber \begin{equation} \begin{split} \boldsymbol{G}^\prime&= \begin{pmatrix} \cos\varphi\sin\theta & r\cos\varphi\cos\theta & -r\sin\varphi\sin\theta\\ \sin\varphi\sin\theta & r\sin\varphi\cos\theta & r\cos\varphi\sin\theta\\ \cos\theta & -r\sin\theta & 0 \end{pmatrix}^\top \begin{pmatrix} \cos\varphi\sin\theta & r\cos\varphi\cos\theta & -r\sin\varphi\sin\theta\\ \sin\varphi\sin\theta & r\sin\varphi\cos\theta & r\cos\varphi\sin\theta\\ \cos\theta & -r\sin\theta & 0 \end{pmatrix}\\ &=\begin{pmatrix} 1 & 0 & 0\\ 0 & r^2 & 0\\ 0 & 0 & r^2\sin^2\theta \end{pmatrix}\\ \implies \left(\boldsymbol{G}^\prime\right)^{-1}&=\begin{pmatrix} 1 & 0 & 0\\ 0 & \frac{1}{r^2} & 0\\ 0 & 0 & \frac{1}{r^2\sin^2\theta} \end{pmatrix} \end{split} \end{equation}\]

Now, we can write down the Laplacian in spherical coordinates

\[\nonumber \begin{equation} \begin{split} \nabla^2\phi &= \frac{1}{r^2\sin\theta}\left[\partial_r(r^2\sin\theta \partial^r\phi) + \partial_\theta(r^2\sin\theta \partial^\theta \phi) + \partial_\varphi(r^2\sin\theta \partial^\varphi\phi)\right]\\ &= \frac{1}{r^2}\partial_r(r^2\partial^r \phi) + \frac{1}{\sin\theta}\partial_\theta(\sin\theta\partial^\theta \phi) + \partial_\varphi\partial^\varphi \phi \end{split} \end{equation}\]

Hold on a second… this doesn’t look like the good old Laplacian in spherical coordinates. Guess this was all for nothing. Not quite. We still have to relate the \(\partial^\mu\) terms to their duals: \(\partial_\mu\) via the metric

\[\nonumber \begin{equation} \begin{split} \partial^\mu &= g^{\mu\nu}\partial_\nu\\ \implies \partial^r &= g^{r\nu}\partial_\nu\\ &= \frac{1}{(h_r)^2}\partial_r\\ &= \partial_r\\ \implies \partial^\mu &= \frac{1}{(h_\mu)^2}\partial_\mu\hspace{5mm}\text{(no sum)}\\ \partial^\theta &= \frac{1}{r^2}\partial_\theta\\ \partial^\varphi &= \frac{1}{r^2\sin^2\theta}\partial_\varphi \end{split} \end{equation}\]

substituting these into the expression for the Laplacian, we get

\[\nonumber \begin{equation} \begin{split} \nabla^2\phi &= \frac{1}{r^2}\partial_r(r^2\partial_r \phi) + \frac{1}{r^2\sin\theta}\partial_\theta(\sin\theta\partial_\theta \phi) + \frac{1}{r^2\sin^2\theta}\partial^2_\varphi\phi \end{split} \end{equation}\]

in accordance with the standard definition here.

The Covariant Vector Laplacian

We can also write the Laplacian of a contravariant vector field (for example the momentum diffusion term in the Navier Stokes equation: \(\nu \nabla^2 \boldsymbol{u}\))

\[\nonumber \begin{equation} \begin{split} \nabla^2A^\mu &= D_\nu D^\nu A^\mu - R^\mu\hspace{1mm}_\lambda A^\lambda \end{split} \end{equation}\]

Where \(R^\mu\hspace{1mm}_\lambda\) is the Ricci curvative tensor, defined in terms of the Riemann tensor

\[\nonumber \begin{equation} \begin{split} R^\mu\hspace{1mm}_\lambda&= R^\lambda\hspace{1mm}_\mu\lambda\nu \end{split} \end{equation}\]

where the Riemann tensor is given by

\[\nonumber \begin{equation} \begin{split} R^\rho\hspace{1mm}_{\sigma\mu\nu}&= \partial_\mu \Gamma^\rho\hspace{1mm}_{\nu\sigma} - \partial_\nu\Gamma^\rho\hspace{1mm}_{\mu\sigma} + \Gamma^\rho\hspace{1mm}_{\mu\lambda}\Gamma^\lambda\hspace{1mm}_{\nu\sigma} - \Gamma^\rho\hspace{1mm}_{\nu\lambda}\Gamma^\lambda\hspace{1mm}_{\mu\rho} \end{split} \end{equation}\]

Thankfully, even in curvilinear corodinates (and in fact in all Euclidean geometries) the Ricci tensor is zero.

The Covariant Directional Derivative

The covariant definition of directional derivative of a contravariant vector field along a contravariant direction vector is

\[\nonumber \begin{equation} \begin{split} (\boldsymbol{A}\cdot \nabla)B^\mu &= A^\nu D_\nu B^\mu\\ &= A^\nu(\partial_\nu B^\mu + \Gamma^\mu\hspace{1mm}_{\lambda\nu}B^\lambda) \end{split} \end{equation}\]

where we see again that the frist term (just the components of \(\boldsymbol{A}\) multiplying the respective partials of \(B^\mu\)) is the naïve definition, and is supplemented by a term proportional to the Christoffel symbols.

The Covariant Tensor Divergence

Finally, we can define the divergence of a (2,0) Tensor (whose result is a contravariant rank 1 tensor) via

\[\nonumber \begin{equation} \begin{split} (\nabla \cdot T)^\nu &= D_\mu T^{\mu\nu}\\ &=\partial_\mu T^{\mu\nu} + \Gamma^\mu\hspace{1mm}_{\mu\lambda}T^{\lambda\nu}+ \Gamma^\nu\hspace{1mm}_{\mu\alpha}T^{\mu\alpha} \end{split} \end{equation}\]

which is useful for quantities like the stress tensor \(\boldsymbol{\sigma}\) in Continuum Mechanics.

The Principle of General Covariance

The principle of general covariance (in physics) says that the form of the physical laws should be invariant under arbitrary differentiable coordinate transformations. Coordinates don’t exist in the real world, and are merely construct that we impose on space so that we can express the equation, and so the statement of any physical law should not depend on them. It sounds simple enough, but has wide reaching consequences. While this principel doesn’t strictly require that all physical equations be formulated in terms of tensors, they’re a natural choice because they’re inherently objects that are invariant under coordinate transformations (though their components can and do change). Further, if you’re formulating equations in terms of tensors, they must all be of the same character, so that both sides of the equation transform the same under coordinate transformations, and all of the transformation coefficients cancel out - leaving the same governing equation. This is why we went through so much trouble to give covariant definitions of differential operators - i.e. to ensure that they transform as tensors, so that we can formulate differential equations in a covariant manner. In other words, so we can write physical laws. Just to demonstrate the importance of this, consider the heat equation

\[\nonumber \begin{equation} \begin{split} \frac{\partial T}{\partial t}&= \alpha \nabla^2T \end{split} \end{equation}\]

Now, it’s understood that \(\nabla^2T=D_\mu D^\mu T\), but if we, say forgot about this definition, and used the naïve one \(\nabla^2T= \partial^2_\mu T\), we would have

\[\nonumber \begin{equation} \begin{split} \frac{\partial T}{\partial t}&= \alpha \left(\frac{\partial^2T}{\partial x^2} + \frac{\partial^2T}{\partial y^2} + \frac{\partial^2T}{\partial z^2}\right) \end{split} \end{equation}\]

in Cartesian coordinates, and

\[\nonumber \begin{equation} \begin{split} \frac{\partial T}{\partial t}&= \alpha \left(\frac{\partial^2T}{\partial r^2} + \frac{\partial^2T}{\partial \theta^2} + \frac{\partial^2T}{\partial \varphi^2}\right) \end{split} \end{equation}\]

in spherical coordinates. Seems fine, right? Well… say you were to solve the equation in cartesian coordinates and get \(T_c\), and solve it again in spherical coordinates, obtaining \(T_s\). If the phsical law is any good, \(T_s\) and \(T_c\) better give the same spatial temperature distribution, just described in different coordinate systems. So, if you write \(T_c\) in spherical coordinates, you’d better have \(T_c=T_s\) (and vice versa), but in this case you don’t. Somehow the laws of physics depend on the coordinate system you’re working in - which are purely artificial concepts. That’s no law of physics. That’s why it’s so important to formulate our equations in a covariant way, and to have covariant definitions for differential operators. Learning this now, I find it a little bit frustrating that I was never given the covariant definition of things like the Laplacian in multivariable calculus (though I realize it’d be infeasible), and was just told “here’s how you define it in Cartesian coordinates… it’ll be different in other coordinate systems, but just trust us and look it up in this table”.

Relation to Model Discovery

All this to say that, in model discovery: when you’re trying to construct terms to form a partial differential equation that describes your system, you’d better make sure that your equation is covariant, and all terms are tensors of the same character, otherwise your physical model is worthless.

Conclusion

I commend anyone who’s read this far. That was a (brief?) tour of tensor calculus and how it relates to the principle of general covariance. I honestly really enjoyed finally learning tensor calculus properly, and hope I’ll have the chance to really stress test my knowledge with Non-Euclidean spaces and maybe try my hand at some General Relativity.

References

Neuenschwander, D. E. (2014). Tensor calculus for physics (Vol. 422). John Hopkins University Press.

Comments

If you're logged in, you may write a comment here. (markdown formatting is supported)

No comments yet.