Tensor Calculus and the Second Order Chain Rule

Matthew Louis / September 2024 (2493 Words, 14 Minutes)

Introduction

In the three years since taking multivariable calculus, I hadn’t relaly used the second order chain rule (for partial derivatives), and until taking my first graduate partial differential equations course, hadn’t realized that I had been doing it wrong.

The Second Order Chain Rule

The first order chain rule is easy enough to write out (and it’s the basis for the covariant transformation rule I talked about in my last blog post)

\[\nonumber \begin{equation} \begin{split} \frac{\partial}{\partial x^{\prime \mu}}&= \frac{\partial x^\nu}{\partial x^{\prime \mu}}\frac{\partial}{\partial x^\nu}\nonumber\\ \partial_{x^{\prime \mu}}&= \frac{\partial x^\nu}{\partial x^{\prime \mu}}\partial_{x^\nu} \end{split} \end{equation}\]

Okay, that’s great. This just says that the partial derivative operators in the “new” coordinates are related to those in the “old” coordinates by the inverse of the Jacobian of the transformation $\boldsymbol{f}:\hspace{1mm}\mathbb{R}^n\mapsto\mathbb{R}^n$, $f^i(\boldsymbol{x})=x^{\prime i}$. To derive an analogous rule for the second order derivatives, we just apply another derivative operator to the transformation rule:

\[\nonumber \begin{equation} \begin{split} \partial_{x^{\prime \mu}}\partial_{x^{\prime \nu}}&=\partial_{x^{\prime \mu}}\left(\frac{\partial x^\sigma}{\partial x^{\prime \nu}} \partial_{x^\sigma}\right)\nonumber\\ &=\partial_{x^{\prime \mu}}\left(\frac{\partial x^\sigma}{\partial x^{\prime \nu}}\right)\partial_{x^\sigma} + \frac{\partial x^\sigma}{\partial x^{\prime \nu}}\partial_{x^{\prime \mu}}\partial_{x^{\prime \sigma}}\\ &= \frac{\partial^2x^\sigma}{\partial x^{\prime \mu}\partial x^{\prime \nu}}\partial_{x^\sigma} + \frac{\partial x^\rho}{\partial x^{\prime\mu }}\frac{\partial x^\sigma}{\partial x^{\prime \nu}}\partial_{x^\rho}\partial_{x^\sigma} \end{split} \end{equation}\]

This time, for coordinate transformations that have nonzero curvature, the second order partials in the new coordinates depend on the second order partials in the old coordinates and the first order partials (so this is no rank 2 tensor, as discussed in my last blog post). This is kind of a mess to calculuate by hand, and for only two variables, the right hand side has 6 terms! In partial differential equations, it’s important to be able to easily predict how the coefficients of the various partial derivatives change under coordinate transformations - in fact, this is the key to classifying PDEs by their standard form, but from the transformation rule above, it’s unclear how the coefficients of the partial derivatives actually transform. Note, the above rule only shows how the partial derivative operators themselves transform, not their coefficients. As a simple example, for first order partials, we have

\[\begin{equation} \label{eq:example-transformation-rule} \begin{split} \begin{pmatrix} \frac{\partial }{\partial x}\\ \frac{\partial}{\partial y} \end{pmatrix}&=[\boldsymbol{J}\boldsymbol{f}]^\top\begin{pmatrix} \frac{\partial}{\partial x^\prime}\\ \frac{\partial}{\partial y^\prime} \end{pmatrix}\\ \begin{pmatrix} \frac{\partial }{\partial x}\\ \frac{\partial}{\partial y} \end{pmatrix}&=\begin{pmatrix} \frac{\partial x^\prime}{\partial x} & \frac{\partial y^\prime}{\partial x}\\ \frac{\partial x^\prime}{\partial y} & \frac{\partial y^\prime}{\partial y} \end{pmatrix}\begin{pmatrix} \frac{\partial}{\partial x^\prime}\\ \frac{\partial}{\partial y^\prime} \end{pmatrix} \end{split} \end{equation}\]

Now, usually we have a term like

\[\nonumber \begin{equation} \begin{split} a\frac{\partial }{\partial x} + b\frac{\partial}{\partial y} \end{split} \end{equation}\]

in a given partial differential equation, and we’d like to see how the coefficients, $\boldsymbol{A}$ ($A_1 = a, A_2=b$) transform under an arbitrary coordinate transformation. That is, we’d like to find some $\boldsymbol{A}^\prime$ ($A^\prime_1 = a^\prime, A^\prime_2=b^\prime$) such that

\[\nonumber \begin{equation} \begin{split} a\frac{\partial }{\partial x} + b\frac{\partial}{\partial y} = a^\prime \frac{\partial }{\partial x^\prime} + b^\prime \frac{\partial}{\partial y^\prime} \end{split} \end{equation}\]

The previous transformation rule might seem to suggest that

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}&= [\boldsymbol{J}\boldsymbol{f}]^\top\boldsymbol{A}^\prime \end{split} \end{equation}\]

but that’s not quite right. Although the partial derivative operators themselves act like components of a rank 1 tensor, the expression:

\[\nonumber \begin{equation} \begin{split} a\frac{\partial }{\partial x} + b\frac{\partial}{\partial y} \end{split} \end{equation}\]

is a scalar, which suggests that we could find the proper transformation rule by taking contractions of the tensor in the old and new coordinates. To see why, note that

So, the coefficients of the partial derivatives are just the row vector that left multiplies the vector of partial derivatives. Specifically, if we just take the transformation rule of Eq. \ref{eq:example-transformation-rule}, and left multiply it by a row vector, we will get our desired transformation rule:

\[\nonumber \begin{equation} \begin{split} \begin{pmatrix} a & b \end{pmatrix}\begin{pmatrix} \frac{\partial }{\partial x}\\ \frac{\partial}{\partial y} \end{pmatrix}&=\begin{pmatrix} a & b \end{pmatrix}\begin{pmatrix} \frac{\partial x^\prime}{\partial x} & \frac{\partial y^\prime}{\partial x}\\ \frac{\partial x^\prime}{\partial y} & \frac{\partial y^\prime}{\partial y} \end{pmatrix}\begin{pmatrix} \frac{\partial}{\partial x^\prime}\\ \frac{\partial}{\partial y^\prime} \end{pmatrix} \end{split} \end{equation}\]

which seems to suggest that

\[\nonumber \begin{equation} \begin{split} \begin{pmatrix} a^\prime & b^\prime \end{pmatrix}&=\begin{pmatrix} a & b \end{pmatrix}\begin{pmatrix} \frac{\partial x^\prime}{\partial x} & \frac{\partial y^\prime}{\partial x}\\ \frac{\partial x^\prime}{\partial y} & \frac{\partial y^\prime}{\partial y} \end{pmatrix}\\ \implies (\boldsymbol{A}^\prime)^\top &= \boldsymbol{A}^\top [\boldsymbol{J}\boldsymbol{f}]^\top\\ \implies \boldsymbol{A}^\prime &= [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}\hspace{5mm}\text{By taking the transpose of both sides} \end{split} \end{equation}\]

So, interestingly, we’ve found that, while the partial derivative operators transform with the transpose of the Jacobian, the vector of their coefficients just transform with the Jacobian. This provides a convenient way to evaluate the coefficients of partial derivatives under an arbitrary change of coordinates (“convenient” in the sense that you can make a symbolic math tool like Mathematica do all of the work for you). That’s cool, but what’s the analog for the second derivative? The above formula for the first derivative is neat, but doesn’t save us much time, and doesn’t really give us much theoretical insight. However, such a transformation rule for the coefficients of the second order partials could tell us exactly what transformation $\boldsymbol{f}$ would eliminate certain second order partial derivatives, thus transforming a given partial differential equation into its standard form.

Transformation Rule for the Coefficients of Second Order Partials

We use the same idea as before to find a transformation rule for the coefficients of second order partial derivatives: taking contractions of the transformation rule for the partial derivative operators. The second transformation rule for second order partial derivatives (this time with the old coordinates expressed in terms of the new coordinates) is given by

\[\nonumber \begin{equation} \begin{split} \partial_{x^{ \mu}}\partial_{x^{ \nu}}&=\frac{\partial^2x^{\prime \sigma}}{\partial x^{ \mu}\partial x^{ \nu}}\partial_{x^{\prime \sigma}} + \frac{\partial x^{\prime \rho}}{\partial x^{\mu }}\frac{\partial x^{\prime \sigma}}{\partial x^{ \nu}}\partial_{x^{\prime \rho}}\partial_{x^{\prime \sigma}}\nonumber \end{split} \end{equation}\]

The next challenge is writing this transformation rule in matrix form. If we let these second derivatives act on an arbitrary scalar function $F:\hspace{1mm}\mathbb{R}^n\mapsto\mathbb{R}$, we have

\[\nonumber \begin{equation} \begin{split} \partial_{x^{ \mu}}\partial_{x^{ \nu}}{F}&=\frac{\partial^2x^{\prime \sigma}}{\partial x^{ \mu}\partial x^{ \nu}}\partial_{x^{\prime \sigma}}{F} + \frac{\partial x^{\prime \rho}}{\partial x^{\mu }}\frac{\partial x^{\prime \sigma}}{\partial x^{ \nu}}\partial_{x^{\prime \rho}}\partial_{x^{\prime \sigma}}{F}\nonumber \end{split} \end{equation}\]

which may be rewritten in terms of the Hessian of $F$

\[\nonumber \begin{equation} \begin{split} [\boldsymbol{H}({F})]_{\mu\nu}&= [\boldsymbol{H}(f^\sigma)]_{\mu \nu}\partial_{x^{\prime \sigma}}{F} + [\boldsymbol{J}\boldsymbol{f}]^\rho\hspace{0.1mm}_\mu[\boldsymbol{J}\boldsymbol{f}]^\sigma\hspace{0.1mm}_\nu\partial_{x^{\prime \rho}}\partial_{x^{\prime \sigma}}{F}\nonumber\\ &=[\boldsymbol{H}(f^\sigma)]_{\mu \nu}[\boldsymbol{J}^{\prime}F]_\sigma + [\boldsymbol{J}\boldsymbol{f}]^\rho\hspace{0.1mm}_\mu[\boldsymbol{H}^{\prime}(F)]_{\rho \sigma}[\boldsymbol{J}\boldsymbol{f}]^\sigma\hspace{0.1mm}_\nu\nonumber\\ &=[\boldsymbol{H}(\boldsymbol{f})]_{\mu \nu}\hspace{0.1mm}^\sigma[\boldsymbol{J}^{\prime}F]_\sigma + \left([\boldsymbol{J}\boldsymbol{f}]^\top\right)_\mu\hspace{0.1mm}^\rho [\boldsymbol{H}^{\prime}(F)]_{\rho \sigma}[\boldsymbol{J}\boldsymbol{f}]^\sigma\hspace{0.1mm}_\nu \end{split} \end{equation}\]

or, in matrix form

\[\begin{equation} \label{eq:matrix-transformation-rule} \begin{split} [\boldsymbol{H}({F})]&=[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F] +[\boldsymbol{J}\boldsymbol{f}]^\top [\boldsymbol{H}^{\prime}(F)][\boldsymbol{J}\boldsymbol{f}] \end{split} \end{equation}\]

where the ``$\prime$’’ denote derivatives with respect to the primed coordinates, and we note that the object $\boldsymbol{H}(\boldsymbol{f})$ is a rank three tensor. If that makes you uncomfortable, feel free to think of it as a matrix of partial derivatives of vectors (i.e. $\boldsymbol{f}$). Whenever we compute the matrix vector product with the Jacobian of $F$, a vector, we think of computing the dot product of the jacobian with each of the individual vectors in the matrix (and that’s essentially what the transformation rule says, in this case). This makes clear the fact that, if the Hessian of the coordinate transformation is zero, then the partial derivatives just transform with the Jacobian, and they don’t involve any first order derivatives.

This is great, now all we need to do is form the correct contraction of $\boldsymbol{H}(F)$ to produce the scalar (for example, for the $n=2$, case)

\[\nonumber \begin{equation} \begin{split} a \partial_{xx}F + b\partial_{xy}F + c\partial_{yx}F + d\partial_{yy}F \end{split} \end{equation}\]

after some experimentation, I found that

\[\nonumber \begin{equation} \begin{split} \text{Tr}[\boldsymbol{A}^\top [\boldsymbol{H}(F)]]&=\text{Tr}\left[\begin{pmatrix} a & c\\ b & d \end{pmatrix}\begin{pmatrix} \partial_{xx} & \partial_{xy}\\ \partial_{yx} & \partial_{yy} \end{pmatrix}\right]\\ &=\text{Tr}\left[\begin{pmatrix} a\partial_{xx} + c\partial_{yx} & a\partial_{xy} + c\partial_{yy}\\ b\partial_{xx} + d\partial_{yx} & b\partial_{xy} + d\partial_{yy} \end{pmatrix}\right]\\ &=a\partial_{xx}F + c\partial_{yx}F + b\partial_{xy}F + d\partial_{yy}F \end{split} \end{equation}\]

and this generalizes for arbitrary $n$ (not just $n=2$)

\[\nonumber \begin{equation} \begin{split} \text{Tr}[\boldsymbol{A}^\top [\boldsymbol{H}(F)]]&=(\boldsymbol{A}^\top [\boldsymbol{H}(F)])_{\mu\mu}\\ &=A^\top_\mu\hspace{0.1mm}^\rho[\boldsymbol{H}(F)]_{\rho\mu}\\ &=A^\top_\mu\hspace{0.1mm}^\rho\partial_{x^\rho x^\mu}F\\ &=A_\rho\hspace{0.1mm}^\mu\partial_{x^\rho x^\mu}F \end{split} \end{equation}\]

which is the desired relation (just a sum of arbitrary coefficients multiplied by all possible second derivatives), so long as the $A_\mu\hspace{0.1mm}^\rho$ is the desired coefficient of the derivative $\partial_{x^\rho x^\mu}F$. So, if we run across an expression, in general, of the form

\[\nonumber \begin{equation} \begin{split} \text{Tr}[\boldsymbol{A}^\top [\boldsymbol{H}(F)]] \end{split} \end{equation}\]

we know that the matrix $\boldsymbol{A}$ is the matrix of coefficients of the second order partial derivatives. Now, we can apply this to the second order transformation rule in Eq. \ref{eq:matrix-transformation-rule} to extract the corresponding transformation rule for the coefficients $\boldsymbol{A}$

Now, we note the following two useful identities for the Trace

\[\begin{equation} \label{eq:sum-rule} \begin{split} \text{Tr}[\boldsymbol{A} + \boldsymbol{B}]&= \text{Tr}[\boldsymbol{A}] + \text{Tr}[\boldsymbol{B}] \end{split} \end{equation}\] \[\begin{equation} \label{eq:product-rule} \begin{split} \text{Tr}[\boldsymbol{A} + \boldsymbol{B}]&= \text{Tr}[\boldsymbol{A}] + \text{Tr}[\boldsymbol{B}] \end{split} \end{equation}\]

\[\nonumber \begin{equation} \begin{split} \text{Tr}[\boldsymbol{A}^\top[\boldsymbol{H}({F})]]&=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right] +\text{Tr}\left[\left(\boldsymbol{A}^\top[\boldsymbol{J}\boldsymbol{f}]^\top\right) [\boldsymbol{H}^{\prime}(F)][\boldsymbol{J}\boldsymbol{f}]\right]\hspace{5mm}\text{By Eq. \ref{eq:sum-rule}}\\ &=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right] +\text{Tr}\left[[\boldsymbol{H}^{\prime}(F)]\left([\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}^\top[\boldsymbol{J}\boldsymbol{f}]^\top\right)\right]\hspace{5mm}\text{By Eq. \ref{eq:product-rule}}\\ &=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right] +\text{Tr}\left[\left([\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}^\top[\boldsymbol{J}\boldsymbol{f}]^\top\right)[\boldsymbol{H}^{\prime}(F)]\right]\hspace{5mm}\text{By Eq. \ref{eq:product-rule}}\\ &=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right] +\text{Tr}\left[\boldsymbol{A}^{\prime \top}[\boldsymbol{H}^{\prime}(F)]\right] \end{split} \end{equation}\]

which implies that the transformation rule for the coefficients of the second order partials is

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}^{\prime\top} &= [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}^\top[\boldsymbol{J}\boldsymbol{f}]^\top\\ &=[\boldsymbol{J}\boldsymbol{f}]\left([\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}\right)^\top\\ \implies \boldsymbol{A}^\prime &= [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}[\boldsymbol{J}\boldsymbol{f}]^\top \end{split} \end{equation}\]

which is to be contrasted with the transformation rule for the second order partial derivative operators (assuming $\boldsymbol{H}(\boldsymbol{f})=\boldsymbol{0}$)

\[\nonumber \begin{equation} \begin{split} [\boldsymbol{H}(F)]&= [\boldsymbol{J}\boldsymbol{f}]^\top[\boldsymbol{H}^\prime(F)][\boldsymbol{J}\boldsymbol{f}] \end{split} \end{equation}\]

which suggests that the rule, in general, might be to take the transformation rule for the derivative operators, replace primed operators with unprimed operators (and vice versa), and take the transpose of each of the transformation matrices on the righthand side. Now, we can also write an expression for the vector of coefficients of the first order derivatives “spawned” by a general change of coordinates ($\boldsymbol{B}^\prime$) for the second order partials. So, we write out

\[\nonumber \begin{equation} \begin{split} \text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right] &= \left( \boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})][\boldsymbol{J}^{\prime}F]\right)_{\mu\mu}\\ &=A^\top_{\mu}\hspace{0.1mm}^\rho\frac{\partial^2x^{\prime \sigma}}{\partial x^\rho \partial x^\mu} \partial_{x^{\prime \sigma}}\\ &=B^{\prime\sigma}\partial_{x^{\prime \sigma}} \end{split} \end{equation}\]

which seems to suggest that the coefficients of the first order derivatives “spawned” by the transformation are

\[\nonumber \begin{equation} \begin{split} B^{\prime}&=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(\boldsymbol{f})]\right]\\ B^{\prime\mu}&=\text{Tr}\left[\boldsymbol{A}^\top[\boldsymbol{H}(f^\mu)]\right] \end{split} \end{equation}\]

so the $\mu$th component of $\boldsymbol{B}^\prime$ (corresponding to the coefficient of the derivative with respect to the $\mu$th primed variable) is given by multiplying $\boldsymbol{A}^\top$ by the Hessian matrix of the $\mu$th primed variable (i.e. the $\mu$th component of the transformation $\boldsymbol{f}$).

An Example

Consider the transformation $\boldsymbol{f}$ defined by

\[\nonumber \begin{equation} \begin{split} \boldsymbol{f}(x,y) &= \begin{pmatrix} x^2 + y^2\\ x^2 - y^2 \end{pmatrix}\equiv \begin{pmatrix} \xi\\ \eta \end{pmatrix} \end{split} \end{equation}\]

using the handy transformation rule, all we have to compute is the Jacobian of $\boldsymbol{f}$ to find the coefficients of the second order derivatives $\boldsymbol{A}^\prime$, and the Hessian of $\boldsymbol{f}$ to find the coefficients of the first order derivatives $\boldsymbol{B}^\prime$. We have

\[\nonumber \begin{equation} \begin{split} [\boldsymbol{J}\boldsymbol{f}]&= \begin{pmatrix} 2x & 2y\\ 2x & -2y \end{pmatrix}\\ \implies [\boldsymbol{H}(f^\xi)]&= \begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}\\ [\boldsymbol{H}(f^\xi)]&= \begin{pmatrix} 2 & 0\\ 0 & -2 \end{pmatrix}\\ \end{split} \end{equation}\]

Now, performing the matrix multiplication, we get

\[\begin{equation} \label{eq:A-prime} \begin{split} \boldsymbol{A}^\prime &= [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}[\boldsymbol{J}\boldsymbol{f}]^\top\\ &= \begin{pmatrix} 2x & 2y\\ 2x & -2y \end{pmatrix}\begin{pmatrix} a & b\\ c & d \end{pmatrix}\begin{pmatrix} 2x & 2x\\ 2y & -2y \end{pmatrix}\\ &=\begin{pmatrix} 2x & 2y\\ 2x & -2y \end{pmatrix}\begin{pmatrix} 2(ax+by) & 2(ax-by)\\ 2(cx+dy) & 2(cx-dy) \end{pmatrix}\\ &=\begin{pmatrix} 4(ax^2 + (b+c)xy + dy^2) & 4(ax^2 + (c-b)xy -dy^2)\\ 4(ax^2 + (b-c)xy -dy^2) & 4(ax^2 - (b+c)xy + dy^2) \end{pmatrix} \end{split} \end{equation}\]

and

\[\begin{equation} \label{eq:B-prime} \begin{split} \boldsymbol{B}^\prime &= \text{Tr}\left[\boldsymbol{A}^\top [\boldsymbol{H}(\boldsymbol{f})]\right]\\ \implies B^{\prime \xi} &= \text{Tr}\left[\begin{pmatrix} a & c\\ b & d \end{pmatrix}\begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}\right]\\ &= \text{Tr}\left[\begin{pmatrix} 2a & c\\ b & 2d \end{pmatrix}\right]\\ &= 2(a+d)\\ B^{\prime \eta} &= \text{Tr}\left[\boldsymbol{A}^\top [\boldsymbol{H}(\boldsymbol{f})]\right]\\ \implies B^{\prime \xi} &= \text{Tr}\left[\begin{pmatrix} a & c\\ b & d \end{pmatrix}\begin{pmatrix} 2 & 0\\ 0 & -2 \end{pmatrix}\right]\\ &= \text{Tr}\left[\begin{pmatrix} 2a & c\\ b & -2d \end{pmatrix}\right]\\ &= 2(a-d)\\ \boldsymbol{B}^\prime &= \begin{pmatrix} 2(a+d)\\ 2(a-d) \end{pmatrix} \end{split} \end{equation}\]

Now, for a sanity check, and to demonstrate the convenience of the matrix transformation rule, we do the same calculation term by term using the second order chain rule

\[\nonumber \begin{equation} \begin{split} a \partial_{xx} &= a\left[\xi_{xx}\partial_\xi + \eta_{xx}\partial_\eta + (\xi_x)^2\partial_{\xi\xi} + (\eta_x)^2\partial_{\eta\eta} + \xi_x\eta_x \partial_{\xi\eta} + \eta_x\xi_x\partial_{\eta\xi}\right]\\ &=\boxed{a\left[2 \partial_\xi + 2\partial \eta + 4x^2\partial_{\xi\xi} + 4x^2\partial_{\eta\eta} + (4x^2)\partial_{\xi\eta} + (4x^2)\partial_{\eta\xi}\right]}\\ d\partial_{yy}&=d\left[\xi_{yy}\partial_\xi + \eta_{yy}\partial_\eta + (\xi_y)^2\partial_{\xi\xi} + (\eta_x)^2\partial_{\eta\eta} + \xi_y\eta_y \partial_{\xi\eta} + \eta_y\xi_y\partial_{\eta\xi}\right]\\ &=\boxed{d\left[2\partial_\xi -2 \partial_\eta + 4y^2 \partial_{\xi\xi} + 4y^2\partial_{\eta\eta} - 4y^2\partial_{\xi\eta} - 4y^2 \partial_{\eta\xi} \right]}\\ b\partial_{xy} &= b\left[ \xi_{xy}\partial_\xi + \eta_{xy}\partial_\eta + \xi_x\xi_y \partial_{\xi\xi} + \eta_x\eta_y \partial_{\eta\eta} + \xi_x\eta_y\partial_{\xi\eta} + \eta_x\xi_y\partial_{\eta\xi} \right]\\ &= \boxed{b\left[ 4xy \partial_{\xi\xi} - 4xy\partial_{\eta\eta} - 4xy\partial_{\xi\eta} + 4xy\partial_{\eta\xi} \right]}\\ c\partial_{yx} &= c\left[ \xi_{yx}\partial_\xi + \eta_{yx}\partial_\eta + \xi_y\xi_x \partial_{\xi\xi} + \eta_y\eta_x \partial_{\eta\eta} + \xi_y\eta_x\partial_{\xi\eta} + \eta_y\xi_x\partial_{\eta\xi} \right]\\ &=\boxed{c\left[ 4xy\partial_{\xi\xi} -4xy\partial_{\eta\eta} + 4xy\partial_{\xi\eta} - 4xy\partial_{\eta\xi} \right]} \end{split} \end{equation}\]

Now, if we add up $a\partial_{xx} + b\partial_{xy} + c\partial_{yx} + d\partial_{yy}$ and meticulously combine terms with the same partials with respect to $\xi,\eta$, we get

\[\nonumber \begin{equation} \begin{split} a\partial_{xx} + b\partial_{xy} + c\partial_{yx} + d\partial_{yy} &= 4\left(ax^2 + (b+c)xy + dy^2\right)\partial_{\xi\xi}\\ &+ 4\left(ax^2 - (b+c)xy + dy^2\right)\partial_{\eta\eta}\\ &+4\left(ax^2 + (c-b)xy - dy^2\right)\partial_{\xi\eta}\\ &+4\left(ax^2 + (b-c)xy - dy^2\right)\partial_{\eta\xi}\\ &+ 2(a+d)\partial_\xi + 2(a-d)\partial_\eta \end{split} \end{equation}\]

which, on comparison with Eqs. \ref{eq:A-prime}, \ref{eq:B-prime} we see the transformed coefficients are exactly the same, as desired.

Transforming a Second Order Equation into Standard Form

Consider the following second order partial differential equation

\[\nonumber \begin{equation} \begin{split} 3u_{xx} - 2u_{xy} + 2u_{yy} - 2u_{yz} + 3u_{zz} + 5u_y - u_x + 10u=0 \end{split} \end{equation}\]

For this PDE, we have

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A} &= \begin{pmatrix} 3 & -1 & 0\\ -1 & 2 & -1\\ 0 & -1 & 3 \end{pmatrix} \end{split} \end{equation}\]

Note that we have split $-2u_{xy} = -u_{xy} - u_{yx}$ and $-2u_{yz} = - u_{yz} - u_{zy}$ to ensure that $\boldsymbol{A}$ is symmetric. Now, to transfer the equation into standard form, we need to find a transformation of the variables such that

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}^\prime = [\boldsymbol{J}\boldsymbol{f}]\boldsymbol{A}[\boldsymbol{J}\boldsymbol{f}]^\top \end{split} \end{equation}\]

is diagonal, with each entry $\chi_i=\pm 1, 0$ corresponding to the sign of the eigenvalue of $\boldsymbol{A}$. To transform $\boldsymbol{A}'$ into a diagonal matrix, we must choose $[\boldsymbol{J}\boldsymbol{f}]^\top =\boldsymbol{X}$, where $\boldsymbol{X}$ is the matrix whose columns are the (normalized) eigenvectors of $\boldsymbol{A}$. Further, if we multiply $\boldsymbol{X}$ by the following diagonal matrix

\[\nonumber \begin{equation} \begin{split} D_{ii} &= \begin{cases} \frac{1}{\sqrt{\lvert\lambda_i\rvert}} & \text{if $\lambda_i\neq 0$}\\ 0 & \text{if $\lambda_i =0$} \end{cases} \end{split} \end{equation}\]

we will have the desired form for $\boldsymbol{A}'$. Now, we choose a linear transformation on the coefficients (as it is the only transformation that will keep the constant coefficients of the derivatives)

\[\nonumber \begin{equation} \begin{split} \boldsymbol{f}(\boldsymbol{x}) &= \boldsymbol{U}\boldsymbol{x} \end{split} \end{equation}\]

so we want to choose $\boldsymbol{U}$ such that

\[\nonumber \begin{equation} \begin{split} \boldsymbol{A}'&= \boldsymbol{U} \boldsymbol{A}\boldsymbol{U}^\top\nonumber\\ &=\boldsymbol{\chi} \end{split} \end{equation}\]

which requires that

\[\nonumber \begin{equation} \begin{split} \boldsymbol{U}^\top &= \boldsymbol{X}\boldsymbol{D}\nonumber\\ \boldsymbol{U} &= \boldsymbol{D}\boldsymbol{X}^\top\\ \implies (\boldsymbol{X}\boldsymbol{D})^\top \boldsymbol{A}\left(\boldsymbol{X}\boldsymbol{D}\right)&= \boldsymbol{D}\boldsymbol{X}^T\boldsymbol{A}\boldsymbol{X}\boldsymbol{D}\\ &=\boldsymbol{D}\boldsymbol{\Lambda}\boldsymbol{D}\\ &=\boldsymbol{\chi} \end{split} \end{equation}\]

Mathematica gives the three eigenvectors

\[\nonumber \begin{equation} \begin{split} \boldsymbol{v}_1 &= \frac{1}{\sqrt{3}}\begin{pmatrix} 1\\ -1\\ 1 \end{pmatrix}\nonumber\\ \boldsymbol{v}_2&=\frac{1}{\sqrt{2}}\begin{pmatrix} -1\\ 0\\ 1 \end{pmatrix}\\ \boldsymbol{v}_3 &= \frac{1}{\sqrt{6}}\begin{pmatrix} 1\\ 2\\ 1 \end{pmatrix} \end{split} \end{equation}\]

with eigenvalues $\lambda_1=4,\lambda_2=3,\lambda_3=1$, which immediately implies that this PDE is elliptic. Now, we have the following equation for $\boldsymbol{U}$

\[\nonumber \begin{equation} \begin{split} \boldsymbol{U}&= \begin{pmatrix} \frac{1}{2}& 0 & 0\\ 0 & \frac{1}{\sqrt{3}} & 0\\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}}\\ -\frac{1}{\sqrt{3}} & 0 & \frac{2}{\sqrt{6}}\\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \end{pmatrix}^\top\nonumber\\ &=\begin{pmatrix} \frac{1}{2 \sqrt{3}} & -\frac{1}{2 \sqrt{3}} & \frac{1}{2 \sqrt{3}} \\ -\frac{1}{\sqrt{6}} & 0 & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} & \sqrt{\frac{2}{3}} & \frac{1}{\sqrt{6}} \\ \end{pmatrix} \end{split} \end{equation}\]

so, under this change of coordinates, the second order term becomes $u_{\xi\xi} + u_{\eta\eta} + u_{\zeta\zeta}$, and all that remains is to compute the transformation of the lower order terms. Note that since the Hessian of this transformation, $\boldsymbol{H}(\boldsymbol{f})$, is zero, the second order derivatives don’t spawn any new first order derivatives, and all that’s left is to determine how the original coefficients of the first order terms transform under $\boldsymbol{f}$. Recall that they transform with the Jacobian matrix, and so

\[\nonumber \begin{equation} \begin{split} \boldsymbol{B}'&=[\boldsymbol{J}\boldsymbol{f}]\boldsymbol{B}\nonumber\\ &=\begin{pmatrix} \frac{1}{2 \sqrt{3}} & -\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ -\frac{1}{2 \sqrt{3}} & 0 & \sqrt{\frac{2}{3}} \\ \frac{1}{2 \sqrt{3}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ \end{pmatrix}\begin{pmatrix} -1\\ 5\\ 0 \end{pmatrix}\\ &=\begin{pmatrix} -\sqrt{3} \\ \frac{1}{\sqrt{6}} \\ 3 \sqrt{\frac{3}{2}} \\ \end{pmatrix} \end{split} \end{equation}\]

so, the standard form of the PDE is

\[\nonumber \begin{equation} \begin{split} \boxed{u_{\xi\xi} + u_{\eta\eta} + u_{\zeta\zeta}=\sqrt{3}u_{\xi} -\frac{1}{\sqrt{6}}u_{\eta} - 3\sqrt{\frac{3}{2}}u_\zeta + 10u} \end{split} \end{equation}\]

Acknowledgements

I couldn’t find the transformation rules for the coefficients of the second order partials anywhere, so the work above is truly my own, but the idea of using tensor calculus and the corresponding transformation rule for the derivative operators came from Skorski’s nice paper on the arxiv (Skorski, 2019).

References

Skorski, M. (2019). Chain rules for Hessian and higher derivatives made easy by tensor calculus. ArXiv Preprint ArXiv:1911.13292.

Comments

If you're logged in, you may write a comment here. (markdown formatting is supported)

No comments yet.