Gravity is geometry. To fully understand this statement, we will need more sophisticated tools and language to describe curved space and, ultimately, curved spacetime. This is the mathematical subject of differential geometry and will be introduced in this section and the next. Armed with these new tools, we will then return to the subject of gravity in Section 4.
Our discussion of differential geometry is not particularly rigorous. We will not prove many big theorems. Furthermore, a number of the statements that we make can be checked straightforwardly but we will often omit this. We will, however, be careful about building up the mathematical structure of curved spaces in the right logical order. As we proceed, we will come across a number of mathematical objects that can live on curved spaces. Many of these are familiar – like vectors, or differential operators – but we’ll see them appear in somewhat unfamiliar guises. The main purpose of this section is to understand what kind of objects can live on curved spaces, and the relationships between them. This will prove useful for both general relativity and other areas of physics.
Moreover, there is a wonderful rigidity to the language of differential geometry. It sometimes feels that any equation that you’re allowed to write down within this rigid structure is more likely than not to be true! This rigidity is going to be of enormous help when we return to discuss theories of gravity in Section 4.
The stage on which our story will play out is a mathematical object called a manifold. We will give a precise definition below, but for now you should think of a manifold as a curved, $n$-dimensional space. If you zoom in to any patch, the manifold looks like ${\mathbf{R}}^{n}$. But, viewed more globally, the manifold may have interesting curvature or topology.
To begin with, our manifold will have very little structure. For example, initially there will be no way to measure distances between points. But as we proceed, we will describe the various kinds of mathematical objects that can be associated to a manifold, and each one will allow us to do more and more things. It will be a surprisingly long time before we can measure distances between points! (Not until Section 3.)
You have met many manifolds in your education to date, even if you didn’t call them by name. Some simple examples in mathematics include Euclidean space ${\mathbf{R}}^{n}$, the sphere ${\mathbf{S}}^{n}$, and the torus ${\mathbf{T}}^{n}={\mathbf{S}}^{1}\times \mathrm{\dots}\times {\mathbf{S}}^{1}$. Some simple examples in physics include the configuration space and phase space that we use in classical mechanics and the state space of thermodynamics. As we progress, we will see how familiar ideas in these subjects can be expressed in a more formal language. Ultimately our goal is to explain how spacetime is a manifold and to understand the structures that live on it.
Even before we get to a manifold, there is some work to do in order to define the underlying object. What follows is the mathematical equivalent of reading a biography about an interesting person and having to spend the first 20 pages wading through a description of what their grandparents did for a living. This backstory will not be particularly useful for our needs and we include it here only for completeness. We’ll keep it down to one page.
Our backstory is called a topological space. Roughly speaking, this is a space in which each point can be viewed as living in a neighbourhood of other points, in a manner that allows us to define concepts such as continuity and convergence.
Definition: A topological space $M$ is a set of points, endowed with a topology $\mathcal{T}$. This is a collection of open subsets $\{{\mathcal{O}}_{\alpha}\subset M\}$ which obey:
Both the set $M$ and the empty set $\mathrm{\varnothing}$ are open subsets: $M\in \mathcal{T}$ and $\mathrm{\varnothing}\in \mathcal{T}$.
The intersection of a finite number of open sets is also an open set. So if ${\mathcal{O}}_{1}\in \mathcal{T}$ and ${\mathcal{O}}_{2}\in \mathcal{T}$ then ${\mathcal{O}}_{1}\cap {\mathcal{O}}_{2}\in \mathcal{T}$.
The union of any number (possibly infinite) of open sets is also an open set. So if ${\mathcal{O}}_{\gamma}\in \mathcal{T}$ then ${\cup}_{\gamma}{\mathcal{O}}_{\gamma}\in \mathcal{T}$.
Given a point $p\in M$, we say that $\mathcal{O}\in \mathcal{T}$ is a neighbourhood of $p$ if $p\in \mathcal{O}$. This concept leads us to our final requirement: we require that, given any two distinct points, there is a neighbourhood which contains one but not the other. In other words, for any $p,q\in M$ with $p\ne q$, there exists ${\mathcal{O}}_{1},{\mathcal{O}}_{2}\in \mathcal{T}$ such that $p\in {\mathcal{O}}_{1}$ and $q\in {\mathcal{O}}_{2}$ and ${\mathcal{O}}_{1}\cap {\mathcal{O}}_{2}=\mathrm{\varnothing}$. Topological spaces which obey this criterion are called Hausdorff. It is like a magic ward to protect us against bad things happening.
An example of a good Hausdorff space is the real line, $M=\mathbf{R}$, with $\mathcal{T}$ consisting of all open intervals $(a,b)$, with $$, and their unions. An example of a non-Hausdorff space is any $M$ with $\mathcal{T}=\{M,\mathrm{\varnothing}\}$.
Definition: One further definition (it won’t be our last). A homeomorphism between topological spaces $(M,\mathcal{T})$ and $(\stackrel{~}{M},\stackrel{~}{\mathcal{T}})$ is a map $f:M\to \stackrel{~}{M}$ which is
Injective (or one-to-one): for $p\ne q$, $f(p)\ne f(q)$.
Surjective (or onto): $f(M)=\stackrel{~}{M}$, which means that for each $\stackrel{~}{p}\in \stackrel{~}{M}$ there exists a $p\in M$ such that $f(p)=\stackrel{~}{p}$.
Functions which are both injective and surjective are said to be bijective. This ensures that they have an inverse
Bicontinuous. This means that both the function and its inverse are continuous. To define a notion of continuity, we need to use the topology. We say that $f$ is continuous if, for all $\stackrel{~}{\mathcal{O}}\in \stackrel{~}{\mathcal{T}}$, ${f}^{-1}(\stackrel{~}{\mathcal{O}})\in \mathcal{T}$.
There’s an animation of a donut morphing into a coffee mug and back that is often used to illustrate the idea of topology. If you want to be fancy, you can say that a donut is homeomorphic to a coffee mug.
We now come to our main character: an $n$-dimensional manifold is a space which, locally, looks like ${\mathbf{R}}^{n}$. Globally, the manifold may be more interesting than ${\mathbf{R}}^{n}$, but the idea is that we can patch together these local descriptions to get an understanding for the entire space.
Definition: An $n$-dimensional differentiable manifold is a Hausdorff topological space $M$ such that
$M$ is locally homeomorphic to ${\mathbf{R}}^{n}$. This means that for each $p\in M$, there is an open set $\mathcal{O}$ such that $p\in \mathcal{O}$ and a homeomorphism $\varphi :\mathcal{O}\to U$ with $U$ an open subset of ${\mathbf{R}}^{n}$.
Take two open subsets ${\mathcal{O}}_{\alpha}$ and ${\mathcal{O}}_{\beta}$ that overlap, so that ${\mathcal{O}}_{\alpha}\cap {\mathcal{O}}_{\beta}\ne \mathrm{\varnothing}$. We require that the corresponding maps ${\varphi}_{\alpha}:{\mathcal{O}}_{\alpha}\to {U}_{\alpha}$ and ${\varphi}_{\beta}:{\mathcal{O}}_{\beta}\to {U}_{\beta}$ are compatible, meaning that the map ${\varphi}_{\beta}\circ {\varphi}_{\alpha}^{-1}:{\varphi}_{\alpha}({\mathcal{O}}_{\alpha}\cap {\mathcal{O}}_{\beta})\to {\varphi}_{\beta}({\mathcal{O}}_{\alpha}\cap {\mathcal{O}}_{\beta})$ is smooth (also known as infinitely differentiable or ${C}^{\mathrm{\infty}}$), as is its inverse. This is depicted in Figure 14.
The maps ${\varphi}_{\alpha}$ are called charts and the collection of charts is called an atlas. You should think of each chart as providing a coordinate system to label the region ${\mathcal{O}}_{\alpha}$ of $M$. The coordinate associated to $p\in {\mathcal{O}}_{\alpha}$ is
${\varphi}_{\alpha}(p)=({x}^{1}(p),\mathrm{\dots},{x}^{n}(p))$ |
We write the coordinate in shorthand as simply ${x}^{\mu}(p)$, with $\mu =1,\mathrm{\dots},n$. Note that we use a superscript $\mu $ rather than a subscript: this simple choice of notation will prove useful as we go along.
If a point $p$ is a member of more than one subset $\mathcal{O}$ then it may have a number of different coordinates associated to it. There’s nothing to be nervous about here: it’s entirely analogous to labelling a point using either Euclidean coordinate or polar coordinates.
The maps ${\varphi}_{\beta}\circ {\varphi}_{\alpha}^{-1}$ take us between different coordinate systems and are called transition functions. The compatibility condition is there to ensure that there is no inconsistency between these different coordinate systems.
Any manifold $M$ admits many different atlases. In particular, nothing stops us from adding another chart to the atlas, provided that it is compatible with all the others. Two atlases are said to be compatible if every chart in one is compatible with every chart in the other. In this case, we say that the two atlases define the same differentiable structure on the manifold.
Here are a few simple examples of differentiable manifolds:
${\mathbf{R}}^{n}$: this looks locally like ${\mathbf{R}}^{n}$ because it is ${\mathbf{R}}^{n}$. You only need a single chart with the usual Euclidean coordinates. Similarly, any open subset of ${\mathbf{R}}^{n}$ is a manifold.
${\mathbf{S}}^{1}$: The circle can be defined as a curve in ${\mathbf{R}}^{2}$ with coordinates $(\mathrm{cos}\theta ,\mathrm{sin}\theta )$. Until now in our physics careers, we’ve been perfectly happy taking $\theta \in [0,2\pi )$ as the coordinate on ${\mathbf{S}}^{1}$. But this coordinate does not meet our requirements to be a chart because it is not an open set. This causes problems if we want to differentiate functions at $\theta =0$; to do so we need to take limits from both sides but there is no coordinate with $\theta $ a little less than zero.
To circumvent this, we need to use at least two charts to cover ${\mathbf{S}}^{1}$. For example, we could pick out two antipodal points, say $q=(1,0)$ and ${q}^{\prime}=(-1,0)$. We take the first chart to cover ${\mathcal{O}}_{1}={\mathbf{S}}^{1}-\{q\}$ with the map ${\varphi}_{1}:{\mathcal{O}}_{1}\to (0,2\pi )$ defined by ${\varphi}_{1}(p)=\theta $ as shown in the left-hand of Figure 15. We take the second chart to cover ${\mathcal{O}}_{2}={\mathbf{S}}^{1}-\{{q}^{\prime}\}$ with the map ${\varphi}_{2}:{\mathcal{O}}_{2}\to (-\pi ,\pi )$ defined by ${\varphi}_{2}(p)={\theta}^{\prime}$ as shown in the right-hand figure.
The two charts overlap on the upper and lower semicircles. The transition function is given by
${\theta}^{\prime}={\varphi}_{2}({\varphi}_{1}^{-1}(\theta ))=\{\begin{array}{cc}\hfill \theta \hfill & \hfill \mathrm{if}\theta \in (0,\pi )\hfill \\ \hfill \theta -2\pi \hfill & \hfill \mathrm{if}\theta \in (\pi ,2\pi )\hfill \end{array}$ |
The transition function isn’t defined at $\theta =0$, corresponding to the point ${q}_{1}$, nor at $\theta =\pi $, corresponding to the point ${q}_{2}$. Nonetheless, it is smooth on each of the two open intervals as required.
${\mathbf{S}}^{2}$: It will be useful to think of the sphere as the surface ${x}^{2}+{y}^{2}+{z}^{2}=1$ embedded in Euclidean ${\mathbf{R}}^{3}$. The familiar coordinates on the sphere ${\mathbf{S}}^{2}$ are those inherited from spherical polar coordinates of ${\mathbf{R}}^{3}$, namely
$x=\mathrm{sin}\theta \mathrm{cos}\varphi \mathit{\hspace{1em}},y=\mathrm{sin}\theta \mathrm{sin}\varphi \mathit{\hspace{1em}},z=\mathrm{cos}\theta $ | (2.56) |
with $\theta \in [0,\pi ]$ and $\varphi \in [0,2\pi )$. But as with the circle ${\mathbf{S}}^{1}$ described above, these are not open sets so will not do for our purpose. In fact, there are two distinct issues. If we focus on the equator at $\theta =\pi /2$, then the coordinate $\varphi \in [0,2\pi )$ parameterises a circle and suffers the same problem that we saw above. On top of this, at the north pole $\theta =0$ and south pole $\theta =\pi $, the coordinate $\varphi $ is not well defined, since the value of $\theta $ has already specified the point uniquely. This manifests itself on Earth by the fact that all time zones coincide at the North pole. It’s one of the reasons people don’t have business meetings there.
Once again, we can resolve these issues by introducing two charts covering different patches on ${\mathbf{S}}^{2}$. The first chart applies to the sphere ${\mathbf{S}}^{2}$ with a line of longitude removed, defined by $y=0$ and $x>0$, as shown in Figure 16. (Think of this as the dateline.) This means that neither the north nor south pole are included in the open set ${\mathcal{O}}_{1}$. On this open set, we define a map ${\varphi}_{1}:{\mathcal{O}}_{1}\to {\mathbf{R}}^{2}$ using the coordinates (2.56), now with $\theta \in (0,\pi )$ and $\varphi \in (0,2\pi )$, so that we have a map to an open subset of ${\mathbf{R}}^{2}$.
We then define a second chart on a different open set ${\mathcal{O}}_{2}$, defined by ${\mathbf{S}}^{2}$, with the line $z=0$ and $$ removed. Here we define the map ${\varphi}_{2}:{\mathcal{O}}_{2}\to {\mathbf{R}}^{2}$ using the coordinates
$x=-\mathrm{sin}{\theta}^{\prime}\mathrm{cos}{\varphi}^{\prime}\mathit{\hspace{1em}},y=\mathrm{cos}{\theta}^{\prime}\mathit{\hspace{1em}},z=\mathrm{sin}{\theta}^{\prime}\mathrm{sin}{\varphi}^{\prime}$ |
with ${\theta}^{\prime}\in (0,\pi )$ and $\varphi \in (0,2\pi )$. Again this is a map to an open subset of ${\mathbf{R}}^{2}$. We have ${\mathcal{O}}_{1}\cup {\mathcal{O}}_{2}={\mathbf{S}}^{2}$ while, on the overlap ${\mathcal{O}}_{1}\cap {\mathcal{O}}_{2}$, the transition functions ${\varphi}_{1}\circ {\varphi}_{2}^{-1}$ and ${\varphi}_{2}\circ {\varphi}_{1}^{-1}$ are smooth. (We haven’t written these functions down explicitly, but it’s clear that they are built from $\mathrm{cos}$ and $\mathrm{sin}$ functions acting on domains where their inverses exist.)
Note that for both ${\mathbf{S}}^{1}$ and ${\mathbf{S}}^{2}$ examples above, we made use of the fact that they can be viewed as embedded in a higher dimensional ${\mathbf{R}}^{n+1}$ to construct the charts. However, this isn’t necessary. The definition of a manifold makes no mention of a higher dimensional embedding and these manifolds should be viewed as having an existence independent of any embedding.
As you can see, there is a level of pedantry involved in describing these charts. (Mathematicians prefer the word “rigour”.) The need to deal with multiple charts arises only when we have manifolds of non-trivial topology; the manifolds ${\mathbf{S}}^{1}$ and ${\mathbf{S}}^{2}$ that we met above are particularly simple examples. When we come to discuss general relativity, we will care a lot about changing coordinates, and the limitations of certain coordinate systems, but our manifolds will turn out to be simple enough that, for all practical purposes, we can always find a single set of coordinates that tells us what we need to know. However, as we progress in physics, and topology becomes more important, so too does the idea of different charts. Perhaps the first place in physics where overlapping charts become an integral part of the discussion is the construction of a magnetic monopole. (See the lectures on Gauge Theory.)
The advantage of locally mapping a manifold to ${\mathbf{R}}^{n}$ is that we can now import our knowledge of how to do maths on ${\mathbf{R}}^{n}$. For example, we know how to differentiate functions on ${\mathbf{R}}^{n}$, and what it means for functions to be smooth. This now translates directly into properties of functions defined over the manifold.
We say that a function $f:M\to \mathbf{R}$ is smooth, if the map $f\circ {\varphi}^{-1}:U\to \mathbf{R}$ is smooth for all charts $\varphi $.
Similarly, we say that a map $f:M\to N$ between two manifolds $M$ and $N$ (which may have different dimensions) is smooth if the map $\psi \circ f\circ {\varphi}^{-1}:U\to V$ is smooth for all charts $\varphi :M\to U\subset {\mathbf{R}}^{\mathrm{dim}(M)}$ and $\psi :N\to V\subset {\mathbf{R}}^{\mathrm{dim}(N)}$
A diffeomorphism is defined to be a smooth homeomorphism $f:M\to N$. In other words it is an invertible, smooth map between manifolds $M$ and $N$ that has a smooth inverse. If such a diffeomorphism exists then the manifolds $M$ and $N$ are said to be diffeomorphic. The existence of an inverse means $M$ and $N$ necessarily have the same dimension.
Manifolds which are homeomorphic can be continuously deformed into each other. But diffeomorphism is stronger: it requires that the map and its inverse are smooth. This gives rise to some curiosities. For example, it turns out that the sphere ${\mathbf{S}}^{7}$ can be covered by a number of different, incompatible atlases. The resulting manifolds are homeomorphic but not diffeomorphic. These are referred to as exotic spheres. Similarly, Euclidean space ${\mathbf{R}}^{n}$ has a unique differentiable structure, except for ${\mathbf{R}}^{4}$ where there are an infinite number of inequivalent structures. I know of only one application of exotic spheres to physics (a subtle global gravitational anomaly in superstring theory) and I know of no applications of the exotic differential structure on ${\mathbf{R}}^{4}$. Certainly these will not play any role in these lectures.
Our next task is to understand how to do calculus on manifolds. We start here with differentiation; it will take us a while longer to get to integration, which we will finally meet in Section 2.4.4.
Consider a function $f:M\to \mathbf{R}$. To differentiate the function at some point $p$, we introduce a chart $\varphi =({x}^{1},\mathrm{\dots},{x}^{n})$ in a neighbourhood of $p$. We can then construct the map $f\circ {\varphi}^{-1}:U\to \mathbf{R}$ with $U\subset {\mathbf{R}}^{n}$. But we know how to differentiate functions on ${\mathbf{R}}^{n}$ and this gives us a way to differentiate functions on $M$, namely
${{\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}|}_{p}:={{\displaystyle \frac{\partial (f\circ {\varphi}^{-1})}{\partial {x}^{\mu}}}|}_{\varphi (p)}$ | (2.57) |
Clearly this depends on the choice of chart $\varphi $ and coordinates ${x}^{\mu}$. We would like to give a coordinate independent definition of differentiation, and then understand what happens when we choose to describe this object using different coordinates.
We will consider smooth functions over a manifold $M$. We denote the set of all smooth functions as ${C}^{\mathrm{\infty}}(M)$.
Definition: A tangent vector ${X}_{p}$ is an object that differentiates functions at a point $p\in M$. Specifically, ${X}_{p}:{C}^{\mathrm{\infty}}(M)\to \mathbf{R}$ satisfying
Linearity: ${X}_{p}(f+g)={X}_{p}(f)+{X}_{p}(g)$ for all $f,g\in {C}^{\mathrm{\infty}}(M)$.
${X}_{p}(f)=0$ when $f$ is the constant function.
Leibnizarity: ${X}_{p}(fg)=f(p){X}_{p}(g)+{X}_{p}(f)g(p)$ for all $f,g\in {C}^{\mathrm{\infty}}(M)$. This, of course, is the product rule.
Note that ii) and iii) combine to tell us that ${X}_{p}(af)=a{X}_{p}(f)$ for $a\in \mathbf{R}$.
This definition is one of the early surprises in differential geometry. The surprise is really in the name “tangent vector”. We know what vectors are from undergraduate physics, and we know what differential operators are. But we’re not used to equating the two. Before we move on, it might be useful to think about how this definition fits with other notions of vectors that we’ve met before.
The first time we meet a vector in physics is usually in the context of Newtonian mechanics, where we describe the position of a particle as a vector $\mathbf{x}$ in ${\mathbf{R}}^{3}$. This concept of a vector is special to flat space and does not generalise to other manifolds. For example, a line connecting two points on a sphere is not a vector and, in general, there is no way to think of a point $p\in M$ as a vector. So we should simply forget that points in ${\mathbf{R}}^{3}$ can be thought of as vectors.
The next type of vector is the velocity of a particle, $\mathbf{v}=\dot{\mathbf{x}}$. This is more pertinent. It clearly involves differentiation and, moreover, is tangent to the curve traced out by the particle. As we will see below, velocities of particles are indeed examples of tangent vectors in differential geometry. More generally, tangent vectors tell us how things change in a given direction. They do this by differentiating.
It is simple to check that the object
${{\partial}_{\mu}|}_{p}:={{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}|}_{p}$ |
which acts on functions as shown in (2.57) obeys all the requirements of a tangent vector.
Note that the index $\mu $ is now a subscript, rather than superscript that we used for the coordinates ${x}^{\mu}$. (On the right-hand-side, the superscript in $\partial /\partial {x}^{\mu}$ is in the denominator and counts as a subscript.) We will adopt the summation convention, where repeated indices are summed. But, as we will see, the placement of indices up or down will tell us something and all sums will necessarily have one index up and one index down. This is a convention that we met already in Special Relativity where the up/downness of the index changes minus signs. Here it has a more important role that we will see as we go on: the placement of the index tells us what kind of mathematical space the object lives in. For now, you should be aware that any equation with two repeated indices that are both up or both down is necessarily wrong, just as any equation with three or more repeated indices is wrong.
Theorem: The set of all tangent vectors at point $p$ forms an $n$-dimensional vector space. We call this the tangent space ${T}_{p}(M)$. The tangent vectors ${{\partial}_{\mu}|}_{p}$ provide a basis for ${T}_{p}(M)$. This means that we can write any tangent vector as
${X}_{p}={{X}^{\mu}{\partial}_{\mu}|}_{p}$ |
with ${X}^{\mu}={X}_{p}({x}^{\mu})$ the components of the tangent vector in this basis.
Proof: Much of the proof is just getting straight what objects live in what spaces. Indeed, getting this straight is a large part of the subject of differential geometry. To start, we need a small lemma. We define the function $F=f\circ {\varphi}^{-1}:U\to \mathbf{R}$, with $\varphi =({x}^{1},\mathrm{\dots},{x}^{n})$ a chart on a neighbourhood of $p$. Then, in some (perhaps smaller) neighbourhood of $p$ we can always write the function $F$ as
$F(x)=F({x}^{\mu}(p))+({x}^{\mu}-{x}^{\mu}(p)){F}_{\mu}(x)$ | (2.58) |
where we have introduced $n$ new functions ${F}_{\mu}(x)$ and used the summation convention in the final term. If the function $F$ has a Taylor expansion then we can trivially write it in the form (2.58) by repackaging all the terms that are quadratic and higher into the ${F}_{\mu}(x)$ functions, keeping a linear term out front. But in fact there’s no need to assume the existence of a Taylor expansion. One way to see this is to note that for any function $G(t)$ we trivially have $G(1)=G(0)+{\int}_{0}^{1}\mathit{d}t{G}^{\prime}(t)$. But now apply this formula to the function $G(t)=F(tx)$ for some fixed $x$. This gives $F(x)=F(0)+x{\int}_{0}^{1}\mathit{d}t{F}^{\prime}(xt)$ which is precisely (2.58) for a function of a single variable expanded about the origin. The same method holds more generally.
Given (2.58), we act with ${\partial}_{\mu}$ on both sides, and then evaluate at ${x}^{\mu}={x}^{\mu}(p)$. This tells us that the functions ${F}_{\mu}$ must satisfy
${{\displaystyle \frac{\partial F}{\partial {x}^{\mu}}}|}_{x(p)}={F}_{\mu}(x(p))$ | (2.59) |
We can translate this into a similar expression for $f$ itself. We define $n$ functions on $M$ by ${f}_{\mu}={F}_{\mu}\circ \varphi $. Then, for any $q\in M$ in the appropriate neighbourhood of $p$, (2.58) becomes
$f\circ {\varphi}^{-1}({x}^{\mu}(q))=f\circ {\varphi}^{-1}({x}^{\mu}(p))+({x}^{\mu}(q)-{x}^{\mu}(p))\left[{f}_{\mu}\circ {\varphi}^{-1}({x}^{\mu}(q))\right]$ |
But ${\varphi}^{-1}({x}^{\mu}(q))=q$. So we find that, in the neighbourhood of $p$, it is always possible to write a function $f$ as
$f(q)=f(p)+({x}^{\mu}(q)-{x}^{\mu}(p)){f}_{\mu}(q)$ |
for some ${f}_{\mu}(q)$. Note that, evaluated at $q=p$, we have
${f}_{\mu}(p)={F}_{\mu}\circ \varphi (p)={F}_{\mu}(x(p))={{\displaystyle \frac{\partial F}{\partial {x}^{\mu}}}|}_{x(p)}={{\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}|}_{p}$ |
where in the last equality we used (2.57) and in the penultimate equality we used (2.59).
Now we can turn to the tangent vector ${X}_{p}$. This acts on the function $f$ to give
${X}_{p}(f)={X}_{p}\left(f(p)+({x}^{\mu}-{x}^{\mu}(p)){f}_{\mu}\right)$ |
where we’ve dropped the arbitrary argument $q$ in $f(q)$, ${x}^{\mu}(q)$ and ${f}_{\mu}(q)$; these are the functions on which the tangent vector is acting. Using linearity and Leibnizarity, we have
${X}_{p}(f)={X}_{p}\left(f(p)\right)+{X}_{p}\left(({x}^{\mu}-{x}^{\mu}(p))\right){f}_{\mu}(p)+({x}^{\mu}(p)-{x}^{\mu}(p)){X}_{p}\left({f}_{\mu}\right)$ |
The first term vanishes because $f(p)$ is just a constant and all tangent vectors are vanishing when acting on a constant. The final term vanishes as well because the Leibniz rule tells us to evaluate the function $({x}^{\mu}-{x}^{\mu}(p))$ at $p$. Finally, by linearity, the middle term includes a ${X}_{p}({x}^{\mu}(p))$ term which vanishes because ${x}^{\mu}(p)$ is just a constant. We’re left with
${X}_{p}(f)={{X}_{p}({x}^{\mu}){\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}|}_{p}$ |
This means that the tangent vector ${X}_{p}$ can be written as
${X}_{p}={{X}^{\mu}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}|}_{p}$ |
with ${X}^{\mu}={X}_{p}({x}^{\mu})$ as promised. To finish, we just need to show that ${{\partial}_{\mu}|}_{p}$ provide a basis for ${T}_{p}(M)$. From above, they span the space. To check linear independence, suppose that we have vector $\alpha ={{\alpha}_{\mu}{\partial}_{\mu}|}_{p}=0$. Then acting on $f={x}^{\nu}$, this gives $\alpha ({x}^{\nu})={{\alpha}_{\mu}({\partial}_{\mu}{x}^{\nu})|}_{p}={\alpha}_{\nu}=0$. This concludes our proof. $\mathrm{\square}$
We have an ambivalent relationship with coordinates. We can’t calculate anything without them, but we don’t want to rely on them. The compromise we will come to is to consistently check that nothing physical depends on our choice of coordinates.
The key idea is that a given tangent vector ${X}_{p}$ exists independent of the choice of coordinate. However, the chosen basis $\{{{\partial}_{\mu}|}_{p}\}$ clearly depends on our choice of coordinates: to define it we had to first introduce a chart $\varphi $ and coordinates ${x}^{\mu}$. A basis defined in this way is called, quite reasonably, a coordinate basis. At times we will work with other bases, $\{{e}_{\mu}\}$ which are not defined in this way. Unsurprisingly, these are referred to as non-coordinate bases. A particularly useful example of a non-coordinate basis, known as vielbeins, will be introduced in Section 3.4.2.
Suppose that we picked a different chart $\stackrel{~}{\varphi}$, with coordinates ${\stackrel{~}{x}}^{\mu}$ in the neighbourhood of $p$. We then have two different bases, and can express the tangent vector ${X}_{p}$ in terms of either,
${X}_{p}={{X}^{\mu}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}|}_{p}={{\stackrel{~}{X}}^{\mu}{\displaystyle \frac{\partial}{\partial {\stackrel{~}{x}}^{\mu}}}|}_{p}$ |
The vector is the same, but the components of the vector change: they are ${X}^{\mu}$ in the first set of coordinates, and ${\stackrel{~}{X}}^{\mu}$ in the second. It is straightforward to determine the relationship between ${X}^{\mu}$ and ${\stackrel{~}{X}}^{\mu}$. To see this, we look at how the tangent vector ${X}_{p}$ acts on a function $f$,
${X}_{p}(f)={{X}^{\mu}{\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}|}_{p}={{{X}^{\mu}{\displaystyle \frac{\partial {\stackrel{~}{x}}^{\nu}}{\partial {x}^{\mu}}}|}_{\varphi (p)}{\displaystyle \frac{\partial f}{\partial {\stackrel{~}{x}}^{\nu}}}|}_{p}$ |
where we’ve used the chain rule. (Actually, we’ve been a little quick here. You can be more careful by introducing the functions $F=f\circ {\varphi}^{-1}$ and $\stackrel{~}{F}=f\circ {\stackrel{~}{\varphi}}^{-1}$ and using (2.57) to write $\partial f/\partial {x}^{\mu}=\partial F(\stackrel{~}{x}(x))/\partial {x}^{\mu}$. The end result is the same. We will be similarly sloppy in the same way as we proceed, often conflating $f$ and $F$.) You can read this equation in one of two different ways. First, we can view this as a change in the basis vectors: they are related as
${{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}|}_{p}={{{\displaystyle \frac{\partial {\stackrel{~}{x}}^{\nu}}{\partial {x}^{\mu}}}|}_{\varphi (p)}{\displaystyle \frac{\partial}{\partial {\stackrel{~}{x}}^{\nu}}}|}_{p}$ | (2.60) |
Alternatively, we can view this as a change in the components of the vector, which transform as
${\stackrel{~}{X}}^{\nu}={{X}^{\mu}{\displaystyle \frac{\partial {\stackrel{~}{x}}^{\nu}}{\partial {x}^{\mu}}}|}_{\varphi (p)}$ | (2.61) |
Components of vectors that transform this way are sometimes said to be contravariant. I’ve always found this to be annoying terminology, in large part because I can never remember it. A more important point is that the form of (2.61) is essentially fixed once you remember that the index on ${X}^{\mu}$ sits up rather than down.
So far, we haven’t really explained where the name “tangent vector” comes from. Consider a smooth curve in $M$ that passes through the point $p$. This is a map $\sigma :I\to M$, with $I$ an open interval $I\subset \mathbf{R}$. We will parameterise the curve as $\sigma (t)$ such that $\sigma (0)=p\in M$.
With a given chart, this curve becomes $\varphi \circ \sigma :\mathbf{R}\mapsto {\mathbf{R}}^{n}$, parameterised by ${x}^{\mu}(t)$. Before we learned any differential geometry, we would say that the tangent vector to the curve at $t=0$ is
${X}^{\mu}={{\displaystyle \frac{d{x}^{\mu}(t)}{dt}}|}_{t=0}$ |
But we can take these to be the components of the tangent vector ${X}_{p}$, which we define as
${X}_{p}={{{\displaystyle \frac{d{x}^{\mu}(t)}{dt}}|}_{t=0}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}|}_{p}$ |
Our tangent vector now acts on functions $f\in {C}^{\mathrm{\infty}}(M)$. It is telling us how fast any function $f$ changes as we move along the curve.
Any tangent vector ${X}_{p}$ can be written in this form. This gives meaning to the term “tangent space” for ${T}_{p}(M)$. It is, literally, the space of all possible tangents to curves passing through the point $p$. For example, a two dimensional manifold, embedded in ${\mathbf{R}}^{3}$ is shown in Figure 17. At each point $p$, we can identify a vector space which is the tangent plane: this is ${T}_{p}(M)$.
As an aside, note that the mathematical definition of a tangent space makes no reference to embedding the manifold in some higher dimensional space. The tangent space is an object intrinsic to the manifold itself. (This is in contrast to the picture where it was unfortunately necessary to think about the manifold as embedded in ${\mathbf{R}}^{3}$.)
The tangent spaces ${T}_{p}(M)$ and ${T}_{q}(M)$ at different points $p\ne q$ are different. There’s no sense in which we can add vectors from one to vectors from the other. In fact, at this stage there no way to even compare vectors in ${T}_{p}(M)$ to vectors in ${T}_{q}(M)$. They are simply different spaces. As we proceed, we will make some effort to figure ways to get around this.
So far we have only defined tangent vectors at a point $p$. It is useful to consider an object in which there is a choice of tangent vector for every point $p\in M$. In physics, we call objects that vary over space fields.
A vector field $X$ is defined to be a smooth assignment of a tangent vector ${X}_{p}$ to each point $p\in M$. This means that if you feed a function to a vector field, then it spits back another function, which is the differentiation of the first. In symbols, a vector field is therefore a map $X:{C}^{\mathrm{\infty}}(M)\to {C}^{\mathrm{\infty}}(M)$. The function $X(f)$ is defined by
$\left(X(f)\right)(p)={X}_{p}(f)$ |
The space of all vector fields on $M$ is denoted $\U0001d51b(M)$.
Given a coordinate basis, we can expand any vector field as
$X={X}^{\mu}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}$ | (2.62) |
where the ${X}^{\mu}$ are now smooth functions on $M$.
Strictly speaking, the expression (2.62) only defines a vector field on the open set $\mathcal{O}\subset M$ covered by the chart, rather than the whole manifold. We may have to patch this together with other charts to cover all of $M$.
Given two vector fields $X,Y\in \U0001d51b(M)$, we can’t multiply them together to get a new vector field. Roughly speaking, this is because the product $XY$ is a second order differential operator rather than a first order operator. This reveals itself in a failure of Leibnizarity for the object $XY$,
$XY(fg)=X(fY(g)+Y(f)g)=X(f)Y(g)+fXY(g)+gXY(f)+X(g)Y(f)$ |
This is not the same as $fXY(g)+gXY(f)$ that Leibniz requires.
However, we can build a new vector field by taking the commutator $[X,Y]$, which acts on functions $f$ as
$[X,Y](f)=X(Y(f))-Y(X(f))$ |
This is also known as the Lie bracket. Evaluated in a coordinate basis, the commutator is given by
$[X,Y](f)$ | $=$ | ${X}^{\mu}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}\left({Y}^{\nu}{\displaystyle \frac{\partial f}{\partial {x}^{\nu}}}\right)-{Y}^{\mu}{\displaystyle \frac{\partial}{\partial {x}^{\mu}}}\left({X}^{\nu}{\displaystyle \frac{\partial f}{\partial {x}^{\nu}}}\right)$ | ||
$=$ | $\left({X}^{\mu}{\displaystyle \frac{\partial {Y}^{\nu}}{\partial {x}^{\mu}}}-{Y}^{\mu}{\displaystyle \frac{\partial {X}^{\nu}}{\partial {x}^{\mu}}}\right){\displaystyle \frac{\partial f}{\partial {x}^{\nu}}}$ |
This holds for all $f\in {C}^{\mathrm{\infty}}(M)$, so we’re at liberty to write
$[X,Y]=\left({X}^{\mu}{\displaystyle \frac{\partial {Y}^{\nu}}{\partial {x}^{\mu}}}-{Y}^{\mu}{\displaystyle \frac{\partial {X}^{\nu}}{\partial {x}^{\mu}}}\right){\displaystyle \frac{\partial}{\partial {x}^{\nu}}}$ | (2.63) |
It is not difficult to check that the commutator obeys the Jacobi identity
$[X,[Y,Z]]+[Y,[Z,X]]+[Z,[X,Y]]=0$ |
This ensures that the set of all vector fields on a manifold $M$ has the mathematical structure of a Lie algebra.
There is a slightly different way of thinking about vector fields on a manifold. A flow on $M$ is a one-parameter family of diffeomorphisms ${\sigma}_{t}:M\to M$ labelled by $t\in \mathbf{R}$. These maps have the properties that ${\sigma}_{t=0}$ is the identity map, and ${\sigma}_{s}\circ {\sigma}_{t}={\sigma}_{s+t}$. These two requirements ensure that ${\sigma}_{-t}={\sigma}_{t}^{-1}$. Such a flow gives rise to streamlines on the manifold. We will further require that these streamlines are smooth.
We can then define a vector field by taking the tangent to the streamlines at each point. In a given coordinate system, the components of the vector field are
${X}^{\mu}({x}^{\mu}(t))={\displaystyle \frac{d{x}^{\mu}(t)}{dt}}$ | (2.64) |
where I’ve abused notation a little and written ${x}^{\mu}(t)$ rather than the more accurate but cumbersome ${x}^{\mu}({\sigma}_{t})$. This will become a habit, with the coordinates ${x}^{\mu}$ often used to refer to the point $p\in M$.
A flow gives rise to a vector field. Alternatively, given a vector field ${X}^{\mu}(x)$, we can integrate the differential equation (2.64), subject to an initial condition ${x}^{\mu}(0)={x}_{\mathrm{initial}}^{\mu}$ to generate streamlines which start at ${x}_{\mathrm{initial}}^{\mu}$. These streamlines are called integral curves, generated by $X$.
In what follows, we will only need the infinitesimal flow generated by $X$. This is simply
${x}^{\mu}(t)={x}^{\mu}(0)+t{X}^{\mu}(x)+\mathcal{O}({t}^{2})$ | (2.65) |
Indeed, differentiating this obeys (2.64) to leading order in $t$.
(An aside: Given a vector field $X$, it may not be possible to integrate (2.64) to generate a flow defined for all $t\in \mathbf{R}$. For example, consider $M=\mathbf{R}$ with the vector field $X={x}^{2}$. The equation $dx/dt={x}^{2}$, subject to the initial condition $x(0)=a$, has the unique solution $x(t)=a/(1-at)$ which diverges at $t=1/a$. Vector fields which generate a flow for all $t\in \mathbf{R}$ are called complete. It turns out that all vector fields on a manifold $M$ are complete if $M$ is compact. Roughly speaking, “compact” means that $M$ doesn’t “stretch to infinity”. More precisely, a topological space $M$ is compact if, for any family of open sets covering $M$ there always exists a finite sub-family which also cover $M$. So $\mathbf{R}$ is not compact because the family of sets $\{(-n,n),n\in {\mathbf{Z}}^{+}\}$ covers $\mathbf{R}$ but has no finite sub-family. Similarly, ${\mathbf{R}}^{n}$ is non-compact. However, ${\mathbf{S}}^{n}$ and ${\mathbf{T}}^{n}$ are compact manifolds.)
We can look at some examples.
Consider the sphere ${\mathbf{S}}^{2}$ in polar coordinates with the vector field $X={\partial}_{\varphi}$. The integral curves solve the equation (2.64), which are
$\frac{d\varphi}{dt}}=1\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{\displaystyle \frac{d\theta}{dt}}=0$ |
This has the solution $\theta ={\theta}_{0}$ and $\varphi ={\varphi}_{0}+t$. The associated one-parameter diffeomorphism is ${\sigma}_{t}:(\theta ,\varphi )\to (\theta ,\varphi +t)$, and the flow lines are simply lines of constant latitude on the sphere and are shown in the left-hand figure.
Alternatively, consider the vector field on ${\mathbf{R}}^{2}$ with Cartesian components ${X}^{\mu}=(1,{x}^{2})$. The equation for the integral curves is now
$\frac{dx}{dt}}=1\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{\displaystyle \frac{dy}{dt}}={x}^{2$ |
which has the solution $x(t)={x}_{0}+t$ and $y(t)={y}_{0}+\frac{1}{3}{({x}_{0}+t)}^{3}$. The associated flow lines are shown in the right-hand figure.
So far we have learned how to differentiate a function. This requires us to introduce a vector field $X$, and the new function $X(f)$ can be viewed as the derivative of $f$ in the direction of $X$.
Next we ask: is it possible to differentiate a vector field? Specifically, suppose that we have a second vector field $Y$. How can we differentiate this in the direction of $X$ to get a new vector field? As we’ve seen, we can’t just write down $XY$ because this doesn’t define a new vector field.
To proceed, we should think more carefully about what differentiation means. For a function $f(x)$ on $\mathbf{R}$, we compare the values of the function at nearby points, and see what happens as those points approach each other
$\frac{df}{dx}}=\underset{t\to 0}{lim}{\displaystyle \frac{f(x+t)-f(x)}{t}$ |
Similarly, to differentiate a vector field, we need to subtract the tangent vector ${Y}_{p}\in {T}_{p}(M)$ from the tangent vector at some nearby point ${Y}_{q}\in {T}_{q}(M)$, and then see what happens in the limit $q\to p$. But that’s problematic because, as we stressed above, the vector spaces ${T}_{p}(M)$ and ${T}_{q}(M)$ are different, and it makes no sense to subtract vectors in one from vectors in the other. To make progress, we’re going to have to find a way to do this. Fortunately, there is a way.
Suppose that we have a map $\phi :M\to N$ between two manifolds $M$ and $N$ which we will take to be a diffeomorphism. This allows us to import various structures on one manifold to the other.
For example, if we have a function on $f:N\to \mathbf{R}$, then we can construct a new function that we denote $({\phi}^{\ast}f):M\to \mathbf{R}$,
$({\phi}^{\ast}f)(p)=f(\phi (p))$ |
Using the map in this way, to drag objects originally defined on $N$ onto $M$ is called the pull-back. If we introduce coordinates ${x}^{\mu}$ on $M$ and ${y}^{\alpha}$ on $N$, then the map $\phi (x)={y}^{\alpha}(x)$, and we can write
$({\phi}^{\ast}f)(x)=f(y(x))$ |
Some objects more naturally go the other way. For example, given a vector field $Y$ on $M$, we can define a new vector field $({\phi}_{\ast}Y)$ on $N$. If we are given a function $f:N\to \mathbf{R}$, then the vector field $({\phi}_{\ast}Y)$ on $N$ acts as
$({\phi}_{\ast}Y)(f)=Y({\phi}^{\ast}f)$ |
where I’ve been a little sloppy in the notation here since the left-hand side is a function on $N$ and the right-hand side a function on $M$. The equality above holds when evaluated at the appropriate points: $[({\phi}_{\ast}Y)(f)](\phi (p))=[Y({\phi}^{\ast}f)](p)$. Using the map to push objects on $M$ onto $N$ is called the push-forward.
If $Y={Y}^{\mu}\partial /\partial {x}^{\mu}$ is the vector field on $M$, we can write the induced vector field on $N$ as
$({\phi}_{\ast}Y)(f)={Y}^{\mu}{\displaystyle \frac{\partial f(y(x))}{\partial {x}^{\mu}}}={Y}^{\mu}{\displaystyle \frac{\partial {y}^{\alpha}}{\partial {x}^{\mu}}}{\displaystyle \frac{\partial f(y)}{\partial {y}^{\alpha}}}$ |
Written in components, $({\phi}_{\ast}Y)={({\phi}_{\ast}Y)}^{\alpha}\partial /\partial {y}^{\alpha}$, we then have
${({\phi}_{\ast}Y)}^{\alpha}={Y}^{\mu}{\displaystyle \frac{\partial {y}^{\alpha}}{\partial {x}^{\mu}}}$ | (2.66) |
Given the way that the indices are contracted, this is more or less the only thing we could write down.
We’ll see other examples of these induced maps later in the lectures. The push-forward is always denoted as ${\phi}_{\ast}$ and goes in the same way as the original map. The pull-back is always denoted as ${\phi}^{\ast}$ and goes in the opposite direction to the original map. Importantly, if our map $\phi :M\to N$ is a diffeomorphism, then we also have ${\phi}^{-1}:N\to M$, so we can transport any object from $M$ to $N$ and back again with impunity.
Now we can use these ideas to help build a derivative. Suppose that we are given a vector field $X$ on $M$. This generates a flow ${\sigma}_{t}:M\to M$, which is a map between manifolds, now with $N=M$. This means that we can use (2.66) to generate a push-forward map from ${T}_{p}(M)$ to ${T}_{{\sigma}_{t}(p)}(M)$. But this is exactly what we need if we want to compare tangent vectors at neighbouring points. The resulting differential operator is called the Lie derivative and is denoted ${\mathcal{L}}_{X}$.
It will turn out that we can use these ideas to differentiate many different kinds of objects. As a warm-up, let’s first see how an analogous construction allows us to differentiate functions. Now the function
${\mathcal{L}}_{X}f=\underset{t\to 0}{lim}{\displaystyle \frac{f({\sigma}_{t}(x))-f(x)}{t}}={{\displaystyle \frac{df({\sigma}_{t}(x))}{dt}}|}_{t=0}={{\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}{\displaystyle \frac{d{x}^{\mu}}{dt}}|}_{t=0}$ |
But, using (2.64), we know that $d{x}^{\mu}/dt={X}^{\mu}$. We then have
${\mathcal{L}}_{X}f={X}^{\mu}(x){\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}=X(f)$ | (2.67) |
In other words, acting on functions with the Lie derivative ${\mathcal{L}}_{X}$ coincides with action of the vector field $X$.
Now let’s look at the action of ${\mathcal{L}}_{X}$ on a vector field $Y$. This is defined by
${\mathcal{L}}_{X}Y=\underset{t\to 0}{lim}{\displaystyle \frac{{({({\sigma}_{-t})}_{\ast}Y)}_{p}-{Y}_{p}}{t}}$ |
Note the minus sign in ${\sigma}_{-t}$. This reflects that fact that vector fields are pushed, rather than pulled. The map ${\sigma}_{t}$ takes us from the point $p$ to the point ${\sigma}_{t}(p)$. But to push a tangent vector ${Y}_{{\sigma}_{t}(p)}\in {T}_{{\sigma}_{t}(p)}(M)$ to a tangent vector in ${T}_{p}(M)$, where it can be compared to ${Y}_{p}$, we need to push with the inverse map ${({\sigma}_{-t})}_{\ast}$. This is shown Figure 20.
Let’s first calculate the action of ${\mathcal{L}}_{X}$ on a coordinate basis ${\partial}_{\mu}=\partial /\partial {x}^{\mu}$. We have
${\mathcal{L}}_{X}{\partial}_{\mu}=\underset{t\to 0}{lim}{\displaystyle \frac{{({\sigma}_{-t})}_{\ast}{\partial}_{\mu}-{\partial}_{\mu}}{t}}$ | (2.68) |
We have an expression for the push-forward of a tangent vector in (2.66), where the coordinates ${y}^{\alpha}$ on $N$ should now be replaced by the infinitesimal change of coordinates induced by the flow ${\sigma}_{-t}$ which, from (2.65) is ${x}^{\mu}(t)={x}^{\mu}(0)-t{X}^{\mu}+\mathrm{\dots}$. Note the minus sign, which comes from the fact that we have to map back to where we came from as shown in Figure 20. We have, for small $t$,
${({\sigma}_{-t})}_{\ast}{\partial}_{\mu}=\left({\delta}_{\mu}^{\nu}-t{\displaystyle \frac{\partial {X}^{\nu}}{\partial {x}^{\mu}}}+\mathrm{\dots}\right){\partial}_{\nu}$ |
Acting on a coordinate basis, we then have
${\mathcal{L}}_{X}{\partial}_{\mu}=-{\displaystyle \frac{\partial {X}^{\nu}}{\partial {x}^{\mu}}}{\partial}_{\nu}$ | (2.69) |
To determine the action of ${\mathcal{L}}_{X}$ on a general vector field $Y$, we use the fact that the Lie derivative obeys the usual properties that we expect of a derivative, including linearity, ${\mathcal{L}}_{X}({Y}_{1}+{Y}_{2})={\mathcal{L}}_{X}{Y}_{1}+{\mathcal{L}}_{X}{Y}_{2}$ and Leibnizarity ${\mathcal{L}}_{X}(fY)=f{\mathcal{L}}_{X}Y+({\mathcal{L}}_{X}f)Y$ for any function $f$, both of which follow from the definition. The action on a general vector field $Y={Y}^{\mu}(x)\partial /\partial {x}^{\mu}$ can then be written as
${\mathcal{L}}_{X}({Y}^{\mu}{\partial}_{\mu})=({\mathcal{L}}_{X}{Y}^{\mu}){\partial}_{\mu}+{Y}^{\mu}({\mathcal{L}}_{X}{\partial}_{\mu})$ |
where we’ve simply viewed the components ${Y}^{\mu}(x)$ as $n$ functions. We can use (2.67) to determine ${\mathcal{L}}_{X}{Y}^{\mu}$ and we’ve computed ${\mathcal{L}}_{X}{\partial}_{\mu}$ in (2.69). We then have
${\mathcal{L}}_{X}({Y}^{\mu}{\partial}_{\mu})={X}^{\nu}{\displaystyle \frac{\partial {Y}^{\mu}}{\partial {x}^{\nu}}}{\partial}_{\mu}-{Y}^{\mu}{\displaystyle \frac{\partial {X}^{\nu}}{\partial {x}^{\mu}}}{\partial}_{\nu}$ |
But this is precisely the structure of the commutator. We learn that the Lie derivative acting on vector fields is given by
${\mathcal{L}}_{X}Y=[X,Y]$ |
A corollary of this is
${\mathcal{L}}_{X}{\mathcal{L}}_{Y}Z-{\mathcal{L}}_{Y}{\mathcal{L}}_{X}Z={\mathcal{L}}_{[X,Y]}Z$ | (2.70) |
which follows from the Jacobi identity for commutators.
The Lie derivative is just one of several derivatives that we will meet in this course. As we introduce new objects, we will learn how to act with ${\mathcal{L}}_{X}$ on them. But we will also see that we can endow different meanings to the idea of differentiation. In fact, the Lie derivative will take something of a back seat until Section 4.3 when we will see that it is what we need to understand symmetries.
For any vector space $V$, the dual vector space ${V}^{\ast}$ is the space of all linear maps from $V$ to $\mathbf{R}$.
This is a standard mathematical construction, but even if you haven’t seen it before it should resonate with something you know from quantum mechanics. There we have states in a Hilbert space with kets $|\psi \u27e9\in \mathscr{H}$ and a dual Hilbert space with bras $\u27e8\varphi |\in {\mathscr{H}}^{\ast}$. Any bra can be viewed as a map $\u27e8\varphi |:\mathscr{H}\to \mathbf{R}$ defined by $\u27e8\varphi |(|\psi \u27e9)=\u27e8\varphi |\psi \u27e9$.
In general, suppose that we are given a basis $\{{e}_{\mu},\mu =1,\mathrm{\dots},n\}$ of $V$. Then we can introduce a dual basis $\{{f}^{\mu},\mu =1,\mathrm{\dots},n\}$ for ${V}^{\ast}$ defined by
${f}^{\nu}({e}_{\mu})={\delta}_{\mu}^{\nu}$ |
A general vector in $V$ can be written as $X={X}^{\mu}{e}_{\mu}$ and ${f}^{\nu}(X)={X}^{\mu}{f}^{\nu}({e}_{\mu})={X}^{\nu}$. Given a basis, this construction provides an isomorphism between $V$ and ${V}^{\ast}$ given by ${e}_{\mu}\to {f}^{\mu}$. But the isomorphism is basis dependent. Pick a different basis, and you’ll get a different map.
We can repeat the construction and consider ${({V}^{\ast})}^{\ast}$, which is the space of all linear maps from ${V}^{\ast}$ to $\mathbf{R}$. But this space is naturally isomorphic to $V$, meaning that the isomorphism is independent of the choice of basis. To see this, suppose that $X\in V$ and $\omega \in {V}^{\ast}$. This means that $\omega (X)\in \mathbf{R}$. But we can equally as well view $X\in {({V}^{\ast})}^{\ast}$ and define $X(\omega )=\omega (X)\in \mathbf{R}$. In this sense, ${({V}^{\ast})}^{\ast}=V$.
At each point $p\in M$, we have a vector space ${T}_{p}(M)$. The dual of this space, ${T}_{p}^{\ast}(M)$ is called the cotangent space at $p$, and an element of this space is called a cotangent vector, sometimes shortened to covector. Given a basis $\{{e}_{\mu}\}$ of ${T}_{p}(M)$, we can introduce the dual basis $\{{f}^{\mu}\}$ for ${T}_{p}^{\ast}(M)$ and expand any co-vector as $\omega ={\omega}_{\mu}{f}^{\mu}$.
We can also construct fields of cotangent vectors, by picking a member of ${T}_{p}^{\ast}(M)$ for each point $p$ in a smooth manner. Such a cotangent field is better known as a one-form; they map vector fields to real numbers. The set of all one-forms on $M$ is denoted ${\mathrm{\Lambda}}^{1}(M)$.
There is a particularly simple way to construct a one-form. Take a function $f\in {C}^{\mathrm{\infty}}(M)$ and define $df\in {\mathrm{\Lambda}}^{1}(M)$ by
$df(X)=X(f)$ | (2.71) |
We can use this method to build a basis for ${\mathrm{\Lambda}}^{1}(M)$. If we introduce coordinates ${x}^{\mu}$ on $M$ with the corresponding coordinate basis ${e}_{\mu}=\partial /\partial {x}^{\mu}$ of vector fields, which we often write in shorthand as $\partial /\partial {x}^{\mu}\equiv {\partial}_{\mu}$. We then we simply take the functions $f={x}^{\mu}$ which, from (2.71), gives
$d{x}^{\mu}({\partial}_{\nu})={\partial}_{\nu}({x}^{\mu})={\delta}_{\nu}^{\mu}$ |
This means that ${f}^{\mu}=d{x}^{\mu}$ provides a basis for ${\mathrm{\Lambda}}^{1}(M)$, dual to the coordinate basis $\partial /{\partial}_{\mu}$. In general, an arbitrary one-form $\omega \in {\mathrm{\Lambda}}^{1}(M)$ can then be expanded as
$\omega ={\omega}_{\mu}d{x}^{\mu}$ |
In such a basis the one-form $df$ takes the form
$df={\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}d{x}^{\mu}$ | (2.72) |
To see this, we simply need to evaluate
$df(X)={\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}d{x}^{\mu}({X}^{\nu}{\partial}_{\nu})={X}^{\mu}{\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}=X(f)$ |
which agrees with the expected answer (2.71).
As with vector fields, we can look at what happens if we change coordinates. Given two different charts, $\varphi =({x}^{1},\mathrm{\dots},{x}^{n})$ and $\stackrel{~}{\varphi}=({\stackrel{~}{x}}^{1},\mathrm{\dots},{\stackrel{~}{x}}^{n})$, we know that the basis for vector fields changes as (2.60),
$\frac{\partial}{\partial {\stackrel{~}{x}}^{\mu}}}={\displaystyle \frac{\partial {x}^{\nu}}{\partial {\stackrel{~}{x}}^{\mu}}}{\displaystyle \frac{\partial}{\partial {x}^{\nu}}$ |
We should take the basis of one-forms to transform in the inverse manner,
$d{\stackrel{~}{x}}^{\mu}={\displaystyle \frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\nu}}}d{x}^{\nu}$ | (2.73) |
This then ensures that
$d{\stackrel{~}{x}}^{\mu}\left({\displaystyle \frac{\partial}{\partial {\stackrel{~}{x}}^{\nu}}}\right)={\displaystyle \frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\rho}}}d{x}^{\rho}\left({\displaystyle \frac{\partial {x}^{\sigma}}{\partial {\stackrel{~}{x}}^{\nu}}}{\displaystyle \frac{\partial}{\partial {x}^{\sigma}}}\right)={\displaystyle \frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\rho}}}{\displaystyle \frac{\partial {x}^{\sigma}}{\partial {\stackrel{~}{x}}^{\nu}}}d{x}^{\rho}\left({\displaystyle \frac{\partial}{\partial {x}^{\sigma}}}\right)={\displaystyle \frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\rho}}}{\displaystyle \frac{\partial {x}^{\sigma}}{\partial {\stackrel{~}{x}}^{\nu}}}{\delta}_{\sigma}^{\rho}$ |
But this is just the multiplication of a matrix and its inverse,
$\frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\rho}}}{\displaystyle \frac{\partial {x}^{\rho}}{\partial {\stackrel{~}{x}}^{\nu}}}={\delta}_{\nu}^{\mu$ |
So we find that
$d{\stackrel{~}{x}}^{\mu}\left({\displaystyle \frac{\partial}{\partial {\stackrel{~}{x}}^{\nu}}}\right)={\delta}_{\nu}^{\mu}$ |
as it should. We can then expand a one-form $\omega $ in either of these two bases,
$\omega ={\omega}_{\mu}d{x}^{\mu}={\stackrel{~}{\omega}}_{\mu}d{\stackrel{~}{x}}^{\mu}\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{with}\mathit{\hspace{1em}\hspace{1em}\u2006}{\stackrel{~}{\omega}}_{\mu}={\displaystyle \frac{\partial {x}^{\nu}}{\partial {\stackrel{~}{x}}^{\mu}}}{\omega}_{\nu}$ | (2.74) |
In the annoying terminology that I can never remember, components of vectors that transform this way are said to be covariant. Note that, as with vector fields, the placement of the indices means that (2.73) and (2.74) are pretty much the only things that you can write down that make sense.
In Section 2.2.4, we explained how to construct the Lie derivative, which differentiates a vector field in the direction of a second vector field $X$. This same idea can be adapted to one-forms.
Under a map $\phi :M\to N$, we saw that a vector field $X$ on $M$ can be pushed forwards to a vector field ${\phi}_{\ast}X$ on $N$. In contrast, one-forms go the other way: given a one-form $\omega $ on $N$, we can pull this back to a one-form $({\phi}^{\ast}\omega )$ on $M$, defined by
$({\phi}^{\ast}\omega )(X)=\omega ({\phi}_{\ast}X)$ |
If we introduce coordinates ${x}^{\mu}$ on $M$ and ${y}^{\alpha}$ on $N$ then the components of the pull-back are given by
${({\phi}^{\ast}\omega )}_{\mu}={\omega}_{\alpha}{\displaystyle \frac{\partial {y}^{\alpha}}{\partial {x}^{\mu}}}$ | (2.75) |
We now define the Lie derivative ${\mathcal{L}}_{X}$ acting on one-forms. Again, we use $X$ to generate a flow ${\sigma}_{t}:M\to M$ which, using the pull-back, allows us to compare one-forms at different points. We will denote the cotangent vector $\omega (p)$ as ${\omega}_{p}$. The Lie derivative of a one-form $\omega $ is then defined as
${\mathcal{L}}_{X}\omega =\underset{t\to 0}{lim}{\displaystyle \frac{{({\sigma}_{t}^{\ast}\omega )}_{p}-{\omega}_{p}}{t}}$ | (2.76) |
Note that we pull-back with the map ${\sigma}_{t}$. This is to be contrasted with (2.68) where we pushed forward the tangent vector with the map ${\sigma}_{-t}$ and, as we now show, this difference in minus sign manifests itself in the expression for the Lie derivative. The infinitesimal map ${\sigma}_{t}$ acts on coordinates as ${x}^{\mu}(t)={x}^{\mu}(0)+t{X}^{\mu}+\mathrm{\dots}$ so, from (2.75), the pull-back of a basis vector $d{x}^{\mu}$ is
${\sigma}_{t}^{\ast}d{x}^{\mu}=\left({\delta}_{\nu}^{\mu}+t{\displaystyle \frac{\partial {X}^{\mu}}{\partial {x}^{\nu}}}+\mathrm{\dots}\right)d{x}^{\nu}$ |
Acting on the coordinate basis, we then have
${\mathcal{L}}_{X}(d{x}^{\mu})={\displaystyle \frac{\partial {X}^{\mu}}{\partial {x}^{\nu}}}d{x}^{\nu}$ |
which indeed differs by a minus sign from the corresponding result (2.69) for tangent vectors. Acting on a general one-form $\omega ={\omega}_{\mu}d{x}^{\mu}$, the Lie derivative is
${\mathcal{L}}_{X}\omega $ | $=$ | $({\mathcal{L}}_{X}{\omega}_{\mu})d{x}^{\mu}+{\omega}_{\nu}{\mathcal{L}}_{X}(d{x}^{\nu})$ | (2.77) | ||
$=$ | $({X}^{\nu}{\partial}_{\nu}{\omega}_{\mu}+{\omega}_{\nu}{\partial}_{\mu}{X}^{\nu})d{x}^{\mu}$ |
We’ll return to discuss one-forms (and other forms) more in Section 2.4.
A tensor of rank $\mathrm{(}r\mathrm{,}s\mathrm{)}$ at a point $p\in M$ is defined to be a multi-linear map
$T:\stackrel{r}{\stackrel{\u23de}{{T}_{p}^{\ast}(M)\times \mathrm{\dots}\times {T}_{p}^{\ast}(M)}}\times \stackrel{s}{\stackrel{\u23de}{{T}_{p}(M)\times \mathrm{\dots}\times {T}_{p}(M)}}\to \mathbf{R}$ |
Such a tensor is said to have total rank $r+s$.
We’ve seen some examples already. A cotangent vector in ${T}_{p}^{\ast}(M)$ is a tensor of type $(0,1)$, while a tangent vector in ${T}_{p}(M)$ is a tensor of type $(1,0)$ (using the fact that ${T}_{p}(M)={T}_{p}^{\ast \ast}(M)$).
As before, we define a tensor field to be a smooth assignment of an $(r,s)$ tensor to every point $p\in M$.
Given a basis $\{{e}_{\mu}\}$ for vector fields and a dual basis $\{{f}^{\mu}\}$ for one-forms, the components of the tensor are defined to be
$T^{{\mu}_{1}\mathrm{\dots}{\mu}_{r}}{}_{{\nu}_{1}\mathrm{\dots}{\nu}_{s}}=T({f}^{{\mu}_{1}},\mathrm{\dots},{f}^{{\mu}_{r}},{e}_{{\nu}_{1}},\mathrm{\dots},{e}_{{\mu}_{s}})$ |
Note that we deliberately write the string of lower indices after the upper indices. In some sense this is unnecessary, and we don’t lose any information by writing ${T}_{{\nu}_{1}\mathrm{\dots}{\nu}_{s}}^{{\mu}_{1}\mathrm{\dots}{\mu}_{r}}$. Nonetheless, we’ll see later that it’s a useful habit to get into.
On a manifold of dimension $n$, there are ${n}^{r+s}$ such components. For a tensor field, each of these is a function over $M$.
As an example, consider a rank $(2,1)$ tensor. This takes two one-forms, say $\omega $ and $\eta $, together with a vector field $X$, and spits out a real number. In a given basis, this number is
$T(\omega ,\eta ,X)=T({\omega}_{\mu}{f}^{\mu},{\eta}_{\nu}{f}^{\nu},{X}^{\rho}{e}_{\rho})={\omega}_{\mu}{\eta}_{\nu}{X}^{\rho}T({f}^{\mu},{f}^{\nu},{e}_{\rho})=T^{\mu \nu}{}_{\rho}{\omega}_{\mu}{\eta}_{\nu}{X}^{\rho}$ |
Every manifold comes equipped with a natural $(1,1)$ tensor called $\delta $. This takes a one-form $\omega $ and a vector field $X$ and spits out the real number
$\delta (\omega ,X)=\omega (X)\mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}\delta ({f}^{\mu},{e}_{\nu})={f}^{\mu}({e}_{\nu})={\delta}_{\nu}^{\mu}$ |
which is simply the Kronecker delta.
As with vector fields and one-forms, we can ask how the components of a tensor transform. We will work more generally than before. Consider two bases for the vector fields, $\{{e}_{\mu}\}$ and $\{{\stackrel{~}{e}}_{\mu}\}$, not necessarily coordinate bases, related by
${\stackrel{~}{e}}_{\nu}={A}_{\nu}^{\mu}{e}_{\mu}$ |
for some invertible matrix $A$. The respective dual bases are $\{{f}^{\mu}\}$ and $\{{\stackrel{~}{f}}^{\mu}\}$ are then related by
${\stackrel{~}{f}}^{\rho}={B}_{\sigma}^{\rho}{f}^{\sigma}$ |
such that
${\stackrel{~}{f}}^{\rho}({\stackrel{~}{e}}_{\nu})={A}_{\nu}^{\mu}{B}_{\sigma}^{\rho}{f}^{\sigma}({e}_{\mu})={A}_{\nu}^{\mu}{B}_{\mu}^{\rho}={\delta}_{\nu}^{\rho}\mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}{B}_{\mu}^{\rho}={({A}^{-1})}_{\mu}^{\rho}$ |
The lower components of a tensor then transform by multiplying by $A$, and the upper components by multiplying by $B={A}^{-1}$. So, for example, a rank $(1,2)$ tensor transforms as
${\stackrel{~}{T}}_{\rho \nu}^{\mu}={B}_{\sigma}^{\mu}{A}_{\rho}^{\tau}{A}_{\nu}^{\lambda}T^{\sigma}{}_{\tau \lambda}$ | (2.78) |
When we change between coordinate bases, we have
${A}_{\nu}^{\mu}={\displaystyle \frac{\partial {x}^{\mu}}{\partial {\stackrel{~}{x}}^{\nu}}}\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{B}_{\nu}^{\mu}={({A}^{-1})}_{\nu}^{\mu}={\displaystyle \frac{\partial {\stackrel{~}{x}}^{\mu}}{\partial {x}^{\nu}}}$ |
You can check that this coincides with our previous results (2.61) and (2.74).
There are a number of operations that we can do on tensor fields to generate further tensors.
First, we can add and subtract tensors fields, or multiply them by functions. This is the statement that the set of tensors at a point $p\in M$ forms a vector space.
Next, there is a way to multiply tensors together to give a tensor of a different type. Given a tensor $S$ of rank $(p,q)$ and a tensor $T$ of rank $(r,s)$, we can form the tensor product, $S\otimes T$ which a new tensor of rank $(p+r,q+s)$, defined by
$S\otimes T({\omega}_{1},\mathrm{\dots},{\omega}_{p},{\eta}_{1},\mathrm{\dots},{\eta}_{r},{X}_{1},\mathrm{\dots}{X}_{q},{Y}_{1},\mathrm{\dots},{Y}_{s})$ | ||
$\mathrm{}\mathit{\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}}=S({\omega}_{1},\mathrm{\dots},{\omega}_{p},{X}_{1},\mathrm{\dots}{X}_{q})T({\eta}_{1},\mathrm{\dots},{\eta}_{r},{Y}_{1},\mathrm{\dots},{Y}_{s})$ |
In terms of components, this reads
$(S\otimes T)^{{\mu}_{1}\mathrm{\dots}{\mu}_{p}{\nu}_{1}\mathrm{\dots}{\nu}_{r}}{}_{{\rho}_{1}\mathrm{\dots}{\rho}_{q}{\sigma}_{1}\mathrm{\dots}{\sigma}_{s}}=S^{{\mu}_{1}\mathrm{\dots}{\mu}_{p}}{}_{{\rho}_{1}\mathrm{\dots}{\rho}_{q}}T^{{\nu}_{1}\mathrm{\dots}{\nu}_{r}}{}_{{\sigma}_{1}\mathrm{\dots}{\sigma}_{s}}$ | (2.79) |
Given an $(r,s)$ tensor $T$, we can also construct a tensor of lower rank $(r-1,s-1)$ by contraction. To do this, simply replace one of ${T}_{p}^{\ast}(M)$ entries with a basis vector ${f}^{\mu}$, and the corresponding ${T}_{p}(M)$ entry with the dual vector ${e}_{\mu}$ and then sum over $\mu =1,\mathrm{\dots},n$. So, for example, given a rank $(2,1)$ tensor $T$ we can construct a rank $(1,0)$ tensor $S$ by
$S(\omega )=T(\omega ,{f}^{\mu},{e}_{\mu})$ |
Alternatively, we could construct a (typically) different $(1,0)$ tensor by contracting the other argument, ${S}^{\prime}(\omega )=T({f}^{\mu},\omega ,{e}_{\mu})$. Written in terms of components, contraction simply means that we put an upper index equal to a lower index and sum over them,
${S}^{\mu}=T^{\mu \nu}{}_{\nu}\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{S}^{\prime \mu}=T^{\nu \mu}{}_{\nu}$ |
Our next operation is symmetrisation and anti-symmetrisation. For example, given a $(0,2)$ tensor $T$ we decompose it into two $(0,2)$ tensors, in which the arguments are either symmetrised or anti-symmetrised,
$S(X,Y)$ | $=$ | $\frac{1}{2}}\left(T(X,Y)+T(Y,X)\right)$ | ||
$A(X,Y)$ | $=$ | $\frac{1}{2}}\left(T(X,Y)-T(Y,X)\right)$ |
In index notation, this becomes
${S}_{\mu \nu}={\displaystyle \frac{1}{2}}({T}_{\mu \nu}+{T}_{\nu \mu})\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{A}_{\mu \nu}={\displaystyle \frac{1}{2}}({T}_{\mu \nu}-{T}_{\nu \mu})$ |
which is just like taking the symmetric and anti-symmetric part of a matrix. We will work with these operations frequently enough to justify introducing some new notation. We define
${T}_{(\mu \nu )}={\displaystyle \frac{1}{2}}({T}_{\mu \nu}+{T}_{\nu \mu})\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{T}_{[\mu \nu ]}={\displaystyle \frac{1}{2}}({T}_{\mu \nu}-{T}_{\nu \mu})$ |
These operations generalise to other tensors. For example,
$T^{(\mu \nu )\rho}{}_{\sigma}={\displaystyle \frac{1}{2}}\left(T^{\mu \nu \rho}{}_{\sigma}+T^{\nu \mu \rho}{}_{\sigma}\right)$ |
We can also symmetrise or anti-symmetrise over multiple indices, provided that these indices are either all up or all down. If we (anti)-symmetrise over $p$ objects, then we divide by $p!$, which is the number of possible permutations. This normalisation ensures that if we start with a tensor which is already, say, symmetric then further symmetrising doesn’t affect it. In the case of anti-symmetry, we weight each term with the sign of the permutation. So, for example,
$T^{\mu}{}_{(\nu \rho \sigma )}={\displaystyle \frac{1}{3!}}\left(T^{\mu}{}_{\nu \rho \sigma}+T^{\mu}{}_{\rho \nu \sigma}+T^{\mu}{}_{\rho \sigma \nu}+T^{\mu}{}_{\sigma \rho \nu}+T^{\mu}{}_{\sigma \nu \rho}+T^{\mu}{}_{\nu \sigma \rho}\right)$ |
and
$T^{\mu}{}_{[\nu \rho \sigma ]}={\displaystyle \frac{1}{3!}}\left(T^{\mu}{}_{\nu \rho \sigma}-T^{\mu}{}_{\rho \nu \sigma}+T^{\mu}{}_{\rho \sigma \nu}-T^{\mu}{}_{\sigma \rho \nu}+T^{\mu}{}_{\sigma \nu \rho}-T^{\mu}{}_{\nu \sigma \rho}\right)$ |
There will be times when, annoyingly, we will wish to symmetrise (or anti-symmetrise) over indices which are not adjacent. We introduce vertical bars to exclude certain indices from the symmetry procedure. So, for example,
${T}^{\mu}{}_{[\nu |\rho |\sigma ]}={\displaystyle \frac{1}{2}}({T}^{\mu}{}_{\nu \rho \sigma}-{T}^{\mu}{}_{\sigma \rho \nu})$ |
Finally, given a smooth tensor field $T$ of any rank, we can always take the Lie derivative with respect to a vector field $X$. As we’ve seen previously, under a map $\phi :M\to N$, vector fields are pushed forwards and one-forms are pulled-back. In general, this leaves a tensor of mixed type unsure where to go. However, if $\phi $ is a diffeomorphism then we also have ${\phi}^{-1}:N\to M$ and this allows us to define the push-forward of a tensor $T$ from $M$ to $N$. This acts on one-forms $\omega \in {\mathrm{\Lambda}}^{1}(N)$ and vector fields $X\in \U0001d51b(N)$ and is given by
$({\phi}_{\ast}T)({\omega}_{1},\mathrm{\dots},{\omega}_{r},{X}_{1},\mathrm{\dots},{X}_{s})=T({\phi}^{\ast}{\omega}_{1},\mathrm{\dots},{\phi}^{\ast}{\omega}_{r},({\phi}_{\ast}^{-1}{X}_{1}),\mathrm{\dots},({\phi}_{\ast}^{-1}{X}_{s}))$ |
Here ${\phi}^{\ast}\omega $ are the pull-backs of $\omega $ from $N$ to $M$, while ${\phi}_{\ast}^{-1}X$ are the push-forwards of $X$ from $N$ to $M$.
The Lie derivative of a tensor $T$ along $X$ is then defined as
${\mathcal{L}}_{X}T=\underset{t\to 0}{lim}{\displaystyle \frac{{({({\sigma}_{-t})}_{\ast}T)}_{p}-{T}_{p}}{t}}$ |
where ${\sigma}_{t}$ is the flow generated by $X$. This coincides with our earlier definitions for vector fields in (2.68) and for one-forms in (2.76). (The difference in the ${\sigma}_{-t}$ vs ${\sigma}_{t}$ minus sign in (2.68) and (2.76) is now hiding in the inverse push-forward ${\phi}_{\ast}^{-1}$ that appears in the definition ${\phi}_{\ast}T$.)
Some tensors are more interesting than others. A particularly interesting class are totally anti-symmetric $(0,p)$ tensors fields. These are called $p$-forms. The set of all $p$-forms over a manifold $M$ is denoted ${\mathrm{\Lambda}}^{p}(M)$.
We’ve met some forms before. A 0-form is simply a function. Meanwhile, as we saw previously, a 1-form is another name for a covector. The anti-symmetry means that we can’t have any form of degree $p>n=\mathrm{dim}(M)$. A $p$-form has $\left(\genfrac{}{}{0pt}{}{n}{p}\right)$ different components. Forms in ${\mathrm{\Lambda}}^{n}(M)$ are called top forms.
Given a $p$-form $\omega $ and a $q$-form $\eta $, we can take the tensor product (2.79) to construct a $(p+q)$-tensor. If we anti-symmetrise this, we then get a $(p+q)$-form. This construction is called the wedge product, and is defined by
${(\omega \wedge \eta )}_{{\mu}_{1}\mathrm{\dots}{\mu}_{p}{\nu}_{1}\mathrm{\dots}{\nu}_{q}}={\displaystyle \frac{(p+q)!}{p!q!}}{\omega}_{[{\mu}_{1}\mathrm{\dots}{\mu}_{p}}{\eta}_{{\nu}_{1}\mathrm{\dots}{\nu}_{q}]}$ |
where the $[\mathrm{\dots}]$ in the subscript tells us to anti-symmetrise over all indices. For example, given $\omega ,\eta \in {\mathrm{\Lambda}}^{1}(M)$, we can construct a 2-form
${(\omega \wedge \eta )}_{\mu \nu}={\omega}_{\mu}{\eta}_{\nu}-{\omega}_{\nu}{\eta}_{\mu}$ |
For one forms, the anti-symmetry ensures that $\omega \wedge \omega =0$. In general, if $\omega \in {\mathrm{\Lambda}}^{p}(M)$ and $\eta \in {\mathrm{\Lambda}}^{q}(M)$, then one can show that
$\omega \wedge \eta ={(-1)}^{pq}\eta \wedge \omega $ |
This means that $\omega \wedge \omega =0$ for any form of odd degree. We can, however, wedge even degree forms with themselves. (Which you know already for 0-forms where the wedge product is just multiplication of functions.)
As a more specific example, consider $M={\mathbf{R}}^{3}$ and $\omega ={\omega}_{\mu}d{x}^{\mu}$ and $\eta ={\eta}_{\mu}d{x}^{\mu}$. We then have
$\omega \wedge \eta $ | $=$ | $({\omega}_{1}d{x}^{1}+{\omega}_{2}d{x}^{2}+{\omega}_{3}d{x}^{3})\wedge ({\eta}_{1}d{x}^{1}+{\eta}_{2}d{x}^{2}+{\eta}_{3}d{x}^{3})$ | ||
$=$ | $({\omega}_{1}{\eta}_{2}-{\omega}_{2}{\eta}_{1})d{x}^{1}\wedge d{x}^{2}+({\omega}_{2}{\eta}_{3}-{\omega}_{3}{\eta}_{2})d{x}^{2}\wedge d{x}^{3}+({\omega}_{3}{\eta}_{1}-{\omega}_{1}{\eta}_{3})d{x}^{3}\wedge d{x}^{1}$ |
Notice that the components that arise are precisely those of the cross-product acting on vectors in ${\mathbf{R}}^{3}$. This is no coincidence: what we usually think of as the cross-product between vectors is really a wedge product between forms. We’ll have to wait to Section 3 to understand how to map from one to the other.
It can also be shown that the wedge product is associative, meaning
$\omega \wedge (\eta \wedge \lambda )=(\omega \wedge \eta )\wedge \lambda $ |
We can then drop the brackets in any such product.
Given a basis $\{{f}^{\mu}\}$ of ${\mathrm{\Lambda}}^{1}(M)$, a basis of ${\mathrm{\Lambda}}^{p}(M)$ can be constructed by wedge products $\{{f}^{{\mu}_{1}}\wedge \mathrm{\dots}\wedge {f}^{{\mu}_{p}}\}$. We will usually work with the coordinate basis $\{d{x}^{\mu}\}$. This means that any $p$-form $\omega $ can be written locally as
$\omega ={\displaystyle \frac{1}{p!}}{\omega}_{{\mu}_{1}\mathrm{\dots}{\mu}_{p}}d{x}^{{\mu}_{1}}\wedge \mathrm{\dots}\wedge d{x}^{{\mu}_{p}}$ | (2.80) |
Although locally any $p$-form can be written as (2.80), this may not be true globally. This, and related issues, will become of some interest in Section 2.4.3.
We learned in Section 2.3.1 how to construct a one-form $df$ from a function $f$. In a coordinate basis, this one-form has components (2.72),
$df={\displaystyle \frac{\partial f}{\partial {x}^{\mu}}}d{x}^{\mu}$ |
We can extend this definition to higher forms. The exterior derivative is a map
$d:{\mathrm{\Lambda}}^{p}(M)\to {\mathrm{\Lambda}}^{p+1}(M)$ |
In local coordinates (2.80), the exterior derivative acts as
$(d\omega )={\displaystyle \frac{1}{p!}}{\displaystyle \frac{\partial {\omega}_{{\mu}_{1}\mathrm{\dots}{\mu}_{p}}}{\partial {x}^{\nu}}}d{x}^{\nu}\wedge d{x}^{{\mu}_{1}}\wedge \mathrm{\dots}\wedge d{x}^{{\mu}_{p}}$ | (2.81) |
Equivalently we have
${(d\omega )}_{{\mu}_{1}\mathrm{\dots}{\mu}_{p+1}}=(p+1){\partial}_{[{\mu}_{1}}{\omega}_{{\mu}_{2}\mathrm{\dots}{\mu}_{p+1}]}$ | (2.82) |
Importantly, if we subsequently act with the exterior derivative again, we get
$d(d\omega )=0$ |
because the derivatives are anti-symmetrised and hence vanish. This holds true for any $p$-form, a fact which is sometimes expressed as
${d}^{2}=0$ |
It can be shown that the exterior derivative satisfies a number of further properties,
$d(\omega \wedge \eta )=d\omega \wedge \eta +{(-1)}^{p}\omega \wedge d\eta $, where $\omega \in {\mathrm{\Lambda}}^{p}(M)$.
$d({\phi}^{\ast}\omega )={\phi}^{\ast}(d\omega )$ where ${\phi}^{\ast}$ is the pull-back associated to the map between manifolds, $\phi :M\to N$
Because the exterior derivative commutes with the pull-back, it also commutes with the Lie derivative. This ensures that we have $d({\mathcal{L}}_{X}\omega )={\mathcal{L}}_{X}(d\omega )$.
A $p$-form $\omega $ is said to be closed if $d\omega =0$ everywhere. It is exact if $\omega =d\eta $ everywhere for some $\eta $. Because ${d}^{2}=0$, an exact form is necessary closed. The question of when the converse is true is interesting: we’ll discuss this more in Section 2.4.3.
Suppose that we have a one-form $\omega ={\omega}_{\mu}d{x}^{\mu}$, the exterior derivative gives a 2-form
${(d\omega )}_{\mu \nu}={\partial}_{\mu}{\omega}_{\nu}-{\partial}_{\nu}{\omega}_{\mu}\mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}d\omega ={\displaystyle \frac{1}{2}}({\partial}_{\mu}{\omega}_{\nu}-{\partial}_{\nu}{\omega}_{\mu})d{x}^{\mu}\wedge d{x}^{\nu}$ |
As a specific example of this example, suppose that we take the one-form to live on ${\mathbf{R}}^{3}$, with
$\omega ={\omega}_{1}d{x}^{1}+{\omega}_{2}d{x}^{2}+{\omega}_{3}d{x}^{3}$ |
Since this is a field, each of the components ${\omega}_{\mu}$ is a function of ${x}^{1}$, ${x}^{2}$ and ${x}^{3}$. The exterior derivative is given by
$d\omega $ | $=$ | ${\partial}_{2}{\omega}_{1}d{x}^{2}\wedge d{x}^{1}+{\partial}_{3}{\omega}_{1}d{x}^{3}\wedge d{x}^{1}+{\partial}_{1}{\omega}_{2}d{x}^{1}\wedge d{x}^{2}$ | ||
$\mathrm{}\mathit{\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}}+{\partial}_{3}{\omega}_{2}d{x}^{3}\wedge d{x}^{2}+{\partial}_{1}{\omega}_{3}d{x}^{1}\wedge d{x}^{3}+{\partial}_{2}{\omega}_{3}d{x}^{2}\wedge d{x}^{3}$ | ||||
$=$ | $({\partial}_{1}{\omega}_{2}-{\partial}_{2}{\omega}_{1})d{x}^{1}\wedge d{x}^{2}+({\partial}_{2}{\omega}_{3}-{\partial}_{3}{\omega}_{2})d{x}^{2}\wedge d{x}^{3}+({\partial}_{3}{\omega}_{1}-{\partial}_{1}{\omega}_{3})d{x}^{3}\wedge d{x}^{1}$ |
Notice that there’s no term like ${\partial}_{1}{\omega}_{1}$ because this would come with a $d{x}^{1}\wedge d{x}^{1}=0$.
In the olden days (before this course), we used to write vector fields in ${\mathbf{R}}^{3}$ as $\bm{\omega}=({\omega}^{1},{\omega}^{2},{\omega}^{3})$ and compute the curl $\nabla \times \bm{\omega}$. But the components of the curl are precisely the components that appear in $d\omega $. In fact, our “vector” $\bm{\omega}$ was really a one-form and the curl turned it into a two-form. It’s a happy fact that in ${\mathbf{R}}^{3}$, vectors, one-forms and two-forms all have three components, which allowed us to conflate them in our earlier courses. (In fact, there is a natural map between them that we will meet in Section 3.)
Suppose instead that we start with a 2-form $B$ in ${\mathbf{R}}^{3}$, which we write as
$B={B}_{1}d{x}^{2}\wedge d{x}^{3}+{B}_{2}d{x}^{3}\wedge d{x}^{1}+{B}_{3}d{x}^{1}\wedge d{x}^{2}$ |
Taking the exterior derivative now gives
$dB$ | $=$ | ${\partial}_{1}{B}_{1}d{x}^{1}\wedge d{x}^{2}\wedge d{x}^{3}+{\partial}_{2}{B}_{2}d{x}^{2}\wedge d{x}^{3}\wedge d{x}^{1}+{\partial}_{3}{B}_{3}d{x}^{3}\wedge d{x}^{1}\wedge d{x}^{2}$ | (2.84) | ||
$=$ | $\left({\partial}_{1}{B}_{1}+{\partial}_{2}{B}_{2}+{\partial}_{3}{B}_{3}\right)d{x}^{1}\wedge d{x}^{2}\wedge d{x}^{3}$ |
This time there is just a single component, but again it’s something familiar. Had we written the original three components of the two-form in old school vector notation $\mathbf{B}=({B}_{1},{B}_{2},{B}_{3})$, then the single component of $dB$ is what we previously called $\nabla \cdot \mathbf{B}$.
There is yet another operation that we can construct on $p$-forms. Given a vector field $X\in \U0001d51b(M)$, we can construct the interior product, a map ${\iota}_{X}:{\mathrm{\Lambda}}^{p}(M)\to {\mathrm{\Lambda}}^{p-1}(M)$. If $\omega \in {\mathrm{\Lambda}}^{p}(M)$, we define a ${\iota}_{X}\omega \in {\mathrm{\Lambda}}^{p-1}(M)$ by
${\iota}_{X}\omega ({Y}_{1},\mathrm{\dots},{Y}_{p-1})=\omega (X,{Y}_{1},\mathrm{\dots},{Y}_{p-1})$ | (2.85) |
In other words, we just put $X$ in the first argument of $\omega $. Acting on functions $f$, we simply define ${\iota}_{X}f=0$.
The anti-symmetry of forms means that ${\iota}_{X}{\iota}_{Y}=-{\iota}_{Y}{\iota}_{X}$. Moreover, you can check that
${\iota}_{X}(\omega \wedge \eta )={\iota}_{X}\omega \wedge \eta +{(-1)}^{p}\omega \wedge {\iota}_{X}\eta $ |
where $\omega \in {\mathrm{\Lambda}}^{p}(M)$.
Consider a 1-form $\omega $. There are two different ways to act with ${\iota}_{X}$ and $d$ to give us back a one-form. These are
${\iota}_{X}d\omega ={\iota}_{X}{\displaystyle \frac{1}{2}}({\partial}_{\mu}{\omega}_{\nu}-{\partial}_{\nu}{\omega}_{\mu})d{x}^{\mu}\wedge d{x}^{\nu}={X}^{\mu}{\partial}_{\mu}{\omega}_{\nu}d{x}^{\nu}-{X}^{\nu}{\partial}_{\mu}{\omega}_{\nu}d{x}^{\mu}$ |
and
$d{\iota}_{X}\omega =d({\omega}_{\mu}{X}^{\mu})={X}^{\mu}{\partial}_{\nu}{\omega}_{\mu}d{x}^{\nu}+{\omega}_{\mu}{\partial}_{\nu}{X}^{\mu}d{x}^{\nu}$ |
Adding the two together gives
$(d{\iota}_{X}+{\iota}_{X}d)\omega =({X}^{\mu}{\partial}_{\mu}{\omega}_{\nu}+{\omega}_{\mu}{\partial}_{\nu}{X}^{\mu})d{x}^{\nu}$ |
But this is exactly the same expression we saw previously when computing the Lie derivative (2.77) of a one-form. We learn that
${\mathcal{L}}_{X}\omega =(d{\iota}_{X}+{\iota}_{X}d)\omega $ | (2.86) |
This expression is sometimes referred to as Cartan’s magic formula. A similar calculation shows that (2.86) holds for any $p$-form $\omega $.
There are a number of examples of differential forms that you’ve met already, but likely never called them by name.
The electromagnetic gauge field ${A}_{\mu}=(\varphi ,\mathbf{A})$ should really be thought of as the components of a one-form on spacetime ${\mathbf{R}}^{4}$. (Here I’ve set $c=1$.) We write
$A={A}_{\mu}(x)d{x}^{\mu}$ |
Taking the exterior derivative yields a 2-form $F=dA$, given by
$F={\displaystyle \frac{1}{2}}{F}_{\mu \nu}d{x}^{\mu}\wedge d{x}^{\nu}={\displaystyle \frac{1}{2}}({\partial}_{\mu}{A}_{\nu}-{\partial}_{\nu}{A}_{\mu})d{x}^{\mu}\wedge d{x}^{\nu}$ |
But this is precisely the field strength ${F}_{\mu \nu}={\partial}_{\mu}{A}_{\nu}-{\partial}_{\nu}{A}_{\mu}$ that we met in our lectures on Electromagnetism. The components are the electric and magnetic fields, arranged as
${F}_{\mu \nu}=\left(\begin{array}{cccc}\hfill 0\hfill & \hfill -{E}_{1}\hfill & \hfill -{E}_{2}\hfill & \hfill -{E}_{3}\hfill \\ \hfill {E}_{1}\hfill & \hfill 0\hfill & \hfill {B}_{3}\hfill & \hfill -{B}_{2}\hfill \\ \hfill {E}_{2}\hfill & \hfill -{B}_{3}\hfill & \hfill 0\hfill & \hfill {B}_{1}\hfill \\ \hfill {E}_{3}\hfill & \hfill {B}_{2}\hfill & \hfill -{B}_{1}\hfill & \hfill 0\hfill \end{array}\right)$ | (2.87) |
By construction, we also have $dF={d}^{2}A=0$. In this context, this is sometimes called the Bianchi identity; it yields two of the four Maxwell equations. In old school vector calculus notation, these are $\nabla \cdot \mathbf{B}=0$ and $\nabla \times \mathbf{E}+\partial \mathbf{B}/\partial t=0$. We need a little more structure to get the other two as we will see later in this chapter.
The gauge field $A$ is not unique. Given any function $\alpha $, we can always shift it by a gauge transformation
$A\to A+d\alpha \mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}{A}_{\mu}\to {A}_{\mu}+{\partial}_{\mu}\alpha $ |
This leaves the field strength invariant because $F\to F+d(d\alpha )=F$.
In classical mechanics, the phase space is a manifold $M$ parameterised by coordinates $({q}^{i},{p}_{j})$ where ${q}^{i}$ are the positions of particles and ${p}_{j}$ the momenta. Recall from our lectures on Classical Dynamics that the Hamiltonian $H(q,p)$ is a function on $M$, and Hamilton’s equations are
${\dot{q}}^{i}={\displaystyle \frac{\partial H}{\partial {p}_{i}}}\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{\dot{p}}_{i}=-{\displaystyle \frac{\partial H}{\partial {q}^{i}}}$ | (2.88) |
Phase space also comes with a structure called a Poisson bracket, defined on a pair of functions $f$ and $g$ as
$\mathrm{\{}f,g\}={\displaystyle \frac{\partial f}{\partial {q}^{j}}}{\displaystyle \frac{\partial g}{\partial {p}_{j}}}-{\displaystyle \frac{\partial f}{\partial {p}_{j}}}{\displaystyle \frac{\partial g}{\partial {q}^{j}}}$ |
Then the time evolution of any function $f$ can be written as
$\dot{f}=\{f,H\}$ |
which reproduces Hamilton’s equations if we take $f={q}^{i}$ or $f={p}_{i}$.
Underlying this story is the mathematical structure of forms. The key idea is that we have a manifold $M$ and a function $H$ on $M$. We want a machinery that turns the function $H$ into a vector field ${X}_{H}$. Particles then follow trajectories in phase space that are integral curves generated by ${X}_{H}$.
To achieve this, we introduce a symplectic two-form $\omega $ on an even-dimensional manifold $M$. This two form must be closed, $d\omega =0$, and non-degenerate, which means that the top form $\omega \wedge \mathrm{\dots}\wedge \omega \ne 0$. We’ll see why we need these requirements as we go along. A manifold $M$ equipped with a symplectic two-form is called a symplectic manifold.
Any 2-form provides a map $\omega :{T}_{p}(M)\to {T}_{p}^{\ast}(M)$. Given a vector field $X\in \U0001d51b(M)$, we can simply take the interior product with $\omega $ to get a one-form ${\iota}_{X}\omega $. However, we want to go the other way: given a function $H$, we can always construct a one-form $dH$, and we’d like to exchange this for a vector field. We can do this if the map $\omega :{T}_{p}(M)\to {T}_{p}^{\ast}(M)$ is actually an isomorphism, so the inverse exists. This turns out to be true provided that $\omega $ is non-degenerate. In this case, we can define the vector field ${X}_{H}$ by solving the equation
${\iota}_{{X}_{H}}\omega =-dH$ | (2.89) |
If we introduce coordinates ${x}^{\mu}$ on the manifold, then the component form of this equation is
${X}_{H}^{\mu}{\omega}_{\mu \nu}=-{\partial}_{\nu}H$ |
We denote the inverse as ${\omega}^{\mu \nu}=-{\omega}^{\nu \mu}$ such that ${\omega}^{\mu \nu}{\omega}_{\nu \rho}={\delta}_{\rho}^{\mu}$. The components of the vector field are then
${X}_{H}^{\mu}=-{\omega}^{\nu \mu}{\partial}_{\nu}H={\omega}^{\mu \nu}{\partial}_{\nu}H$ |
The integral curves generated by ${X}_{H}$ obey the differential equation (2.64)
$\frac{d{x}^{\mu}}{dt}}={X}_{H}^{\mu}={\omega}^{\mu \nu}{\partial}_{\nu}H$ |
These are the general form of Hamilton’s equations. They reduce to our earlier form (2.88) if we write ${x}^{\mu}=({q}^{i},{p}_{j})$ and choose the symplectic form to have block diagonal form ${\omega}^{\mu \nu}=\left(\begin{array}{cc}\hfill 0\hfill & \hfill 1\hfill \\ \hfill -1\hfill & \hfill 0\hfill \end{array}\right)$.
To define the Poisson structure, we first note that we can repeat the map (2.89) to turn any function $f$ into a vector field ${X}_{f}$ obeying ${\iota}_{{X}_{f}}\omega =-df$. But we can then feed these vector fields back into the original 2-form $\omega $. This gives us a Poisson bracket,
$\mathrm{\{}f,g\}=\omega ({X}_{g},{X}_{f})=-\omega ({X}_{f},{X}_{g})$ |
Or, in components,
$\mathrm{\{}f,g\}=\omega {}^{\mu \nu}\partial {}_{\mu}f\partial {}_{\nu}g$ |
There are many other ways to write this Poisson bracket structure in invariant form. For example, backtracking through various definitions we find
$\mathrm{\{}f,g\}=-\iota {}_{{X}_{f}}\omega ({X}_{g})=df({X}_{g})=X{}_{g}(f)$ |
The equation of motion in Poisson bracket structure is then
$\dot{f}=\{f,H\}={X}_{H}(f)={\mathcal{L}}_{{X}_{H}}f$ |
which tells us that the Lie derivative along ${X}_{H}$ generates time evolution.
We haven’t yet explained why the symplectic two-form must be closed, $d\omega =0$. You can check that this is needed so that the Poisson bracket obeys the Jacobi identity. Alternatively, it ensures that the symplectic form itself is invariant under Hamiltonian flow, in the sense that ${\mathcal{L}}_{{X}_{H}}\omega =0$. To see this, we use (2.86)
${\mathcal{L}}_{{X}_{H}}\omega $ | $=$ | $(d{\iota}_{{X}_{H}}+{\iota}_{{X}_{H}}d)\omega ={\iota}_{{X}_{H}}d\omega $ |
The second equality follows from the fact that $d{\iota}_{{X}_{H}}\omega =-d(dH)=0$. If we insist that $d\omega =0$ then we find ${\mathcal{L}}_{{X}_{H}}\omega =0$ as promised.
The state space of a thermodynamic system is a manifold $M$. For the ideal gas, this is a two-dimensional manifold with coordinates provided by, for example, the pressure $p$ and volume $V$. More complicated systems can have a higher dimensional state space, but the dimension is always even since, like in classical mechanics, thermodynamical variables come in conjugate pairs.
When we first learn the laws of thermodynamics, we have to make up some strange new notation, $-d$, which then never rears its head again. For example, the first law of thermodynamics is written as
$dE=-dQ+-dW$ |
Here $dE$ is the infinitesimal change of energy in the system. The first law of thermodynamics, as written above, states that this decomposes into the heat flowing into the system $-dQ$ and the work done on the system $-dW$.
Why the stupid notation? Well, the energy $E(p,q)$ is a function over the state space $M$ and this means that we can write the change of the energy $dE=\frac{\partial E}{\partial p}dp+\frac{\partial E}{\partial V}dV$. But there is no such function $Q(p,q)$ or $W(p,q)$ and, correspondingly, $-dQ$ and $-dW$ are not exact differentials. Indeed, we have $-dW=-pdV$ and later, after we introduce the second law, we learn that $-dQ=TdS$, with $T$ the temperature and $S$ the entropy, both of which are functions over $M$.
This is much more natural in the language of forms. All of the terms in the first law are one-forms. But the transfer of heat, $-dQ$ and the work $-dW$ are not exact one-forms and so can’t be written as $d(\mathrm{something})$. In contrast, $dE$ is an exact one-form. That’s what the $-d$ notation is really telling us: it’s the way of denoting non-exact one-forms before we had a notion of differential geometry.
The real purpose of the first law of thermodynamics is to define the energy functional $E$. The ${18}^{\mathrm{th}}$ century version of the statement is something like: “the amount of work required to change an isolated system is independent of how the work is performed”. A more modern rendering would be: “the sum of the work and heat is an exact one-form”.
The exterior derivative is a map which squares to zero, ${d}^{2}=0$. It turns out that one can have a lot of fun with such maps. We will now explore a little bit of this fun.
First a repeat of definitions we met already: a $p$-form is closed if $d\omega =0$ everywhere. A $p$-form is exact if $\omega =d\eta $ everywhere for some $\eta $. Because ${d}^{2}=0$, exact implies closed. However, the converse is not necessarily true. It turns out that the way in which closed forms fail to be exact captures interesting facts about the topology of the underlying manifold.
We’ve met this kind of question before. In electromagnetism, we have a magnetic field $\mathbf{B}$ which obeys $\nabla \cdot \mathbf{B}=0$. We then argue that this means we can write the magnetic field as $\mathbf{B}=\nabla \times \mathbf{A}$. This is more properly expressed the language of forms. We we saw in the previous section, the magnetic field is really a 2-form
$B={B}_{1}d{x}^{2}\wedge d{x}^{3}+{B}_{2}d{x}^{3}\wedge d{x}^{1}+{B}_{3}d{x}^{1}\wedge d{x}^{2}$ |
We computed the exterior derivative in (2.84); it is
$dB=\left({\partial}_{1}{B}_{1}+{\partial}_{2}{B}_{2}+{\partial}_{3}{B}_{3}\right)d{x}^{1}\wedge d{x}^{2}\wedge d{x}^{3}$ |
We see that the Maxwell equation $\nabla \cdot \mathbf{B}=0$ is really the statement that $B$ is a closed two-form, obeying $dB=0$. We also saw in (2.4) if we write $B=dA$ for some one-form $A$, then the components are given by $\mathbf{B}=\nabla \times \mathbf{A}$. Clearly writing $B=dA$ ensures that $dB=0$. But when is the converse true? We have the following statement (which we leave unproven)
Theorem (The Poincaré Lemma): On $M={\mathbf{R}}^{n}$, closed implies exact.
Since we’ve spent a lot of time mapping manifolds to ${\mathbf{R}}^{n}$, this also has consequence for a general manifold $M$. It means that if $\omega $ is a closed $p$-form, then in any neighbourhood $\mathcal{O}\subset M$ it is always possible to find a $\eta \in {\mathrm{\Lambda}}^{p-1}(M)$ such that $\omega =d\eta $ on $\mathcal{O}$. The catch is that it may not be possible to find such an $\eta $ everywhere on the manifold.
Consider the one-dimensional manifold $M=\mathbf{R}$. We can take a one-form $\omega =f(x)dx$. This is always closed because it is a top form. It is also exact. We introduce the function
$g(x)={\displaystyle {\int}_{0}^{x}}\mathit{d}{x}^{\prime}f({x}^{\prime})$ |
Then $\omega =dg$.
Now consider the topologically more interesting one-dimensional manifold ${\mathbf{S}}^{1}$, which we can view as the phase ${e}^{i\theta}\in \mathbf{C}$. We can introduce the form $\omega =d\theta $ on ${\mathbf{S}}^{1}$. The way its written makes it look like its an exact form, but this is an illusion because, as we stressed in Section 2.1, $\theta $ is not a good coordinate everywhere on ${\mathbf{S}}^{1}$ because it’s not single valued. Indeed, it’s simple to see that there is no single-valued function $g(\theta )$ on ${\mathbf{S}}^{1}$ such that $\omega =dg$. So on ${\mathbf{S}}^{1}$, we can construct a form which, locally, can be written as $d\theta $ but globally cannot be written as $d(\mathrm{something})$. So we have a form that is closed but not exact.
On $M={\mathbf{R}}^{2}$, the Poincaré lemma ensures that all closed forms are exact. However, things change if we remove a single point and consider ${\mathbf{R}}^{2}-\{0,0\}$. Consider the one-form defined by
$\omega =-{\displaystyle \frac{y}{{x}^{2}+{y}^{2}}}dx+{\displaystyle \frac{x}{{x}^{2}+{y}^{2}}}dy$ |
This is not a smooth one-form on ${\mathbf{R}}^{2}$ because of the divergence at the origin. But removing that point means that $\omega $ becomes acceptable. We can check that $\omega $ is closed,
$d\omega =-{\displaystyle \frac{\partial}{\partial y}}\left({\displaystyle \frac{y}{{x}^{2}+{y}^{2}}}\right)dy\wedge dx+{\displaystyle \frac{\partial}{\partial x}}\left({\displaystyle \frac{x}{{x}^{2}+{y}^{2}}}\right)dx\wedge dy=0$ |
where the $=0$ follows from a little bit of algebra. $\omega $ is exact if we can find a function $f$, defined everywhere on ${\mathbf{R}}^{2}-\{0,0\}$ such that $\omega =df$, which means
$\omega ={\displaystyle \frac{\partial f}{\partial x}}dx+{\displaystyle \frac{\partial f}{\partial y}}dy\mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}{\displaystyle \frac{\partial f}{\partial x}}=-{\displaystyle \frac{y}{{x}^{2}+{y}^{2}}}\mathit{\hspace{1em}}\mathrm{and}\mathit{\hspace{1em}}{\displaystyle \frac{\partial f}{\partial y}}={\displaystyle \frac{x}{{x}^{2}+{y}^{2}}}$ |
We can certainly integrate these equations; the result is
$f(x,y)={\mathrm{tan}}^{-1}\left({\displaystyle \frac{y}{x}}\right)+\mathrm{constant}$ |
But this is not a smooth function everywhere on ${\mathbf{R}}^{2}-\{0,0\}$. This means that we can’t, in fact, write $\omega =df$ for a well defined function on ${\mathbf{R}}^{2}-\{0,0\}$. We learn that removing a point makes a big difference: now closed no longer implies exact.
There is a similar story for ${\mathbf{R}}^{3}$. Indeed, this is how magnetic monopoles sneak back into physics, despite being forbidden by the Maxwell equation $\nabla \cdot \mathbf{B}=0$. You can learn more about this in the lectures on Gauge Theory.
We denote the set of all closed $p$-forms on a manifold $M$ as ${Z}^{p}(M)$. Equivalently, ${Z}^{p}(M)$ is the kernel of the map $d:{\mathrm{\Lambda}}^{p}(M)\to {\mathrm{\Lambda}}^{p+1}(M)$.
We denote the set of all exact $p$-forms on a manifold $M$ as ${B}^{p}(M)$. Equivalently, ${B}^{p}(M)$ is the range of $d:{\mathrm{\Lambda}}^{p-1}(M)\to {\mathrm{\Lambda}}^{p}(M)$.
The ${p}^{\mathrm{th}}$ de Rham cohomology group is defined to be
${H}^{p}(M)={Z}^{p}(M)/{B}^{p}(M)$ |
The quotient here is an equivalence class. Two closed forms $\omega ,{\omega}^{\prime}\in {Z}^{p}(M)$ are said to be equivalent if $\omega ={\omega}^{\prime}+\eta $ for some $\eta \in {B}^{p}(M)$. We say that $\omega $ and ${\omega}^{\prime}$ sit in the same equivalence class $[\omega ]$. The cohomology group ${H}^{p}(M)$ is the set of equivalence classes; in other words, it consists of closed forms mod exact forms.
The Betti numbers ${B}_{p}$ of a manifold $M$ are defined as
${B}_{p}=\mathrm{dim}{H}^{p}(M)$ |
It turns out that these are always finite. The Betti number ${B}_{0}=1$ for any connected manifold. This can be traced to the existence of constant functions which are clearly closed but, because there are no $p=-1$ forms, are not exact. The higher Betti numbers are non-zero only if the manifold has some interesting topology. Finally, the Euler character is defined as the alternating sum of Betti numbers,
$\chi (M)={\displaystyle \sum _{p}}{(-1)}^{p}{B}_{p}$ | (2.90) |
Here are some simple examples. We’ve already seen that the circle ${\mathbf{S}}^{1}$ has a closed, non-exact one-form. This means that ${B}_{1}=1$ and $\chi =0$. The sphere ${\mathbf{S}}^{n}$ has only ${B}_{n}=1$ and $\chi =1+{(-1)}^{n}$. The torus ${\mathbf{T}}^{n}$ has ${B}_{p}=\left(\genfrac{}{}{0pt}{}{n}{p}\right)$ and $\chi =0$.
We have learned how to differentiate on manifolds by using a vector field $X$. Now it is time to learn how to integrate. It turns out that the things that we integrate on manifolds are forms.
To start, we need to orient ourselves. A volume form, or orientation on a manifold of dimension $\mathrm{dim}(M)=n$ is a nowhere-vanishing top form $v$. Any top form has just a single component and can be locally written as
$v=v(x)d{x}^{1}\wedge \mathrm{\dots}\wedge d{x}^{n}$ |
where we require $v(x)\ne 0$. If such a top form exists everywhere on the manifold, then $M$ is said to be orientable.
The orientation is called right-handed if $v(x)>0$ everywhere, and left-handed if $$ everywhere. Given one volume form $v$, we can always construct another by multiplying by a function, giving $\stackrel{~}{v}=fv$ where $f(x)>0$ everywhere or $$ everywhere.
It’s not enough to just write down a volume form with $v(x)\ne 0$ locally. We must also ensure that we can patch these volume forms together over the manifold, without the handedness changing. Suppose that we have two sets of coordinates, ${x}^{\mu}$ and ${\stackrel{~}{x}}^{\mu}$ that overlap on some region. In the new coordinates, the volume form is given by
$v=v(x){\displaystyle \frac{\partial {x}^{1}}{\partial {\stackrel{~}{x}}^{{\mu}_{1}}}}d{\stackrel{~}{x}}^{{\mu}_{1}}\wedge \mathrm{\dots}\wedge {\displaystyle \frac{\partial {x}^{n}}{\partial {\stackrel{~}{x}}^{{\mu}_{n}}}}d{\stackrel{~}{x}}^{{\mu}_{n}}=v(x)det\left({\displaystyle \frac{\partial {x}^{\mu}}{\partial {\stackrel{~}{x}}^{\nu}}}\right)d{\stackrel{~}{x}}^{1}\wedge \mathrm{\dots}\wedge d{\stackrel{~}{x}}^{n}$ |
which has the same orientation provided
$det\left({\displaystyle \frac{\partial {x}^{\mu}}{\partial {\stackrel{~}{x}}^{\nu}}}\right)>0$ | (2.91) |
Non-orientable manifolds cannot be covered by overlapping charts such that (2.91) holds. Examples include the Möbius strip and real projective space ${\mathrm{\mathbf{R}\mathbf{P}}}^{n}$ for $n$ even. (In contrast ${\mathrm{\mathbf{R}\mathbf{P}}}^{n}$ is orientable for $n$ odd, and ${\mathrm{\mathbf{C}\mathbf{P}}}^{n}$ is orientable for all $n$.) In these lectures, we deal only with orientable manifolds.
Given a volume form $v$ on $M$, we can integrate any function $f:M\to \mathbf{R}$ over the manifold. In a chart $\varphi :\mathcal{O}\to U$, with coordinates ${x}^{\mu}$, we have
${\int}_{\mathcal{O}}}fv={\displaystyle {\int}_{U}}\mathit{d}{x}_{1}\mathrm{\dots}\mathit{d}{x}_{n}f(x)v(x)$ |
On the right-hand-side, we’re just doing normal integration over some part of ${\mathbf{R}}^{n}$. The volume form is playing the role of a measure, telling us how much to weight various parts of the integral. To integrate over the entire manifold, we divide the manifold up into different regions, each covered by a single chart. We then perform the integral over each region and sum the results.
We don’t have to integrate over the full manifold $M$. We can integrate over some lower dimensional submanifold.
A manifold $\mathrm{\Sigma}$ with dimension $$ is a submanifold of $M$ if we can find a map $\varphi :\mathrm{\Sigma}\to M$ which is one-to-one (which ensures that $\mathrm{\Sigma}$ doesn’t intersect itself in $M$) and ${\varphi}_{\ast}:{T}_{p}(\mathrm{\Sigma})\to {T}_{\varphi (p)}(M)$ is one-to-one.
We can then integrate a $k$-form $\omega $ on $M$ over a $k$-dimensional submanifold $\mathrm{\Sigma}$. We do this by pulling back the $k$-form to $\mathrm{\Sigma}$ and writing
${\int}_{\varphi (\mathrm{\Sigma})}}\omega ={\displaystyle {\int}_{\mathrm{\Sigma}}}{\varphi}^{\ast}\omega $ |
For example, suppose that we have a one-form $A$ living over $M$. If $C$ is a one-dimensional manifold, the we can introduce a map $\sigma :C\to M$ which defines a non-intersecting, one-dimensional curve $\sigma (C)$ which is a submanifold of $M$. We can then pull-back $A$ onto this curve and integrate to get
${\int}_{\sigma (C)}}A={\displaystyle {\int}_{C}}{\sigma}^{\ast}A$ |
This probably looks more familiar in coordinates. If the curve traces out a path ${x}^{\mu}(\tau )$ in $M$, we have
$\int}_{C}}{\sigma}^{\ast}A={\displaystyle \int \mathit{d}\tau {A}_{\mu}(x)\frac{d{x}^{\mu}}{d\tau$ |
But this is precisely the way the worldline of a particle couples to the electromagnetic field, as we previously saw in (1.21).
Until now, we have considered only smooth manifolds. There is a slight generalisation that will be useful. We define a manifold with boundary in the same way as a manifold, except the charts map $\varphi :\mathcal{O}\to U$ where $U$ is an open subset of ${\mathbf{R}}^{n+}=\{\{{x}^{1},\mathrm{\dots},{x}^{n}\}\text{such that}{x}^{n}\ge 0\}$. The boundary has co-dimension 1 and is denoted $\partial M$: it is the submanifold with coordinates ${x}^{n}=0$.
Consider a manifold $M$ with boundary $\partial M$. If the dimension of the manifold is $\mathrm{dim}(M)=n$ then for any $(n-1)$-form $\omega $, we have the following simple result
${\int}_{M}}\mathit{d}\omega ={\displaystyle {\int}_{\partial M}}\omega $ | (2.92) |
This is Stokes’ theorem.
We do not prove Stoke’s theorem here. The proof is fairly tedious, and does not differ greatly from the proofs of other things that you’ve called Stokes’ theorem (or Gauss’ divergence theorem) in the past. (See, for example, the lectures on Vector Calculus.) However, the wonderful thing about (2.92) is the way in which it unifies many results in a single elegant formula. To see this, we simply need to look at a few examples.
First, consider $n=1$ with $M$ the interval $I$. We introduce coordinates $x\in [a,b]$ on the interval. The 0-form $\omega =\omega (x)$ is simply a function and $d\omega =(d\omega /dx)dx$. In this case, the two sides of Stokes’ theorem can be evaluated to give
${\int}_{M}}\mathit{d}\omega ={\displaystyle {\int}_{a}^{b}}{\displaystyle \frac{d\omega}{dx}}\mathit{d}x\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{\displaystyle {\int}_{\partial M}}\omega =\omega (b)-\omega (a)$ |
Equating the two, we see that Stokes’ theorem is simply a restatement of the fundamental theorem of calculus.
Next, we take $M\subset {\mathbf{R}}^{2}$ to be a manifold with boundary. We introduce a one-form with coordinates
$\omega ={\omega}_{1}d{x}^{1}+{\omega}_{2}d{x}^{2}\mathit{\hspace{1em}\hspace{1em}\u2006}\Rightarrow \mathit{\hspace{1em}\hspace{1em}\u2006}d\omega =\left({\displaystyle \frac{\partial {\omega}_{2}}{\partial {x}^{1}}}-{\displaystyle \frac{\partial {\omega}_{1}}{\partial {x}^{2}}}\right)d{x}^{1}\wedge d{x}^{2}$ |
In this case, the ingredients in Stokes’ theorem are
${\int}_{M}}\mathit{d}w={\displaystyle {\int}_{M}}\left({\displaystyle \frac{\partial {\omega}_{2}}{\partial {x}^{1}}}-{\displaystyle \frac{\partial {\omega}_{1}}{\partial {x}^{2}}}\right)\mathit{d}{x}^{1}\mathit{d}{x}^{2}\mathit{\hspace{1em}\hspace{1em}\u2006}\mathrm{and}\mathit{\hspace{1em}\hspace{1em}\u2006}{\displaystyle {\int}_{\partial M}}\omega ={\displaystyle {\int}_{\partial M}}{\omega}_{1}\mathit{d}{x}^{1}+{\omega}_{2}d{x}^{2$ |
Equating the two gives the result usually referred to as Green’s theorem in the plane.
Finally, consider $M\subset {\mathbf{R}}^{3}$ to be a manifold with boundary, with a 2-form
$\omega ={\omega}_{1}d{x}^{2}\wedge d{x}^{3}+{\omega}_{2}d{x}^{3}\wedge d{x}^{1}+{\omega}_{3}d{x}^{1}\wedge d{x}^{2}$ |
The right-hand-side of Stokes theorem is
${\int}_{\partial M}}{\omega}_{1}\mathit{d}{x}^{2}\mathit{d}{x}^{3}+{\omega}_{2}d{x}^{3}d{x}^{1}+{\omega}_{3}d{x}^{1}d{x}^{2$ |
Meanwhile, we computed the exterior derivative of a 2-form in (2.84). The left-hand-side of Stokes’ theorem then gives
${\int}_{M}}\mathit{d}\omega ={\displaystyle {\int}_{M}}\left({\partial}_{1}{\omega}_{1}+{\partial}_{2}{\omega}_{2}+{\partial}_{3}{\omega}_{3}\right)\mathit{d}{x}^{1}\mathit{d}{x}^{2}\mathit{d}{x}^{3$ |
This time, equating the two gives us Gauss’ divergence theorem.