2 Introducing Differential Geometry

Gravity is geometry. To fully understand this statement, we will need more sophisticated tools and language to describe curved space and, ultimately, curved spacetime. This is the mathematical subject of differential geometry and will be introduced in this section and the next. Armed with these new tools, we will then return to the subject of gravity in Section 4.

Our discussion of differential geometry is not particularly rigorous. We will not prove many big theorems. Furthermore, a number of the statements that we make can be checked straightforwardly but we will often omit this. We will, however, be careful about building up the mathematical structure of curved spaces in the right logical order. As we proceed, we will come across a number of mathematical objects that can live on curved spaces. Many of these are familiar – like vectors, or differential operators – but we’ll see them appear in somewhat unfamiliar guises. The main purpose of this section is to understand what kind of objects can live on curved spaces, and the relationships between them. This will prove useful for both general relativity and other areas of physics.

Moreover, there is a wonderful rigidity to the language of differential geometry. It sometimes feels that any equation that you’re allowed to write down within this rigid structure is more likely than not to be true! This rigidity is going to be of enormous help when we return to discuss theories of gravity in Section 4.

2.1 Manifolds

The stage on which our story will play out is a mathematical object called a manifold. We will give a precise definition below, but for now you should think of a manifold as a curved, n-dimensional space. If you zoom in to any patch, the manifold looks like 𝐑n. But, viewed more globally, the manifold may have interesting curvature or topology.

To begin with, our manifold will have very little structure. For example, initially there will be no way to measure distances between points. But as we proceed, we will describe the various kinds of mathematical objects that can be associated to a manifold, and each one will allow us to do more and more things. It will be a surprisingly long time before we can measure distances between points! (Not until Section 3.)

You have met many manifolds in your education to date, even if you didn’t call them by name. Some simple examples in mathematics include Euclidean space 𝐑n, the sphere 𝐒n, and the torus 𝐓n=𝐒1××𝐒1. Some simple examples in physics include the configuration space and phase space that we use in classical mechanics and the state space of thermodynamics. As we progress, we will see how familiar ideas in these subjects can be expressed in a more formal language. Ultimately our goal is to explain how spacetime is a manifold and to understand the structures that live on it.

2.1.1 Topological Spaces

Even before we get to a manifold, there is some work to do in order to define the underlying object. What follows is the mathematical equivalent of reading a biography about an interesting person and having to spend the first 20 pages wading through a description of what their grandparents did for a living. This backstory will not be particularly useful for our needs and we include it here only for completeness. We’ll keep it down to one page.

Our backstory is called a topological space. Roughly speaking, this is a space in which each point can be viewed as living in a neighbourhood of other points, in a manner that allows us to define concepts such as continuity and convergence.


Definition: A topological space M is a set of points, endowed with a topology 𝒯. This is a collection of open subsets {𝒪αM} which obey:

  1. i)

    Both the set M and the empty set are open subsets: M𝒯 and 𝒯.

  2. ii)

    The intersection of a finite number of open sets is also an open set. So if 𝒪1𝒯 and 𝒪2𝒯 then 𝒪1𝒪2𝒯.

  3. iii)

    The union of any number (possibly infinite) of open sets is also an open set. So if 𝒪γ𝒯 then γ𝒪γ𝒯.

Given a point pM, we say that 𝒪𝒯 is a neighbourhood of p if p𝒪. This concept leads us to our final requirement: we require that, given any two distinct points, there is a neighbourhood which contains one but not the other. In other words, for any p,qM with pq, there exists 𝒪1,𝒪2𝒯 such that p𝒪1 and q𝒪2 and 𝒪1𝒪2=. Topological spaces which obey this criterion are called Hausdorff. It is like a magic ward to protect us against bad things happening.

An example of a good Hausdorff space is the real line, M=𝐑, with 𝒯 consisting of all open intervals (a,b), with a<b𝐑, and their unions. An example of a non-Hausdorff space is any M with 𝒯={M,}.


Definition: One further definition (it won’t be our last). A homeomorphism between topological spaces (M,𝒯) and (M~,𝒯~) is a map f:MM~ which is

  1. i)

    Injective (or one-to-one): for pq, f(p)f(q).

  2. ii)

    Surjective (or onto): f(M)=M~, which means that for each p~M~ there exists a pM such that f(p)=p~.

    Functions which are both injective and surjective are said to be bijective. This ensures that they have an inverse

  3. iii)

    Bicontinuous. This means that both the function and its inverse are continuous. To define a notion of continuity, we need to use the topology. We say that f is continuous if, for all 𝒪~𝒯~, f-1(𝒪~)𝒯.

There’s an animation of a donut morphing into a coffee mug and back that is often used to illustrate the idea of topology. If you want to be fancy, you can say that a donut is homeomorphic to a coffee mug.

2.1.2 Differentiable Manifolds

We now come to our main character: an n-dimensional manifold is a space which, locally, looks like 𝐑n. Globally, the manifold may be more interesting than 𝐑n, but the idea is that we can patch together these local descriptions to get an understanding for the entire space.


Definition: An n-dimensional differentiable manifold is a Hausdorff topological space M such that

  1. i)

    M is locally homeomorphic to 𝐑n. This means that for each pM, there is an open set 𝒪 such that p𝒪 and a homeomorphism ϕ:𝒪U with U an open subset of 𝐑n.

  2. ii)

    Take two open subsets 𝒪α and 𝒪β that overlap, so that 𝒪α𝒪β. We require that the corresponding maps ϕα:𝒪αUα and ϕβ:𝒪βUβ are compatible, meaning that the map ϕβϕα-1:ϕα(𝒪α𝒪β)ϕβ(𝒪α𝒪β) is smooth (also known as infinitely differentiable or C), as is its inverse. This is depicted in Figure 14.

The maps ϕα are called charts and the collection of charts is called an atlas. You should think of each chart as providing a coordinate system to label the region 𝒪α of M. The coordinate associated to p𝒪α is

ϕα(p)=(x1(p),,xn(p))

We write the coordinate in shorthand as simply xμ(p), with μ=1,,n. Note that we use a superscript μ rather than a subscript: this simple choice of notation will prove useful as we go along.

Figure 14: Charts on a manifold.

If a point p is a member of more than one subset 𝒪 then it may have a number of different coordinates associated to it. There’s nothing to be nervous about here: it’s entirely analogous to labelling a point using either Euclidean coordinate or polar coordinates.

The maps ϕβϕα-1 take us between different coordinate systems and are called transition functions. The compatibility condition is there to ensure that there is no inconsistency between these different coordinate systems.

Any manifold M admits many different atlases. In particular, nothing stops us from adding another chart to the atlas, provided that it is compatible with all the others. Two atlases are said to be compatible if every chart in one is compatible with every chart in the other. In this case, we say that the two atlases define the same differentiable structure on the manifold.

Examples

Here are a few simple examples of differentiable manifolds:

  • 𝐑n: this looks locally like 𝐑n because it is 𝐑n. You only need a single chart with the usual Euclidean coordinates. Similarly, any open subset of 𝐑n is a manifold.

    Figure 15: Two charts on a circle. The figures are subtly different! On the left, point q is removed and θ1(0,2π). On the right, point q is removed and θ2(-π,π).
  • 𝐒1: The circle can be defined as a curve in 𝐑2 with coordinates (cosθ,sinθ). Until now in our physics careers, we’ve been perfectly happy taking θ[0,2π) as the coordinate on 𝐒1. But this coordinate does not meet our requirements to be a chart because it is not an open set. This causes problems if we want to differentiate functions at θ=0; to do so we need to take limits from both sides but there is no coordinate with θ a little less than zero.

    To circumvent this, we need to use at least two charts to cover 𝐒1. For example, we could pick out two antipodal points, say q=(1,0) and q=(-1,0). We take the first chart to cover 𝒪1=𝐒1-{q} with the map ϕ1:𝒪1(0,2π) defined by ϕ1(p)=θ as shown in the left-hand of Figure 15. We take the second chart to cover 𝒪2=𝐒1-{q} with the map ϕ2:𝒪2(-π,π) defined by ϕ2(p)=θ as shown in the right-hand figure.

    The two charts overlap on the upper and lower semicircles. The transition function is given by

    θ=ϕ2(ϕ1-1(θ))={θifθ(0,π)θ-2πifθ(π,2π)

    The transition function isn’t defined at θ=0, corresponding to the point q1, nor at θ=π, corresponding to the point q2. Nonetheless, it is smooth on each of the two open intervals as required.

    Figure 16: Two charts on a sphere. In the left-hand figure, we have removed the half-equator defined as y=0 with x>0, shown in red. In right-figure, we have removed the half-equator z=0 with x<0, again shown in red.
  • 𝐒2: It will be useful to think of the sphere as the surface x2+y2+z2=1 embedded in Euclidean 𝐑3. The familiar coordinates on the sphere 𝐒2 are those inherited from spherical polar coordinates of 𝐑3, namely

    x=sinθcosϕ,y=sinθsinϕ,z=cosθ (2.56)

    with θ[0,π] and ϕ[0,2π). But as with the circle 𝐒1 described above, these are not open sets so will not do for our purpose. In fact, there are two distinct issues. If we focus on the equator at θ=π/2, then the coordinate ϕ[0,2π) parameterises a circle and suffers the same problem that we saw above. On top of this, at the north pole θ=0 and south pole θ=π, the coordinate ϕ is not well defined, since the value of θ has already specified the point uniquely. This manifests itself on Earth by the fact that all time zones coincide at the North pole. It’s one of the reasons people don’t have business meetings there.

    Once again, we can resolve these issues by introducing two charts covering different patches on 𝐒2. The first chart applies to the sphere 𝐒2 with a line of longitude removed, defined by y=0 and x>0, as shown in Figure 16. (Think of this as the dateline.) This means that neither the north nor south pole are included in the open set 𝒪1. On this open set, we define a map ϕ1:𝒪1𝐑2 using the coordinates (2.56), now with θ(0,π) and ϕ(0,2π), so that we have a map to an open subset of 𝐑2.

    We then define a second chart on a different open set 𝒪2, defined by 𝐒2, with the line z=0 and x<0 removed. Here we define the map ϕ2:𝒪2𝐑2 using the coordinates

    x=-sinθcosϕ,y=cosθ,z=sinθsinϕ

    with θ(0,π) and ϕ(0,2π). Again this is a map to an open subset of 𝐑2. We have 𝒪1𝒪2=𝐒2 while, on the overlap 𝒪1𝒪2, the transition functions ϕ1ϕ2-1 and ϕ2ϕ1-1 are smooth. (We haven’t written these functions down explicitly, but it’s clear that they are built from cos and sin functions acting on domains where their inverses exist.)

Note that for both 𝐒1 and 𝐒2 examples above, we made use of the fact that they can be viewed as embedded in a higher dimensional 𝐑n+1 to construct the charts. However, this isn’t necessary. The definition of a manifold makes no mention of a higher dimensional embedding and these manifolds should be viewed as having an existence independent of any embedding.

As you can see, there is a level of pedantry involved in describing these charts. (Mathematicians prefer the word “rigour”.) The need to deal with multiple charts arises only when we have manifolds of non-trivial topology; the manifolds 𝐒1 and 𝐒2 that we met above are particularly simple examples. When we come to discuss general relativity, we will care a lot about changing coordinates, and the limitations of certain coordinate systems, but our manifolds will turn out to be simple enough that, for all practical purposes, we can always find a single set of coordinates that tells us what we need to know. However, as we progress in physics, and topology becomes more important, so too does the idea of different charts. Perhaps the first place in physics where overlapping charts become an integral part of the discussion is the construction of a magnetic monopole. (See the lectures on Gauge Theory.)

2.1.3 Maps Between Manifolds

The advantage of locally mapping a manifold to 𝐑n is that we can now import our knowledge of how to do maths on 𝐑n. For example, we know how to differentiate functions on 𝐑n, and what it means for functions to be smooth. This now translates directly into properties of functions defined over the manifold.

We say that a function f:M𝐑 is smooth, if the map fϕ-1:U𝐑 is smooth for all charts ϕ.

Similarly, we say that a map f:MN between two manifolds M and N (which may have different dimensions) is smooth if the map ψfϕ-1:UV is smooth for all charts ϕ:MU𝐑dim(M) and ψ:NV𝐑dim(N)


A diffeomorphism is defined to be a smooth homeomorphism f:MN. In other words it is an invertible, smooth map between manifolds M and N that has a smooth inverse. If such a diffeomorphism exists then the manifolds M and N are said to be diffeomorphic. The existence of an inverse means M and N necessarily have the same dimension.

Manifolds which are homeomorphic can be continuously deformed into each other. But diffeomorphism is stronger: it requires that the map and its inverse are smooth. This gives rise to some curiosities. For example, it turns out that the sphere 𝐒7 can be covered by a number of different, incompatible atlases. The resulting manifolds are homeomorphic but not diffeomorphic. These are referred to as exotic spheres. Similarly, Euclidean space 𝐑n has a unique differentiable structure, except for 𝐑4 where there are an infinite number of inequivalent structures. I know of only one application of exotic spheres to physics (a subtle global gravitational anomaly in superstring theory) and I know of no applications of the exotic differential structure on 𝐑4. Certainly these will not play any role in these lectures.

2.2 Tangent Spaces

Our next task is to understand how to do calculus on manifolds. We start here with differentiation; it will take us a while longer to get to integration, which we will finally meet in Section 2.4.4.

Consider a function f:M𝐑. To differentiate the function at some point p, we introduce a chart ϕ=(x1,,xn) in a neighbourhood of p. We can then construct the map fϕ-1:U𝐑 with U𝐑n. But we know how to differentiate functions on 𝐑n and this gives us a way to differentiate functions on M, namely

fxμ|p:=(fϕ-1)xμ|ϕ(p) (2.57)

Clearly this depends on the choice of chart ϕ and coordinates xμ. We would like to give a coordinate independent definition of differentiation, and then understand what happens when we choose to describe this object using different coordinates.

2.2.1 Tangent Vectors

We will consider smooth functions over a manifold M. We denote the set of all smooth functions as C(M).


Definition: A tangent vector Xp is an object that differentiates functions at a point pM. Specifically, Xp:C(M)𝐑 satisfying

  1. i)

    Linearity: Xp(f+g)=Xp(f)+Xp(g) for all f,gC(M).

  2. ii)

    Xp(f)=0 when f is the constant function.

  3. iii)

    Leibnizarity: Xp(fg)=f(p)Xp(g)+Xp(f)g(p) for all f,gC(M). This, of course, is the product rule.

Note that ii) and iii) combine to tell us that Xp(af)=aXp(f) for a𝐑.

This definition is one of the early surprises in differential geometry. The surprise is really in the name “tangent vector”. We know what vectors are from undergraduate physics, and we know what differential operators are. But we’re not used to equating the two. Before we move on, it might be useful to think about how this definition fits with other notions of vectors that we’ve met before.

The first time we meet a vector in physics is usually in the context of Newtonian mechanics, where we describe the position of a particle as a vector 𝐱 in 𝐑3. This concept of a vector is special to flat space and does not generalise to other manifolds. For example, a line connecting two points on a sphere is not a vector and, in general, there is no way to think of a point pM as a vector. So we should simply forget that points in 𝐑3 can be thought of as vectors.

The next type of vector is the velocity of a particle, 𝐯=𝐱˙. This is more pertinent. It clearly involves differentiation and, moreover, is tangent to the curve traced out by the particle. As we will see below, velocities of particles are indeed examples of tangent vectors in differential geometry. More generally, tangent vectors tell us how things change in a given direction. They do this by differentiating.

It is simple to check that the object

μ|p:=xμ|p

which acts on functions as shown in (2.57) obeys all the requirements of a tangent vector.

Note that the index μ is now a subscript, rather than superscript that we used for the coordinates xμ. (On the right-hand-side, the superscript in /xμ is in the denominator and counts as a subscript.) We will adopt the summation convention, where repeated indices are summed. But, as we will see, the placement of indices up or down will tell us something and all sums will necessarily have one index up and one index down. This is a convention that we met already in Special Relativity where the up/downness of the index changes minus signs. Here it has a more important role that we will see as we go on: the placement of the index tells us what kind of mathematical space the object lives in. For now, you should be aware that any equation with two repeated indices that are both up or both down is necessarily wrong, just as any equation with three or more repeated indices is wrong.


Theorem: The set of all tangent vectors at point p forms an n-dimensional vector space. We call this the tangent space Tp(M). The tangent vectors μ|p provide a basis for Tp(M). This means that we can write any tangent vector as

Xp=Xμμ|p

with Xμ=Xp(xμ) the components of the tangent vector in this basis.


Proof: Much of the proof is just getting straight what objects live in what spaces. Indeed, getting this straight is a large part of the subject of differential geometry. To start, we need a small lemma. We define the function F=fϕ-1:U𝐑, with ϕ=(x1,,xn) a chart on a neighbourhood of p. Then, in some (perhaps smaller) neighbourhood of p we can always write the function F as

F(x)=F(xμ(p))+(xμ-xμ(p))Fμ(x) (2.58)

where we have introduced n new functions Fμ(x) and used the summation convention in the final term. If the function F has a Taylor expansion then we can trivially write it in the form (2.58) by repackaging all the terms that are quadratic and higher into the Fμ(x) functions, keeping a linear term out front. But in fact there’s no need to assume the existence of a Taylor expansion. One way to see this is to note that for any function G(t) we trivially have G(1)=G(0)+01𝑑tG(t). But now apply this formula to the function G(t)=F(tx) for some fixed x. This gives F(x)=F(0)+x01𝑑tF(xt) which is precisely (2.58) for a function of a single variable expanded about the origin. The same method holds more generally.

Given (2.58), we act with μ on both sides, and then evaluate at xμ=xμ(p). This tells us that the functions Fμ must satisfy

Fxμ|x(p)=Fμ(x(p)) (2.59)

We can translate this into a similar expression for f itself. We define n functions on M by fμ=Fμϕ. Then, for any qM in the appropriate neighbourhood of p, (2.58) becomes

fϕ-1(xμ(q))=fϕ-1(xμ(p))+(xμ(q)-xμ(p))[fμϕ-1(xμ(q))]

But ϕ-1(xμ(q))=q. So we find that, in the neighbourhood of p, it is always possible to write a function f as

f(q)=f(p)+(xμ(q)-xμ(p))fμ(q)

for some fμ(q). Note that, evaluated at q=p, we have

fμ(p)=Fμϕ(p)=Fμ(x(p))=Fxμ|x(p)=fxμ|p

where in the last equality we used (2.57) and in the penultimate equality we used (2.59).

Now we can turn to the tangent vector Xp. This acts on the function f to give

Xp(f)=Xp(f(p)+(xμ-xμ(p))fμ)

where we’ve dropped the arbitrary argument q in f(q), xμ(q) and fμ(q); these are the functions on which the tangent vector is acting. Using linearity and Leibnizarity, we have

Xp(f)=Xp(f(p))+Xp((xμ-xμ(p)))fμ(p)+(xμ(p)-xμ(p))Xp(fμ)

The first term vanishes because f(p) is just a constant and all tangent vectors are vanishing when acting on a constant. The final term vanishes as well because the Leibniz rule tells us to evaluate the function (xμ-xμ(p)) at p. Finally, by linearity, the middle term includes a Xp(xμ(p)) term which vanishes because xμ(p) is just a constant. We’re left with

Xp(f)=Xp(xμ)fxμ|p

This means that the tangent vector Xp can be written as

Xp=Xμxμ|p

with Xμ=Xp(xμ) as promised. To finish, we just need to show that μ|p provide a basis for Tp(M). From above, they span the space. To check linear independence, suppose that we have vector α=αμμ|p=0. Then acting on f=xν, this gives α(xν)=αμ(μxν)|p=αν=0. This concludes our proof.

Changing Coordinates

We have an ambivalent relationship with coordinates. We can’t calculate anything without them, but we don’t want to rely on them. The compromise we will come to is to consistently check that nothing physical depends on our choice of coordinates.

The key idea is that a given tangent vector Xp exists independent of the choice of coordinate. However, the chosen basis {μ|p} clearly depends on our choice of coordinates: to define it we had to first introduce a chart ϕ and coordinates xμ. A basis defined in this way is called, quite reasonably, a coordinate basis. At times we will work with other bases, {eμ} which are not defined in this way. Unsurprisingly, these are referred to as non-coordinate bases. A particularly useful example of a non-coordinate basis, known as vielbeins, will be introduced in Section 3.4.2.

Suppose that we picked a different chart ϕ~, with coordinates x~μ in the neighbourhood of p. We then have two different bases, and can express the tangent vector Xp in terms of either,

Xp=Xμxμ|p=X~μx~μ|p

The vector is the same, but the components of the vector change: they are Xμ in the first set of coordinates, and X~μ in the second. It is straightforward to determine the relationship between Xμ and X~μ. To see this, we look at how the tangent vector Xp acts on a function f,

Xp(f)=Xμfxμ|p=Xμx~νxμ|ϕ(p)fx~ν|p

where we’ve used the chain rule. (Actually, we’ve been a little quick here. You can be more careful by introducing the functions F=fϕ-1 and F~=fϕ~-1 and using (2.57) to write f/xμ=F(x~(x))/xμ. The end result is the same. We will be similarly sloppy in the same way as we proceed, often conflating f and F.) You can read this equation in one of two different ways. First, we can view this as a change in the basis vectors: they are related as

xμ|p=x~νxμ|ϕ(p)x~ν|p (2.60)

Alternatively, we can view this as a change in the components of the vector, which transform as

X~ν=Xμx~νxμ|ϕ(p) (2.61)

Components of vectors that transform this way are sometimes said to be contravariant. I’ve always found this to be annoying terminology, in large part because I can never remember it. A more important point is that the form of (2.61) is essentially fixed once you remember that the index on Xμ sits up rather than down.

What Is It Tangent To?

So far, we haven’t really explained where the name “tangent vector” comes from. Consider a smooth curve in M that passes through the point p. This is a map σ:IM, with I an open interval I𝐑. We will parameterise the curve as σ(t) such that σ(0)=pM.

Figure 17: The tangent space at a point p.

With a given chart, this curve becomes ϕσ:𝐑𝐑n, parameterised by xμ(t). Before we learned any differential geometry, we would say that the tangent vector to the curve at t=0 is

Xμ=dxμ(t)dt|t=0

But we can take these to be the components of the tangent vector Xp, which we define as

Xp=dxμ(t)dt|t=0xμ|p

Our tangent vector now acts on functions fC(M). It is telling us how fast any function f changes as we move along the curve.

Any tangent vector Xp can be written in this form. This gives meaning to the term “tangent space” for Tp(M). It is, literally, the space of all possible tangents to curves passing through the point p. For example, a two dimensional manifold, embedded in 𝐑3 is shown in Figure 17. At each point p, we can identify a vector space which is the tangent plane: this is Tp(M).

As an aside, note that the mathematical definition of a tangent space makes no reference to embedding the manifold in some higher dimensional space. The tangent space is an object intrinsic to the manifold itself. (This is in contrast to the picture where it was unfortunately necessary to think about the manifold as embedded in 𝐑3.)

The tangent spaces Tp(M) and Tq(M) at different points pq are different. There’s no sense in which we can add vectors from one to vectors from the other. In fact, at this stage there no way to even compare vectors in Tp(M) to vectors in Tq(M). They are simply different spaces. As we proceed, we will make some effort to figure ways to get around this.

2.2.2 Vector Fields

So far we have only defined tangent vectors at a point p. It is useful to consider an object in which there is a choice of tangent vector for every point pM. In physics, we call objects that vary over space fields.

A vector field X is defined to be a smooth assignment of a tangent vector Xp to each point pM. This means that if you feed a function to a vector field, then it spits back another function, which is the differentiation of the first. In symbols, a vector field is therefore a map X:C(M)C(M). The function X(f) is defined by

(X(f))(p)=Xp(f)

The space of all vector fields on M is denoted 𝔛(M).

Given a coordinate basis, we can expand any vector field as

X=Xμxμ (2.62)

where the Xμ are now smooth functions on M.

Strictly speaking, the expression (2.62) only defines a vector field on the open set 𝒪M covered by the chart, rather than the whole manifold. We may have to patch this together with other charts to cover all of M.

The Commutator

Given two vector fields X,Y𝔛(M), we can’t multiply them together to get a new vector field. Roughly speaking, this is because the product XY is a second order differential operator rather than a first order operator. This reveals itself in a failure of Leibnizarity for the object XY,

XY(fg)=X(fY(g)+Y(f)g)=X(f)Y(g)+fXY(g)+gXY(f)+X(g)Y(f)

This is not the same as fXY(g)+gXY(f) that Leibniz requires.

However, we can build a new vector field by taking the commutator [X,Y], which acts on functions f as

[X,Y](f)=X(Y(f))-Y(X(f))

This is also known as the Lie bracket. Evaluated in a coordinate basis, the commutator is given by

[X,Y](f) = Xμxμ(Yνfxν)-Yμxμ(Xνfxν)
= (XμYνxμ-YμXνxμ)fxν

This holds for all fC(M), so we’re at liberty to write

[X,Y]=(XμYνxμ-YμXνxμ)xν (2.63)

It is not difficult to check that the commutator obeys the Jacobi identity

[X,[Y,Z]]+[Y,[Z,X]]+[Z,[X,Y]]=0

This ensures that the set of all vector fields on a manifold M has the mathematical structure of a Lie algebra.

2.2.3 Integral Curves

There is a slightly different way of thinking about vector fields on a manifold. A flow on M is a one-parameter family of diffeomorphisms σt:MM labelled by t𝐑. These maps have the properties that σt=0 is the identity map, and σsσt=σs+t. These two requirements ensure that σ-t=σt-1. Such a flow gives rise to streamlines on the manifold. We will further require that these streamlines are smooth.

We can then define a vector field by taking the tangent to the streamlines at each point. In a given coordinate system, the components of the vector field are

Xμ(xμ(t))=dxμ(t)dt (2.64)

where I’ve abused notation a little and written xμ(t) rather than the more accurate but cumbersome xμ(σt). This will become a habit, with the coordinates xμ often used to refer to the point pM.

A flow gives rise to a vector field. Alternatively, given a vector field Xμ(x), we can integrate the differential equation (2.64), subject to an initial condition xμ(0)=xinitialμ to generate streamlines which start at xinitialμ. These streamlines are called integral curves, generated by X.

Flows on a sphere.
Figure 18: Flows on a sphere.
Flows in the plane.
Figure 19: Flows in the plane.

In what follows, we will only need the infinitesimal flow generated by X. This is simply

xμ(t)=xμ(0)+tXμ(x)+𝒪(t2) (2.65)

Indeed, differentiating this obeys (2.64) to leading order in t.

(An aside: Given a vector field X, it may not be possible to integrate (2.64) to generate a flow defined for all t𝐑. For example, consider M=𝐑 with the vector field X=x2. The equation dx/dt=x2, subject to the initial condition x(0)=a, has the unique solution x(t)=a/(1-at) which diverges at t=1/a. Vector fields which generate a flow for all t𝐑 are called complete. It turns out that all vector fields on a manifold M are complete if M is compact. Roughly speaking, “compact” means that M doesn’t “stretch to infinity”. More precisely, a topological space M is compact if, for any family of open sets covering M there always exists a finite sub-family which also cover M. So 𝐑 is not compact because the family of sets {(-n,n),n𝐙+} covers 𝐑 but has no finite sub-family. Similarly, 𝐑n is non-compact. However, 𝐒n and 𝐓n are compact manifolds.)

We can look at some examples.

  • Consider the sphere 𝐒2 in polar coordinates with the vector field X=ϕ. The integral curves solve the equation (2.64), which are

    dϕdt=1   and   dθdt=0

    This has the solution θ=θ0 and ϕ=ϕ0+t. The associated one-parameter diffeomorphism is σt:(θ,ϕ)(θ,ϕ+t), and the flow lines are simply lines of constant latitude on the sphere and are shown in the left-hand figure.

  • Alternatively, consider the vector field on 𝐑2 with Cartesian components Xμ=(1,x2). The equation for the integral curves is now

    dxdt=1   and   dydt=x2

    which has the solution x(t)=x0+t and y(t)=y0+13(x0+t)3. The associated flow lines are shown in the right-hand figure.

2.2.4 The Lie Derivative

So far we have learned how to differentiate a function. This requires us to introduce a vector field X, and the new function X(f) can be viewed as the derivative of f in the direction of X.

Next we ask: is it possible to differentiate a vector field? Specifically, suppose that we have a second vector field Y. How can we differentiate this in the direction of X to get a new vector field? As we’ve seen, we can’t just write down XY because this doesn’t define a new vector field.

To proceed, we should think more carefully about what differentiation means. For a function f(x) on 𝐑, we compare the values of the function at nearby points, and see what happens as those points approach each other

dfdx=limt0f(x+t)-f(x)t

Similarly, to differentiate a vector field, we need to subtract the tangent vector YpTp(M) from the tangent vector at some nearby point YqTq(M), and then see what happens in the limit qp. But that’s problematic because, as we stressed above, the vector spaces Tp(M) and Tq(M) are different, and it makes no sense to subtract vectors in one from vectors in the other. To make progress, we’re going to have to find a way to do this. Fortunately, there is a way.

Push-Foward and Pull-Back

Suppose that we have a map φ:MN between two manifolds M and N which we will take to be a diffeomorphism. This allows us to import various structures on one manifold to the other.

For example, if we have a function on f:N𝐑, then we can construct a new function that we denote (φf):M𝐑,

(φf)(p)=f(φ(p))

Using the map in this way, to drag objects originally defined on N onto M is called the pull-back. If we introduce coordinates xμ on M and yα on N, then the map φ(x)=yα(x), and we can write

(φf)(x)=f(y(x))

Some objects more naturally go the other way. For example, given a vector field Y on M, we can define a new vector field (φY) on N. If we are given a function f:N𝐑, then the vector field (φY) on N acts as

(φY)(f)=Y(φf)

where I’ve been a little sloppy in the notation here since the left-hand side is a function on N and the right-hand side a function on M. The equality above holds when evaluated at the appropriate points: [(φY)(f)](φ(p))=[Y(φf)](p). Using the map to push objects on M onto N is called the push-forward.

If Y=Yμ/xμ is the vector field on M, we can write the induced vector field on N as

(φY)(f)=Yμf(y(x))xμ=Yμyαxμf(y)yα

Written in components, (φY)=(φY)α/yα, we then have

(φY)α=Yμyαxμ (2.66)

Given the way that the indices are contracted, this is more or less the only thing we could write down.

We’ll see other examples of these induced maps later in the lectures. The push-forward is always denoted as φ and goes in the same way as the original map. The pull-back is always denoted as φ and goes in the opposite direction to the original map. Importantly, if our map φ:MN is a diffeomorphism, then we also have φ-1:NM, so we can transport any object from M to N and back again with impunity.

Figure 20: To construct the Lie derivative, we use the push-forward (σ-t) to map the vector Yσt(p) back to p. The resulting vector, shown in red, is ((σ-t)Y)p.

Constructing the Lie Derivative

Now we can use these ideas to help build a derivative. Suppose that we are given a vector field X on M. This generates a flow σt:MM, which is a map between manifolds, now with N=M. This means that we can use (2.66) to generate a push-forward map from Tp(M) to Tσt(p)(M). But this is exactly what we need if we want to compare tangent vectors at neighbouring points. The resulting differential operator is called the Lie derivative and is denoted X.

It will turn out that we can use these ideas to differentiate many different kinds of objects. As a warm-up, let’s first see how an analogous construction allows us to differentiate functions. Now the function

Xf=limt0f(σt(x))-f(x)t=df(σt(x))dt|t=0=fxμdxμdt|t=0

But, using (2.64), we know that dxμ/dt=Xμ. We then have

Xf=Xμ(x)fxμ=X(f) (2.67)

In other words, acting on functions with the Lie derivative X coincides with action of the vector field X.

Now let’s look at the action of X on a vector field Y. This is defined by

XY=limt0((σ-t)Y)p-Ypt

Note the minus sign in σ-t. This reflects that fact that vector fields are pushed, rather than pulled. The map σt takes us from the point p to the point σt(p). But to push a tangent vector Yσt(p)Tσt(p)(M) to a tangent vector in Tp(M), where it can be compared to Yp, we need to push with the inverse map (σ-t). This is shown Figure 20.

Let’s first calculate the action of X on a coordinate basis μ=/xμ. We have

Xμ=limt0(σ-t)μ-μt (2.68)

We have an expression for the push-forward of a tangent vector in (2.66), where the coordinates yα on N should now be replaced by the infinitesimal change of coordinates induced by the flow σ-t which, from (2.65) is xμ(t)=xμ(0)-tXμ+. Note the minus sign, which comes from the fact that we have to map back to where we came from as shown in Figure 20. We have, for small t,

(σ-t)μ=(δμν-tXνxμ+)ν

Acting on a coordinate basis, we then have

Xμ=-Xνxμν (2.69)

To determine the action of X on a general vector field Y, we use the fact that the Lie derivative obeys the usual properties that we expect of a derivative, including linearity, X(Y1+Y2)=XY1+XY2 and Leibnizarity X(fY)=fXY+(Xf)Y for any function f, both of which follow from the definition. The action on a general vector field Y=Yμ(x)/xμ can then be written as

X(Yμμ)=(XYμ)μ+Yμ(Xμ)

where we’ve simply viewed the components Yμ(x) as n functions. We can use (2.67) to determine XYμ and we’ve computed Xμ in (2.69). We then have

X(Yμμ)=XνYμxνμ-YμXνxμν

But this is precisely the structure of the commutator. We learn that the Lie derivative acting on vector fields is given by

XY=[X,Y]

A corollary of this is

XYZ-YXZ=[X,Y]Z (2.70)

which follows from the Jacobi identity for commutators.

The Lie derivative is just one of several derivatives that we will meet in this course. As we introduce new objects, we will learn how to act with X on them. But we will also see that we can endow different meanings to the idea of differentiation. In fact, the Lie derivative will take something of a back seat until Section 4.3 when we will see that it is what we need to understand symmetries.

2.3 Tensors

For any vector space V, the dual vector space V is the space of all linear maps from V to 𝐑.

This is a standard mathematical construction, but even if you haven’t seen it before it should resonate with something you know from quantum mechanics. There we have states in a Hilbert space with kets |ψ and a dual Hilbert space with bras ϕ|. Any bra can be viewed as a map ϕ|:𝐑 defined by ϕ|(|ψ)=ϕ|ψ.

In general, suppose that we are given a basis {eμ,μ=1,,n} of V. Then we can introduce a dual basis {fμ,μ=1,,n} for V defined by

fν(eμ)=δμν

A general vector in V can be written as X=Xμeμ and fν(X)=Xμfν(eμ)=Xν. Given a basis, this construction provides an isomorphism between V and V given by eμfμ. But the isomorphism is basis dependent. Pick a different basis, and you’ll get a different map.

We can repeat the construction and consider (V), which is the space of all linear maps from V to 𝐑. But this space is naturally isomorphic to V, meaning that the isomorphism is independent of the choice of basis. To see this, suppose that XV and ωV. This means that ω(X)𝐑. But we can equally as well view X(V) and define X(ω)=ω(X)𝐑. In this sense, (V)=V.

2.3.1 Covectors and One-Forms

At each point pM, we have a vector space Tp(M). The dual of this space, Tp(M) is called the cotangent space at p, and an element of this space is called a cotangent vector, sometimes shortened to covector. Given a basis {eμ} of Tp(M), we can introduce the dual basis {fμ} for Tp(M) and expand any co-vector as ω=ωμfμ.

We can also construct fields of cotangent vectors, by picking a member of Tp(M) for each point p in a smooth manner. Such a cotangent field is better known as a one-form; they map vector fields to real numbers. The set of all one-forms on M is denoted Λ1(M).

There is a particularly simple way to construct a one-form. Take a function fC(M) and define dfΛ1(M) by

df(X)=X(f) (2.71)

We can use this method to build a basis for Λ1(M). If we introduce coordinates xμ on M with the corresponding coordinate basis eμ=/xμ of vector fields, which we often write in shorthand as /xμμ. We then we simply take the functions f=xμ which, from (2.71), gives

dxμ(ν)=ν(xμ)=δνμ

This means that fμ=dxμ provides a basis for Λ1(M), dual to the coordinate basis /μ. In general, an arbitrary one-form ωΛ1(M) can then be expanded as

ω=ωμdxμ

In such a basis the one-form df takes the form

df=fxμdxμ (2.72)

To see this, we simply need to evaluate

df(X)=fxμdxμ(Xνν)=Xμfxμ=X(f)

which agrees with the expected answer (2.71).

As with vector fields, we can look at what happens if we change coordinates. Given two different charts, ϕ=(x1,,xn) and ϕ~=(x~1,,x~n), we know that the basis for vector fields changes as (2.60),

x~μ=xνx~μxν

We should take the basis of one-forms to transform in the inverse manner,

dx~μ=x~μxνdxν (2.73)

This then ensures that

dx~μ(x~ν)=x~μxρdxρ(xσx~νxσ)=x~μxρxσx~νdxρ(xσ)=x~μxρxσx~νδσρ

But this is just the multiplication of a matrix and its inverse,

x~μxρxρx~ν=δνμ

So we find that

dx~μ(x~ν)=δνμ

as it should. We can then expand a one-form ω in either of these two bases,

ω=ωμdxμ=ω~μdx~μ   with   ω~μ=xνx~μων (2.74)

In the annoying terminology that I can never remember, components of vectors that transform this way are said to be covariant. Note that, as with vector fields, the placement of the indices means that (2.73) and (2.74) are pretty much the only things that you can write down that make sense.

2.3.2 The Lie Derivative Revisited

In Section 2.2.4, we explained how to construct the Lie derivative, which differentiates a vector field in the direction of a second vector field X. This same idea can be adapted to one-forms.

Under a map φ:MN, we saw that a vector field X on M can be pushed forwards to a vector field φX on N. In contrast, one-forms go the other way: given a one-form ω on N, we can pull this back to a one-form (φω) on M, defined by

(φω)(X)=ω(φX)

If we introduce coordinates xμ on M and yα on N then the components of the pull-back are given by

(φω)μ=ωαyαxμ (2.75)

We now define the Lie derivative X acting on one-forms. Again, we use X to generate a flow σt:MM which, using the pull-back, allows us to compare one-forms at different points. We will denote the cotangent vector ω(p) as ωp. The Lie derivative of a one-form ω is then defined as

Xω=limt0(σtω)p-ωpt (2.76)

Note that we pull-back with the map σt. This is to be contrasted with (2.68) where we pushed forward the tangent vector with the map σ-t and, as we now show, this difference in minus sign manifests itself in the expression for the Lie derivative. The infinitesimal map σt acts on coordinates as xμ(t)=xμ(0)+tXμ+ so, from (2.75), the pull-back of a basis vector dxμ is

σtdxμ=(δνμ+tXμxν+)dxν

Acting on the coordinate basis, we then have

X(dxμ)=Xμxνdxν

which indeed differs by a minus sign from the corresponding result (2.69) for tangent vectors. Acting on a general one-form ω=ωμdxμ, the Lie derivative is

Xω = (Xωμ)dxμ+ωνX(dxν) (2.77)
= (Xννωμ+ωνμXν)dxμ

We’ll return to discuss one-forms (and other forms) more in Section 2.4.

2.3.3 Tensors and Tensor Fields

A tensor of rank (r,s) at a point pM is defined to be a multi-linear map

T:Tp(M)××Tp(M)r×Tp(M)××Tp(M)s𝐑

Such a tensor is said to have total rank r+s.

We’ve seen some examples already. A cotangent vector in Tp(M) is a tensor of type (0,1), while a tangent vector in Tp(M) is a tensor of type (1,0) (using the fact that Tp(M)=Tp(M)).

As before, we define a tensor field to be a smooth assignment of an (r,s) tensor to every point pM.

Given a basis {eμ} for vector fields and a dual basis {fμ} for one-forms, the components of the tensor are defined to be

Tμ1μrν1νs=T(fμ1,,fμr,eν1,,eμs)

Note that we deliberately write the string of lower indices after the upper indices. In some sense this is unnecessary, and we don’t lose any information by writing Tν1νsμ1μr. Nonetheless, we’ll see later that it’s a useful habit to get into.

On a manifold of dimension n, there are nr+s such components. For a tensor field, each of these is a function over M.

As an example, consider a rank (2,1) tensor. This takes two one-forms, say ω and η, together with a vector field X, and spits out a real number. In a given basis, this number is

T(ω,η,X)=T(ωμfμ,ηνfν,Xρeρ)=ωμηνXρT(fμ,fν,eρ)=TμνρωμηνXρ

Every manifold comes equipped with a natural (1,1) tensor called δ. This takes a one-form ω and a vector field X and spits out the real number

δ(ω,X)=ω(X)      δ(fμ,eν)=fμ(eν)=δνμ

which is simply the Kronecker delta.

As with vector fields and one-forms, we can ask how the components of a tensor transform. We will work more generally than before. Consider two bases for the vector fields, {eμ} and {e~μ}, not necessarily coordinate bases, related by

e~ν=Aνμeμ

for some invertible matrix A. The respective dual bases are {fμ} and {f~μ} are then related by

f~ρ=Bσρfσ

such that

f~ρ(e~ν)=AνμBσρfσ(eμ)=AνμBμρ=δνρ      Bμρ=(A-1)μρ

The lower components of a tensor then transform by multiplying by A, and the upper components by multiplying by B=A-1. So, for example, a rank (1,2) tensor transforms as

T~ρνμ=BσμAρτAνλTστλ (2.78)

When we change between coordinate bases, we have

Aνμ=xμx~ν   and   Bνμ=(A-1)νμ=x~μxν

You can check that this coincides with our previous results (2.61) and (2.74).

Operations on Tensor Fields

There are a number of operations that we can do on tensor fields to generate further tensors.

First, we can add and subtract tensors fields, or multiply them by functions. This is the statement that the set of tensors at a point pM forms a vector space.

Next, there is a way to multiply tensors together to give a tensor of a different type. Given a tensor S of rank (p,q) and a tensor T of rank (r,s), we can form the tensor product, ST which a new tensor of rank (p+r,q+s), defined by

ST(ω1,,ωp,η1,,ηr,X1,Xq,Y1,,Ys)
        =S(ω1,,ωp,X1,Xq)T(η1,,ηr,Y1,,Ys)

In terms of components, this reads

(ST)μ1μpν1νrρ1ρqσ1σs=Sμ1μpρ1ρqTν1νrσ1σs (2.79)

Given an (r,s) tensor T, we can also construct a tensor of lower rank (r-1,s-1) by contraction. To do this, simply replace one of Tp(M) entries with a basis vector fμ, and the corresponding Tp(M) entry with the dual vector eμ and then sum over μ=1,,n. So, for example, given a rank (2,1) tensor T we can construct a rank (1,0) tensor S by

S(ω)=T(ω,fμ,eμ)

Alternatively, we could construct a (typically) different (1,0) tensor by contracting the other argument, S(ω)=T(fμ,ω,eμ). Written in terms of components, contraction simply means that we put an upper index equal to a lower index and sum over them,

Sμ=Tμνν   and   Sμ=Tνμν

Our next operation is symmetrisation and anti-symmetrisation. For example, given a (0,2) tensor T we decompose it into two (0,2) tensors, in which the arguments are either symmetrised or anti-symmetrised,

S(X,Y) = 12(T(X,Y)+T(Y,X))
A(X,Y) = 12(T(X,Y)-T(Y,X))

In index notation, this becomes

Sμν=12(Tμν+Tνμ)   and   Aμν=12(Tμν-Tνμ)

which is just like taking the symmetric and anti-symmetric part of a matrix. We will work with these operations frequently enough to justify introducing some new notation. We define

T(μν)=12(Tμν+Tνμ)   and   T[μν]=12(Tμν-Tνμ)

These operations generalise to other tensors. For example,

T(μν)ρσ=12(Tμνρσ+Tνμρσ)

We can also symmetrise or anti-symmetrise over multiple indices, provided that these indices are either all up or all down. If we (anti)-symmetrise over p objects, then we divide by p!, which is the number of possible permutations. This normalisation ensures that if we start with a tensor which is already, say, symmetric then further symmetrising doesn’t affect it. In the case of anti-symmetry, we weight each term with the sign of the permutation. So, for example,

Tμ(νρσ)=13!(Tμνρσ+Tμρνσ+Tμρσν+Tμσρν+Tμσνρ+Tμνσρ)

and

Tμ[νρσ]=13!(Tμνρσ-Tμρνσ+Tμρσν-Tμσρν+Tμσνρ-Tμνσρ)

There will be times when, annoyingly, we will wish to symmetrise (or anti-symmetrise) over indices which are not adjacent. We introduce vertical bars to exclude certain indices from the symmetry procedure. So, for example,

Tμ=[ν|ρ|σ]12(Tμ-νρσTμ)σρν

Finally, given a smooth tensor field T of any rank, we can always take the Lie derivative with respect to a vector field X. As we’ve seen previously, under a map φ:MN, vector fields are pushed forwards and one-forms are pulled-back. In general, this leaves a tensor of mixed type unsure where to go. However, if φ is a diffeomorphism then we also have φ-1:NM and this allows us to define the push-forward of a tensor T from M to N. This acts on one-forms ωΛ1(N) and vector fields X𝔛(N) and is given by

(φT)(ω1,,ωr,X1,,Xs)=T(φω1,,φωr,(φ-1X1),,(φ-1Xs))

Here φω are the pull-backs of ω from N to M, while φ-1X are the push-forwards of X from N to M.

The Lie derivative of a tensor T along X is then defined as

XT=limt0((σ-t)T)p-Tpt

where σt is the flow generated by X. This coincides with our earlier definitions for vector fields in (2.68) and for one-forms in (2.76). (The difference in the σ-t vs σt minus sign in (2.68) and (2.76) is now hiding in the inverse push-forward φ-1 that appears in the definition φT.)

2.4 Differential Forms

Some tensors are more interesting than others. A particularly interesting class are totally anti-symmetric (0,p) tensors fields. These are called p-forms. The set of all p-forms over a manifold M is denoted Λp(M).

We’ve met some forms before. A 0-form is simply a function. Meanwhile, as we saw previously, a 1-form is another name for a covector. The anti-symmetry means that we can’t have any form of degree p>n=dim(M). A p-form has (np) different components. Forms in Λn(M) are called top forms.

Given a p-form ω and a q-form η, we can take the tensor product (2.79) to construct a (p+q)-tensor. If we anti-symmetrise this, we then get a (p+q)-form. This construction is called the wedge product, and is defined by

(ωη)μ1μpν1νq=(p+q)!p!q!ω[μ1μpην1νq]

where the [] in the subscript tells us to anti-symmetrise over all indices. For example, given ω,ηΛ1(M), we can construct a 2-form

(ωη)μν=ωμην-ωνημ

For one forms, the anti-symmetry ensures that ωω=0. In general, if ωΛp(M) and ηΛq(M), then one can show that

ωη=(-1)pqηω

This means that ωω=0 for any form of odd degree. We can, however, wedge even degree forms with themselves. (Which you know already for 0-forms where the wedge product is just multiplication of functions.)

As a more specific example, consider M=𝐑3 and ω=ωμdxμ and η=ημdxμ. We then have

ωη = (ω1dx1+ω2dx2+ω3dx3)(η1dx1+η2dx2+η3dx3)
= (ω1η2-ω2η1)dx1dx2+(ω2η3-ω3η2)dx2dx3+(ω3η1-ω1η3)dx3dx1

Notice that the components that arise are precisely those of the cross-product acting on vectors in 𝐑3. This is no coincidence: what we usually think of as the cross-product between vectors is really a wedge product between forms. We’ll have to wait to Section 3 to understand how to map from one to the other.

It can also be shown that the wedge product is associative, meaning

ω(ηλ)=(ωη)λ

We can then drop the brackets in any such product.

Given a basis {fμ} of Λ1(M), a basis of Λp(M) can be constructed by wedge products {fμ1fμp}. We will usually work with the coordinate basis {dxμ}. This means that any p-form ω can be written locally as

ω=1p!ωμ1μpdxμ1dxμp (2.80)

Although locally any p-form can be written as (2.80), this may not be true globally. This, and related issues, will become of some interest in Section 2.4.3.

2.4.1 The Exterior Derivative

We learned in Section 2.3.1 how to construct a one-form df from a function f. In a coordinate basis, this one-form has components (2.72),

df=fxμdxμ

We can extend this definition to higher forms. The exterior derivative is a map

d:Λp(M)Λp+1(M)

In local coordinates (2.80), the exterior derivative acts as

(dω)=1p!ωμ1μpxνdxνdxμ1dxμp (2.81)

Equivalently we have

(dω)μ1μp+1=(p+1)[μ1ωμ2μp+1] (2.82)

Importantly, if we subsequently act with the exterior derivative again, we get

d(dω)=0

because the derivatives are anti-symmetrised and hence vanish. This holds true for any p-form, a fact which is sometimes expressed as

d2=0

It can be shown that the exterior derivative satisfies a number of further properties,

  • d(ωη)=dωη+(-1)pωdη, where ωΛp(M).

  • d(φω)=φ(dω) where φ is the pull-back associated to the map between manifolds, φ:MN

  • Because the exterior derivative commutes with the pull-back, it also commutes with the Lie derivative. This ensures that we have d(Xω)=X(dω).

A p-form ω is said to be closed if dω=0 everywhere. It is exact if ω=dη everywhere for some η. Because d2=0, an exact form is necessary closed. The question of when the converse is true is interesting: we’ll discuss this more in Section 2.4.3.

Examples

Suppose that we have a one-form ω=ωμdxμ, the exterior derivative gives a 2-form

(dω)μν=μων-νωμ      dω=12(μων-νωμ)dxμdxν

As a specific example of this example, suppose that we take the one-form to live on 𝐑3, with

ω=ω1dx1+ω2dx2+ω3dx3

Since this is a field, each of the components ωμ is a function of x1, x2 and x3. The exterior derivative is given by

dω = 2ω1dx2dx1+3ω1dx3dx1+1ω2dx1dx2
        +3ω2dx3dx2+1ω3dx1dx3+2ω3dx2dx3
= (1ω2-2ω1)dx1dx2+(2ω3-3ω2)dx2dx3+(3ω1-1ω3)dx3dx1

Notice that there’s no term like 1ω1 because this would come with a dx1dx1=0.

In the olden days (before this course), we used to write vector fields in 𝐑3 as 𝝎=(ω1,ω2,ω3) and compute the curl ×𝝎. But the components of the curl are precisely the components that appear in dω. In fact, our “vector” 𝝎 was really a one-form and the curl turned it into a two-form. It’s a happy fact that in 𝐑3, vectors, one-forms and two-forms all have three components, which allowed us to conflate them in our earlier courses. (In fact, there is a natural map between them that we will meet in Section 3.)

Suppose instead that we start with a 2-form B in 𝐑3, which we write as

B=B1dx2dx3+B2dx3dx1+B3dx1dx2

Taking the exterior derivative now gives

dB = 1B1dx1dx2dx3+2B2dx2dx3dx1+3B3dx3dx1dx2 (2.84)
= (1B1+2B2+3B3)dx1dx2dx3

This time there is just a single component, but again it’s something familiar. Had we written the original three components of the two-form in old school vector notation 𝐁=(B1,B2,B3), then the single component of dB is what we previously called 𝐁.

The Lie Derivative Yet Again

There is yet another operation that we can construct on p-forms. Given a vector field X𝔛(M), we can construct the interior product, a map ιX:Λp(M)Λp-1(M). If ωΛp(M), we define a ιXωΛp-1(M) by

ιXω(Y1,,Yp-1)=ω(X,Y1,,Yp-1) (2.85)

In other words, we just put X in the first argument of ω. Acting on functions f, we simply define ιXf=0.

The anti-symmetry of forms means that ιXιY=-ιYιX. Moreover, you can check that

ιX(ωη)=ιXωη+(-1)pωιXη

where ωΛp(M).

Consider a 1-form ω. There are two different ways to act with ιX and d to give us back a one-form. These are

ιXdω=ιX12(μων-νωμ)dxμdxν=Xμμωνdxν-Xνμωνdxμ

and

dιXω=d(ωμXμ)=Xμνωμdxν+ωμνXμdxν

Adding the two together gives

(dιX+ιXd)ω=(Xμμων+ωμνXμ)dxν

But this is exactly the same expression we saw previously when computing the Lie derivative (2.77) of a one-form. We learn that

Xω=(dιX+ιXd)ω (2.86)

This expression is sometimes referred to as Cartan’s magic formula. A similar calculation shows that (2.86) holds for any p-form ω.

2.4.2 Forms You Know and Love

There are a number of examples of differential forms that you’ve met already, but likely never called them by name.

The Electromagnetic Field

The electromagnetic gauge field Aμ=(ϕ,𝐀) should really be thought of as the components of a one-form on spacetime 𝐑4. (Here I’ve set c=1.) We write

A=Aμ(x)dxμ

Taking the exterior derivative yields a 2-form F=dA, given by

F=12Fμνdxμdxν=12(μAν-νAμ)dxμdxν

But this is precisely the field strength Fμν=μAν-νAμ that we met in our lectures on Electromagnetism. The components are the electric and magnetic fields, arranged as

Fμν=(0-E1-E2-E3E10B3-B2E2-B30B1E3B2-B10) (2.87)

By construction, we also have dF=d2A=0. In this context, this is sometimes called the Bianchi identity; it yields two of the four Maxwell equations. In old school vector calculus notation, these are 𝐁=0 and ×𝐄+𝐁/t=0. We need a little more structure to get the other two as we will see later in this chapter.

The gauge field A is not unique. Given any function α, we can always shift it by a gauge transformation

AA+dα      AμAμ+μα

This leaves the field strength invariant because FF+d(dα)=F.

Phase Space and Hamilton’s Equations

In classical mechanics, the phase space is a manifold M parameterised by coordinates (qi,pj) where qi are the positions of particles and pj the momenta. Recall from our lectures on Classical Dynamics that the Hamiltonian H(q,p) is a function on M, and Hamilton’s equations are

q˙i=Hpi   and   p˙i=-Hqi (2.88)

Phase space also comes with a structure called a Poisson bracket, defined on a pair of functions f and g as

{f,g}=fqjgpj-fpjgqj

Then the time evolution of any function f can be written as

f˙={f,H}

which reproduces Hamilton’s equations if we take f=qi or f=pi.

Underlying this story is the mathematical structure of forms. The key idea is that we have a manifold M and a function H on M. We want a machinery that turns the function H into a vector field XH. Particles then follow trajectories in phase space that are integral curves generated by XH.

To achieve this, we introduce a symplectic two-form ω on an even-dimensional manifold M. This two form must be closed, dω=0, and non-degenerate, which means that the top form ωω0. We’ll see why we need these requirements as we go along. A manifold M equipped with a symplectic two-form is called a symplectic manifold.

Any 2-form provides a map ω:Tp(M)Tp(M). Given a vector field X𝔛(M), we can simply take the interior product with ω to get a one-form ιXω. However, we want to go the other way: given a function H, we can always construct a one-form dH, and we’d like to exchange this for a vector field. We can do this if the map ω:Tp(M)Tp(M) is actually an isomorphism, so the inverse exists. This turns out to be true provided that ω is non-degenerate. In this case, we can define the vector field XH by solving the equation

ιXHω=-dH (2.89)

If we introduce coordinates xμ on the manifold, then the component form of this equation is

XHμωμν=-νH

We denote the inverse as ωμν=-ωνμ such that ωμνωνρ=δρμ. The components of the vector field are then

XHμ=-ωνμνH=ωμννH

The integral curves generated by XH obey the differential equation (2.64)

dxμdt=XHμ=ωμννH

These are the general form of Hamilton’s equations. They reduce to our earlier form (2.88) if we write xμ=(qi,pj) and choose the symplectic form to have block diagonal form ωμν=(01-10).

To define the Poisson structure, we first note that we can repeat the map (2.89) to turn any function f into a vector field Xf obeying ιXfω=-df. But we can then feed these vector fields back into the original 2-form ω. This gives us a Poisson bracket,

{f,g}=ω(Xg,Xf)=-ω(Xf,Xg)

Or, in components,

{f,g}=ωμνμfνg

There are many other ways to write this Poisson bracket structure in invariant form. For example, backtracking through various definitions we find

{f,g}=-ιXfω(Xg)=df(Xg)=Xg(f)

The equation of motion in Poisson bracket structure is then

f˙={f,H}=XH(f)=XHf

which tells us that the Lie derivative along XH generates time evolution.

We haven’t yet explained why the symplectic two-form must be closed, dω=0. You can check that this is needed so that the Poisson bracket obeys the Jacobi identity. Alternatively, it ensures that the symplectic form itself is invariant under Hamiltonian flow, in the sense that XHω=0. To see this, we use (2.86)

XHω = (dιXH+ιXHd)ω=ιXHdω

The second equality follows from the fact that dιXHω=-d(dH)=0. If we insist that dω=0 then we find XHω=0 as promised.

Thermodynamics

The state space of a thermodynamic system is a manifold M. For the ideal gas, this is a two-dimensional manifold with coordinates provided by, for example, the pressure p and volume V. More complicated systems can have a higher dimensional state space, but the dimension is always even since, like in classical mechanics, thermodynamical variables come in conjugate pairs.

When we first learn the laws of thermodynamics, we have to make up some strange new notation, -d, which then never rears its head again. For example, the first law of thermodynamics is written as

dE=-dQ+-dW

Here dE is the infinitesimal change of energy in the system. The first law of thermodynamics, as written above, states that this decomposes into the heat flowing into the system -dQ and the work done on the system -dW.

Why the stupid notation? Well, the energy E(p,q) is a function over the state space M and this means that we can write the change of the energy dE=Epdp+EVdV. But there is no such function Q(p,q) or W(p,q) and, correspondingly, -dQ and -dW are not exact differentials. Indeed, we have -dW=-pdV and later, after we introduce the second law, we learn that -dQ=TdS, with T the temperature and S the entropy, both of which are functions over M.

This is much more natural in the language of forms. All of the terms in the first law are one-forms. But the transfer of heat, -dQ and the work -dW are not exact one-forms and so can’t be written as d(something). In contrast, dE is an exact one-form. That’s what the -d notation is really telling us: it’s the way of denoting non-exact one-forms before we had a notion of differential geometry.

The real purpose of the first law of thermodynamics is to define the energy functional E. The 18th century version of the statement is something like: “the amount of work required to change an isolated system is independent of how the work is performed”. A more modern rendering would be: “the sum of the work and heat is an exact one-form”.

2.4.3 A Sniff of de Rham Cohomology

The exterior derivative is a map which squares to zero, d2=0. It turns out that one can have a lot of fun with such maps. We will now explore a little bit of this fun.

First a repeat of definitions we met already: a p-form is closed if dω=0 everywhere. A p-form is exact if ω=dη everywhere for some η. Because d2=0, exact implies closed. However, the converse is not necessarily true. It turns out that the way in which closed forms fail to be exact captures interesting facts about the topology of the underlying manifold.

We’ve met this kind of question before. In electromagnetism, we have a magnetic field 𝐁 which obeys 𝐁=0. We then argue that this means we can write the magnetic field as 𝐁=×𝐀. This is more properly expressed the language of forms. We we saw in the previous section, the magnetic field is really a 2-form

B=B1dx2dx3+B2dx3dx1+B3dx1dx2

We computed the exterior derivative in (2.84); it is

dB=(1B1+2B2+3B3)dx1dx2dx3

We see that the Maxwell equation 𝐁=0 is really the statement that B is a closed two-form, obeying dB=0. We also saw in (2.4) if we write B=dA for some one-form A, then the components are given by 𝐁=×𝐀. Clearly writing B=dA ensures that dB=0. But when is the converse true? We have the following statement (which we leave unproven)


Theorem (The Poincaré Lemma): On M=𝐑n, closed implies exact.

Since we’ve spent a lot of time mapping manifolds to 𝐑n, this also has consequence for a general manifold M. It means that if ω is a closed p-form, then in any neighbourhood 𝒪M it is always possible to find a ηΛp-1(M) such that ω=dη on 𝒪. The catch is that it may not be possible to find such an η everywhere on the manifold.

An Example

Consider the one-dimensional manifold M=𝐑. We can take a one-form ω=f(x)dx. This is always closed because it is a top form. It is also exact. We introduce the function

g(x)=0x𝑑xf(x)

Then ω=dg.

Now consider the topologically more interesting one-dimensional manifold 𝐒1, which we can view as the phase eiθ𝐂. We can introduce the form ω=dθ on 𝐒1. The way its written makes it look like its an exact form, but this is an illusion because, as we stressed in Section 2.1, θ is not a good coordinate everywhere on 𝐒1 because it’s not single valued. Indeed, it’s simple to see that there is no single-valued function g(θ) on 𝐒1 such that ω=dg. So on 𝐒1, we can construct a form which, locally, can be written as dθ but globally cannot be written as d(something). So we have a form that is closed but not exact.

Another Example

On M=𝐑2, the Poincaré lemma ensures that all closed forms are exact. However, things change if we remove a single point and consider 𝐑2-{0,0}. Consider the one-form defined by

ω=-yx2+y2dx+xx2+y2dy

This is not a smooth one-form on 𝐑2 because of the divergence at the origin. But removing that point means that ω becomes acceptable. We can check that ω is closed,

dω=-y(yx2+y2)dydx+x(xx2+y2)dxdy=0

where the =0 follows from a little bit of algebra. ω is exact if we can find a function f, defined everywhere on 𝐑2-{0,0} such that ω=df, which means

ω=fxdx+fydy      fx=-yx2+y2andfy=xx2+y2

We can certainly integrate these equations; the result is

f(x,y)=tan-1(yx)+constant

But this is not a smooth function everywhere on 𝐑2-{0,0}. This means that we can’t, in fact, write ω=df for a well defined function on 𝐑2-{0,0}. We learn that removing a point makes a big difference: now closed no longer implies exact.

There is a similar story for 𝐑3. Indeed, this is how magnetic monopoles sneak back into physics, despite being forbidden by the Maxwell equation 𝐁=0. You can learn more about this in the lectures on Gauge Theory.

Betti Numbers

We denote the set of all closed p-forms on a manifold M as Zp(M). Equivalently, Zp(M) is the kernel of the map d:Λp(M)Λp+1(M).

We denote the set of all exact p-forms on a manifold M as Bp(M). Equivalently, Bp(M) is the range of d:Λp-1(M)Λp(M).

The pth de Rham cohomology group is defined to be

Hp(M)=Zp(M)/Bp(M)

The quotient here is an equivalence class. Two closed forms ω,ωZp(M) are said to be equivalent if ω=ω+η for some ηBp(M). We say that ω and ω sit in the same equivalence class [ω]. The cohomology group Hp(M) is the set of equivalence classes; in other words, it consists of closed forms mod exact forms.

The Betti numbers Bp of a manifold M are defined as

Bp=dimHp(M)

It turns out that these are always finite. The Betti number B0=1 for any connected manifold. This can be traced to the existence of constant functions which are clearly closed but, because there are no p=-1 forms, are not exact. The higher Betti numbers are non-zero only if the manifold has some interesting topology. Finally, the Euler character is defined as the alternating sum of Betti numbers,

χ(M)=p(-1)pBp (2.90)

Here are some simple examples. We’ve already seen that the circle 𝐒1 has a closed, non-exact one-form. This means that B1=1 and χ=0. The sphere 𝐒n has only Bn=1 and χ=1+(-1)n. The torus 𝐓n has Bp=(np) and χ=0.

2.4.4 Integration

We have learned how to differentiate on manifolds by using a vector field X. Now it is time to learn how to integrate. It turns out that the things that we integrate on manifolds are forms.

Integrating over Manifolds

To start, we need to orient ourselves. A volume form, or orientation on a manifold of dimension dim(M)=n is a nowhere-vanishing top form v. Any top form has just a single component and can be locally written as

v=v(x)dx1dxn

where we require v(x)0. If such a top form exists everywhere on the manifold, then M is said to be orientable.

The orientation is called right-handed if v(x)>0 everywhere, and left-handed if v(x)<0 everywhere. Given one volume form v, we can always construct another by multiplying by a function, giving v~=fv where f(x)>0 everywhere or f(x)<0 everywhere.

It’s not enough to just write down a volume form with v(x)0 locally. We must also ensure that we can patch these volume forms together over the manifold, without the handedness changing. Suppose that we have two sets of coordinates, xμ and x~μ that overlap on some region. In the new coordinates, the volume form is given by

v=v(x)x1x~μ1dx~μ1xnx~μndx~μn=v(x)det(xμx~ν)dx~1dx~n

which has the same orientation provided

det(xμx~ν)>0 (2.91)

Non-orientable manifolds cannot be covered by overlapping charts such that (2.91) holds. Examples include the Möbius strip and real projective space 𝐑𝐏n for n even. (In contrast 𝐑𝐏n is orientable for n odd, and 𝐂𝐏n is orientable for all n.) In these lectures, we deal only with orientable manifolds.

Given a volume form v on M, we can integrate any function f:M𝐑 over the manifold. In a chart ϕ:𝒪U, with coordinates xμ, we have

𝒪fv=U𝑑x1𝑑xnf(x)v(x)

On the right-hand-side, we’re just doing normal integration over some part of 𝐑n. The volume form is playing the role of a measure, telling us how much to weight various parts of the integral. To integrate over the entire manifold, we divide the manifold up into different regions, each covered by a single chart. We then perform the integral over each region and sum the results.

Integrating over Submanifolds

We don’t have to integrate over the full manifold M. We can integrate over some lower dimensional submanifold.

A manifold Σ with dimension k<n is a submanifold of M if we can find a map ϕ:ΣM which is one-to-one (which ensures that Σ doesn’t intersect itself in M) and ϕ:Tp(Σ)Tϕ(p)(M) is one-to-one.

We can then integrate a k-form ω on M over a k-dimensional submanifold Σ. We do this by pulling back the k-form to Σ and writing

ϕ(Σ)ω=Σϕω

For example, suppose that we have a one-form A living over M. If C is a one-dimensional manifold, the we can introduce a map σ:CM which defines a non-intersecting, one-dimensional curve σ(C) which is a submanifold of M. We can then pull-back A onto this curve and integrate to get

σ(C)A=CσA

This probably looks more familiar in coordinates. If the curve traces out a path xμ(τ) in M, we have

CσA=𝑑τAμ(x)dxμdτ

But this is precisely the way the worldline of a particle couples to the electromagnetic field, as we previously saw in (1.21).

2.4.5 Stokes’ Theorem

Until now, we have considered only smooth manifolds. There is a slight generalisation that will be useful. We define a manifold with boundary in the same way as a manifold, except the charts map ϕ:𝒪U where U is an open subset of 𝐑n+={{x1,,xn}such that xn0}. The boundary has co-dimension 1 and is denoted M: it is the submanifold with coordinates xn=0.

Consider a manifold M with boundary M. If the dimension of the manifold is dim(M)=n then for any (n-1)-form ω, we have the following simple result

M𝑑ω=Mω (2.92)

This is Stokes’ theorem.

We do not prove Stoke’s theorem here. The proof is fairly tedious, and does not differ greatly from the proofs of other things that you’ve called Stokes’ theorem (or Gauss’ divergence theorem) in the past. (See, for example, the lectures on Vector Calculus.) However, the wonderful thing about (2.92) is the way in which it unifies many results in a single elegant formula. To see this, we simply need to look at a few examples.

The Mother of all Integral Theorems

First, consider n=1 with M the interval I. We introduce coordinates x[a,b] on the interval. The 0-form ω=ω(x) is simply a function and dω=(dω/dx)dx. In this case, the two sides of Stokes’ theorem can be evaluated to give

M𝑑ω=abdωdx𝑑x   and   Mω=ω(b)-ω(a)

Equating the two, we see that Stokes’ theorem is simply a restatement of the fundamental theorem of calculus.

Next, we take M𝐑2 to be a manifold with boundary. We introduce a one-form with coordinates

ω=ω1dx1+ω2dx2      dω=(ω2x1-ω1x2)dx1dx2

In this case, the ingredients in Stokes’ theorem are

M𝑑w=M(ω2x1-ω1x2)𝑑x1𝑑x2   and   Mω=Mω1𝑑x1+ω2dx2

Equating the two gives the result usually referred to as Green’s theorem in the plane.

Finally, consider M𝐑3 to be a manifold with boundary, with a 2-form

ω=ω1dx2dx3+ω2dx3dx1+ω3dx1dx2

The right-hand-side of Stokes theorem is

Mω1𝑑x2𝑑x3+ω2dx3dx1+ω3dx1dx2

Meanwhile, we computed the exterior derivative of a 2-form in (2.84). The left-hand-side of Stokes’ theorem then gives

M𝑑ω=M(1ω1+2ω2+3ω3)𝑑x1𝑑x2𝑑x3

This time, equating the two gives us Gauss’ divergence theorem.

We see that Stokes’ theorem, as written in (2.92), is the mother of all integral theorems, packaging many famous results in a single formula. We’ll revisit this in Section 3.2.4 where we relate Stokes’ theorem to a more explicit form of the divergence theorem.