As a first post, I thought I’d start off with something rather general, which might be broadly-useful to other students in analysis:1 the details of the construction of the uniform spherical measure.
A PDF of my full write-up is here; below, I’ll explain some of the motivation for creating this (a more abbreviated discussion along these same lines can be found in the Introduction).
The spherical measure \(\sigma\) on \(S^{d \ – \, 1} = \{x \in \mathbb{R}^d : |x|_d = 1\}\) can be constructed in several ways; with a multiplicity of approaches, it is not always clear why they produce the same measure.
Some of these equivalences are shown in arguments scattered across MathOverflow or MathStackExchange, and others exist as consequences of general results found in various references, but it is useful to gather tools in one place, and prove them all together, simply and succinctly.2
First, one may consider the question from the perspective of integration theory: ideally, a surface measure on the sphere would ensure that the polar integration formula $$\int_{\mathbb{R}^d} f(x) \, dx = \int_{(0, \infty)} \int_{S^{d \ – \, 1}} f(r \omega) \, d\sigma \, r^{d \ – \, 1} \, dr$$ would be satisfied. Some references will then construct and define \(\sigma\) on the basis of this identity; there turns out to be a unique measure \(\sigma\) for which the identity holds. This is the approach of 2.7 in Folland’s text, for instance (from which I first learned measure theory and modern real analysis).
This approach is nice in some aspects, as having the polar coordinates formula is quite useful, and the form of \(\sigma\) that one develops from this approach will show that one has a rotationally-invariant, finite, nontrivial measure on the sphere.
But the form of \(\sigma\), as so defined, is given for Borel sets \(E \subseteq S^{d \ – \, 1}\) as $$\sigma(E) = d \cdot \lambda^d(\{x \in \mathbb{R}^d \setminus \{0\} : |x|_d \leq 1, \ x / |x|_d \in E\}),$$ a definition which is utterly-intractable to compute, in all but the simplest of cases.
(We do have the useful growth estimate \(\sigma(E) \lesssim_d \text{diam}(E)^{d \ – \, 1}\); without loss we may assume \(E = B(x_0, r) \cap S^{d \ – \, 1}\), for some small ball centered at an \(x_0 \in S^{d \ – \, 1}\) for some \(0 < r \ll 1\) with \(r \leq 2 \, \text{diam}(E)\) numerically, and rotate to ensure \(x_0 = e_d\). The resulting cone-set associated with \(E\), appearing on the right in the definition above, can then be embedded into a thin cylinder \(\{(x’, x_d) \in \mathbb{R}^d: |x’|_{d \ – \, 1} < r, \ 0 \leq x_d \leq 1\}\). The estimate then follows from a base-height bound.)
There is, of course, a glaring problem: when the spherical measure arises in practice, this is not how it is defined or needed. Instead, for many analysis applications, \(\sigma\) is the measure on the surface \(S^{d – 1}\) for which the divergence theorem holds when integrating over the unit ball.
When we invoke the spherical measure in our PDE I class, for instance, the property we use to derive the mean-value representation for harmonic functions is precisely that the divergence-theorem holds between balls and spheres (and it cannot be understated how foundational this result is, for establishing many early results for harmonic functions).
Given the process by which \(\sigma\) was defined, which was measure-theoretic in nature, it is unclear why anything like the divergence theorem should hold for it. From the definition alone, proving that all \(C^1\) functions obey $$\int_{|x| < 1} (\partial_i f)(x) \, dx = \int_{S^{d \ – \, 1}} f \nu_i \, d\sigma = \int_{S^{d \ – \, 1}} f(\omega) \omega_i \, d\sigma$$ with nothing but \((r, \omega)\) coordinates is rather difficult; the nature of this claim seems especially bound-up with Cartesian ones.3
We first encounter the divergence theorem in our multivariable calculus classes, where the classical formulation is given in rather-special cases. The set-up (at least in the standard course) goes like this: for a parameterized surface in \(3\)-space, induced as the graph of a \(C^1\) function \(\varphi : U \to \mathbb{R}\), \(U \subseteq \mathbb{R}^2\), one must consider the “surface area element” $$d S = \sqrt{1 + (\partial_u \varphi)^2 + (\partial_v \varphi)^2} \, du \, dv$$ on the region $$\text{Gr}(\varphi, U) = \{(w, \varphi(w)) : w \in U\} \subseteq U \times \mathbb{R},$$ which is often explained with much hand-waving and talk of the areas of parallelograms. But, at least on the level of the bare definitions, after properly setting-up the unit normal, one obtains that the divergence theorem holds for regions that can be decomposed along each axial direction as being between two such parameterizations.
This, then, also furnishes a definition of “area” on curved surfaces, and we may consider whether this can be used to define a measure on the sphere: in the case of \(S^{d \ – \, 1}\), the parameterization is given by \(x’ \in \mathbb{R}^{d \ – \, 1}\), \(|x’|_{d \ – \, 1} < 1\), being mapped to $$\Phi_{\pm}(x’) = (x’, \pm \sqrt{1 \ – \, |x’|^2})$$ (upper and lower hemispheres) from which we get a measure $$\sigma(A) = \int_{\Phi_{+}^{- 1}(A)} \sqrt{1 + |\nabla_{x’}(\sqrt{1 \ – \, |x’|^2})|^2} \, dx’ + \int_{\Phi_{-}^{- 1}(A)} \sqrt{1 + |\nabla_{x’}(- \sqrt{1 \ – \, |x’|^2})|^2} \, dx’,$$ which is somewhat messy, and not very enlightening, but theoretically does allow for calculation.
Moreover, this allows for the divergence theorem to be applied when integrating over spheres; this, as we discussed, is always a welcome result to have.
Unfortunately, from this parameterized “hemispheric” definition, it is truly unclear why (or even if) this \(\sigma\) should be rotationally invariant; this is one of the key (arguably, defining) properties that \(\sigma\) ought to have. For instance, what if \(A\) is split between the two (upper and lower) hemispheres? One may attempt to reparameterize along a different axis, but there’s always the case when \(A\) is large and does not fit inside a single hemisphere, and it’s unclear whether we may switch \(i\) to “change hemispheres,” anyways; a priori, different choices could assign different measures for a set. This coordinates-based definition is not quite suitable for some types of global calculations.
Finally, from our more-advanced courses, we can recall the notion of Hausdorff measure on metric spaces, \(\mathcal{H}^s\), \(s \geq 0\). Seeing as the unit sphere \(S^{d \ – \, 1}\) is a smooth \((d \ – \, 1)\)-submanifold, which can be parameterized (locally, at least) by Lipschitz hypersurfaces, we might consider the measure $$\sigma = \mathcal{H}^{d \ – \, 1} \mid_{S^{d \ – \, 1}},$$ and ask whether this is suitable.
On the one hand, the excellent invariance properties of the Hausdorff measure do give that this is rotationally-invariant, which certainly seems like it should determine (speaking loosely) the property of being the spherical measure. But the divergence theorem and the polar coordinates formula, advantages of the first two definitions, are much less clear from this vantage point.4
With all these definitions, it is unclear whether these even produce equivalent measures; ideally, we’d like them to all equal each other (perhaps allowing equality up to multiplicative constants), but these all come from radically-different contexts.
To do this, I adapted and reworked some arguments from introductory geometric measure theory, to show these equivalences with tools that were as elementary as I could manage.
One equivalence, that the “parameterized” definition and the Hausdorff measure definition are precisely the same, follows from two main ideas:
- Knowing the \(k\)-Hausdorff measure of a \(k\)-dimensional Euclidean ball, and how it changes when we apply a linear operator \(T\) (which is essentially linear algebra, and a nice application of the SVD); and
- Understanding how a surface parameterization can be taken as, essentially, a spatially-varying collection of affine maps (which is essentially just an application of the inverse function theorem, displaying how \(C^1\) functions can be nicely approximated locally).
Moreover, most of the aspects of the theory of Hausdorff measure we use are quite straightforward: beyong the linear algebra, we essentially just have the inequality \(\mathcal{H}^s(f(E)) \leq \text{Lip}(f)^s \mathcal{H}^s(E)\), and the fact that \(k\)-Hausdorff measure on \(\mathbb{R}^k\) recovers the \(k\)-Lebesgue measure.)
(This type of thing done in much greater generality in Matilla’s book, for instance, but powerful results are unnecessary for now; I found that the specific case of a hypersurface parameterization map \(x’ \mapsto (x’, \varphi(x’))\) admits a linearized local approximation which is simple and easy to work with, and closeness in \(C^1\) follows quickly from elementary arguments. This simplifies my argument by a considerable amount, as does the fact that we’re only working with continuously-differentiable functions; the passage to the general Lipschitz case is annoyingly technical.)
This gives a certain representation formula for the Hausdorff-\((d \ – \, 1)\) measure of a smooth parameterized hypersurface, as an integral involving the parameterization maps. Specializing to the case of the sphere, we have one equivalence we sought. On the one hand, we see finiteness of the Hausdorff measure restricted to the sphere; on the other, we have the tricky-to-see-directly rotational invariance of the parameterized definition.
To convert between Folland’s Lebesgue-based, sector-based definition and the Hausdorff measure, we use the key idea of analyzing the symmetries of the Radon-Nikodym derivative of the former with respect to the latter; we first show that exists, and then, using rotational invariance, that it’s a constant function. (Along the way, we also derive a Lebesgue differentiation theorem on the sphere for Hausdorff measure, which is rather neat.)
As a consequence, switching between these various equivalent definitions, we’re able to easily show many of the classical analysis results for \(\sigma\); the divergence theorem holds, for instance, and we derive the standard stationary-phase estimates for its Fourier transform, in detail.
We also have a general uniqueness result, using similar ideas as the Radon-Nikodym argument for the final equivalence, in order to fully classify all rotationally-invariant finite measures on the sphere (intuitively, they’re just the scalar multiples of the uniform spherical measure, and this is what we can now show).
Hopefully these notes can help those who may feel wary about the many different definitions, as I once did.
Work begun: August 25, 2025. Completed: August 27, 2025.
- Analysis means that I’ll have to pass on the Haar construction, which involves taking a Haar measure over the compact groups of orthogonal (or special orthogonal) matrices, and using that to generate a measure on the sphere, over which these matrix groups act transitively. To me, this is interesting theoretically, but any insights this gives into the analytical properties of the measure are far from obvious. My apologies to the algebraists. ↩︎
- I guess I should also take this moment to apologize for the arrogance implicit in the post’s title. In my defense, I would hope that it’s not too much of an exaggeration. ↩︎
- One can try using the polar coordinates formula on \(\partial_i f\), since that is where \(\sigma\) naturally shows up, but curiously, this does not seem to be a viable approach: the inner integral \(\int_{0}^{1} r^{d \ – \, 1} (\partial_i f)(r \omega) \, dr\) does not simplify much at all, integration-by-parts makes this worse. It is quite difficult to derive a divergence form structure or isolate any single direction \(i \in \{1, \dots, d\}\) with only radial derivatives. Likewise, manipulating the expected RHS \(r^{d \ – \, 1} f(r \omega) \omega_i\), differentiating in \(r\), gives an expression with far too many terms, most of which do not seem to have any apparent reason to vanish when integrated over the sphere. ↩︎
- People in geometric measure theory will be familiar with the co-area formula, which computes an integral by decomposing it as an iterated integral involving the lower-dimensional level sets of a Lipschitz map. After establishing this nontrivial result with a decent amount of work, the polar coordinates formula emerges as a (distant) consequence. Thus the full machinery of Hausdorff measure does eventually give us what we want, but the path to getting there is winding and circuitous. We would prefer a more direct approach. ↩︎
Leave a Reply