The Pólya-Szegő Principle

Written

by

The write-up is here; motivation is below.

I’d like to thank Daniel Rui for discussing with me his frustration (as it turns out, mutual) regarding the state of exposition of this important inequality in our textbooks; a quick and clean proof is not easy to extract (or even to find coverage of) from many, many print sources.


In extremal problems, the presence of symmetry is always an interesting feature. For instance, to establish the existence of extremizers to the \(W^{1, p}\) Sobolev inequality (which, based on the structure of the terms, one might expect to be radial), one can argue in the vein of Talenti and apply the spherical decreasing rearrangement.

(Of course, there’s plenty of work that’s necessary beyond that; demonstrating a minimizer in the one-dimensional variational problem also requires significant toil, and Talenti’s paper is very difficult even after the reduction to the radial case. But symmetrization certainly gives a starting point.)

It is known (as a result of the work of Pólya and Szegő for \(p = 2\); I am unsure as to who first proved it in the general case) that spherical decreasing rearrangement induces a contraction on the \(p\)-energy of Sobolev functions:

Theorem: Let \(1 \leq p < \infty\), and suppose \(u \in W^{1, p}(\mathbb{R}^d; \mathbb{R})\). Then \(u^* \in L^p(\mathbb{R}^d)\) actually belongs to \(W^{1, p}(\mathbb{R}^d; \mathbb{R})\) also, and $$\|\nabla (u^*)\|_{L^p(\mathbb{R}^d)} \leq \|\nabla u\|_{L^p(\mathbb{R}^d)},$$ where the vector \(L^p\) norms are specifically evaluated with the Euclidean norm.


This is a remarkable theorem. It confirms (and extends) our intuition, in a very clean and elegant way. It is also quite useful for applications; one need only page through the chapters of Lieb & Loss’s Analysis text to witness this fact.

However, this simple-to-state fact requires a nontrivial amount of work to prove.

One way to show this is via the co-area formula; for a Lipschitz map from a higher-dimensional space to a lower-dimensional one, it states that one can evaluate the Lebesgue integral of the modulus of its gradient, over \(\mathbb{R}^d\), by stratifying space into the level sets of the map, integrating over each with the appropriate-dimension Hausdorff measure, and then putting the pieces back together with an appropriate Jacobian-type factor. In his paper, Talenti uses this approach to show Pólya-Szegő.

While I’m perfectly willing to accept the use of the co-area formula if truly necessary, in general, it’s quite annoying to use such heavy weaponry in the course of an argument. This is a rather-massive piece of machinery to cart in, and it requires a significant amount of geometric measure theory just to show, even in its simplest forms; it also has a number of technical hypotheses that must be verified. These considerations might not be dispositive, but I would still prefer to avoid the use of co-area whenever possible; it seems like a massive item to take on faith for a student reader.

I should also mention the proof of Lieb & Loss, which only works for \(p = 2\); it uses the Riesz rearrangement inequality and the special structure of the energy functional on \(W^{1, 2}\) (using the Fourier transform, Plancherel, to write it as a limit of a certain interaction functional involving the heat kernel).

While this is an interesting idea, the approach also does not generalize (as far as I know) to the \(p \neq 2\) case, short of some discovery of an exact formula for \(\|\nabla u\|_{L^p(\mathbb{R}^d)}^p\) with convolution-type terms (and one can imagine just by scaling that for general, noninteger \(p\), the existence of such a device is very much fantastic).

Thus, if we seek an accessible, full proof, a new way forward is needed. For this, the lecture notes of Almut Burchard were incredibly helpful and clear.

As it turns out, a simpler process, known as two-point rearrangement, can be used to both approximate the spherical decreasing rearrangement and is easily-verified to produce an identity of Pólya-Szegő type, relating the \(L^p\) norm of the gradient of a so-rearranged function to the original \(p\)-energy.

The two-point rearrangement considers hyperplanes away from the origin, and acts by reflection; the higher values on the half-space not containing the origin are reflected across, to the other half-space, while the lower values on the half-space which does contain the origin are reflected away. In this sense, one is essentially redistributing the mass of the function to become successively more-concentrated near the origin.

I focused on the sections of the notes giving the bare minimum of results needed to show the Pólya-Szegő principle, including working some of the exercises, and combined them in a single document. I’ve tried to create a self-contained roadmap from start to finish; the progression of lemmas should lead naturally to the result.

I’ve also striven to make the structure as tight and spare as possible, essentially demanding the lightest load possible, since this is meant for the traveller aiming to arrive in a hurry.

(I also wish to emphasize, strongly, that nothing here is original in the slightest on my part; all I have done is add some justifications, mostly for why the two-point rearrangement preserves Sobolev functions, and ordered some things.)


Here, I’ll describe some arguments that draw on material from the penultimate post (on the Sobolev chain rule), and thus were not included in the original write-up (to ensure it remains self-contained). However, these provide a nontrivial extension of the result, and might be interesting in their own right, so I’ve written them up in full detail below.

Now, the statement of the result is given solely for functions \(u \in W^{1, p}(\mathbb{R}^d)\). However, we can relax this restriction slightly, as follows: suppose \(u : \mathbb{R}^d \to \mathbb{R}\) vanishes measure-theoretically at infinity, and that it induces a distribution with \(\nabla u \in L^p(\mathbb{R}^d)\). Then \(u^* \in L^1_{\text{loc}}(\mathbb{R}^d)\), and \(\nabla u^* \in L^p(\mathbb{R}^d)\); moreover, the Pólya-Szegő principle holds for \(u\).

The vanishing condition ensures that \(u^* : \mathbb{R}^d \to [0, \infty]\) is defined and not identically \(\infty\). Moreover, to verify the Pólya-Szegő principle, it suffices to show it for \(|u|\) here, since given our set-up (which certainly ensures \(u \in W^{1, 1}_{\text{loc}}(\mathbb{R}^d)\)), we have \(|\nabla u| = |\nabla |u||\) pointwise a.e. by our results from the previous blog post.

So we may assume that \(u \geq 0\), and proceed to define for \(0 < \epsilon < 1\), \(L > 1\) $$u_{\epsilon, L} = \max(L, (u \ – \, \epsilon)_{+}),$$ which converges to \(u\) as we simultaneously send \(\epsilon \downarrow 0\) and \(L \uparrow \infty\), and in fact increases whenever we increase \(L\) or reduce \(\epsilon\).

By the results from last post, we see that $$\nabla u_{\epsilon, L} = \chi_{\{\epsilon < u < L + \epsilon\}} \nabla u,$$ and also, \(u_{\epsilon, L}\) is only nonzero on \(\{u > \epsilon\}\), meaning it is supported on a finite-measure set; and additionally always remains bounded, by the cap at \(L\).

Then we have the inequality $$\|\nabla (u_{\epsilon, L}^*)\|_{L^p(\mathbb{R}^d)} \leq \|\nabla u_{\epsilon, L}\|_{L^p(\mathbb{R}^d)} \leq \|\nabla u\|_{L^p(\mathbb{R}^d)},$$ uniformly in \(\epsilon\) and \(L\).

We would like to do the obvious thing and send \(\epsilon \downarrow 0\) and \(L \uparrow \infty\). The issue, of course, is identifying the limit object: given our hypotheses, how can we be sure \(u^*\) is even locally-integrable?

Measure-theoretic “vanishing at infinity” is a very weak condition (for instance, \(x \mapsto |x|^{- d}\) satisfies it), and it’s conceivable that the mass of \(u\) is distributed in such a way that when it all gets piled up at the origin, it concentrates (one imagines) into a massive spike. (Of course, radial decrease ensures \(u^* \in L^{\infty}_{\text{loc}}(\mathbb{R}^d \setminus \{0\})\), but we are concerned with a potential singularity at \(x = 0\) itself.) Why this is prevented, solely by the addition of gradient control, is not obvious.

To exclude this possibility, we will have to be careful.

We see that $$u_{\epsilon, L} \uparrow (u \ – \, \epsilon)_{+} = u_{\epsilon}$$ pointwise everywhere; it follows from the monotonicity property of the symmetrization operation \(^*\), noted at the end of the proof of Lemma IV in the paper, that \(u_{\epsilon, L}^* \uparrow u_{\epsilon}^*\) too.

We note that each $$\|\nabla u_{\epsilon, L}\|_{L^1(\mathbb{R}^d)} \leq \|\chi_{\{u > \epsilon\}} \nabla u\|_{L^1(\mathbb{R}^d)} \leq \lambda^d(\{u > \epsilon\})^{1 / p’} \|\nabla u\|_{L^p(\mathbb{R}^d)},$$ since \(\{u > \epsilon\}\) always has finite measure by hypothesis. It follows from this \(L^1\) bound, plus Pólya-Szegő for \(p = 1\), $$\|\nabla (u_{\epsilon, L}^*)\|_{L^1(\mathbb{R}^d)} \leq \|\nabla u_{\epsilon, L}\|_{L^1(\mathbb{R}^d)},$$ that we actually have $$\|u_{\epsilon, L}^*\|_{L^{1^*}(\mathbb{R}^d)} \lesssim_{d, \epsilon, u} \|\nabla u\|_{L^p(\mathbb{R}^d)},$$ uniformly in \(L\), from Sobolev embedding. (Here, we know that \(u_{\epsilon, L} \in L^1(\mathbb{R}^d) \cap L^{\infty}(\mathbb{R}^d)\) by its boundedness and its restriction to living on a set of finite measure, so that \(u_{\epsilon, L} \in W^{1, 1}(\mathbb{R}^d)\), and thus Pólya-Szegő validly applies.) Taking \(L \uparrow \infty\), this is enough to conclude that \(u_{\epsilon}^* \in L^{1^*}(\mathbb{R}^d)\) for every \(\epsilon > 0\) (albeit in an \(\epsilon\)-dependent manner).

We used the \(L\)-independent bounds to extract a component in \(L^{1^*}\) for the high components of \(u^*\). For the low components, we will do something different, as sending \(\epsilon \downarrow 0\) would introduce an uncontrollable amount of mass that could potentially coalesce into a singularity at the origin. Instead, we argue as follows:

Define a convex function \(\Phi : \mathbb{R} \to [0, \infty)\) by $$\Phi(x) = \max(- x, 0, x \ – \, \epsilon) = \begin{cases} x \ – \, \epsilon & x > \epsilon \\ 0 & 0 \leq x \leq \epsilon \\ – x & x < 0 \end{cases}$$ and applying Lemma IV, we have $$\int_{\mathbb{R}^d} \Phi(u^* \ – \, u_{\epsilon}^*) \, d\lambda^d \leq \int_{\mathbb{R}^d} \Phi(u \ – \, u_{\epsilon}) \, d\lambda^d;$$ Lemma IV was shown for all nonnegative functions \(f\), \(g\) vanishing at infinity; it did not require any integrability hypotheses on the functions whatsoever.

In particular, we can use this to control the left-hand difference, just by knowing that the right-hand difference is finite.

Since $$u_{\epsilon} \leq u \leq u_{\epsilon} + \epsilon,$$ we see that the right-hand integral is identically zero. By positivity, comparing with the integral on the left, this means that $$u_{\epsilon}^* \leq u^* \leq u_{\epsilon}^* + \epsilon$$ \(\lambda^d\)-a.e., by the structure of \(\Phi\); otherwise, if the upper inequality is violated nontrivially, we get a positive contribution.

So $$u^* = u_{\epsilon}^* + \epsilon g_{\epsilon}, \qquad u_{\epsilon}^* \in L^{1^*}(\mathbb{R}^d), \qquad 0 \leq g_{\epsilon} \leq 1.$$ From this, we get that \(u^*\) is indeed in \(L^1_{\text{loc}}(\mathbb{R}^d)\).

With this, we can go forward: we have $$- \int_{\mathbb{R}^d} (\nabla (u_{\epsilon, L}^*)) \phi \, d\lambda^d = \int_{\mathbb{R}^d} u_{\epsilon, L}^* \nabla \phi \, d\lambda^d,$$ for every \(\varphi \in C^{\infty}_c(\mathbb{R}^d)\) and all \(L > 1\); moreover, we saw from the polarization proof of Pólya-Szegő that the modulus of equi-integrability of the weak gradient is preserved under finitely-many polarizations.

So in the \(p = 1\) case, we have that $$\int_E |\nabla (u_{\epsilon, L}^{\sigma})| \, d\lambda^d \leq \sup_{\lambda^d(E) \leq \epsilon} \int_E |\nabla (u_{\epsilon, L})| \, d\lambda^d \leq \sup_{\lambda^d(E) \leq \epsilon} \int_E |\nabla u| \, d\lambda^d = \omega(\epsilon),$$ for all Borel sets \(E\) with \(\lambda^d(E) \leq \epsilon\). Of course, the modulus \(\omega : [0, \infty) \to [0, \infty)\) is decreasing; it is also continuous, essentially by the absolute continuity of the integral (plus the non-atomicity of the Lebesgue measure).

In the limit of polarizations, we obtain an element \(\vec{\mu_{\epsilon}} = \nabla (u_{\epsilon}^*) \in \mathcal{M}(\mathbb{R}^d)\) by weak\(^*\) compactness, so that for all compact sets \(K\) and all open sets \(U \supseteq K\), \(U \Subset \mathbb{R}^d\), one has $$\left| \int_{\mathbb{R}^d} \vec{f} \cdot \, d\vec{\mu_{\epsilon}} \right| \leq \omega(\lambda^d(U)),$$ first for all continuous \(\vec{f} : \mathbb{R}^d \to \mathbb{R}^d\) with \(|\vec{f}| \leq 1\) everywhere and with \(\text{supp}(\vec{f}) \subseteq U\), and then for all Borel \(\vec{f} : \mathbb{R}^d \to \mathbb{R}^d\) with \(|\vec{f}| \leq 1\) everywhere and vanishing outside a compact subset of \(U\).

It follows from manipulating Radon-Nikodym derivatives (or the Hahn decomposition) in the typical way that each component of \(\vec{\mu_{\epsilon}}\) is \(\lambda^d\)-absolutely continuous, and thus represented by a component of some vector \(\vec{g_{\epsilon}}\) of \(L^1(\mathbb{R}^d)\)-functions. This shows that the weak gradient is given by a \(L^1(\mathbb{R}^d; \mathbb{R}^d)\) function.

And so, we see that $$\int_K |\vec{g_{\epsilon}}| \, d\lambda^d \leq \omega(\eta)$$ whenever \(K \subseteq U\) is a compact set inside an open set of \(\lambda^d\)-measure \(\leq \eta\), and by regularity, we can extend this to open sets \(U\); taking a supremum over open sets, $$\sup_{\lambda^d(U) \leq \eta} \int_U |\nabla (u_{\epsilon}^*)| \, d\lambda^d \leq \omega(\eta).$$ By approximation, given any \(\eta > 0\), the supremum over open sets \(\lambda^d(U) \leq \eta\) gives the same value as the supremum over all sets \(\lambda^d(E) \leq \eta\), by absolute continuity and outer regularity. So this supremum really recovers the modulus of \(|\nabla (u_{\epsilon}^*)|\), evaluated at \(\eta\).

So we have a diminished modulus \(\omega_{\epsilon} \leq \omega\) for every \(\epsilon > 0\), and we repeat this entire argument once more with the \(\nabla (u_{\epsilon}^*) \in L^1\) to get a final weak gradient, for the pointwise (and monotone) limit of the \(u_{\epsilon}^*\) in \(L^1_{\text{loc}}(\mathbb{R}^d)\): \(u^*\).

This same reasoning ensures that the distributional derivative \(\nabla(u^*) \in \mathcal{M}(\mathbb{R}^d; \mathbb{R}^d)\) actually belongs to \(L^1\); as a consequence, we conclude that $$\int_{\mathbb{R}^d} |\nabla (u^*)| \, d\lambda^d \leq \int_{\mathbb{R}^d} |\nabla u| \, d\lambda^d,$$ and even more specifically, that \(\omega_* \leq \omega\).

After first taking the limit in \(L \to \infty\), identifying that weak\(^*\) limit measure with a Lebesgue density, then sending the limit in \(\epsilon \downarrow 0\), identifying that with a density too, we get $$- \int_{\mathbb{R}^d} \nabla (u^*) \phi \, d\lambda^d = \int_{\mathbb{R}^d} u^* \nabla \phi \, d\lambda^d.$$

For \(1 < p < \infty\), we can, of course, use weak/\(^*\) convergence to conclude that \(\nabla (u^*)\) exists in \(L^p(\mathbb{R}^d; \mathbb{R}^d)\), and moreover, by the same manipulations as above, one can additionally show that the modulus of integrability of \(|\nabla (u^*)|^p\) is no worse than that of our original \(|\nabla u|^p\).

Finally, when \(p = \infty\), one need not worry about the local integrability of \(u^*\), since this will naturally be bounded; the spherical decreasing rearrangement will preserve this bound, and we also need not take the upper-\(L\) truncations, as \(u\) is already bounded.

We see that if \(u \in W^{1, \infty}(\mathbb{R}^d)\) and \(u\) vanishes measure-theoretically at infinity, then it must actually vanish at infinity, in the sense that $$\lim_{|x| \to \infty} u(x) = 0.$$ This follows from the uniform continuity that results from belonging to the Lipschitz class. From this, we note the truncations \(u_{\epsilon}\) will actually be in \(W^{1, \infty}(\mathbb{R}^d) \cap C_c(\mathbb{R}^d)\), and so \(u\) also belongs to \(W^{1, p}(\mathbb{R}^d)\) for all \(1 \leq p < \infty\); hence we can apply and then take the limit \(p \to \infty\) in Lemma X.

This furnishes a uniform a priori bound on the distributional gradients of the \(\nabla(u_{\epsilon}^*)\), and so by sending \(\epsilon \downarrow 0\), and using weak\(^*\) convergence this time, we get that $$\|\nabla (u^*)\|_{L^{\infty}(\mathbb{R}^d)} \leq \|\nabla u^*\|_{L^{\infty}(\mathbb{R}^d)}.$$

From all of this, we can finally state the result we have obtained now:

Theorem: Let \(1 \leq p \leq \infty\), and suppose \(u : \mathbb{R}^d \to \mathbb{R}\) is a locally-integrable function which vanishes measure-theoretically at infinity. Further suppose the distributional derivative \(\nabla u\) of \(u\) belongs to \(L^p(\mathbb{R}^d; \mathbb{R}^d)\).

Then the spherical decreasing rearrangement \(u^* : \mathbb{R}^d \to [0, \infty]\) is an \(L^1_{\text{loc}}(\mathbb{R}^d)\) function, and its distributional derivative \(\nabla (u^*)\) is in \(L^p(\mathbb{R}^d; \mathbb{R}^d)\); moreover, $$\|\nabla (u^*)\|_{L^p(\mathbb{R}^d)} \leq \|\nabla u\|_{L^p(\mathbb{R}^d)},$$ and when \(1 \leq p < \infty\), we have the stronger result that the \(\lambda^d\)-modulus of integrability of \(|\nabla(u^*)|^p \in L^1(\mathbb{R}^d)\) is always dominated by that of the original \(|\nabla u|^p\).


This is the most general form of the classical Pólya-Szegő principle of which I know (not counting the more-modern variants with generalized Orlicz norms and energy functionals); it requires no global integrability properties on the function \(u\) itself, only on the gradient.

Because of this, performing the steps of the truncation argument require some care; it seems like many authors (including the argument in Burchard’s notes, as well as several other sources) claim that this simply follows from performing a truncation controlled by a single parameter, \(\epsilon > 0\), lowering \(u\) by \(\epsilon\), considering anything positive which remains, and cutting that further at height \(\epsilon^{- 1}\). The claim seems to be that after this truncation step, everything is legitimate and we need only worry about the limiting behavior of the gradient (the original function is forgotten).

(Unfortunately, even Lieb & Loss, in their attempted proof of the vanishing-at-infinity extension (Lemma 7.17, Part 1, Reduction To \(L^2\)), neglects this step, which is quite frustrating; the argument above for local integrability isn’t exactly obvious, and it would be nice if this hadn’t been skated-over.)

As we saw, one must be careful with showing that \(u^*\) remains locally integrable when taking limits; the entire issue is that while the weak gradients converge to something, that does not imply that the original functions are converging to anything at all, even distributionally (or, more precisely, this could be deduced from Sobolev embedding if \(1 \leq p < d\), but this global-mass control fails when \(p \geq d\) and no substitute seems available here).

So, the key was to control the high-part of \(u\) in a suitably low-indexed Sobolev space; for this, we had to restrict the weak gradient \(\nabla u\) to a finite-measure set, which meant keeping \(\{u > \epsilon\}\) as a fixed base, without changing the value of \(\epsilon\). But keeping \(\epsilon > 0\) fixed would be incompatible with lifting the upper truncation of \(\epsilon^{- 1}\) to infinity; so we require two distinct parameters to control the process. This is one of those extension steps that’s trickier than it may appear.

Leave a Reply

(In)Complete Thoughts

Mathematical musings & miscellany

Discover more from (In)Complete Thoughts

Subscribe now to keep reading and get access to the full archive.

Continue reading