Interpreting the Christ-Weinstein Fractional Chain Rule

Written

by

This post is really a kind of note-to-self (and, I guess, should make sense to anyone else who has taken UCLA’s MATH 247B course with Professor Killip). I’m hoping to resolve some technicalities in the statement of a result from that winter, but I’m trying not to add a whole new section in my lecture notes and potentially-ruin the layout. (Small changes upstream in a large PDF can have unpredictable consequences, as experience has shown.) I was also motivated to write out the details clearly after seeing an old MathOverflow question, slightly related to this.

Consequently, this post is rather specialized, and many statements will not have citations, in contrast to my usual style; when this happens, I will be referring to other parts of my own notes. This is also a very incomplete and rough sketch; most of what follows is just trivial observations and arguments, pushed through to their natural conclusion. In particular, we get some parameter ranges that are likely very far from sharp, and others which we ignore altogether.

The Christ-Weinstein fractional chain rule can be formally stated as follows: given a function \(F : \mathbb{C} \to \mathbb{C}\), which satisfies (for a function \(G \geq 0\) on \(\mathbb{C}\)) the bound $$|F(u) \ – \, F(v)| \leq |u \ – \, v| [G(u) + G(v)],$$ for all \(u, v \in \mathbb{C}\), taking any \(u \in \mathcal{S}(\mathbb{R}^d)\), \(0 < s < 1\), we have $$\||\nabla|^s (F \circ u)\|_{L^p(\mathbb{R}^d)} \lesssim \||\nabla|^s u\|_{L^{p_1}(\mathbb{R}^d)} \|G(u)\|_{L^{p_2}(\mathbb{R}^d)},$$ where \(1 < p, p_1 < \infty\) satisfy \(\frac{1}{p} = \frac{1}{p_1} + \frac{1}{p_2}\). (For brevity, we will not notate any dependence on the \(p\), \(p_i\), \(s\) in the occurrences of \(\lesssim\) in this post.)

This is, as we must stress, a formal statement. The precise sense in which these objects exist was not discussed in class.

To start, we have \(|\nabla|^s\) defined initially as a Fourier multiplier operator, which is always unbounded for any \(s \neq 0\). It is unclear why \(F \circ u\), which is not necessarily Schwartz, admits a fractional derivative under this definition, at all.

Indeed, in the proof, we do not really compute the fractional derivative and then take a \(L^p\) norm; instead, we use the equivalence $$\||\nabla|^s f\|_{L^p(\mathbb{R}^d)} \simeq \left\| \bigg( \sum_N N^{2 s} |P_N f|^2 \bigg)^{1 / 2} \right\|_{L^p(\mathbb{R}^d)}$$ for \(f \in \mathcal{S}(\mathbb{R}^d)\), \(1 < p < \infty\); this is a double-sided estimate of the norm, and so the dyadic counterpart is comparable on this proper subclass, but it is not clear that its finiteness characterizes an admissible element of the domain (heretofore still-unspecified) of the fractional derivative.

We will require some reasonable conditions on \(F\) to start: first, we assume \(F(0) = 0\), so that $$|F(u)| \leq |u| [G(u) + G(0)] = |u| G(u) + G(0) |u|,$$ and then the second term has superpolynomial decay, while the first is a product of the modulus of a Schwartz function and a function in \(L^{p_2}(\mathbb{R}^d)\), by hypothesis; hence \(F \circ u \in L^1(\mathbb{R}^d) \cap L^{p_2}(\mathbb{R}^d)\). (For the general case, we could simply imagine the statement to be about the function \(F(u) \ – \, F(0)\), instead of \(F(u)\).)


We now pause to discuss the situation from an abstract point of view. Again, as emphasized above, this is surely not-at-all-close to anything sharp.

There are two “obvious” possibilities for defining the domain of the fractional derivative \(|\nabla|^s\), \(0 < s < 1\), as an unbounded operator on Lebesgue spaces \(L^q \to L^p\).

Our hypotheses are as follows: we assume we have indices \(1 < p, q < \infty\) satisfying the HLS-type condition \(\frac{1}{q} \geq \frac{1}{p} \ – \, \frac{s}{d}\). These requirements are for technical reasons, which will make themselves known in our argument. We consider:

  • First, by duality, as the subspace of all \(f \in L^q(\mathbb{R}^d)\) which admit a fractional derivative by “duality”: that is, there exists \(g \in L^p(\mathbb{R}^d)\) with $$\langle g, h \rangle = \langle f, |\nabla|^s h \rangle$$ for all \(h \in \mathcal{S}(\mathbb{R}^d)\). (Recall that \(|\nabla|^s h \in L^r(\mathbb{R}^d)\) for all \(1 < r < \infty\) whenever \(h\) is Schwartz, so the dual-pairing on the right is well-defined, as \(f \in L^q(\mathbb{R}^d)\), \(1 < q < \infty\).1 This is the “minimal” definition; it takes the basic feature that we should expect \(|\nabla|^s f\) to satisfy and considers all possible functions in this space which can reproduce it.)
  • Second, by completion: we define the function-space norm $$\|f\|_{L^q(\mathbb{R}^d)} + \||\nabla|^s f\|_{L^p(\mathbb{R}^d)}$$ on \(\mathcal{S}(\mathbb{R}^d)\), and take the completion of the Schwartz space with respect to this norm. Obviously, any element of the completion, which can naturally be identified as a subspace of \(L^q(\mathbb{R}^d)\), will satisfy the above “minimal” definition. But here, we begin with a very small class of functions and take the completion; this is the “maximal” definition; it is the most restrictive, requiring not only that the duality-relationship be satisfied, but also a means of approximating the element and its fractional derivative simultaneously.

We need only show that elements in the first space can be arbitrarily well-approximated by Schwartz functions, with the approximation occurring in the graph norm given above. We will first do this under the assumption that our inequality, \(\frac{1}{q} \geq \frac{1}{p} \ – \, \frac{s}{d}\), is strict.

Take any qualifying \(f \in L^q(\mathbb{R}^d)\), and note that by the duality pairing, $$\langle f * \varphi_{\epsilon}, |\nabla|^s h \rangle = \langle g, h * \varphi_{\epsilon} \rangle = \langle g * \varphi_{\epsilon}, h \rangle,$$ where we use the fact that \((|\nabla|^s h) * \varphi_{\epsilon} = |\nabla|^s (h * \varphi_{\epsilon})\) due to both the convolution and the fractional derivative being Fourier multiplier operators acting on Schwartz functions.

Then take a dilated cutoff \(\eta_N(x) = \eta(x / N)\). Note that we have $$\langle (f * \varphi_{\epsilon}) \eta_N, |\nabla|^s h \rangle = \langle (|\nabla|^s h) \eta_N \ – \, |\nabla|^s(h \eta_N), f * \varphi_{\epsilon} \rangle + \langle g * \varphi_{\epsilon}, h \eta_N \rangle.$$

Since we now have the Schwartz function \((f * \varphi_{\epsilon}) \eta_N\), we can now validly move the fractional derivative over, to consider the expression $$\langle |\nabla|^s ((f * \varphi_{\epsilon}) \eta_N) \ – \, g, h \rangle,$$ and this is equal to a sum of two terms: $$\langle (|\nabla|^s h) \eta_N \ – \, |\nabla|^s(h \eta_N), f * \varphi_{\epsilon} \rangle,$$ and $$\langle g * \varphi_{\epsilon}, h \eta_N \rangle \ – \, \langle g, h \rangle.$$

We claim the first term can be made arbitrarily-small for large \(N\). To do this, we use Hölder to bound this by $$\|(|\nabla|^s h) \eta_N \ – \, |\nabla|^s(h \eta_N)\|_{L^{r’}(\mathbb{R}^d)} \|f * \varphi_{\epsilon}\|_{L^r(\mathbb{R}^d)},$$ where we first take $$q \leq r < \infty$$ to be later specified. We then apply the general case of Young, with \(1 + \frac{1}{r} = \frac{1}{t} + \frac{1}{q}\) (there exists a \(1 \leq t \leq \infty\), by our assumption that \(q \leq r\)), to control the second term by $$\leq \|\varphi_{\epsilon}\|_{L^t(\mathbb{R}^d)} \|f\|_{L^q(\mathbb{R}^d)} = c_{d, t} \epsilon^{- d / t’} \|f\|_{L^q(\mathbb{R}^d)}.$$

Then, we will pull out a very powerful tool from later in our notes: the commutator estimate for fractional derivatives, to control the first term in the above product by $$\lesssim \||\nabla|^s \eta_N\|_{L^{r_1}(\mathbb{R}^d)} \|h\|_{L^{r_2}(\mathbb{R}^d)},$$ where \(r_1\), \(r_2\) satisfy \(\frac{1}{r’} = \frac{1}{r_1} + \frac{1}{r_2}\) with \(1 < r_1 < \infty\). In particular, if we wish to take \(r_2 = p’\), we need \(\frac{1}{r’} \ – \, \frac{1}{p’} > 0\), or equivalently, \(\frac{1}{r_1} = \frac{1}{p} \ – \, \frac{1}{r} > 0\); this means we must also enforce that $$p < r < \infty.$$

We now compute $$(|\nabla|^s \eta_N)(x) = \int_{\mathbb{R}^d} |\xi|^s \widehat{\eta_N}(\xi) e^{2 \pi i \xi \cdot x} \, d\xi = \int_{\mathbb{R}^d} |\xi|^s N^d \widehat{\eta}(N \xi) e^{2 \pi i \xi \cdot x} \, d\xi = N^{- s} (|\nabla|^s \eta)(x / N),$$ so that the \(L^{q_1}\) norm is $$\||\nabla|^s \eta\|_{L^{r_1}(\mathbb{R}^d)} N^{- s} N^{d / r_1}.$$

Now, the second term in the expression is controlled by $$\leq \|(g * \varphi_{\epsilon}) \eta_N \ – \, g\|_{L^p(\mathbb{R}^d)} \|h\|_{L^{p’}(\mathbb{R}^d)},$$ so that overall, we have $$|\langle |\nabla|^s ((f * \varphi_{\epsilon}) \eta_N) \ – \, g, h \rangle| \leq (C N^{d / p \ – \, d / r \ – \, s} \epsilon^{- d / t’} + \|(g * \varphi_{\epsilon}) \eta_N \ – \, g\|_{L^p(\mathbb{R}^d)}) \|h\|_{L^{p’}(\mathbb{R}^d)}.$$

We can obviously take the supremum in \(h \in \mathcal{S}(\mathbb{R}^d)\) with normalized \(L^{p’}(\mathbb{R}^d)\) norm, knowing that \(|\nabla|^s ((f * \varphi_{\epsilon}) \eta_N) \ – \, g\) is a true \(L^p(\mathbb{R}^d)\) function. It only remains to choose are parameters carefully (recall that we have \(q\), \(\epsilon\), and \(N\) uninitialized right now), and show the right-hand side can be made arbitrarily-small.

We first note that because \(g \in L^p(\mathbb{R}^d)\), there exists \(0 < \epsilon_0 \ll 1\) such that \(g * \varphi_{\epsilon_0}\) is \(\delta / 4\)-close in \(L^p\) norm to \(g\) itself. Then by spatial truncations, there exists a value \(N_0 \gg 1\) such that for all \(N > N_0\), we have \((g * \varphi_{\epsilon_0}) \eta_N\) within \(\delta / 4\) of the spatially-untruncated convolution of \(g\).

It follows that the second term in the sum above is well-controlled, and after fixing our \(\epsilon\) to be \(\epsilon = \epsilon_0\), we are free to choose any \(N > N_0\) while ensuring its contribution is always \(< \delta / 2\) in size.

We would like to take our \(r > p\) to be so close that \(\frac{s}{d} > \frac{1}{p} \ – \, \frac{1}{r}\) is satisfied. Then we would have a negative power of \(N\), and the \(\epsilon^{- d / t’}\) coefficient is now a large (indeed, titanic) but finite and fixed value, by our decision to take \(\epsilon = \epsilon_0\). Then we may take \(N\) as large as we wish, and this means the first term will eventually fall to below \(\delta / 2\) as well.

We can take \(r > p\) to be very close, but not without possibly violating the \(r \geq q\) condition. Since \(\frac{1}{r} \leq \frac{1}{q}\), the desired inequality would necessarily imply \(\frac{s}{d} > \frac{1}{p} \ – \, \frac{1}{q}\). Conversely, assume \(p \leq q\). Then if \(\frac{s}{d} > \frac{1}{p} \ – \, \frac{1}{q}\), taking \(r = q + \delta\) for some \(0 < \delta \ll 1\) means that \(r > q \geq p\), while \(\frac{s}{d} > \frac{1}{p} \ – \, \frac{1}{r}\), as desired.

Otherwise, we have \(p > q\). For this, we may simply take \(r > p\) very close, so that \(\frac{1}{p} \ – \, \frac{1}{r}\) becomes arbitrarily small, and in particular, is less than the value \(\frac{s}{d} > 0\). Such a choice of \(r\) will also guarantee \(r > p \geq q\). So in this case, the conditions are likewise satisfied.

By this process, we get a function \((f * \varphi_{\epsilon}) \eta_N \in C^{\infty}_c(\mathbb{R}^d)\) which always approximates \(f\) in \(L^q\), and its fractional derivative, as we have shown, approximates \(g\) in \(L^p\); modifying the approximation at the frequency origin, if we like, we can ensure that the mapping under \(|\nabla|^s\) is Schwartz too, so that \(g\) is approximated by a Schwartz function.

In the case when we have do not have strict inequality, this is even easier: the existence of such \(p\) and \(q\) related by the relation (which is now an equality) means that the Riesz potential of order \(s\) forms a bounded map \(L^p \to L^q\). Moreover, from the distributional relation, one can verify that \(I_s(g) = f\). (Indeed, we can test against \(h = I_s(q)\) for Schwartz \(q\) vanishing in frequency space in a neighborhood of the origin, and use that \(I_s\) undoes \(|\nabla|^s\) and vice versa.)

So one can approximate \(g\) in \(L^p\) by a Schwartz function with compact frequency support in \(\mathbb{R}^d \setminus \{0\}\), and \(I_s\) maps that to a Schwartz function approximating \(f\) in \(L^q\). That will be our approximant for \(f\), and the fractional derivative (by the inverses relationship) maps this to precisely the original element we had, approximating \(g\).

Now, we have shown that any function which satisfies the first condition, in the duality-based sense, is actually strongly-approximable in the second sense. We can state this most simply in the diagonal case:

Proposition: Any function \(f \in L^p(\mathbb{R}^d)\) satisfying the first, “minimal” condition above (n.b.: \(q = p\)) must be an element of the space \(H^{s, p}(\mathbb{R}^d)\), the standard fractional Sobolev space defined by the Fourier multiplier/Bessel potential approach.


Returning to the situation at hand: we recall that our hypothesis \(G(u) \in L^{p_2}(\mathbb{R}^d)\) gives \(F \circ u \in L^1(\mathbb{R}^d) \cap L^{p_2}(\mathbb{R}^d)\). But we also have \(\frac{1}{p} = \frac{1}{p_1} + \frac{1}{p_2}\), so that \(\frac{1}{p} \geq \frac{1}{p_2}\) and then \(p \leq p_2\). Hence \(F \circ u \in L^p(\mathbb{R}^d)\).

We also get that the dyadic square function is in \(L^p\), which means, by taking a frequency localization and using Littlewood-Paley, that \(\sum_N N^s P_N (F \circ u)\) exists and converges unconditionally in \(L^p(\mathbb{R}^d)\).

Writing this series as \(\tilde{g}\), and for a Mikhlin multiplier operator \(T\) to be specified, $$\langle T \tilde{g}, h \rangle = \sum_N \langle T(N^s P_N(F \circ u)), h \rangle,$$ so we can get $$\sum_N \langle F \circ u, N^s P_N (T^* h) \rangle = \sum_N \langle F \circ u, T(N^s P_N h) \rangle.$$

It quickly becomes apparent that for the unbounded multiplier \(m(\xi) = \sum_N N^s \varphi_N(\xi)\), if \(h \in \mathcal{S}(\mathbb{R}^d)\) is Schwartz, then \(\sum_N N^s P_N h\) converges to \(\mathcal{F}^{- 1}(m \widehat{h})\) in any \(L^p\) space, \(1 < p < \infty\).

So we may take \(T\) to be the bounded operator given by the well-defined multiplier $$m(\xi) = \frac{|\xi|^s}{\sum_N N^s \varphi_N(\xi)},$$ and it is straightforward to verify that \(m\) is Mikhlin. Hence \(F \circ u\) satisfies the first definition; the function \(g = T \tilde{g} \in L^p(\mathbb{R}^d)\) obeys $$\langle g, h \rangle = \langle F \circ u, |\nabla|^s h \rangle$$ for all Schwartz functions.

By our equivalence above, because \(F \circ u\) belongs to the first class, it also is an element of the second. This allows us to clarify the meaning of the fractional chain rule result:

Theorem: Let \(s\), \(p\), \(p_i\), and \((F, G)\) be related by the hypotheses of the fractional chain rule, above, and take any \(u \in H^{s, p_1}(\mathbb{R}^d) \cap L^p(\mathbb{R}^d)\). Further suppose that \(F(0) = 0\). Then \(F \circ u : \mathbb{R}^d \to \mathbb{C}\) is an element of the Bessel potential space \(H^{s, p}(\mathbb{R}^d)\), and thus \(|\nabla|^s (F \circ u)\) is well-defined in the familiar sense and obeys the claimed \(L^p\) bounds.

With this, we can finally make sense of the fractional derivative in question. Then the natural hypotheses of the fractional chain rule do allow us to fully specify the meaning of the expressions in the estimate.

Remark 1: We also note the following small improvement: that because \(F \circ u\) is controlled in size by the sum of \(|u| G(u)\) and \(c |u|\), for more general \(u \in L^q\), we need only get \(q \geq p_2’\) to get \(F \circ u \in L^r + L^q\), where \(1 \leq r \leq p_2\) is determined by the Hölder-type relation between \(q\) and \(p_2\). Then, provided we additionally have \(q \leq p\), because \(p \leq p_1\), we get \(F \circ u\) being in \(L^r + L^q\) where \(r \leq p\), and of course \(q \leq p\) also.

The unconditional computation of the dyadic square function of the original proof means that we do have partial control of the standard Littlewood-Paley square function of \(F \circ u\): that is, specifically for the high-frequency components, $$\left\| \bigg( \sum_{N \geq 1} |P_N (F \circ u)|^2 \bigg)^{1 / 2} \right\|_{L^p(\mathbb{R}^d)} < \infty.$$ By our remarks above, we may apply Bernstein to get that \(P_{\leq 1}(F \circ u) \in L^p(\mathbb{R}^d)\) also.

Putting these two components together, it follows that \(F \circ u \in L^p\); this provides an alternate set of hypotheses to the statement above. (That is, \(u \in L^p\) can be relaxed to requiring \(u \in L^r\) for any \(p_2^{\prime} \leq r \leq p\), or a suitable sum-space.)

Remark 2: We recall that in many circumstances where the fractional chain rule is used, it is being applied to power-type nonlinearities in PDE (e.g., \(|u|^{p \ – \, 1} u\) or things vaguely resembling). For these functions, the derivatives \(F’\) often obey a type of quasiconvexity property: \(|F'((1 \ – \, \theta) u + \theta v)| \lesssim |F'(u)| + |F'(v)|\), so that we can take \(G \simeq |F’|\). In this case, we get a fractional chain rule of precisely \(\||\nabla|^s (F \circ u)\|_{L^p(\mathbb{R}^d)} \lesssim \||\nabla|^s u\|_{L^{p_1}(\mathbb{R}^d)} \|F'(u)\|_{L^{p_2}(\mathbb{R}^d)}\).

Remark 3: The above computations were slightly long because we required norm convergence of the approximating functions; this is a relatively-demanding requirement. It was this, combined with the various parameter ranges of the inequalities we used, which produced the \(q \leq “p_s^*”\) type requirement at the beginning. The above proof shows that in general, for \(f \in L^q\) with \(g \in L^p\), without any restrictions on \(1 < p, q < \infty\), the mollified, smoothly-truncated approximations of \(f\) split into two components, upon application of the fractional derivative operator: a norm-convergent one in \(L^p\), upon selecting the right subsequences of \(\epsilon\) and \(N\); and a Leibniz-type error term converging weakly in the sense of tempered distributions, which can be refined to \(L^p\) convergence with our stated conditions on the indices.


  1. We note here that this is implicitly how we are choosing to resolve one of the issues with the pure fractional derivative operator \(|\nabla|^s\); it maps \(\mathcal{S}(\mathbb{R}^d)\) into \(L^1(\mathbb{R}^d) \cap L^{\infty}(\mathbb{R}^d) \cap C^{\infty}(\mathbb{R}^d)\) for all \(s > 0\), but not into \(\mathcal{S}(\mathbb{R}^d)\) itself; this means it does not admit a nice distributional transpose operator. Consequently, we are adopting a slightly ad-hoc approach to resolving the issue: treating the \(|\nabla|^s\) operator as acting on \(L^p\) spaces for varying index, and producing a different domain for each. ↩︎

Leave a Reply

(In)Complete Thoughts

Mathematical musings & miscellany

Discover more from (In)Complete Thoughts

Subscribe now to keep reading and get access to the full archive.

Continue reading