Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY METRICS AND UNIQUENESS OF THE
SOLUTION TO THE BOLTZMANN EQUATION FOR A
MAXWELL GAS
G. TOSCANI AND C. VILLANI
Abstract. We consider a metric for probability densities with
finite variance on Rd , and compare it with other metrics. We use
it for several applications, including a uniqueness result for the
solution of the spatially homogeneous Boltzmann equation for a
gas of true Maxwell molecules.
Key-words : spatially homogeneous Boltzmann equation, probability
metrics, Maxwellian molecules.
1. Introduction
Denote by Ps (Rd ), s > 0, the class of all probability distributions F
on Rd , d ≥ 1, such that
Z
|v|s dF (v) < ∞.
Rd
We introduce a metric on Ps (Rd ) by
(1)
|fb(ξ) − gb(ξ)|
|ξ|s
ξ∈Rd
ds (F, G) = sup
where fb is the Fourier transform of F ,
Z
fb(ξ) =
e−iξ·v dF (v).
Rd
Let us write s = m + α, where m is an integer and 0 ≤ α < 1. In
order that ds (F, G) be finite, it suffices that F and G have the same
moments up to order m.
The norm (1) has been introduced in [6] to investigate the trend to
equilibrium of the solutions to the Boltzmann equation for Maxwell
molecules. There, the case s = 2 + α, α > 0, was considered. Further
applications of ds , with s = 4, were studied in [3], while the cases s = 2
and s = 2 + α, α > 0, have been considered in [4] in connection with
the so-called Mc Kean graphs [8].
1
2
G. TOSCANI AND C. VILLANI
In this paper, we shall be interested with the case s = 2. To understand why this case separates in a natural way from the other ones, let
us briefly introduce and discuss other well-known metrics on Ps (Rd ).
Let F , G in Ps (Rd ), and let Π(F, G) be the set of all probability distributions L in Ps (Rd × Rd ) having F and G as marginal distributions.
Let
Z
(2)
Ts (F, G) = inf
|v − w|s dL(v, w).
L∈Π(F,G)
1/s
Then τs = Ts metrizes the weak-* topology T W∗ on Ps (Rd ). We note
that T1 is the Kantorovich-Vasershtein distance of F and G [7, 18]. For
a detailed discussion, and application of these distances to statistics and
information theory, see Vajda [17]. See also [10] for a recent application
to kinetic theory.
The case s = 2 was introduced and studied independently by Tanaka [15] who, in the one-dimensional case d = 1, applied T2 to the study
of Kac’s equation. Subsequently, the properties of T2 were studied in
the multidimensional case by Murata and Tanaka [9]. Applications
to the kinetic theory of rarefied gases were finally studied by Tanaka
in [16]; several of these applications were given a simplified proof in [12].
The importance of Tanaka’s distance τ2 mainly relies on its convexity
and superadditivity with respect to rescaled convolutions. We recall
this property, that is at the basis of most of the applications of T2 .
Let {X0 , Y0 }, {X1 , Y1 } be two independent pairs of random variables,
and let Fi (resp. Gi ) be the probability distribution of Xi (resp. Yi ), i =
0, 1.
Gλ√
) be the probability distribution
√ For 0 <
√
√ λ < 1, let Fλ (resp.
of λX0 + 1 − λX1 (resp. λY0 + 1 − λY1 ), i.e.
µ
¶
µ
¶
1
·
·
1
(3)
Fλ = d/2 F0 √
∗
F1 √
.
λ
(1 − λ)d/2
1−λ
λ
Then,
(4)
T2 (Fλ , Gλ ) ≤ λT2 (F0 , G0 ) + (1 − λ)T2 (F1 , G1 ).
Superadditivity is also known for convex functionals (relative entropies), like Boltzmann’s relative entropy
Z
f (v)
f
dv,
(5)
H(f |M ) =
f (v) log f
M (v)
Rd
where f is a probability density and M f is the Gaussian density with
the same mean vector and variance as those of f . This means that
the property (4) holds with T2 replaced by H and {G0 , G1 } replaced
by {M f0 , M f1 }. This is a consequence of Shannon’s entropy power
PROBABILITY METRICS AND UNIQUENESS
3
inequality (Cf [13, 1]). The same property holds for the relative Fisher
information,
Z
¯
¯
f
¯∇ log f (v) − ∇ log M f (v)¯2 f (v) dv
(6)
I(f |M ) =
Rd
(see again [13, 1]). As discussed by Csiszar [5], by means of the relative entropy H, one can define the so-called H-neighbourhoods. Even
if those do not define a topological space, in the usual sense, their topological structure is finer than the metric topology defined by the total
variation distance,
Z
ν(f, g) = |f (v) − g(v)| dv,
in the sense that
(7)
ν(f, g) ≤
p
2H(f |g),
which is the so-called Csiszar-Kullback inequality.
It turns out that these properties of superadditivity and convexity
also hold for d2 , the proofs being in fact much more simple. As an
illustration of the interest of these properties, we shall give a version
of the central limit theorem and a very simple proof of Kac’s theorem.
We shall also apply d2 to the study of the Boltzmann equation with
Maxwellian molecules,
∂f
(8)
(t, v) = Q(f, f )(t, v)
∂t
µ
¶
Z
u·n
=
σ
[f (v 0 )f (w0 ) − f (v)f (w)] dw dn,
|u|
3
2
R ×S
where u = v − w is the relative velocity of colliding particles with
velocity v and w, and
v + w |u|
v + w |u|
v0 =
+
n, w0 =
−
n
2
2
2
2
are the postcollisional velocities. σ(ν) is a nonnegative function which
for true Maxwell molecules has a nonintegrable singularity of the form
(1 − ν)−5/4 as ν → 1. Usually, one truncates σ in some way, so that it
become integrable (cut-off assumption).
We shall prove that d2 shares a remarkable property with Tanaka’s
distance : it is nonexpanding with time along trajectories of the Boltzmann equation; that is, if f and g are two such solutions,
(9)
d2 (f (t), g(t)) ≤ d2 (f (0), g(0)).
This holds even if σ is singular. As an immediate corollary, we obtain
that the solution to the Cauchy problem for the Boltzmann equation is
4
G. TOSCANI AND C. VILLANI
unique. Up to our knowledge, this is so far the only uniqueness result
available for long-range interactions.
The organization of the paper is as follows. First, in section 2, we
investigate the connections between several distances on P2 (Rd ), including d2 and τ2 . In section 3, we establish the basic properties of
superadditivity for d2 and give applications. Then, in section 4, we
apply this metric to the study of the Boltzmann equation.
2. Metrics on P2 (Rd )
In order that d2 be well-defined, we need to restrict it to some space
of probability densities with the same mean vector. For simplicity, we
shall restrict to probability measures with zero mean vector, and we
shall work on
½
¾
Z
Z
d
2
(10)
Dσ = F ∈ P2 (R );
vi dF (v) = 0, |v| dF (v) = d · σ
where σ is some positive real number. We begin with two elementary
lemmas.
Lemma 1. Let (Fn ) in Dσ , Fn * F in P0 (Rd ). Then
Z
(11)
F ∈ Dσ ⇐⇒ lim sup
|v|2 1|v|≥K dFn (v) = 0.
K→∞ n→∞
R
Proof. It is clear that if limK→∞ supn→∞ |v|2 1|v|≥K dFn (v) = 0, then
F ∈ Dσ . On the other hand, if this condition
is not satisfied, then
R 2
there exists ε > 0 and (Kn ) → ∞ with |v|
R 1|v|≤Kn dFn (v) ≤ d · σ − ε.
Since |v|2 1|v|≤Kn * |v|2 , this implies that |v|2 dF (v) ≤ dσ − ε, hence
F ∈
/ Dσ .
¤
Lemma 2. Let (Fn ) and F in Dσ , such that Fn * F . Then, there
exists a nonnegative function φ such that φ(r)/r −→ ∞ as r → ∞,
which can be chosen smooth and convex, and a constant M > 0, such
that
Z
(12)
sup φ(|v|2 ) dFn (v) ≤ M.
n
Proof. Using Lemma 1, copy the construction thatRwas done in [6] for
one single function : there exists k1 such that supn |v|≥k1 |v|2 dFn (v) ≤
R
1/2; then there exists k2 ≥ k1 such that supn |v|≥k2 |v|2 dFn (v) ≤ 1/4;
and so on... for all p ≥ 2, there exists kp ≥ kp−1 such that
Z
1
sup
|v|2 dFn (v) ≤ p .
2
n
|v|≥kp
PROBABILITY METRICS AND UNIQUENESS
5
Then choose φ(|v|2 ) = p|v|2 if |v| ∈ [kp , kp+1 ); smooth this function and
slow its growth if necessary, as in [6].
¤
In the sequel of the paper, for M > 0, we shall denote by
½
¾
Z
M
d
2+α
(13)
P2+α (R ) = F ∈ Dσ ;
|v|
dF (v) ≤ M ,
(14)
PφM (Rd )
½
¾
Z
2
= F ∈ Dσ ;
φ(|v| ) dF (v) ≤ M
for φ nonnegative, φ(r)/r −→ ∞ as r → ∞. Lemma 2 enables us to
restrict to spaces PφM in all the cases when one is interested with weak
convergence in Dσ .
In addition to the metrics d2 and τ = τ2 which were introduced in
the last section, we consider
• Prokhorov’s distance ρ(F, G) : for δ ≥ 0 and U ⊂ Rd , we define
©
ª
©
ª
U δ = v ∈ Rd ; d(v, U ) < δ ,
U δ] = v ∈ Rd ; d(v, U ) ≤ δ ,
where d(v, U ) = inf{kv − wk, w ∈ U }. Let
©
ª
σ(F, G) = inf ε > 0/F (A) ≤ G(Aε ) + ε for all closed A ⊂ Rd ;
we set
(15)
ρ(F, G) = max{σ(F, G), σ(G, F )}.
• the (C m )∗ distance kF − Gk∗m : for m ≥ 1, let C m (Rd ) be the set
of m-times continuously differentiable functions, endowed with
its natural norm k · km . Then, let
¯
½¯Z
¾
¯
¯
∗
m
¯
¯
(16)
kF − Gkm = sup ¯ ϕ dF (v)¯ , ϕ ∈ C , kϕkm ≤ 1 .
Theorem 1. Let (Fn ) in Dσ and F in P2 (Rd ). Then, the following
statements are equivalent.
(i) Fn * F and F ∈ Dσ ;
(ii) τ (Fn , F ) −→ 0;
(iii) ρ(Fn , F ) −→ 0 and F ∈ Dσ ;
(iv) For any m ≥ 1, kFn − F k∗m −→ 0, and F ∈ Dσ ;
(v) d2 (Fn , F ) −→ 0.
Proof. Here we shall only show (i) ⇐⇒ (v). First, if d2 (Fn , F ) −→ 0,
then obviously, for all ξ ∈ Rd \ {0}, fbn (ξ) −→ fb(ξ); on the other hand,
fbn (0) = fb(0) = 1. This entails that Fn * F ; it remains to prove that
F ∈ Dσ . But for fixed ξ, |ξ| = 1, t ≥ 0, since the first derivatives of
6
G. TOSCANI AND C. VILLANI
fbn and fb coincide at the origin, and since their Fourier transforms are
twice continuously differentiable,
¯
¯
¯
¯ ¯
¯
b
b
f
(tξ)
−
f
(tξ)
¯ 2 b
¯ ¯
¯
n
b
D
(
f
−
f
)(0)
·
(ξ,
ξ)
=
lim
¯
¯ ¯
¯ ≤ d2 (Fn , F ) −→ 0.
n
¯t→0
¯
t2
Conversely, suppose that Fn * F ∈ Dσ . By Lemma 2, there exists
φ and M such that Fn ∈ PφM . By Lemma 3.1 of [6], all D2 fbn have a
uniform modulus of continuity. In particular, for ε > 0, there exists
δ > 0 such that
∀n |ξ| ≤ δ ⇒ |D2 fbn (ξ) − D2 fbn (0)| ≤ ε.
Since
Z 1h
i µξ ξ ¶
fbn (ξ) − fb(ξ)
2b
2b
=
D fn (tξ) − D fn (0) ·
,
(1 − t) dt
|ξ|2
|ξ| |ξ|
0
iµ ξ ξ ¶ Z 1h
iµ ξ ξ ¶
1h 2b
2b
2b
2b
+ D fn (0) − D f (0)
,
−
D f (tξ) − D f (0) ·
,
(1−t) dt,
2
|ξ| |ξ|
|ξ| |ξ|
0
we obtain
|fbn (ξ) − fb(ξ)|
≤ ε.
|ξ|2
|ξ|≤δ
On the other hand, clearly, there exists D > 0 such that
|fbn (ξ) − fb(ξ)|
|ξ| ≥ D =⇒
≤ ε.
|ξ|2
sup
b
b
f (ξ)|
Thus d2 (Fn , F ) ≤ max{ε, supδ≤|ξ|≤D |fn (ξ)−
} ≤ max{ε, supδ≤|ξ|≤D
|ξ|2
fb(ξ)|}. Now, since Fn * F we have
1 b
|f (ξ)−
δ2 n
∀ξ, fbn (ξ) −→ fb(ξ);
and since |D2 fbn (ξ)| ≤ dσ and Dfbn (0) = 0, (fbn ) is uniformly equicontinuous on the compact set {δ ≤ |ξ| ≤ D}. By Ascoli’s theorem, this
entails that supδ≤|ξ|≤D |fbn (ξ) − fb(ξ)| goes to 0, and thus there exists
n0 ≥ 0 such that for n ≥ n0 , d2 (Fn , F )) ≤ ε.
¤
Next, we would like to compare more precisely these different metrics. We shall say that two metrics m1 and m2 define the same weak-*
uniformity on a set S ⊂ P2 (Rd ) if for all ε > 0 there exists η > 0 such
that for all F, G ∈ S,
m1 (F, G) ≤ η =⇒ m2 (F, G) ≤ ε,
m2 (F, G) ≤ η =⇒ m1 (F, G) ≤ ε.
PROBABILITY METRICS AND UNIQUENESS
7
Theorem 2. Let M > 0 and φ be fixed. Then, for any m ≥ 1, τ , ρ,
k · k∗m and d2 define the same weak-* uniformity in PφM .
In order to simplify the proof, we shall prove this theorem only on
M
P2+α
, where the bounds are explicit. In all the sequel α > 0, M > 0
and m ≥ 1 will be fixed. The main part of the proof has already been
performed in [6], therefore we shall only “fill the gaps”. We split the
proof in three steps.
M
First step : τ and ρ define the same weak-* uniformity on P2+α
.
More precisely,
α
(17)
(a) τ (F, G)2 ≤ (2M + 8)ρ(F, G) α+2 + 4ρ(F, G)2 ;
(b) ρ(F, G) ≤ τ (F, G)3/2 .
Proof. Part (a) is Theorem 5.2 of [6]. As for part (b), let β > 0, then
there exists L∗ ∈ Π(F, G) such that
Z
T2 (F, G) = |v − w|2 dL∗ (v, w) ≥ β 2 L∗ (|v − w| ≥ β) .
Chosing β = T2 (F, G)1/3 , we see that
L∗ (|v − w| ≥ β) ≤ β.
By Theorem 5.1 of [6], this implies that F (A) ≤ G(Aδ] ) + β for all
closed A ⊂ Rd . Thus, ρ(F, G) ≤ β = τ (F, G)3/2 .
¤
Second step : For all m ≥ 1, ρ and k · k∗m define the same weak-*
uniformity on P0 (Rd ).
This is done by putting together Lemma 5.3 and Corollary 5.5 in [6].
Third step : For all m ≥ 1, k · k∗m and d2 define the same weak-*
M
uniformity on P2+α
. More precisely,
d+1
(18)
2
(a) kF − Gk∗d+3 ≤ Cd σ d+3 d2 (F, G) d+3 ;
α
2
(b) d2 (F, G) ≤ Cd M 2+α (k|F − Gk∗1 ) 2+α .
Proof. (a) is an easy adaptation of Lemma 5.7 in [6]. Let ϕ ∈ C d+3 (Rd ),
kϕkd+3 ≤ 1; let R ≥ 1. Let χR be a smooth function such that 0 ≤
χR ≤ 1, χR = 1 for |v| ≤ R, χR (v) = 0 for |v| ≥ R + 1. We estimate
first the tails of the distributions.
¯ Z
¯Z
Z
¯
¯
2σ
¯ (1 − χR )ϕ d(F − G)¯ ≤
dF +
dG ≤ 2 .
¯
¯
R
|v|≥R
|v|≥R
8
G. TOSCANI AND C. VILLANI
Then, by Parseval,
¯Z
¯ ¯Z
¯
¯
¯ ¯
¯
b
¯ χR ϕ d(F − G)¯ = ¯ ϕχ
¯
¯
¯ ¯ dR (ξ)[f (ξ) − gb(ξ)] dξ ¯
£
¤
|fb(ξ) − gb(ξ)|
d+3
sup χd
)
≤ sup
R ϕ(ξ)(1 + |ξ|
2
|ξ|
Rd
Rd
Z
|ξ|2
dξ.
1 + |ξ|d+3
Using the classical inequality
©
ª
d+1
sup {(1 + |ξ|m )|χd
|Dm (χR ϕ)(v)| ≤ Cd (1+R)d+1 ,
R ϕ(ξ)|} ≤ Cd sup (1 + |v|)
ξ
v
we conclude that
¯
¯Z
¯
¯
¯ ϕ d(F − G)¯ ≤ 2σ + Cd Rd+1 d2 (F, G).
¯
¯ R2
Optimizing over R, we get the desired result.
As for (b) : clearly,
¯Z
¯ ¯Z
¯
¯ ¯ sin(ξ · v)
¯
|fb(ξ) − gb(ξ)| ¯¯ cos(ξ · v)
¯+¯
¯.
≤
d(F
−
G)
d(F
−
G)
¯
¯ ¯
¯
|ξ|2
|ξ|2
|ξ|2
Let us estimate for instance the first term in the right-hand side. First
we note that
Z
Z
cos(ξ · v)
cos(ξ · v) − 1
d(F
−
G)
=
d(F − G).
|ξ|2
|ξ|2
For fixed ξ,
Φξ (v) =
cos(ξ · v) − 1
|ξ|2
is a C 2 function, vanishing at the origin, as well as its first-order derivatives, and elementary estimates show that |Φ0ξ (v)| ≤ |v|, Φξ (v) ≤ |v|2 /2,
hence kΦξ χR k1 ≤ CR2 , whence by definition
¯
¯Z
¯
1 ¯¯
Φξ χR d(F − G)¯¯ ≤ kF − Gk∗1 .
¯
2
CR
On the other hand,
¯
¯Z
¯
¯
¯ (1 − χR )Φξ d(F − G)¯ ≤ 2M .
¯
¯
Rα
Optimizing over R, we get the result.
¤
PROBABILITY METRICS AND UNIQUENESS
9
3. Superadditivity
Let {X0 , Y0 } and {X1 , Y1 } be two independent pairs of random variables with zero mean vector. Let Fi (resp. Gi ) denote the law of Xi
(resp. Yi ). For 0 < λ < 1, the law of
√
√
Xλ = λ X0 + 1 − λ X1
is
¶
¶
µ
1
·
·
∗
.
(19)
Fλ = d/2 F0 √
F1 √
λ
(1 − λ)d/2
1−λ
λ
Theorem 3. d2 is superadditive with respect to the rescaled convolution, i.e.
1
(20)
µ
d2 (Fλ , Gλ ) ≤ λd2 (F0 , G0 ) + (1 − λ)d2 (F1 , G1 ).
Corollary. The rescaled convolution is continuous with respect to d2 :
with obvious notations, if d2 (F0n , G0 ) −→ 0 and d2 (F1n , G1 ) −→ 0, then
d2 (Fλn , Gλ ) −→ 0.
Proof. We denote by fbi and gbi the Fourier transforms of Fi and Gi .
Since
√
√
fbλ (ξ) = fb0 ( λ ξ)fb1 ( 1 − λ ξ),
and an analogous formula holds for gbλ , we have
√
√
√
√
|fb0 ( λ ξ)fb1 ( 1 − λ ξ) − gb0 ( λ ξ)gb1 ( 1 − λ ξ)|
d2 (Fλ , Gλ ) = sup
|ξ|2
ξ
(
√
√
√
|fb1 ( 1 − λ ξ) − gb1 ( 1 − λ ξ)|
b
= sup (1 − λ)|f0 ( λ ξ)|
(1 − λ)|ξ|2
ξ
)
√
√
√
|fb0 ( λ ξ) − gb0 ( λ ξ)|
+λ|gb1 ( 1 − λ ξ)|
.
λ|ξ|2
Since kfb0 k∞ ≤ 1 and kgb1 k∞ ≤ 1, this expression can be bounded by
(1 − λ) d2 (F1 , G1 ) + λ d2 (F0 , G0 ).
¤
Remark. This property is sufficient to imply that d2 is nonexpandive
along solutions of the Boltzmann equation if d = 2 (Cf. [2]) . But we
shall show in the next section that this restriction can be dispended
with.
10
G. TOSCANI AND C. VILLANI
As an application, for X a random variable with law F , let us consider the functional
dF |
|fb − M
(21)
J(X) = J(F ) = d2 (F, M F ) = sup
,
|ξ|2
ξ
where this time M F is the Gaussian distribution with the same mean
vector and covariance matrix as F . The same proof as before shows
the
Theorem 4. For any two independent random variables X0 and X1 ,
and 0 < λ < 1,
³√
´
√
(22)
J
λ X0 + 1 − λ X1 ≤ λJ(X0 ) + (1 − λ)J(X1 )
with equality if and only if X0 and X1 are gaussian variables with the
same mean vector and covariance matrix.
Proof. Suppose that there is equality in the proof of Theorem 3, with
G0 and G1 replaced by a centered gaussian probability law (one can
always reduce to this case). Suppose that F0 6= M F0 . Let us denote by
g0 and g1 the densities of M F0 and M F1 . Let (ξn ) such that
√
√
√
√
√
|fb0 ( λ ξn )fb1 ( 1 − λξn ) − gb0 ( λ ξn )gb1 ( 1 − λ ξn )| n→∞ √
−→
J(
λ
X
+
1 − λ X1 );
0
|ξn |2
the left-hand side is bounded by
(
√
√
√
|fb1 ( 1 − λ ξn ) − gb1 ( 1 − λ ξn )|
b
(1 − λ)|f0 ( λ ξn )|
(1 − λ)|ξn |2
)
√
√
√
|fb0 ( λ ξn ) − gb0 ( λ ξn )|
.
+λ|gb1 ( 1 − λ ξn )|
λ|ξn |2
√
√
If |ξn | −→ ∞, then |gb1 (ξn )| ≤ 1/2 for large n, and J( λ X0 + 1 − λ X1 ) ≤
(1 − λ)J(X1 ) + λ/2J(X1 ), which is impossible since J(X1 ) 6= 0. Therefore, extracting a subsequence if necessary, we may assume that ξn −→
ξ ∈ Rd . The same argument shows then that ξ = 0. But by defintion,
√
√
|fb0 ( λ ξn ) − gb0 ( λ ξn )|
−→ 0
λ|ξn |2
as n → ∞. Hence, J(Xλ ) = 0, and J(X0 ) = J(X1 ) = 0.
¤
As remarked by Murata and Tanaka [9] and others, a functional
having these properties can be used to several applications, as the following.
PROBABILITY METRICS AND UNIQUENESS
11
First application : the central limit theorem. Let (Xn ) be a sequence of independent identically distributed variables with finite variance, and let
(23)
1
ξn = √ (X1 + · · · + Xn ).
n
Then, if F denotes the common probability law of each Xn , Gn the
probability law of ξn , and G the gaussian probability law with the same
mean vector and covariance matrix as those of F ,
(24)
d2 (Gn , G) −→ 0 as n → ∞.
Moreover, given ε > 0, knowing d2 (F, G) and a modulus of continuity of
D2 fˆ at the origin, one can compute explicitly n0 such that for n ≥ n0 ,
d2 (Gn , G) ≤ ε.
Proof. For P and Q two probability
let us denote by P ◦Q
√
√ distributions,
−d
the rescaled convolution 2 P (·/ 2) ∗ Q(·/ 2). Let ηk = ξ2k , and Pk
its probability law, then, thanks to Theorem 4,
J(ηk+1 ) = J(Pk ◦ Pk ) ≤ J(Pk ) = J(ηk ) :
(J(ηk )) is decreasing, hence converging to some limit l. Admit for a
while that there exists Q so that for some subsequence kp , d2 (Pkp , Q) −→ 0,
so that l = J(Q). Then, since the rescaled convolution is continuous with respect to d2 , J(Pkp ◦ Pkp ) −→ J(Q ◦ Q); but it is also
J(Pkp +1 ) −→ l = J(Q), so that
J(Q ◦ Q) = J(Q).
This implies that Q is the gaussian distribution
G, whence J(ηk ) −→ 0.
P
Now, any integer n ≥ 1 can be written n0 αk 2k with αk ∈ {0, 1}, and
1X
αk 2k J(ηk ) −→ 0.
J(ξn ) ≤
n
Finally, to prove that (Pk ) has a weak cluster point with respect
to note that the Fourier transform of
to the topology³of d2 , it suffices
´n
√
Fn is fbn (ξ) = fb(ξ/ n) , whose second derivatives can be readily
computed,
µ
µ
µ µ
¶¶n
¶
¶ µ
¶n−2
n
−
1
ξ
ξ
ξ
ξ
2
=
Dij fb √
Di fb √
Dj fb √
fb √
n
n
n
n
n
µ
µ
¶n−1
¶
ξ
ξ
2 b
b
+f √
Dij f √
.
n
n
12
G. TOSCANI AND C. VILLANI
If one denotes by ψ a modulus of continuity of D2 fb near 0, i.e.
¯
¯
¯ 2b
¯
¯D f (ξ) − D2 fˆ(0)¯ ≤ ψ(|ξ|),
where ψ is chosen to be increasing from 0, we obtain at once that for
D2 fbn one can take as modulus of continuity
µ
¶
t
2 2
ψn (t) = σ t + ψ √
.
n
ψn is bounded, uniformly in n, by
ψ1 (t) = σ 2 t2 + ψ(t).
This is enough to conclude.
Stated in this form, this would seem to be only an overcomplicated
way of proving the central limit theorem; but the interest of this method
is that it can immediately lead to explicit computations. Indeed, let
ε > 0, and let us look for n0 such that for n ≥ n0 , d2 (Gn , G) ≤ ε. First,
since J(Pk ) is decreasing, it we look for k0 such that J(Pk0 ) ≤ ε. The
proof of Theorem 4 clearly shows that
( Ã
Ã
!
!)
2
1 + e−|ξ| /2
J(Pk+1 ) = J(Pk ◦ Pk ) ≤ sup inf ψ(|ξ|),
J(Pk )
.
2
ξ
Let η such that ψ(η) ≤ ε. As long as J(Pk ◦ Pk ) ≥ ε, the supremum
can only be obtained for |ξ| ≥ η, hence J(Pk+1 ) ≤ µJ(Pk ) with
1 + e−η
µ=
2
2 /2
< 1.
Therefore, one can take
k0 = −
log J(X1 )
.
log µ
Now, for n ≥ 2k0 , one writes
X αk 2k
X αk 2k
2k0
J(X1 ) + ε
≤
J(X1 ) + ε,
J(Xn ) ≤
n
n
n
k≥k
k<k
0
0
and it suffices to choose n ≥ 2k0 J(X1 )/ε for this expression to be less
than 2ε.
¤
Second application : Kac’s theorem. Let X1 and X2 be two independent random variables with finite variance, such that
(
f1 = X1 cos θ + X2 sin θ
X
(25)
f2 = −X1 sin θ + X2 cos θ
X
PROBABILITY METRICS AND UNIQUENESS
13
are independent for some θ ∈ R \ π2 Z. Then X1 and X2 are gaussian.
Proof.
(26)
f1 ) ≤ J(X1 ) cos2 θ + J(X2 ) sin2 θ;
J(X
f2 ) ≤ J(X1 ) sin2 θ + J(X2 ) cos2 θ;
J(X
f1 ) + J(X
f2 ) ≤ J(X1 ) + J(X2 ). But
hence J(X
(
f1 cos θ − X
f2 sin θ
X1 = X
(27)
f1 sin θ + X
f2 cos θ;
X2 = X
f1 ) + J(X
f2 ), so that there
by the same inequality, J(X1 ) + J(X2 ) ≤ J(X
is equality in (26), which implies that X1 and X2 are gaussian.
¤
4. Application to the Boltzmann equation
In this section, we shall assume d = 3 for simplicity, but all the
results can be generalized readily to any dimension d ≥ 2, or to the onedimensional Kac model. The Boltzmann equation (8) can be studied
in weak form for a probability measure as well as for a distribution
function,
µ
¶
Z
Z
d
u·n
ϕ(v) dF (v) =
σ
{ϕ(v 0 ) − ϕ(v)} dF (v) dF (w) dn;
dt
|u|
3
3
2
R ×R ×S
we shall study this with the conditions
Z
Z
Z
dF0 (v) dv = 1,
v dF0 (v) = 0,
v 2 dF0 (v) = 3;
it is classical that these are preserved under the time-evolution of the
Boltzmann equation. Moreover, it is equivalent to use the Fourier
transform of the equation [12] :
¶
µ
Z
i
ξ ·n hb + b −
b
(28)
∂t f (t, ξ) =
f (ξ )f (ξ ) − fb(ξ)fb(0) dn,
σ
|ξ|
S2
where
ξ + |ξ|n
+
 ξ =
2
(29)
 ξ − = ξ − |ξ|n ,
2
and the initial conditions are such that
fb(0) = 1, ∇fb(0) = 0, ∇2 fb(0) = −3,
fb ∈ C 2 (Rd ). Note that ξ + + ξ − = ξ, and |ξ + |2 + |ξ − |2 = |ξ|2 .
14
G. TOSCANI AND C. VILLANI
Theorem 5. Let F and G be two solutions of the Boltzmann equation (8). Then, for all time t ≥ 0,
d2 (F (t), G(t)) ≤ d2 (F (0), G(0)).
Before proving Theorem 5, we mention three useful corollaries.
Corollary 5.1. Let F0 be a nonnegative measure with finite variance.
Then, there exists a unique weak solution F (t) of the Boltzmann equation, such that F (0) = F0 .
Corollary 5.2. If Fε (t) is a sequence of approximate solutions of the
Boltzmann equation, obtained by a standard cut-off procedure for instance, then Fε converges weakly to F . This entails in particular that
such results as the decrease of the Fisher information, or the decrease of
Tanaka’s functional, which are known to hold for the cut-off equation,
also hold for the non cut-off equation.
Corollary 5.3. Let F0 be a nonnegative measure with finite variance,
and F (t) the associated solution of the Boltzmann equation. Let M be
the Maxwellian distribution with the same mean vector and variance
that F0 . Then d2 (F (t), M ) is decreasing towards 0.
Proof of Theorem 5. Let F and G be two solutions of the Boltzmann
equation, and fb, gb their Fourier transforms. Then,
(30)
³
´
#
¶" b + b −
Z µ
fb − gb
ξ·n
f (ξ )f (ξ ) − gb(ξ + )b
g (ξ − ) fb(ξ) − gb(ξ)
∂t
= σ
−
dn.
|ξ|2
|ξ|
|ξ|2
|ξ|2
Now, we do the usual splitting
¯
¯
¯
¯
¯ fb(ξ + )fb(ξ − ) − gb(ξ + )b
¯ fb(ξ − ) − gb(ξ − ) ¯ |ξ − |2
− ¯
g
(ξ
)
¯
¯
¯
¯
¯
¯ ≤ |fb(ξ + )| ¯
¯
¯
¯
¯
¯ |ξ|2
|ξ|2
|ξ − |2
¯
¯
¯ fb(ξ + ) − gb(ξ + ) ¯ |ξ + |2
¯
¯
+ |b
g (ξ − )| ¯
¯
¯
¯ |ξ|2
|ξ + |2
¯
¯
¯
¯
¯ fb − gb ¯ µ |ξ − |2 + |ξ + |2 ¶
¯ fb − gb ¯
¯
¯
¯
¯
≤ sup ¯
= sup ¯
¯
¯.
2
2
2
¯ |ξ| ¯
¯ |ξ| ¯
|ξ|
We set
h(t, ξ) =
fb(ξ) − gb(ξ)
.
|ξ|2
PROBABILITY METRICS AND UNIQUENESS
15
For cut-off molecules, let e be any fixed unit vector and let us denote
by
Z
S=
σ(n · e) dn
S2
the total cross-section. By rotational invariance, for all ξ 6= 0,
¶
Z µ
ξ·n
σ
= S,
|ξ|
and the preceding computation shows that
¯
¯
¯∂t h − Sh¯ ≤ Skhk∞ .
(31)
Gronwall’s lemma proves at once that for cut-off molecules, kh(t)k∞ is
nonincreasing.
Now, let us consider the case of true Maxwell molecules, where σ(ν)
is singular like (1 − ν)−5/4 . This singularity corresponds to grazing
collisions, i.e. ξ + ∼ ξ, ξ − ∼ 0. Since it is nonintegrable, S = ∞.
Then we split the right-hand side of (28) according to |1 − ν| ≥ ε or
|1 − ν| < ε. For the first term, we use the preceding estimate, while
for the other, we use the fact that the singularity is cancelled by the
vanishing of fb(ξ + )fb(ξ − ) − fb(ξ)fb(0) for grazing collisions. Indeed, as
in [12], let us write
¯
¯
¯b + b −
¯
b
b
¯f (ξ )f (ξ ) − f (ξ)f (0)¯ ≤ |fb(ξ + )| |fb(ξ + )−fb(ξ)|+|fb(ξ)| |fb(ξ − )−fb(0)|
≤
|∇fb(η)| |ξ + − ξ| + sup |∇fb(η)| |ξ − |.
sup
|η|≤sup(|ξ|,|ξ + |)
|η|≤|ξ − |
Since |ξ + |, |ξ − | ≤ |ξ|, |D2 fb(ξ)| ≤ d, and ∇fb(0) = 0, we conclude that
|fb(ξ + )fb(ξ − ) − fb(ξ)fb(0)| ≤ C|ξ||ξ − | ≤ C|ξ|2 (1 − ν)1/2 ,
where C depends only on the dimension. This implies that the integrand in (28) is bounded by C(1 − ν)−3/4 , and thus the integral is
convergent, uniformly in ξ and in t. As a conclusion, setting
Z
Sε =
1|1−n·e|≥ε σ(n · e) dn,
¯Z
¯
rε = sup ¯¯
ξ,t
S2
¶
i ¯¯
ξ ·n hb + b −
b
b
1|1− ξ·n |<ε σ
f (ξ )f (ξ ) − f (ξ)f (0) dn¯¯ ,
|ξ|
|ξ|
2
S
we obtain that rε −→ 0 as ε → 0, and
¯
¯
¯∂t h(ξ, t) − Sε h(ξ, t)¯ ≤ Sε khk∞ (t) + rε .
(32)
µ
This is equivalent to
¯ ¡
¢¯
¯∂t h(ξ, t)eSε t ¯ ≤ Sε kh(·, t)eSε t k∞ + rε eSε t .
16
G. TOSCANI AND C. VILLANI
Integrating from 0 to t, we get
Z t
¡
¢
Sε t
|h(ξ, t)|e ≤ |h(ξ, 0)| +
dτ Sε kh(·, τ )eSε τ k∞ + rε eSε τ .
0
Hence, if Hε (t) = kh(·, t)e
Sε t
k∞ ,
Z t
Z t
Sε τ
Hε (t) ≤ Hε (0) +
rε e dτ +
Sε H(τ ) dτ.
0
0
Now, by the generalized Gronwall inequality,
Z t
u(t) ≤ ϕ(t) +
λ(τ )u(τ ) dτ
0
implies
½Z
¾
t
Z
½Z
t
¾
t
dϕ
dτ.
dτ
0
0
τ
Rt
Applying this inequality with λ(τ ) = Sε and ϕ(t) = Hε (0)+ 0 rε eSε τ dτ ,
we obtain
Hε (t) ≤ Hε (0)eSε t + trε eSε t ,
namely
kh(·, t)k∞ ≤ kh(·, 0)k∞ + rε t.
Letting ε going to 0, we obtain kh(·, t)k∞ ≤ kh(·, 0)k∞ , i.e.
u(t) ≤ ϕ(0) exp
λ(τ ) dτ
+
exp
λ(τ ) dτ
d2 (F (t), G(t)) ≤ d2 (F (0), G(0)).
¤
References
[1] N.M. Blachman. The convolution inequality for entropy powers, IEEE Trans.
Inform. Theory, 2:267–271, 1965.
[2] A.V. Bobylev, G. Toscani On the generalization of the Boltzmann H-theorem
for a spatially homogeneous Maxwell gas J. Math. Phys, 33:2578–2586, 1992.
[3] E.A. Carlen, E. Gabetta and G. Toscani. Propagation of smoothness and the
rate of exponential convergence to equilibrium for a spatially homogeneous
Maxwellian gas, Commun. Math. Phys. (to appear), 1997.
[4] E.A. Carlen, M.C. Carvalho and E. Gabetta. Central limit theorem for
Maxwellian molecules and truncation of the Wild expansion, Preprint, 1997.
[5] I. Csiszar. Information-type measures of difference of probability distributions
and indirect observations, Stud. Sci. Math. Hung., 2:299–318, 1967.
[6] E. Gabetta, G. Toscani and B. Wennberg. Metrics for probability distributions
and the trend to equilibrium for solutions of the Boltzmann equation, J. Stat.
Phys., 81 :901–934, 1995.
[7] L. Kantorovich. On translation of mass(in Russian), Dokl. AN SSSR, 37 :227–
229, 1942.
[8] H.P. McKean, Jr. Speed of approach to equilibrium for Kac’s caricature of a
Maxwellian gas, Arch. Rat. Mech. Anal., 21:343–367, 1966.
PROBABILITY METRICS AND UNIQUENESS
17
[9] H. Murata, H. Tanaka. An inequality for certain functional of multidimensional
probability distributions, Hiroshima Math. J., 4:75–81, 1974.
[10] R. Jordan, D. Kinderlehrer, F. Otto. The variational formulation of the FokkerPlanck equation, To appear in SIAM J. Appl. Math. Anal.
[11] Yu.V. Prokhorov. Convergence of random processes and limit theorems in
probability theory, Theory Prob. Appl., 1 :157–214, 1956.
[12] A. Pulvirenti, G. Toscani. The theory of the nonlinear Boltzmann equation for
Maxwell molecules in Fourier representation, Ann. Mat. pura ed appl., 4, Vol.
171:181–204, 1996.
[13] A. Stam. Some inequalities satisfied by the quantities of information of Fisher
and Shannon, Inform. Control, 2:101–112, (1959)
[14] V. Strassen. The existence of probability measures with given marginals, Ann.
Math. Statist., 36 :423–439, 1965.
[15] H. Tanaka. An inequality for a functional of probability distributions and its
application to Kac’s one-dimensional model of a Maxwellian gas, Wahrsch.
Verw. Geb., 27 :47–52, 1973.
[16] H. Tanaka. Probabilistic treatment of the Boltzmann equation of Maxwellian
molecules, Wahrsch. Verw. Geb., 46 :67–105, 1978.
[17] I. Vaida. Theory of statistical Inference and Information, Kluwer Academic
Publishers, Dordrecht 1989
[18] L.N. Vasershtein. Markov processes on countable product space describing
large systems of automata (in Russian), Probl. Pered. Inform., 5:64–73, 1969.
[19] V.M. Zolotarev. Probability metrics, Theory Prob. Appl., 28 :278–302, 1983.
Department of Mathematics, University of Pavia, via Abbiategrasso 209,
27100 Pavia, ITALY
École Normale Suprieure, DMI, 45 rue d’Ulm, 75230 Paris Cedex 05