Download Log-concave distributions: definitions, properties, and consequences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Log-concave distributions:
definitions, properties, and
consequences
Jon A. Wellner
University of Washington, Seattle; visiting Heidelberg
Seminaire Point de vue, Université Paris-Diderot Paris 7
23 January 2012
Seminaire Point de vue, Paris
Part 1
Based on joint work with:
• Fadoua Balabdaoui
• Kaspar Rufibach
• Arseni Seregin
Outline, Part 1
• 1: Log-concave densities / distributions: definitions
• 2: Properties of the class
• 3: Some consequences (statistics and probability)
• 4: Strong log-concavity: definitions
• 5: Examples & counterexamples
• 6: Some consequences, strong log-concavity
• 7. Questions & problems
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.2
1.
Log-concave densities / distributions:
definitions
Suppose that a density f can be written as
f (x) ≡ fϕ(x) = exp(ϕ(x)) = exp (−(−ϕ(x)))
where ϕ is concave (and −ϕ is convex). The class of all densities
f on R, or on Rd, of this form is called the class of log-concave
densities, Plog−concave ≡ P0.
Note that f is log-concave if and only if :
• logf (λx+(1−λ)y) ≥ λlogf (x)+(1−λ)logf (y) for all 0 ≤ λ ≤ 1
and for all x, y.
• iff f (λx + (1 − λ)y) ≥ f (x)λ · f (y)1−λ
• iff f ((x + y)/2) ≥
q
f (x)f (y), (assuming f is measurable)
• iff f ((x + y)/2)2 ≥ f (x)f (y).
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.3
1.
Log-concave densities / distributions:
definitions
Examples, R
• Example 1: standard normal
f (x) = (2π)−1/2exp(−x2/2),
√
1 2
−logf (x) = x + log 2π,
2
(−logf )00(x) = 1.
• Example 2: Laplace
f (x) = 2−1exp(−|x|),
−logf (x) = |x| + log2,
(−logf )00(x) = 0
for all x 6= 0.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.4
1.
Log-concave densities / distributions:
definitions
• Example 3: Logistic
ex
f (x) =
,
x
2
(1 + e )
−logf (x) = −x + 2log(1 + ex),
ex
00
(−logf ) (x) =
= f (x).
x
2
(1 + e )
• Example 4: Subbotin
f (x) = Cr−1exp(−|x|r /r),
Cr = 2Γ(1/r)r1/r−1,
−logf (x) = r−1|x|r + logCr ,
(−logf )00(x) = (r − 1)|x|r−2,
r ≥ 1,
Seminaire Point de vue, Paris; 23 January 2012; part 1
x 6= 0.
1.5
1.
Log-concave densities / distributions:
definitions
• Many univariate parametric families on R are log-concave,
for example:
B Normal (µ, σ 2)
B Uniform(a, b)
B Gamma(r, λ) for r ≥ 1
B Beta(a, b) for a, b ≥ 1
B Subbotin(r) with r ≥ 1.
• tr densities with r > 0 are not log-concave
• Tails of log-concave densities are necessarily sub-exponential:
i.e. if X ∼ f ∈ P F2, then Eexp(c|X|) < ∞ for some c > 0.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.6
1.
Log-concave densities / distributions:
definitions
Log-concave densities on Rd:
• A density f on Rd is log-concave if f (x) = exp(ϕ(x)) with ϕ
concave.
• Examples
B The density f of X ∼ Nd(µ, Σ) with Σ positive definite:
1
exp − (x − µ)T Σ−1(x − µ) ,
f (x) = f (x; µ, Σ) = q
2
(2π)d|Σ
1
−logf (x) = (x − µ)T Σ−1(x − µ) − (1/2)log(2π|Σ),
2
!
2
∂
2
D (−logf )(x) ≡
(−logf )(x), i, j = 1, . . . , d = Σ−1.
∂xi∂xj
1
B If K ⊂ Rd is compact and convex, then f (x) = 1K (x)/λ(K)
is a log-concave density.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.7
1.
Log-concave densities / distributions:
definitions
Log-concave measures:
Suppose that P is a probability measure on (Rd, Bd). P is a logconcave measure if for all nonempty A, B ∈ Bd and λ ∈ (0, 1) we
have
P (λA + (1 − λ)B) ≥ {P (A)}λ{P (B)}1−λ.
• A set A ⊂ Rd is affine if tx + (1 − t)y ∈ A for all x, y ∈ A, t ∈ R.
• The affine hull of a set A ⊂ Rd is the smallest affine set
containing A.
Theorem. (Prékopa (1971, 1973), Rinott (1976)). Suppose
P is a probability measure on Bd such that the affine hull of
supp(P ) has dimension d. Then P is log-concave if and only if
there is a log-concave (density) function f on Rd such that
P (B) =
Z
B
f (x)dx
for all
B ∈ Bd .
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.8
2.
Properties of log-concave densities
Properties: log-concave densities on R:
• A density f on R is log-concave if and only if its convolution
with any unimodal density is again unimodal (Ibragimov,
1956).
• Every log-concave density f is unimodal (but need not be
symmetric).
• P0 is closed under convolution.
• P0 is closed under weak limits
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.9
2.
Properties of log-concave densities
Properties: log-concave densities on Rd:
• Any log−concave f is unimodal.
• The level sets of f are closed convex sets.
• Log-concave densities correspond to log-concave measures.
Prékopa, Rinott.
• Marginals of log-concave distributions are log-concave: if
f (x, y) is a log-concave density on Rm+n, then
g(x) =
Z
Rn
f (x, y)dy
is a log-concave density on Rm. Prékopa, Brascamp-Lieb.
• Products of log-concave densities are log-concave.
• P0 is closed under convolution.
• P0 is closed under weak limits.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.10
3.
Some consequences and connections
(statistics and probability)
• (a) f is log-concave if and only if det((f (xi −yj ))i,j∈{1,2}) ≥ 0
for all x1 ≤ x2, y1 ≤ y2; i.e f is a Polya frequency density of
order 2; thus
log-concave = P F2 = strongly uni-modal
• (b) The densities pθ (x) ≡ f (x − θ) for θ ∈ R have monotone
likelihood ratio (in x) if and only if f is log-concave.
Proof of (b): pθ (x) = f (x − θ) has MLR iff
f (x − θ0)
f (x0 − θ0)
0, θ < θ0
≤
for
all
x
<
x
f (x − θ)
f (x0 − θ)
This holds if and only if
logf (x − θ0) + logf (x0 − θ) ≤ logf (x0 − θ0) + logf (x − θ).
(1)
Let t = (x0 − x)/(x0 − x + θ0 − θ) and note that
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.11
3.
Some consequences and connections
(statistics and probability)
x − θ = t(x − θ0) + (1 − t)(x0 − θ),
x0 − θ0 = (1 − t)(x − θ0) + t(x0 − θ)
Hence log-concavity of f implies that
logf (x − θ) ≥ t logf (x − θ0) + (1 − t)logf (x0 − θ),
logf (x0 − θ0) ≥ (1 − t)logf (x − θ0) + t logf (x0 − θ).
Adding these yields (1); i.e.
MLR in x.
f log-concave implies pθ (x) has
Now suppose that pθ (x) has MLR so that (1) holds. In particular
that holds if x, x0, θ, θ0 satisfy x − θ0 = a < b = x0 − θ and t =
(x0 − x)/(x0 − x + θ0 − θ) = 1/2, so that x − θ = (a + b)/2 = x0 − θ0.
Then (1) becomes
logf (a) + logf (b) ≤ 2logf ((a + b)/2).
This together with measurability of f implies that f is logconcave.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.12
3.
Some consequences and connections
(statistics and probability)
Proof of (a): Suppose f is P F2. Then for x < x0, y < y 0,
!
0
f (x − y) f (x − y )
det
f (x0 − y) f (x0 − y 0)
= f (x − y)f (x0 − y 0) − f (x − y 0)f (x0 − y) ≥ 0
if and only if
f (x − y 0)f (x0 − y) ≤ f (x − y)f (x0 − y 0),
or, if and only if
f (x0 − y 0)
f (x − y 0)
≤
.
f (x − y)
f (x0 − y)
That is, py (x) has MLR in x.
log-concave.
By (b) this is equivalent to f
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.13
3.
Some consequences and connections
(statistics and probability)
Theorem. (Brascamp-Lieb, 1976). Suppose X ∼ f = e−ϕ with
ϕ convex and D2ϕ > 0, and let g ∈ C 1(Rd). Then
V arf (g(X)) ≤ Eh(D2ϕ)−1∇g(X), ∇g(X)i.
(Poincaré - type inequality for log-concave densities)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.14
3.
Some consequences and connections
(statistics and probability)
Further consequences: Peakedness and majorization
Theorem 1. (Proschan, 1965) Suppose that f on R is logconcave and symmetric about 0. Let X1, . . . , Xn be i.i.d. with
density f , and suppose that p, p0 ∈ Rn
+ satisfy
• p1 ≥ p2 ≥ · · · ≥ pn, p01 ≥ p02 ≥ · · · ≥ p0n,
Pk 0
Pk
•
p
≤
pj , k ∈ {1, . . . , n},
1
1
j
Pn
Pn 0
•
p
=
1 j
1 pj = 1.
P 0
(That is, p0 ≺ p.) Then n
pj Xj is strictly more peaked than
1
Pn
1 pj Xj :

P |
n
X
1


p0j Xj | ≥ t < P |
n
X

pj Xj | ≥ t
for all t ≥ 0.
1
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.15
3.
Some consequences and connections
(statistics and probability)
Example:
p01 = · · · =
p1 = · · · = pn−1 = 1/(n − 1), pn = 0, while
P
Pn 0
p0n = 1/n. Then p p0 (since n
p
=
1 j
1 pj = 1 and
Pk
Pk 0
1 pj = k/(n − 1) ≥ k/n =
1 pj ), and hence if X1 , . . . , Xn are
i.i.d. f symmetric and log-concave,
P (|X n| ≥ t) < P (|X n−1| ≥ t) < · · · < P (|X1| ≥ t) for all t ≥ 0.
Definition: A d−dimensional random variable X is said to be
more peaked than a random variable Y if both X and Y have
densities and
P (Y ∈ A) ≥ P (X ∈ A)
for all A ∈ Ad,
the class of subsets of Rd which are compact, convex, and
symmetric about the origin.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.16
3.
Some consequences and connections
(statistics and probability)
Theorem 2. (Olkin and Tong, 1988) Suppose that f on Rd
is log-concave and symmetric about 0. Let X1, . . . , Xn be i.i.d.
with density f , and suppose that a, b ∈ Rn satisfy
• a1 ≥ a2 ≥ · · · ≥ an, b1 ≥ b2 ≥ · · · ≥ bn,
Pk
Pk
•
a
≤
bj , k ∈ {1, . . . , n},
j
1
1
Pn
Pn
•
a
=
1 j
1 bj .
(That is, a ≺ b.)
P
Pn
Then n
a
X
is
more
peaked
than
1 j j
1 b j Xj :




n
n
X
X
P  aj Xj ∈ A ≥ P  bj Xj ∈ A
1
1
for all A ∈ Ad
In particular,

P k
n
X
1


aj Xj k ≥ t ≤ P k
n
X

bj Xj k ≥ t
for all t ≥ 0.
1
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.17
3.
Some consequences and connections
(statistics and probability)
Corollary: If g is non-decreasing on R+ with g(0) = 0, then

Eg k
n
X


aj Xj k ≤ Eg k
n
X
1

bj Xj k .
1
Another peakedness result:
Suppose that Y = (Y1, . . . , Yn) where Yj ∼ N (µj , σ 2) are
independent and µ1 ≤ . . . ≤ µn; i.e. µ ∈ Kn where Kn ≡ {x ∈
Rn : x1 ≤ · · · ≤ xn}. Let
b = Π(Y |Kn),
µ
n
the least squares projection of Y onto Kn. It is well-known that
b = min max
µ
n
Ps
j=r Yj
s≥i r≤i s − r + 1
!
, i = 1, . . . , n .
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.18
3.
Some consequences and connections
(statistics and probability)
Theorem 3. (Kelly) If Y ∼ Nn(µ, σ 2I) and µ ∈ Kn, then
is more peaked than Yk − µk for each k ∈ {1, . . . , n}; that
b k − µk | ≤ t) ≥ P (|Yk − µk | ≤ t)
P (|µ
for all t > 0,
b k − µk
µ
is
k ∈ {1, . . . , n}.
Question: Does Kelly’s theorem continue to hold if the normal
distribution is replaced by an arbitrary log-concave joint density
symmetric about µ?
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.19
4.
Strong log-concavity: definitions
Definition 1. A density f on R is strongly log-concave if
f (x) = h(x)cφ(cx)
for some c > 0
where h is log-concave and φ(x) = (2π)−1/2exp(−x2/2).
Sufficient condition: logf ∈ C 2(R) with (−logf )00(x) ≥ c2 > 0 for
all x.
Definition 2. A density f on Rd is strongly log-concave if
f (x) = h(x)cγ(cx)
for some c > 0
where h is log-concave and γ is the Nd(0, cId) density.
Sufficient condition: logf ∈ C 2(Rd) with D2(−logf )(x) ≥ c2Id for
some c > 0 for all x ∈ Rd.
These agree with strong convexity as defined by Rockafellar &
Wets (1998), p. 565.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.20
5.
Examples & conterexamples
Examples
R
Example 1. f (x) = h(x)φ(x)/ hφdx where h is the logistic
density, h(x) = ex/(1 + ex)2.
R
Example 2. f (x) = h(x)φ(x)/ hφdx where h is the Gumbel
density. h(x) = exp(x − ex).
R
Example 3. f (x) = h(x)h(−x)/ h(y)h(−y)dy where h is the
Gumbel density.
Counterexamples
Counterexample 1. f logistic: f (x) = ex/(1 + ex)2;
(−logf )00(x) = f (x).
Counterexample 2. f Subbotin, r ∈ [1, 2) ∪ (2, ∞);
f (x) = Cr−1exp(−|x|r /r); (−logf )00(x) = (r − 2)|x|r−2.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.21
0.6
0.5
0.4
0.3
0.2
0.1
-3
-2
-1
1
2
3
Ex. 1: Logistic (red) perturbation of N (0, 1) (green): f (blue)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.22
4
3
2
1
-3
-2
Ex. 1:
-1
(−logf )00,
0
1
2
3
Logistic perturbation of N (0, 1)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.23
0.5
0.4
0.3
0.2
0.1
-4
-2
0
2
Ex. 2: Gumbel (red) perturbation of N (0, 1) (green): f (blue)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.24
5
4
3
2
1
-3
-2
Ex. 2:
-1
(−logf )00,
0
1
2
3
Gumbel perturbation of N (0, 1)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.25
0.6
0.5
0.4
0.3
0.2
0.1
-2
-1
0
1
2
Ex. 3: Gumbel (·) × Gumbel(−·) (purple); N (0, Vf ) (blue)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.26
8
6
4
2
-2
-1
0
1
2
Ex. 3: −logGumbel(·) × Gumbel(−·) (purple); −logN (0, Vf ) (blue)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.27
5
4
3
2
1
-2
-1
0
1
2
Ex. 3: D2(−logGumbel(·) × Gumbel(−·)) (purple); D2(−logN (0, Vf )) (blue)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.28
0.5
0.4
0.3
0.2
0.1
-4
-2
2
4
Subbotin fr r = 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.29
5
4
3
2
1
-4
-2
2
4
−log fr : r = 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.30
5
4
3
2
1
(−log fr
-4
)00: r
-2
0
2
4
= 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.31
6.
Some consequences, strong log-concavity
First consequence
Theorem. (Hargé, 2004). Suppose X ∼ Nn(µ, Σ) with density
γ and Y has density h · γ with h log-concave, and let g : Rn → R
be convex. Then
Eg(Y − E(Y )) ≤ Eg(X − EX)).
Equivalently, with µ = EX, ν = EY = E(Xh(X))/Eh(X), and
g̃ ≡ g(· + µ)
E{g̃(X − ν + µ)h(X)} ≤ Eg̃(X) · Eh(X).
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.32
6.
Some consequences, strong log-concavity
More consequences
Corollary. (Brascamp-Lieb, 1976). Suppose X ∼ f = exp(−ϕ)
with D2ϕ ≥ λId, λ > 0, and let g ∈ C 1(Rd). Then
1
E|∇g(X)|2.
λ
(Poincaré inequality for strongly log-concave densities; improvements by Hargé (2008))
V arf (g(X)) ≤ Eh(D2ϕ)−1∇g(X), ∇g(X)i ≤
Theorem. (Caffarelli, 2002). Suppose X ∼ Nd(0, I) with density
γd and Y has density e−v · γd with v convex. Let T = ∇ϕ be the
d
unique gradient of a convex map ϕ such that ∇ϕ(X) = Y . Then
0 ≤ D 2 ϕ ≤ Id .
(cf. Villani (2003), pages 290-291)
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.33
7.
Questions & problems
• Does strong log-concavity occur naturally? Are there natural
examples?
• Are there large classes of strongly log-concave densities in
connection with other known classes such as P F∞ (Pólya
frequency functions of order infinity) or L. Bondesson’s class
HM∞ of completely hyperbolically monotone densities?
• Does Kelly’s peakedness result for projection onto the
ordered cone Kn continue to hold with Gaussian replaced
by log-concave (or symmetric log concave)?
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.34
Selected references:
• Villani, C. (2002). Topics in Optimal Transportation. Amer. Math Soc.,
Providence.
• Dharmadhikari, S. and Joag-dev, K. (1988). Unimodality, Convexity, and
Applications. Academic Press.
• Marshall, A. W. and Olkin, I. (1979).
Inequalities:
Majorization and Its Applications. Academic Press.
Theory of
• Marshall, A. W., Olkin, I., and Arnold, B. C. (2011). Inequalities: Theory
of Majorization and Its Applications. Second edition, Springer.
• Rockfellar, R. T. and Wets, R. (1998). Variational Analysis. Springer.
• Hargé, G. (2004).
A convex/log-concave correlation inequality for
Gaussian measure and an application to abstract Wiener spaces. Probab.
Theory Related Fields 130, 415-440.
• Brascamp, H. J. and Lieb, E. H. (1976). On extensions of the BrunnMinkowski and Prékopa-Leindler theorems, including inequalities for log
concave functions, and with an application to the diffusion equation. J.
Functional Analysis 22, 366-389.
• Hargé, G. (2008). Reinforcement of an inequality due to Brascamp and
Lieb. J. Funct. Anal., 254, 267-300.
Seminaire Point de vue, Paris; 23 January 2012; part 1
1.35
Related documents