Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Log-concave distributions: definitions, properties, and consequences Jon A. Wellner University of Washington, Seattle; visiting Heidelberg Seminaire Point de vue, Université Paris-Diderot Paris 7 23 January 2012 Seminaire Point de vue, Paris Part 1 Based on joint work with: • Fadoua Balabdaoui • Kaspar Rufibach • Arseni Seregin Outline, Part 1 • 1: Log-concave densities / distributions: definitions • 2: Properties of the class • 3: Some consequences (statistics and probability) • 4: Strong log-concavity: definitions • 5: Examples & counterexamples • 6: Some consequences, strong log-concavity • 7. Questions & problems Seminaire Point de vue, Paris; 23 January 2012; part 1 1.2 1. Log-concave densities / distributions: definitions Suppose that a density f can be written as f (x) ≡ fϕ(x) = exp(ϕ(x)) = exp (−(−ϕ(x))) where ϕ is concave (and −ϕ is convex). The class of all densities f on R, or on Rd, of this form is called the class of log-concave densities, Plog−concave ≡ P0. Note that f is log-concave if and only if : • logf (λx+(1−λ)y) ≥ λlogf (x)+(1−λ)logf (y) for all 0 ≤ λ ≤ 1 and for all x, y. • iff f (λx + (1 − λ)y) ≥ f (x)λ · f (y)1−λ • iff f ((x + y)/2) ≥ q f (x)f (y), (assuming f is measurable) • iff f ((x + y)/2)2 ≥ f (x)f (y). Seminaire Point de vue, Paris; 23 January 2012; part 1 1.3 1. Log-concave densities / distributions: definitions Examples, R • Example 1: standard normal f (x) = (2π)−1/2exp(−x2/2), √ 1 2 −logf (x) = x + log 2π, 2 (−logf )00(x) = 1. • Example 2: Laplace f (x) = 2−1exp(−|x|), −logf (x) = |x| + log2, (−logf )00(x) = 0 for all x 6= 0. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.4 1. Log-concave densities / distributions: definitions • Example 3: Logistic ex f (x) = , x 2 (1 + e ) −logf (x) = −x + 2log(1 + ex), ex 00 (−logf ) (x) = = f (x). x 2 (1 + e ) • Example 4: Subbotin f (x) = Cr−1exp(−|x|r /r), Cr = 2Γ(1/r)r1/r−1, −logf (x) = r−1|x|r + logCr , (−logf )00(x) = (r − 1)|x|r−2, r ≥ 1, Seminaire Point de vue, Paris; 23 January 2012; part 1 x 6= 0. 1.5 1. Log-concave densities / distributions: definitions • Many univariate parametric families on R are log-concave, for example: B Normal (µ, σ 2) B Uniform(a, b) B Gamma(r, λ) for r ≥ 1 B Beta(a, b) for a, b ≥ 1 B Subbotin(r) with r ≥ 1. • tr densities with r > 0 are not log-concave • Tails of log-concave densities are necessarily sub-exponential: i.e. if X ∼ f ∈ P F2, then Eexp(c|X|) < ∞ for some c > 0. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.6 1. Log-concave densities / distributions: definitions Log-concave densities on Rd: • A density f on Rd is log-concave if f (x) = exp(ϕ(x)) with ϕ concave. • Examples B The density f of X ∼ Nd(µ, Σ) with Σ positive definite: 1 exp − (x − µ)T Σ−1(x − µ) , f (x) = f (x; µ, Σ) = q 2 (2π)d|Σ 1 −logf (x) = (x − µ)T Σ−1(x − µ) − (1/2)log(2π|Σ), 2 ! 2 ∂ 2 D (−logf )(x) ≡ (−logf )(x), i, j = 1, . . . , d = Σ−1. ∂xi∂xj 1 B If K ⊂ Rd is compact and convex, then f (x) = 1K (x)/λ(K) is a log-concave density. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.7 1. Log-concave densities / distributions: definitions Log-concave measures: Suppose that P is a probability measure on (Rd, Bd). P is a logconcave measure if for all nonempty A, B ∈ Bd and λ ∈ (0, 1) we have P (λA + (1 − λ)B) ≥ {P (A)}λ{P (B)}1−λ. • A set A ⊂ Rd is affine if tx + (1 − t)y ∈ A for all x, y ∈ A, t ∈ R. • The affine hull of a set A ⊂ Rd is the smallest affine set containing A. Theorem. (Prékopa (1971, 1973), Rinott (1976)). Suppose P is a probability measure on Bd such that the affine hull of supp(P ) has dimension d. Then P is log-concave if and only if there is a log-concave (density) function f on Rd such that P (B) = Z B f (x)dx for all B ∈ Bd . Seminaire Point de vue, Paris; 23 January 2012; part 1 1.8 2. Properties of log-concave densities Properties: log-concave densities on R: • A density f on R is log-concave if and only if its convolution with any unimodal density is again unimodal (Ibragimov, 1956). • Every log-concave density f is unimodal (but need not be symmetric). • P0 is closed under convolution. • P0 is closed under weak limits Seminaire Point de vue, Paris; 23 January 2012; part 1 1.9 2. Properties of log-concave densities Properties: log-concave densities on Rd: • Any log−concave f is unimodal. • The level sets of f are closed convex sets. • Log-concave densities correspond to log-concave measures. Prékopa, Rinott. • Marginals of log-concave distributions are log-concave: if f (x, y) is a log-concave density on Rm+n, then g(x) = Z Rn f (x, y)dy is a log-concave density on Rm. Prékopa, Brascamp-Lieb. • Products of log-concave densities are log-concave. • P0 is closed under convolution. • P0 is closed under weak limits. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.10 3. Some consequences and connections (statistics and probability) • (a) f is log-concave if and only if det((f (xi −yj ))i,j∈{1,2}) ≥ 0 for all x1 ≤ x2, y1 ≤ y2; i.e f is a Polya frequency density of order 2; thus log-concave = P F2 = strongly uni-modal • (b) The densities pθ (x) ≡ f (x − θ) for θ ∈ R have monotone likelihood ratio (in x) if and only if f is log-concave. Proof of (b): pθ (x) = f (x − θ) has MLR iff f (x − θ0) f (x0 − θ0) 0, θ < θ0 ≤ for all x < x f (x − θ) f (x0 − θ) This holds if and only if logf (x − θ0) + logf (x0 − θ) ≤ logf (x0 − θ0) + logf (x − θ). (1) Let t = (x0 − x)/(x0 − x + θ0 − θ) and note that Seminaire Point de vue, Paris; 23 January 2012; part 1 1.11 3. Some consequences and connections (statistics and probability) x − θ = t(x − θ0) + (1 − t)(x0 − θ), x0 − θ0 = (1 − t)(x − θ0) + t(x0 − θ) Hence log-concavity of f implies that logf (x − θ) ≥ t logf (x − θ0) + (1 − t)logf (x0 − θ), logf (x0 − θ0) ≥ (1 − t)logf (x − θ0) + t logf (x0 − θ). Adding these yields (1); i.e. MLR in x. f log-concave implies pθ (x) has Now suppose that pθ (x) has MLR so that (1) holds. In particular that holds if x, x0, θ, θ0 satisfy x − θ0 = a < b = x0 − θ and t = (x0 − x)/(x0 − x + θ0 − θ) = 1/2, so that x − θ = (a + b)/2 = x0 − θ0. Then (1) becomes logf (a) + logf (b) ≤ 2logf ((a + b)/2). This together with measurability of f implies that f is logconcave. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.12 3. Some consequences and connections (statistics and probability) Proof of (a): Suppose f is P F2. Then for x < x0, y < y 0, ! 0 f (x − y) f (x − y ) det f (x0 − y) f (x0 − y 0) = f (x − y)f (x0 − y 0) − f (x − y 0)f (x0 − y) ≥ 0 if and only if f (x − y 0)f (x0 − y) ≤ f (x − y)f (x0 − y 0), or, if and only if f (x0 − y 0) f (x − y 0) ≤ . f (x − y) f (x0 − y) That is, py (x) has MLR in x. log-concave. By (b) this is equivalent to f Seminaire Point de vue, Paris; 23 January 2012; part 1 1.13 3. Some consequences and connections (statistics and probability) Theorem. (Brascamp-Lieb, 1976). Suppose X ∼ f = e−ϕ with ϕ convex and D2ϕ > 0, and let g ∈ C 1(Rd). Then V arf (g(X)) ≤ Eh(D2ϕ)−1∇g(X), ∇g(X)i. (Poincaré - type inequality for log-concave densities) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.14 3. Some consequences and connections (statistics and probability) Further consequences: Peakedness and majorization Theorem 1. (Proschan, 1965) Suppose that f on R is logconcave and symmetric about 0. Let X1, . . . , Xn be i.i.d. with density f , and suppose that p, p0 ∈ Rn + satisfy • p1 ≥ p2 ≥ · · · ≥ pn, p01 ≥ p02 ≥ · · · ≥ p0n, Pk 0 Pk • p ≤ pj , k ∈ {1, . . . , n}, 1 1 j Pn Pn 0 • p = 1 j 1 pj = 1. P 0 (That is, p0 ≺ p.) Then n pj Xj is strictly more peaked than 1 Pn 1 pj Xj : P | n X 1 p0j Xj | ≥ t < P | n X pj Xj | ≥ t for all t ≥ 0. 1 Seminaire Point de vue, Paris; 23 January 2012; part 1 1.15 3. Some consequences and connections (statistics and probability) Example: p01 = · · · = p1 = · · · = pn−1 = 1/(n − 1), pn = 0, while P Pn 0 p0n = 1/n. Then p p0 (since n p = 1 j 1 pj = 1 and Pk Pk 0 1 pj = k/(n − 1) ≥ k/n = 1 pj ), and hence if X1 , . . . , Xn are i.i.d. f symmetric and log-concave, P (|X n| ≥ t) < P (|X n−1| ≥ t) < · · · < P (|X1| ≥ t) for all t ≥ 0. Definition: A d−dimensional random variable X is said to be more peaked than a random variable Y if both X and Y have densities and P (Y ∈ A) ≥ P (X ∈ A) for all A ∈ Ad, the class of subsets of Rd which are compact, convex, and symmetric about the origin. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.16 3. Some consequences and connections (statistics and probability) Theorem 2. (Olkin and Tong, 1988) Suppose that f on Rd is log-concave and symmetric about 0. Let X1, . . . , Xn be i.i.d. with density f , and suppose that a, b ∈ Rn satisfy • a1 ≥ a2 ≥ · · · ≥ an, b1 ≥ b2 ≥ · · · ≥ bn, Pk Pk • a ≤ bj , k ∈ {1, . . . , n}, j 1 1 Pn Pn • a = 1 j 1 bj . (That is, a ≺ b.) P Pn Then n a X is more peaked than 1 j j 1 b j Xj : n n X X P aj Xj ∈ A ≥ P bj Xj ∈ A 1 1 for all A ∈ Ad In particular, P k n X 1 aj Xj k ≥ t ≤ P k n X bj Xj k ≥ t for all t ≥ 0. 1 Seminaire Point de vue, Paris; 23 January 2012; part 1 1.17 3. Some consequences and connections (statistics and probability) Corollary: If g is non-decreasing on R+ with g(0) = 0, then Eg k n X aj Xj k ≤ Eg k n X 1 bj Xj k . 1 Another peakedness result: Suppose that Y = (Y1, . . . , Yn) where Yj ∼ N (µj , σ 2) are independent and µ1 ≤ . . . ≤ µn; i.e. µ ∈ Kn where Kn ≡ {x ∈ Rn : x1 ≤ · · · ≤ xn}. Let b = Π(Y |Kn), µ n the least squares projection of Y onto Kn. It is well-known that b = min max µ n Ps j=r Yj s≥i r≤i s − r + 1 ! , i = 1, . . . , n . Seminaire Point de vue, Paris; 23 January 2012; part 1 1.18 3. Some consequences and connections (statistics and probability) Theorem 3. (Kelly) If Y ∼ Nn(µ, σ 2I) and µ ∈ Kn, then is more peaked than Yk − µk for each k ∈ {1, . . . , n}; that b k − µk | ≤ t) ≥ P (|Yk − µk | ≤ t) P (|µ for all t > 0, b k − µk µ is k ∈ {1, . . . , n}. Question: Does Kelly’s theorem continue to hold if the normal distribution is replaced by an arbitrary log-concave joint density symmetric about µ? Seminaire Point de vue, Paris; 23 January 2012; part 1 1.19 4. Strong log-concavity: definitions Definition 1. A density f on R is strongly log-concave if f (x) = h(x)cφ(cx) for some c > 0 where h is log-concave and φ(x) = (2π)−1/2exp(−x2/2). Sufficient condition: logf ∈ C 2(R) with (−logf )00(x) ≥ c2 > 0 for all x. Definition 2. A density f on Rd is strongly log-concave if f (x) = h(x)cγ(cx) for some c > 0 where h is log-concave and γ is the Nd(0, cId) density. Sufficient condition: logf ∈ C 2(Rd) with D2(−logf )(x) ≥ c2Id for some c > 0 for all x ∈ Rd. These agree with strong convexity as defined by Rockafellar & Wets (1998), p. 565. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.20 5. Examples & conterexamples Examples R Example 1. f (x) = h(x)φ(x)/ hφdx where h is the logistic density, h(x) = ex/(1 + ex)2. R Example 2. f (x) = h(x)φ(x)/ hφdx where h is the Gumbel density. h(x) = exp(x − ex). R Example 3. f (x) = h(x)h(−x)/ h(y)h(−y)dy where h is the Gumbel density. Counterexamples Counterexample 1. f logistic: f (x) = ex/(1 + ex)2; (−logf )00(x) = f (x). Counterexample 2. f Subbotin, r ∈ [1, 2) ∪ (2, ∞); f (x) = Cr−1exp(−|x|r /r); (−logf )00(x) = (r − 2)|x|r−2. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.21 0.6 0.5 0.4 0.3 0.2 0.1 -3 -2 -1 1 2 3 Ex. 1: Logistic (red) perturbation of N (0, 1) (green): f (blue) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.22 4 3 2 1 -3 -2 Ex. 1: -1 (−logf )00, 0 1 2 3 Logistic perturbation of N (0, 1) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.23 0.5 0.4 0.3 0.2 0.1 -4 -2 0 2 Ex. 2: Gumbel (red) perturbation of N (0, 1) (green): f (blue) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.24 5 4 3 2 1 -3 -2 Ex. 2: -1 (−logf )00, 0 1 2 3 Gumbel perturbation of N (0, 1) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.25 0.6 0.5 0.4 0.3 0.2 0.1 -2 -1 0 1 2 Ex. 3: Gumbel (·) × Gumbel(−·) (purple); N (0, Vf ) (blue) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.26 8 6 4 2 -2 -1 0 1 2 Ex. 3: −logGumbel(·) × Gumbel(−·) (purple); −logN (0, Vf ) (blue) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.27 5 4 3 2 1 -2 -1 0 1 2 Ex. 3: D2(−logGumbel(·) × Gumbel(−·)) (purple); D2(−logN (0, Vf )) (blue) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.28 0.5 0.4 0.3 0.2 0.1 -4 -2 2 4 Subbotin fr r = 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.29 5 4 3 2 1 -4 -2 2 4 −log fr : r = 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.30 5 4 3 2 1 (−log fr -4 )00: r -2 0 2 4 = 1 (blue), r = 1.5 (red), r = 2 (green), r = 3 (purple) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.31 6. Some consequences, strong log-concavity First consequence Theorem. (Hargé, 2004). Suppose X ∼ Nn(µ, Σ) with density γ and Y has density h · γ with h log-concave, and let g : Rn → R be convex. Then Eg(Y − E(Y )) ≤ Eg(X − EX)). Equivalently, with µ = EX, ν = EY = E(Xh(X))/Eh(X), and g̃ ≡ g(· + µ) E{g̃(X − ν + µ)h(X)} ≤ Eg̃(X) · Eh(X). Seminaire Point de vue, Paris; 23 January 2012; part 1 1.32 6. Some consequences, strong log-concavity More consequences Corollary. (Brascamp-Lieb, 1976). Suppose X ∼ f = exp(−ϕ) with D2ϕ ≥ λId, λ > 0, and let g ∈ C 1(Rd). Then 1 E|∇g(X)|2. λ (Poincaré inequality for strongly log-concave densities; improvements by Hargé (2008)) V arf (g(X)) ≤ Eh(D2ϕ)−1∇g(X), ∇g(X)i ≤ Theorem. (Caffarelli, 2002). Suppose X ∼ Nd(0, I) with density γd and Y has density e−v · γd with v convex. Let T = ∇ϕ be the d unique gradient of a convex map ϕ such that ∇ϕ(X) = Y . Then 0 ≤ D 2 ϕ ≤ Id . (cf. Villani (2003), pages 290-291) Seminaire Point de vue, Paris; 23 January 2012; part 1 1.33 7. Questions & problems • Does strong log-concavity occur naturally? Are there natural examples? • Are there large classes of strongly log-concave densities in connection with other known classes such as P F∞ (Pólya frequency functions of order infinity) or L. Bondesson’s class HM∞ of completely hyperbolically monotone densities? • Does Kelly’s peakedness result for projection onto the ordered cone Kn continue to hold with Gaussian replaced by log-concave (or symmetric log concave)? Seminaire Point de vue, Paris; 23 January 2012; part 1 1.34 Selected references: • Villani, C. (2002). Topics in Optimal Transportation. Amer. Math Soc., Providence. • Dharmadhikari, S. and Joag-dev, K. (1988). Unimodality, Convexity, and Applications. Academic Press. • Marshall, A. W. and Olkin, I. (1979). Inequalities: Majorization and Its Applications. Academic Press. Theory of • Marshall, A. W., Olkin, I., and Arnold, B. C. (2011). Inequalities: Theory of Majorization and Its Applications. Second edition, Springer. • Rockfellar, R. T. and Wets, R. (1998). Variational Analysis. Springer. • Hargé, G. (2004). A convex/log-concave correlation inequality for Gaussian measure and an application to abstract Wiener spaces. Probab. Theory Related Fields 130, 415-440. • Brascamp, H. J. and Lieb, E. H. (1976). On extensions of the BrunnMinkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Functional Analysis 22, 366-389. • Hargé, G. (2008). Reinforcement of an inequality due to Brascamp and Lieb. J. Funct. Anal., 254, 267-300. Seminaire Point de vue, Paris; 23 January 2012; part 1 1.35