Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Laplace Transform with Applications in Machine Learning The Laplace Transform with Applications in Machine Learning Zhang Zhihua July 17, 2013 The Laplace Transform with Applications in Machine Learning Outline Outline I I I I The Laplace Transform Completely Monotone and Bernstein Functions Positive and Negative Definite Kernels Scale Mixture Distributions The Laplace Transform with Applications in Machine Learning The Laplace Transform Introduction Backgrounds I The Laplace transform is a classical mathematical theory and has also received wide applications in science and engineering. The Laplace transform is named after mathematician and astronomer Pierre-Simon Laplace, who used a similar transform (now called z transform) in his work on probability theory. The current widespread use of the transform came about soon after World War II although it had been used in the 19th century by Abel, Lerch, Heaviside, and Bromwich. The Laplace Transform with Applications in Machine Learning The Laplace Transform Introduction Backgrounds I The Laplace transform is essentially an integral transform. In probability theory and statistics, the Laplace transform is defined as expectation of a random variable. I This lecture will provide an detailed illustration for the application of the Laplace transform in machine learning. We will see that the Laplace transform helps us establish a bridge of connecting some machine learning methods from different approaches. The Laplace Transform with Applications in Machine Learning The Laplace Transform Definition The Laplace Transform I Suppose that f is a real-or complex-valued function of the variable t ≥ 0. The Laplace transform of function f (t) is defined by Z ∞ Z z φ(s) = exp(−ts)f (t)dt , lim exp(−ts)f (t)dt 0 z→∞ 0 whenever the limit exists. If the limit does not exist, the integral is said to diverge and there is no Laplace transform defined for f . Here s is a real or complex parameter. However, we always assume that f is a real function and s is a real nonnegative number. Usually, the above equation is also called the Laplace integral. The Laplace Transform with Applications in Machine Learning The Laplace Transform Definition The Laplace Transform I Suppose that {an } is a sequence of real numbers. The Laplace transform is a Dirichlet series of the form φ(s) = ∞ X an exp(−tn s) = lim N→∞ n=1 I N X an exp(−tn s), n=1 where the discrete set {tn } of exponents corresponds to the continuous variable t in the integral. Both the Laplace integral and the Dirichlet series can be included in the Laplace-Stieltjes integral: Z ∞ φ(s) = exp(−st)dF (t). 0 If F (t) is absolutely continuous, it becomes the Laplace integral; if F (t) is a step-function, it is the Dirichlet series. The Laplace Transform with Applications in Machine Learning The Laplace Transform Definition The Laplace Transform I We are especially interested in that F (x) is a probability measure concentrated on [0, ∞). In this case, if X is a random variable with probability distribution F , then the Laplace transform is given by the expectation; namely, φ(s) = E(exp(−sX )). By abuse of language, we also speak of “the Laplace transform of the random variable X .” Note that replacing s by −s gives the moment generating function of X . The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 1 I We consider the Laplace transform of a generalized inverse Gaussian (GIG) distribution. The density of the GIG distribution is defined as p(X = x) = (α/β)γ/2 γ−1 √ x exp(−(αx + βx −1 )/2), x > 0, 2Kγ ( αβ) where Kγ (·) represents the modified Bessel function of the second kind with the index γ. We denote this distribution by GIG(γ, β, α). It is well known that its special cases include the gamma distribution Ga(γ, α/2) when β = 0 and γ > 0, the inverse gamma distribution IG(−γ, β/2) when α = 0 and γ < 0, the inverse Gaussian distribution when γ = −1/2, and the hyperbolic distribution when γ = 0. The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 1 I It is direct to obtain the Laplace transform of GIG(γ, β, α) as p Kγ ( (α + 2s)β) √ φ(s) = (α/(α + 2s))γ/2 , s ≥ 0. (1) Kγ ( αβ) Especially, the Laplace transform of the inverse Gaussian distribution (i.e., γ = −1/2) is p p φ(s) = exp(− (α + 2s)β + αβ) pπ due to K1/2 (z) = K−1/2 (z) = 2z exp(−z). The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 1 I The Laplace transform of gamma distribution Ga(γ, 2/α) with shape γ and scale 2/α (i.e., β = 0) is h α iγ , φ(s) = α+2s which can be also followed from (1) by using the fact 2Kγ (z) z γ lim Γ(γ) = 1 for γ > 0. 2 z→0 The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 2 I Let X be α-stable distribution of the density ∞ 1 X Γ(1+kα) (−1)k+1 x −kα sin(kαπ), p(X = x) = πx k! k=1 for 0 < α < 1. The α-stable distribution has Laplace transform φ(s) = exp(−s α ), s ≥ 0. x >0 The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 2 I Note that the 12 -stable distribution becomes Lévy distribution Lv(0, 1/2) where Lv(u, σ 2 ) represents that x follows an Lévy distribution of the form σ σ2 p(x) = √ (x−u)−3/2 exp − , for x ≥ u. 2(x−u) 2π We can obtain that the Laplace transform of Lv(0, σ 2 ) is exp(−σ(2s)1/2 ). This shows that exp(−s 1/2 ) is the Laplace transform of Lévy distribution Lv(0, 1/2). The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 3 We see several discrete random variables. I First, let X be a Poisson distribution of intensity λ. That is, Pr(X = k) = λk −λ e , for k = 0, 1, 2, . . . k! The transform of Poisson distribution Po(λ) is exp[−λ(1 − exp(−s))]. The Laplace Transform with Applications in Machine Learning The Laplace Transform Examples Example 4 I Second, we let random variable X follow negative binomial distribution NB(ξ, q) with ξ > 0 and 0 < q < 1. That is, the probability mass function of X is Pr(X = k) = Γ(k+ξ) ξ q (1 − q)k . k!Γ(ξ) (2) The Laplace transform is φ(s) = q ξ [1 − (1−q) exp(−s)]−ξ , s ≥ 0. It is worth noting that for any fixed nonzero constant α, αX also follows negative binomial distribution NB(ξ, q) but it takes values on {kα; k = 0, 1, . . .}. In this case, the Laplace transform of αX is φ(s) = q ξ [1 − (1−q) exp(−sα)]−ξ , s ≥ 0. (3) The Laplace Transform with Applications in Machine Learning The Laplace Transform Properties Uniqueness Theorem (Uniqueness) Distinct probability distributions have distinct Laplace transforms. The uniqueness is only with respect to a probability measure. For any function, the theorem would be no longer satisfied. The Laplace Transform with Applications in Machine Learning The Laplace Transform Properties Continuity Theorem Let Fn be a probability distribution with Laplace transform φn , for n = 1, 2, . . . If Fn → F where F is a possibly defective distribution with Laplace transform φ, then φn (s) → φ(s) for s > 0. Conversely, if the sequence {φn (s)} converges for each s > 0 to a limit φ(s), then φ is the Laplace transform of some possibly defective distribution F such that Fn → F . Furthermore, the limit F is not defective iff φ(s) → 1 as s → 0. The Laplace Transform with Applications in Machine Learning The Laplace Transform Properties Example 1’ Let us return to Example 1. As we see earlier, GIG(γ, β, α) converges to Ga(γ, 1/(2α)) as β → 0. Theorem 2 implies that p h α iγ Kγ ( (α + 2s)β) √ . (α/(α + 2s))γ/2 = lim β→0 α+2s Kγ ( αβ) 2 γ In fact, the limit can also obtained from that Kγ (z) ∼ Γ(γ) 2 ( z ) for γ > 0 as z → 0. Additionally, recall that p p p lim exp(− (α + 2s)β + αβ) = exp(− 2sβ), α→0 which implies that inverse Gaussian distribution GIG(−1/2, β, α) converges to Lévy distribution Lv(0, β) as α → 0. The Laplace Transform with Applications in Machine Learning The Laplace Transform Properties Example 3’ We now see random variable αX in Example 3. Let q = α= ξ ξ+1 . ξ ξ+λ and Taking the limit of ξ → ∞, we have lim q ξ [1 − (1−q) exp(−αs)]−ξ = exp[−λ(1 − exp(−s))]. ξ→∞ Theorem 2 then shows that lim NB(ξ, ξ/(ξ + λ)) = Po(λ). On ξ→∞ the other hand, let α = q. For fixed ξ, we have h 1 iξ lim q ξ [1 − (1−q) exp(−αs)]−ξ = . α→0 1+s We then know that αX converges in distribution to a gamma distribution with shape ξ and scale 1. That is, for any u > 0 Z u 1 ξ−1 −t t e dt. lim Pr(αX ≤ u) = α→0 0 Γ(ξ) The Laplace Transform with Applications in Machine Learning The Laplace Transform Remarks Remarks There are other types of transforms related to the Laplace transform. For example, the Stieltjes transform is Z ∞ 1 g (s) = dF (x), s > 0. (4) s +x 0 R∞ 1 = 0 e −su e −xu du. Using the Fubini theorem, we Note that s+x have Z ∞ Z ∞ −su g (s) = e e −xu dF (x)du, 0 0 which is a double Laplace transform. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Properties Definition An important application of the Laplace transform lies in the definition of a completely monotone function. Let f ∈ C ∞ (0, ∞) with f ≥ 0. We say f is completely monotone if (−1)n f (n) ≥ 0 for all n ∈ N, and a Bernstein function if (−1)n f (n) ≤ 0 for all n ∈ N. Thus, a nonnegative function f : (0, ∞) → R is a Bernstein function iff f 0 is a completely monotone function. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Properties Properties I For any two completely monotone (or Bernstein) functions f1 (x) and f2 (x), it is clearly seen that a1 f1 (x) + a2 f2 (x) where a1 and a2 are nonnegative constants is also completely monotone (or Bernstein). I Moreover, f1 (x)f2 (x) are also completely monotone when f1 (x) and f2 (x) are completely monotone, while f1 (f2 (x)) is Bernstein when when f1 (x) and f2 (x) are Bernstein. I However, f1 (f2 (x)) is completely monotone if f1 is completely monotone and f2 is nonnegative with a completely monotone derivative. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Properties Properties Theorem Let φ and ψ be functions from (0, ∞) to R, and (x ∧ y ) denote min(x, y ). (1) A necessary and sufficient condition that φ is completely monotone is that Z ∞ φ(s) = e −su dF (u), 0 where F (u) is non-decreasing and the integral converges for s ∈ (0, ∞). Furthermore, F (u) is a probability distribution, iff φ(0) = 1. (2) ψ is a Bernstein function iff the mapping s → exp(−tψ(s)) is completely monotone for all t ≥ 0. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Properties Properties Theorem Let φ and ψ be functions from (0, ∞) to R, and (x ∧ y ) denote min(x, y ). (3) ψ is a Bernstein function iff it admits the representation of Z ψ(s) = a + bs + (1 − exp(−su))ν(du), (5) (0,∞) where a, b ≥ 0 and ν is a Lévy measure on (0, ∞) such that R (1 ∧ u)ν(du) < ∞. Furthermore, the triplet (a, b, ν) (0,∞) uniquely determines ψ. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Properties Remarks I The representation in (5) is also well known as the Lévy-Khintchine formula. This formula implies that 0 lim ψ(s) = a and lim ψ(s) s = lim ψ (s) = b. Let ν be a s→0+ s→∞ s→∞ Borel measure defined on Rd − {0} = {x ∈ Rd : x 6= 0}. We call ν a Lévy measure if Z (kzk22 ∧ 1)ν(dz) < ∞. (6) Rd −{0} The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Examples Example 5 I The following elementary functions are completely monotone, namely, 1 c , e −bs , and ln(1 + ), ν (a + s) s where a ≥ 0, b ≥ 0, ν ≥ 0 and c > 0. I The following functions are then Bernstein, which are s α , 1 − exp(−βs), and log(1 + γs), where α ∈ (0, 1], β > 0, γ > 0. Consider that s = 1 − exp(− log(1 + s)), 1+s which is also Bernstein based on the composition property. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Examples Example 6 I We now have from Theorem 4 that e −bs can be represented as the Laplace transform of some probability distribution, but so cannot ln(1 + cs ). Moreover, the functions φ given in Examples 1, 2 and 3 are completely monotone. The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Examples Example 7 I It is immediate that ψ(s) = s α for α ∈ (0, 1) is a Bernstein function of s in (0, ∞). Moreover, it is also directly verified that Z ∞ α dx sα = (1 − e −su ) 1+α , Γ(1−α) 0 u which implies that ν(du) = du α . Γ(1−α) u 1+α Obviously, ψ(s) = s is also Bernstein. It is an extreme case, because we have that a = 0, b = 1 and ν(du) = δ0 (u)du (the Dirac Delta measure). The Laplace Transform with Applications in Machine Learning Completely Monotone and Bernstein Functions Examples Example 8 We consider the following function: α = 0, log(1 h + γs) − α i 1 1−α ψα (s) = α ∈ (0, 1), α 1 − 1 + (1−α)γs 1 − exp(−γs) α=1 for γ > 0. This function is Bernstein for any α ∈ [0, 1]. In this case, we have that a = 0 and b = 0. The corresponding Lévy measure is 1 exp(− γu )du α = 0, u 1 − 1−α α γ[(1−α)γ] −1 1 ν(du) = 1−α exp − (1−α)γ u du α ∈ (0, 1), Γ(1/(1−α)) u δ (u)du α = 1. γ The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Introduction I Machine learning methods based on the reproducing kernel theory have received successful applications, and they become a classical tool in machine learning and data mining. There is an equivalent relationship between a reproducing kernel Hilbert space and a positive definite kernel. We here exploit completely monotone functions in positive and negative definite kernels. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Definition Definition Definition Let X ⊂ Rd be a nonempty set. A function k : X ×X → R is said to be symmetric if k(x, y) = k(y, x) for all x and y ∈ X . A symmetric function k : X ×X → R is called a positive definite kernel iff m X ai aj k(xi , xj ) ≥ 0 i,j=1 for all m ∈ N, {x1 , . . . , xm } ⊂ X and {a1 , . . . , am } ⊂ R. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Definition Definition Definition Let X ⊂ Rd be a nonempty set. A function k : X ×X → R is said to be symmetric if k(x, y) = k(y, x) for all x and y ∈ X . A symmetric function k is called a conditionally positive kernel iff m X ai aj k(xi , xj ) ≥ 0 i,j=1 for all m ≥ 2, {x1 , . . . , xm } ⊂ X and {a1 , . . . , am } ⊂ R with P m j=1 aj = 0. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Definition Remarks I It is worth pointing out that if X = {x1 , . . . , xm } is a finite set, then k is positive definite iff the m×m matrix K = [k(xi , xj )] is positive semidefinite. I We call a kernel k negative definite if −k is conditionally positive definite. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Properties Theorem Let µR be a probability measure on the half-line R+ such that ∞ 0 < 0 udµ(u) < ∞. Then k : X ×X R ∞→ R+ is negative definite iff the Laplace transform of tk (i.e., 0 exp(−utk)dµ(u)) is positive definite for all t > 0. Taking µ(du) = δ1 (u)du, we obtain the following theorem, which illustrates a connection between positive and negative definite kernels. Theorem (Schoenberg correspondence) For function k : Rd → R, it is negative definite iff exp(−tk) is positive definite for all t > 0. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Definition Definition Especially, we are interested in the definition that k(xi , xj ) = ψ(xi − xj ). In this case, we say that ψ : X → R is positive definite if kernel k(xi , xj ) , ψ(xi − xj ) is positive definite, and ψ is conditionally positive definite if k(xi , xj ) , ψ(xi − xj ) is conditionally positive definite. In this regard, ψ : X → R is said to be symmetric if ψ(−x) = ψ(x) for all x ∈ X . Moreover, we always have ψ(0) = ψ(x − x) ≥ 0 if ψ is positive definite. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Examples Example 9 Consider the function ψ(x) = kxk22 . Let x1 , . . . , xm ∈ Rd , and let 2 A = [aij ] be the m×m matrix with Pm aij = kxi −xj k2 . Given any real numbers β1 , . . . , βm such that j=1 βj = 0, we have m X i,j=1 βi βj kxi −xj k22 = m X βi βj (kxi k22 + kxj k22 − 2xT i xj ) i,j=1 = −2 m X m X 2 (βi xi ) (βj xj ) = −2 βi xi ≤ 0, i,j=1 T i=1 2 which implies that kx − yk22 is negative definite. It then follows from Theorem 8 that exp(−tkx − yk22 ) for all t > 0 is positive definite; that is, the Gaussian RBF kernel is positive definite. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Examples Example 10 Consider the function ψ(x) = kxk1 = Z exp(−t|x|) = Pp j=1 |xj |. Note that ∞ exp(−t 2 u|x|2 )Lv(u|0, 1/2)du (7) 0 for all t > 0. It then follows from the negative definiteness of |x − y |2 and Theorem 7 that exp(−t|x − y |) is positive definite for all t > 0. Again using Theorem 8, we obtain that |x − y | is negative P definite. Accordingly, we have that kx − yk1 (= pj=1 |xj − yj |) is negative definite and that exp(−tkx − yk1 ) for all t > 0 is positive definite. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Distance Functions Distance Functions It is well known that a positive definite kernel can induce a kernel matrix, which is applied to nonlinear machine learning methods such as kernel principal component analysis, generalized Fisher discriminant analysis, spectral clustering. We now show that a negative definite kernel can then define an Euclidean distance matrix, which has been used in multidimensional scaling (MDS). The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Distance Functions Distance Matrices Definition An m×m matrix D = [drs ] is called a distance matrix if drs = dsr , drr = 0 and drs ≥ 0. An m×m distance matrix D = [drs ] is said to be Euclidean if there exists a configuration of points in some Euclidean space whose interpoint distances are given by D; that is, if for some p ∈ N, there exist points z1 , . . . , zm ∈ Rp such that drs2 = kzr −zs k22 = (zr −zs )T (zr −zs ). The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Distance Functions Main Result Theorem Let D = [drs ] be an m×m distance matrix and define B = Hm AHm where Hm = Im − m1 1m 1T m (the centering matrix) and A = [ars ] with ars = − 12 drs2 . Then D is Euclidean iff B is positive semidefinite. Clearly, if A is conditionally positive semidefinite, then Hm AHm is positive semidefinite. Thus, we directly have the following corollary. Corollary Let g (s) be Bernstein on (0, ∞) with for any pg (0+) = 0. Then vector x1 , . . . , xm ∈ Rd , the matrix g (kxi −xj kαα ) for any α ∈ {1, 2} is an Euclidean distance matrix. The Laplace Transform with Applications in Machine Learning Positive and Negative Definite Kernels Distance Functions Examples 11 √ The functions 1+s − 1, log(1+s) and 1 − exp(−s) are Bernstein on (0, ∞) and equal to zero at the origin. Thus, p for any distinct 1+kxi −xj k1 − 1 vectors x1 , . . . , xm ∈ Rd , both the matrices q and 1+kxi −xj k22 − 1 , log(1+kxi −xj k1 ) and 2 log(1+kxi −xj k2 ) 2, and 1− exp(−kxi −xj k1 ) and 1− exp(−kxi −xj k2 ) are negative semidefinite and nonsingular. Denote these matrices by A. Then −Hm AHm is an Euclidean distance matrix. Furthermore, the rank of −Hm AHm is m−1. This implies that p = m − 1 (the dimension of the Euclidean space). The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Definition Introduction We consider the application of completely monotone functions in scale mixture distributions, which in turn have wide use in applied and computational statistics. We particularly study scale mixtures of exponential power (EP) distributions. The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Definition Exponential Power (EP) Distributions A univariate random variable X ∈ R is said to follow an EP distribution if the density is specified by η −1/q −1 q (2η) q 1 1 exp(− |x−u|q ), exp(− |x−u|q ) = p(X = x) = q+1 1 2η 2 Γ( q ) 2η 2 q Γ( q+1 ) q with η > 0. It is typically assumed that 0 < q ≤ 2. The distribution is denoted by EP(x|u, η, q). There are two classical special cases: the Gaussian distribution arises when q = 2 (denoted N(x|u, η)) and the Laplace distribution arises when q = 1 (denoted L(x|u, η)). As for the case that 0 < q < 2, the corresponding density is called a bridge distribution. Moreover, we will only consider the setting that u = 0. The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Definition Scale Mixture of EP Distributions Let τ = 1/(2η) be a random variable with probability density p(τ ). The marginal distribution of x is Z ∞ fX (x) = EP(b|0, 1/(2τ ), q)p(τ )dτ. 0 We call the resulting distribution a scale mixture of exponential power distributions. This distribution is symmetric about 0 and can be denoted as φ(|x|q ). Especially, when q = 2, it is a scale mixture of normal distributions. When q = 1, it is called a scale mixture of Laplace distributions. The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Definition Scale Mixture of EP Distributions Let s = |x|q . Then we can write fX (s) as follows Z ∞ 0 τ 1/q exp(−τ s)p(τ )dτ , 2Γ( q+1 q ) which we now denote by φ(s). Clearly, φ(s) is the Laplace 1/q transform of τ q+1 p(τ ). This implies that φ(s) is completely 2Γ( q ) monotone on (0, ∞). (8) The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Main Result Theorem Let X have a density function fX (x), which is a function of |x|q with 0 < q ≤ 2 (denoted φ(|x|q )). Then φ(|x|q ) can be represented as a scale mixture of exponential power distributions iff φ(s) is completely monotone on (0, ∞). The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Examples The Laplace Distribution The marginal density of b is thus defined by √ Z +∞ √ α exp(− α|b|) = N(b|0, η)Ga(η|γ, α/2)dη. 2 0 The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Examples The Bridge Distribution The marginal density of b is thus defined by Z +∞ p α exp(− α|b|) = L(b|0, η)Ga(η|3/2, α/2)dη. 4 0 The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Examples Generalized t distributions We t consider a scale mixtures of exponential power EP(b|u, η, q) with inverse gamma IG(η|τ /2, τ /(2λ)). They are called generalized t distributions. The density is τ 1 λ q −( τ2 + q1 ) q Γ( 2 + q ) λ q1 1 + |b| 2 Γ( τ2 )Γ( q1 ) τ τ Z = EP(b|0, η, q)IG(η|τ /2, τ /(2λ))dη, where τ > 0, λ > 0 and q > 0. Clearly, when q = 2 the generalized t distribution becomes to a t-distribution. Moreover, when τ = 1, it is the Cauchy distribution. The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Examples Generalized Double Pareto Distributions When q = 1, it is also called called the resulting distributions generalized double Pareto distributions. The densities are given as follows: Z ∞ λ λ|b| −(τ /2+1) 1+ = L(b|0, η)IG(η|τ /2, τ /(2λ))dη, λ > 0, τ > 0. 4 τ 0 Furthermore, consider τ = 1, such that η ∼ IG(η|1/2, 1/(2λ)). We obtain λ p(b) = (1 + λ|b|)−3/2 . 4 The Laplace Transform with Applications in Machine Learning Scale Mixture Distributions Examples EP-GIG We now develop a family of distributions by mixing the exponential power EP(b|0, η, q) with the generalized inverse Gaussian GIG(η|γ, β, α). The marginal density of b is thus defined by Z +∞ p(b) = EP(b|0, η, q)GIG(η|γ, β, α)dη. 0 We refer to this distribution as the EP-GIG. The density is p K γq−1 ( α(β+|b|q )) α1/(2q) p(b) = q+1q [β+|b|q ](γq−1)/(2q) . √ γ/2 q+1 β q 2 Γ( q )Kγ ( αβ) The Laplace Transform with Applications in Machine Learning Thanks Thanks Questions & Comments!