Download The Laplace Transform with Applications in Machine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Laplace Transform with Applications in Machine Learning
The Laplace Transform with Applications in
Machine Learning
Zhang Zhihua
July 17, 2013
The Laplace Transform with Applications in Machine Learning
Outline
Outline
I
I
I
I
The Laplace Transform
Completely Monotone and Bernstein Functions
Positive and Negative Definite Kernels
Scale Mixture Distributions
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Introduction
Backgrounds
I
The Laplace transform is a classical mathematical theory and
has also received wide applications in science and engineering.
The Laplace transform is named after mathematician and
astronomer Pierre-Simon Laplace, who used a similar
transform (now called z transform) in his work on probability
theory. The current widespread use of the transform came
about soon after World War II although it had been used in
the 19th century by Abel, Lerch, Heaviside, and Bromwich.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Introduction
Backgrounds
I
The Laplace transform is essentially an integral transform. In
probability theory and statistics, the Laplace transform is
defined as expectation of a random variable.
I
This lecture will provide an detailed illustration for the
application of the Laplace transform in machine learning. We
will see that the Laplace transform helps us establish a bridge
of connecting some machine learning methods from different
approaches.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Definition
The Laplace Transform
I
Suppose that f is a real-or complex-valued function of the
variable t ≥ 0. The Laplace transform of function f (t) is
defined by
Z ∞
Z z
φ(s) =
exp(−ts)f (t)dt , lim
exp(−ts)f (t)dt
0
z→∞ 0
whenever the limit exists. If the limit does not exist, the
integral is said to diverge and there is no Laplace transform
defined for f . Here s is a real or complex parameter.
However, we always assume that f is a real function and s is a
real nonnegative number. Usually, the above equation is also
called the Laplace integral.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Definition
The Laplace Transform
I
Suppose that {an } is a sequence of real numbers. The
Laplace transform is a Dirichlet series of the form
φ(s) =
∞
X
an exp(−tn s) = lim
N→∞
n=1
I
N
X
an exp(−tn s),
n=1
where the discrete set {tn } of exponents corresponds to the
continuous variable t in the integral.
Both the Laplace integral and the Dirichlet series can be
included in the Laplace-Stieltjes integral:
Z ∞
φ(s) =
exp(−st)dF (t).
0
If F (t) is absolutely continuous, it becomes the Laplace
integral; if F (t) is a step-function, it is the Dirichlet series.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Definition
The Laplace Transform
I
We are especially interested in that F (x) is a probability
measure concentrated on [0, ∞). In this case, if X is a
random variable with probability distribution F , then the
Laplace transform is given by the expectation; namely,
φ(s) = E(exp(−sX )).
By abuse of language, we also speak of “the Laplace
transform of the random variable X .” Note that replacing s by
−s gives the moment generating function of X .
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 1
I
We consider the Laplace transform of a generalized inverse
Gaussian (GIG) distribution. The density of the GIG
distribution is defined as
p(X = x) =
(α/β)γ/2 γ−1
√
x
exp(−(αx + βx −1 )/2), x > 0,
2Kγ ( αβ)
where Kγ (·) represents the modified Bessel function of the
second kind with the index γ. We denote this distribution by
GIG(γ, β, α). It is well known that its special cases include
the gamma distribution Ga(γ, α/2) when β = 0 and γ > 0,
the inverse gamma distribution IG(−γ, β/2) when α = 0 and
γ < 0, the inverse Gaussian distribution when γ = −1/2, and
the hyperbolic distribution when γ = 0.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 1
I
It is direct to obtain the Laplace transform of GIG(γ, β, α) as
p
Kγ ( (α + 2s)β)
√
φ(s) =
(α/(α + 2s))γ/2 , s ≥ 0.
(1)
Kγ ( αβ)
Especially, the Laplace transform of the inverse Gaussian
distribution (i.e., γ = −1/2) is
p
p
φ(s) = exp(− (α + 2s)β + αβ)
pπ
due to K1/2 (z) = K−1/2 (z) = 2z
exp(−z).
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 1
I
The Laplace transform of gamma distribution Ga(γ, 2/α)
with shape γ and scale 2/α (i.e., β = 0) is
h α iγ
,
φ(s) =
α+2s
which can be also followed from (1) by using the fact
2Kγ (z) z γ
lim Γ(γ)
= 1 for γ > 0.
2
z→0
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 2
I
Let X be α-stable distribution of the density
∞
1 X Γ(1+kα)
(−1)k+1 x −kα sin(kαπ),
p(X = x) =
πx
k!
k=1
for 0 < α < 1. The α-stable distribution has Laplace
transform
φ(s) = exp(−s α ), s ≥ 0.
x >0
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 2
I
Note that the 12 -stable distribution becomes Lévy distribution
Lv(0, 1/2) where Lv(u, σ 2 ) represents that x follows an Lévy
distribution of the form
σ
σ2 p(x) = √ (x−u)−3/2 exp −
, for x ≥ u.
2(x−u)
2π
We can obtain that the Laplace transform of Lv(0, σ 2 ) is
exp(−σ(2s)1/2 ). This shows that exp(−s 1/2 ) is the Laplace
transform of Lévy distribution Lv(0, 1/2).
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 3
We see several discrete random variables.
I
First, let X be a Poisson distribution of intensity λ. That is,
Pr(X = k) =
λk −λ
e , for k = 0, 1, 2, . . .
k!
The transform of Poisson distribution Po(λ) is
exp[−λ(1 − exp(−s))].
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Examples
Example 4
I
Second, we let random variable X follow negative binomial
distribution NB(ξ, q) with ξ > 0 and 0 < q < 1. That is, the
probability mass function of X is
Pr(X = k) =
Γ(k+ξ) ξ
q (1 − q)k .
k!Γ(ξ)
(2)
The Laplace transform is
φ(s) = q ξ [1 − (1−q) exp(−s)]−ξ ,
s ≥ 0.
It is worth noting that for any fixed nonzero constant α, αX
also follows negative binomial distribution NB(ξ, q) but it
takes values on {kα; k = 0, 1, . . .}. In this case, the Laplace
transform of αX is
φ(s) = q ξ [1 − (1−q) exp(−sα)]−ξ ,
s ≥ 0.
(3)
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Properties
Uniqueness
Theorem (Uniqueness)
Distinct probability distributions have distinct Laplace transforms.
The uniqueness is only with respect to a probability measure. For
any function, the theorem would be no longer satisfied.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Properties
Continuity
Theorem
Let Fn be a probability distribution with Laplace transform φn , for
n = 1, 2, . . .
If Fn → F where F is a possibly defective distribution with Laplace
transform φ, then φn (s) → φ(s) for s > 0.
Conversely, if the sequence {φn (s)} converges for each s > 0 to a
limit φ(s), then φ is the Laplace transform of some possibly
defective distribution F such that Fn → F . Furthermore, the limit
F is not defective iff φ(s) → 1 as s → 0.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Properties
Example 1’
Let us return to Example 1. As we see earlier, GIG(γ, β, α)
converges to Ga(γ, 1/(2α)) as β → 0. Theorem 2 implies that
p
h α iγ
Kγ ( (α + 2s)β)
√
.
(α/(α + 2s))γ/2 =
lim
β→0
α+2s
Kγ ( αβ)
2 γ
In fact, the limit can also obtained from that Kγ (z) ∼ Γ(γ)
2 ( z ) for
γ > 0 as z → 0. Additionally, recall that
p
p
p
lim exp(− (α + 2s)β + αβ) = exp(− 2sβ),
α→0
which implies that inverse Gaussian distribution GIG(−1/2, β, α)
converges to Lévy distribution Lv(0, β) as α → 0.
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Properties
Example 3’
We now see random variable αX in Example 3. Let q =
α=
ξ
ξ+1 .
ξ
ξ+λ
and
Taking the limit of ξ → ∞, we have
lim q ξ [1 − (1−q) exp(−αs)]−ξ = exp[−λ(1 − exp(−s))].
ξ→∞
Theorem 2 then shows that lim NB(ξ, ξ/(ξ + λ)) = Po(λ). On
ξ→∞
the other hand, let α = q. For fixed ξ, we have
h 1 iξ
lim q ξ [1 − (1−q) exp(−αs)]−ξ =
.
α→0
1+s
We then know that αX converges in distribution to a gamma
distribution with shape ξ and scale 1. That is, for any u > 0
Z u
1 ξ−1 −t
t e dt.
lim Pr(αX ≤ u) =
α→0
0 Γ(ξ)
The Laplace Transform with Applications in Machine Learning
The Laplace Transform
Remarks
Remarks
There are other types of transforms related to the Laplace
transform. For example, the Stieltjes transform is
Z ∞
1
g (s) =
dF (x), s > 0.
(4)
s +x
0
R∞
1
= 0 e −su e −xu du. Using the Fubini theorem, we
Note that s+x
have
Z ∞
Z ∞
−su
g (s) =
e
e −xu dF (x)du,
0
0
which is a double Laplace transform.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Properties
Definition
An important application of the Laplace transform lies in the
definition of a completely monotone function. Let f ∈ C ∞ (0, ∞)
with f ≥ 0. We say f is completely monotone if (−1)n f (n) ≥ 0 for
all n ∈ N, and a Bernstein function if (−1)n f (n) ≤ 0 for all n ∈ N.
Thus, a nonnegative function f : (0, ∞) → R is a Bernstein
function iff f 0 is a completely monotone function.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Properties
Properties
I
For any two completely monotone (or Bernstein) functions
f1 (x) and f2 (x), it is clearly seen that a1 f1 (x) + a2 f2 (x) where
a1 and a2 are nonnegative constants is also completely
monotone (or Bernstein).
I
Moreover, f1 (x)f2 (x) are also completely monotone when
f1 (x) and f2 (x) are completely monotone, while f1 (f2 (x)) is
Bernstein when when f1 (x) and f2 (x) are Bernstein.
I
However, f1 (f2 (x)) is completely monotone if f1 is completely
monotone and f2 is nonnegative with a completely monotone
derivative.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Properties
Properties
Theorem
Let φ and ψ be functions from (0, ∞) to R, and (x ∧ y ) denote
min(x, y ).
(1) A necessary and sufficient condition that φ is completely
monotone is that
Z ∞
φ(s) =
e −su dF (u),
0
where F (u) is non-decreasing and the integral converges for
s ∈ (0, ∞). Furthermore, F (u) is a probability distribution, iff
φ(0) = 1.
(2) ψ is a Bernstein function iff the mapping s → exp(−tψ(s)) is
completely monotone for all t ≥ 0.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Properties
Properties
Theorem
Let φ and ψ be functions from (0, ∞) to R, and (x ∧ y ) denote
min(x, y ).
(3) ψ is a Bernstein function iff it admits the representation of
Z
ψ(s) = a + bs +
(1 − exp(−su))ν(du),
(5)
(0,∞)
where
a, b ≥ 0 and ν is a Lévy measure on (0, ∞) such that
R
(1
∧ u)ν(du) < ∞. Furthermore, the triplet (a, b, ν)
(0,∞)
uniquely determines ψ.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Properties
Remarks
I
The representation in (5) is also well known as the
Lévy-Khintchine formula. This formula implies that
0
lim ψ(s) = a and lim ψ(s)
s = lim ψ (s) = b. Let ν be a
s→0+
s→∞
s→∞
Borel measure defined on Rd − {0} = {x ∈ Rd : x 6= 0}. We
call ν a Lévy measure if
Z
(kzk22 ∧ 1)ν(dz) < ∞.
(6)
Rd −{0}
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Examples
Example 5
I
The following elementary functions are completely monotone,
namely,
1
c
, e −bs , and ln(1 + ),
ν
(a + s)
s
where a ≥ 0, b ≥ 0, ν ≥ 0 and c > 0.
I
The following functions are then Bernstein, which are
s α , 1 − exp(−βs), and log(1 + γs),
where α ∈ (0, 1], β > 0, γ > 0. Consider that
s
= 1 − exp(− log(1 + s)),
1+s
which is also Bernstein based on the composition property.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Examples
Example 6
I
We now have from Theorem 4 that e −bs can be represented
as the Laplace transform of some probability distribution, but
so cannot ln(1 + cs ). Moreover, the functions φ given in
Examples 1, 2 and 3 are completely monotone.
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Examples
Example 7
I
It is immediate that ψ(s) = s α for α ∈ (0, 1) is a Bernstein
function of s in (0, ∞). Moreover, it is also directly verified
that
Z ∞
α
dx
sα =
(1 − e −su ) 1+α ,
Γ(1−α) 0
u
which implies that
ν(du) =
du
α
.
Γ(1−α) u 1+α
Obviously, ψ(s) = s is also Bernstein. It is an extreme case,
because we have that a = 0, b = 1 and ν(du) = δ0 (u)du (the
Dirac Delta measure).
The Laplace Transform with Applications in Machine Learning
Completely Monotone and Bernstein Functions
Examples
Example 8
We consider the following function:

α = 0,

 log(1
h + γs)
− α i
1
1−α
ψα (s) =
α ∈ (0, 1),
α 1 − 1 + (1−α)γs


1 − exp(−γs)
α=1
for γ > 0. This function is Bernstein for any α ∈ [0, 1]. In this
case, we have that a = 0 and b = 0. The corresponding Lévy
measure is

1

exp(− γu )du
α = 0,

 u
1
− 1−α
α
γ[(1−α)γ]
−1
1
ν(du) =
1−α
exp − (1−α)γ
u du α ∈ (0, 1),
Γ(1/(1−α)) u


 δ (u)du
α = 1.
γ
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Introduction
I
Machine learning methods based on the reproducing kernel
theory have received successful applications, and they become
a classical tool in machine learning and data mining. There is
an equivalent relationship between a reproducing kernel
Hilbert space and a positive definite kernel. We here exploit
completely monotone functions in positive and negative
definite kernels.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Definition
Definition
Definition
Let X ⊂ Rd be a nonempty set. A function k : X ×X → R is said
to be symmetric if k(x, y) = k(y, x) for all x and y ∈ X . A
symmetric function k : X ×X → R is called a positive definite
kernel iff
m
X
ai aj k(xi , xj ) ≥ 0
i,j=1
for all m ∈ N, {x1 , . . . , xm } ⊂ X and {a1 , . . . , am } ⊂ R.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Definition
Definition
Definition
Let X ⊂ Rd be a nonempty set. A function k : X ×X → R is said
to be symmetric if k(x, y) = k(y, x) for all x and y ∈ X . A
symmetric function k is called a conditionally positive kernel iff
m
X
ai aj k(xi , xj ) ≥ 0
i,j=1
for all m ≥ 2, {x1 , . . . , xm } ⊂ X and {a1 , . . . , am } ⊂ R with
P
m
j=1 aj = 0.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Definition
Remarks
I
It is worth pointing out that if X = {x1 , . . . , xm } is a finite
set, then k is positive definite iff the m×m matrix
K = [k(xi , xj )] is positive semidefinite.
I
We call a kernel k negative definite if −k is conditionally
positive definite.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Properties
Theorem
Let µR be a probability measure on the half-line R+ such that
∞
0 < 0 udµ(u) < ∞. Then k : X ×X
R ∞→ R+ is negative definite
iff the Laplace transform of tk (i.e., 0 exp(−utk)dµ(u)) is
positive definite for all t > 0.
Taking µ(du) = δ1 (u)du, we obtain the following theorem, which
illustrates a connection between positive and negative definite
kernels.
Theorem (Schoenberg correspondence)
For function k : Rd → R, it is negative definite iff exp(−tk) is
positive definite for all t > 0.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Definition
Definition
Especially, we are interested in the definition that
k(xi , xj ) = ψ(xi − xj ). In this case, we say that ψ : X → R is
positive definite if kernel k(xi , xj ) , ψ(xi − xj ) is positive definite,
and ψ is conditionally positive definite if k(xi , xj ) , ψ(xi − xj ) is
conditionally positive definite. In this regard, ψ : X → R is said to
be symmetric if ψ(−x) = ψ(x) for all x ∈ X . Moreover, we always
have ψ(0) = ψ(x − x) ≥ 0 if ψ is positive definite.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Examples
Example 9
Consider the function ψ(x) = kxk22 . Let x1 , . . . , xm ∈ Rd , and let
2
A = [aij ] be the m×m matrix with
Pm aij = kxi −xj k2 . Given any real
numbers β1 , . . . , βm such that j=1 βj = 0, we have
m
X
i,j=1
βi βj kxi −xj k22 =
m
X
βi βj (kxi k22 + kxj k22 − 2xT
i xj )
i,j=1
= −2
m
X
m
X
2
(βi xi ) (βj xj ) = −2
βi xi ≤ 0,
i,j=1
T
i=1
2
which implies that kx − yk22 is negative definite. It then follows
from Theorem 8 that exp(−tkx − yk22 ) for all t > 0 is positive
definite; that is, the Gaussian RBF kernel is positive definite.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Examples
Example 10
Consider the function ψ(x) = kxk1 =
Z
exp(−t|x|) =
Pp
j=1 |xj |.
Note that
∞
exp(−t 2 u|x|2 )Lv(u|0, 1/2)du
(7)
0
for all t > 0. It then follows from the negative definiteness of
|x − y |2 and Theorem 7 that exp(−t|x − y |) is positive definite for
all t > 0. Again using Theorem 8, we obtain that |x − y | is
negative
P definite. Accordingly, we have that kx − yk1
(= pj=1 |xj − yj |) is negative definite and that exp(−tkx − yk1 )
for all t > 0 is positive definite.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Distance Functions
Distance Functions
It is well known that a positive definite kernel can induce a kernel
matrix, which is applied to nonlinear machine learning methods
such as kernel principal component analysis, generalized Fisher
discriminant analysis, spectral clustering. We now show that a
negative definite kernel can then define an Euclidean distance
matrix, which has been used in multidimensional scaling (MDS).
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Distance Functions
Distance Matrices
Definition
An m×m matrix D = [drs ] is called a distance matrix if drs = dsr ,
drr = 0 and drs ≥ 0. An m×m distance matrix D = [drs ] is said to
be Euclidean if there exists a configuration of points in some
Euclidean space whose interpoint distances are given by D; that is,
if for some p ∈ N, there exist points z1 , . . . , zm ∈ Rp such that
drs2 = kzr −zs k22 = (zr −zs )T (zr −zs ).
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Distance Functions
Main Result
Theorem
Let D = [drs ] be an m×m distance matrix and define
B = Hm AHm where Hm = Im − m1 1m 1T
m (the centering matrix)
and A = [ars ] with ars = − 12 drs2 . Then D is Euclidean iff B is
positive semidefinite.
Clearly, if A is conditionally positive semidefinite, then Hm AHm is
positive semidefinite. Thus, we directly have the following corollary.
Corollary
Let g (s) be Bernstein on (0, ∞) with
for any
pg (0+) = 0. Then
vector x1 , . . . , xm ∈ Rd , the matrix
g (kxi −xj kαα ) for any
α ∈ {1, 2} is an Euclidean distance matrix.
The Laplace Transform with Applications in Machine Learning
Positive and Negative Definite Kernels
Distance Functions
Examples 11
√
The functions 1+s − 1, log(1+s) and 1 − exp(−s) are Bernstein
on (0, ∞) and equal to zero at the origin. Thus,
p for any distinct
1+kxi −xj k1 − 1
vectors x1 , . . . , xm ∈ Rd , both the matrices
q
and
1+kxi −xj k22 − 1 , log(1+kxi −xj k1 ) and
2
log(1+kxi −xj k2 ) 2, and 1− exp(−kxi −xj k1 ) and
1− exp(−kxi −xj k2 ) are negative semidefinite and nonsingular.
Denote these matrices by A. Then −Hm AHm is an Euclidean
distance matrix. Furthermore, the rank of −Hm AHm is m−1. This
implies that p = m − 1 (the dimension of the Euclidean space).
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Definition
Introduction
We consider the application of completely monotone functions in
scale mixture distributions, which in turn have wide use in applied
and computational statistics. We particularly study scale mixtures
of exponential power (EP) distributions.
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Definition
Exponential Power (EP) Distributions
A univariate random variable X ∈ R is said to follow an EP
distribution if the density is specified by
η −1/q
−1
q (2η) q
1
1
exp(− |x−u|q ),
exp(− |x−u|q ) =
p(X = x) = q+1
1
2η
2 Γ( q )
2η
2 q Γ( q+1 )
q
with η > 0. It is typically assumed that 0 < q ≤ 2. The
distribution is denoted by EP(x|u, η, q). There are two classical
special cases: the Gaussian distribution arises when q = 2 (denoted
N(x|u, η)) and the Laplace distribution arises when q = 1 (denoted
L(x|u, η)). As for the case that 0 < q < 2, the corresponding
density is called a bridge distribution. Moreover, we will only
consider the setting that u = 0.
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Definition
Scale Mixture of EP Distributions
Let τ = 1/(2η) be a random variable with probability density p(τ ).
The marginal distribution of x is
Z ∞
fX (x) =
EP(b|0, 1/(2τ ), q)p(τ )dτ.
0
We call the resulting distribution a scale mixture of exponential
power distributions. This distribution is symmetric about 0 and
can be denoted as φ(|x|q ). Especially, when q = 2, it is a scale
mixture of normal distributions. When q = 1, it is called a scale
mixture of Laplace distributions.
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Definition
Scale Mixture of EP Distributions
Let s = |x|q . Then we can write fX (s) as follows
Z
∞
0
τ 1/q
exp(−τ s)p(τ )dτ ,
2Γ( q+1
q )
which we now denote by φ(s). Clearly, φ(s) is the Laplace
1/q
transform of τ q+1 p(τ ). This implies that φ(s) is completely
2Γ(
q
)
monotone on (0, ∞).
(8)
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Main Result
Theorem
Let X have a density function fX (x), which is a function of |x|q
with 0 < q ≤ 2 (denoted φ(|x|q )). Then φ(|x|q ) can be
represented as a scale mixture of exponential power distributions iff
φ(s) is completely monotone on (0, ∞).
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Examples
The Laplace Distribution
The marginal density of b is thus defined by
√
Z +∞
√
α
exp(− α|b|) =
N(b|0, η)Ga(η|γ, α/2)dη.
2
0
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Examples
The Bridge Distribution
The marginal density of b is thus defined by
Z +∞
p
α
exp(− α|b|) =
L(b|0, η)Ga(η|3/2, α/2)dη.
4
0
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Examples
Generalized t distributions
We t consider a scale mixtures of exponential power EP(b|u, η, q)
with inverse gamma IG(η|τ /2, τ /(2λ)). They are called
generalized t distributions. The density is
τ
1
λ q −( τ2 + q1 )
q Γ( 2 + q ) λ q1 1
+
|b|
2 Γ( τ2 )Γ( q1 ) τ
τ
Z
=
EP(b|0, η, q)IG(η|τ /2, τ /(2λ))dη,
where τ > 0, λ > 0 and q > 0. Clearly, when q = 2 the
generalized t distribution becomes to a t-distribution. Moreover,
when τ = 1, it is the Cauchy distribution.
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Examples
Generalized Double Pareto Distributions
When q = 1, it is also called called the resulting distributions
generalized double Pareto distributions. The densities are given as
follows:
Z ∞
λ λ|b| −(τ /2+1)
1+
=
L(b|0, η)IG(η|τ /2, τ /(2λ))dη, λ > 0, τ > 0.
4
τ
0
Furthermore, consider τ = 1, such that η ∼ IG(η|1/2, 1/(2λ)).
We obtain
λ
p(b) = (1 + λ|b|)−3/2 .
4
The Laplace Transform with Applications in Machine Learning
Scale Mixture Distributions
Examples
EP-GIG
We now develop a family of distributions by mixing the exponential
power EP(b|0, η, q) with the generalized inverse Gaussian
GIG(η|γ, β, α). The marginal density of b is thus defined by
Z +∞
p(b) =
EP(b|0, η, q)GIG(η|γ, β, α)dη.
0
We refer to this distribution as the EP-GIG. The density is
p
K γq−1 ( α(β+|b|q )) α1/(2q)
p(b) = q+1q
[β+|b|q ](γq−1)/(2q) .
√
γ/2
q+1
β
q
2 Γ( q )Kγ ( αβ)
The Laplace Transform with Applications in Machine Learning
Thanks
Thanks
Questions & Comments!
Related documents