Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 8 1 Poisson Limit Theorem Lindeberg’s Theorem explains why the Gaussian distribution is ubiquitous: Whenever we take the sum of a large number of independent random variables Sn := Xn,1 + Xn,2 + · · · + Xn,n , centered and normalized so that Sn has mean zero and variance 1, the distribution of Sn will be close to the standard normal, provided that Xn,1 , . . . , Xn,n are uniformly small in the sense that they satisfy Lindeberg’s condition: ∀ > 0, The typical example is Xn,i lim n→∞ n X 2 E[Xn,i 1{|Xn,i |>} ] = 0. i=1 √ = Xi / n for a sequence of i.i.d. random variables (Xi )i∈N . Exercise 1.1 Show that the following Lyapunov’s condition implies Lindeberg’s condition: lim n→∞ n X E[|Xn,i |p ] = 0 for some p > 2. i=1 Another distribution, which is as ubiquitous as the Gaussian distribution, is the Poisson distribution. Recall that the Poisson distribution with parameter λ > 0 is a probability n measure µ on the non-negative integers with µ(n) = e−λ λn! for n ∈ {0} ∪ N. It has mean λ, it variance also λ, and characteristic function φ(t) = e−λ(1−e ) . As the next theorem will show, the Poisson distribution typically arises as the limit of the sum of indicators of independent events, each occurring with a small probability. Theorem 1.2 [Poisson Limit Theorem] Let (Xn,i )1≤i≤n be a triangular array of independent {0, 1}-valued random variables, with pn,i := P(Xn,i = 1) = 1 − P(Xn,i = 0). Suppose P P that for some λ > 0, ni=1 pn,i → λ and max1≤i≤n pn,i → 0 as n → ∞, then Sn := ni=1 Xn,i converges in distribution to a Poisson random variable Z with parameter λ. Proof. It suffices to show that the characteristic function converges, i.e., E[eitSn ] → e−λ(1−e as n → ∞. Note that by Taylor expansion, log E[eitSn ] = log n Y j=1 E[eitXn,j ] = n X it ) n X log 1 − pn,j + pn,j eit = − pn,j (1 − eit ) + O(p2n,j ) , j=1 j=1 which converges to −λ(1 − eit ) as n → ∞ by our assumptions on pn,j . Theorem 1.2 can be readily extended to Corollary 1.3 Let (Xn,i )1≤i≤n be a triangular array of independent N0 := {0} ∪ N-valued random variables, with pn,i := P(Xn,i = 1) and n,i := P(Xn,i ≥ 2). Suppose that (i) Pn i=1 pn,i → λ for some λ > 0, 1 (ii) max1≤i≤n pn,i → 0, Pn (iii) i=1 n,i → 0 as n → ∞, then Sn := Pn i=1 Xn,i converges in distribution to a Poisson random variable Z with mean λ. Proof. Let X̃n,i := Xn,i 1{Xn,i ≤1} . Then (X̃n,i )1≤i≤n satisfies the conditions in Theorem 1.2, P and hence S̃n := ni=1 X̃n,i converges in distribution to a Poisson random variable Z with parameter λ. On the other hand P(Sn 6= S̃n ) = P n X n n X X 1{Xn,i ≥2} 6= 0 ≤ P(Xn,i ≥ 2) = n,i −→ 0. i=1 i=1 i=1 n→∞ Therefore Sn − S̃n converges in probability to 0, and hence Sn also converges in distribution to Z. Let us consider a few examples which explain why the Poisson distribution is universal. Example 1.4 If there are 400 students in a class, then it is not unreasonable to model their birthdays by i.i.d. random variables, each chosen uniformly among the 365 days of the year. The expected number of students whose birthday fall on the day of the final exam is λ := 400/365 ≈ 1.096, and the distribution of the number of such students is approximately Poisson with mean 1.096. In particular, the probability of having no student with birthday on the day of the final exam is approximately e−λ = e−1.096 ≈ 0.334. Example 1.5 [Counting Rare Events] Let (Xi )i∈N be i.i.d. real-valued random variables with distribution µ(dx) = f (x)dx for some density f . Let an be an increasing sequence chosen such that P(X1 > an ) = λ/n. Then by Theorem 1.2, the number of Xi , 1 ≤ i ≤ n, which exceeds an , converges in distribution to a Poisson random variable with parameter λ. The general principle is that, among a large collection of independent events, if a notion of rare event is defined in such a way that the expected number of rare events is of order 1, then the number of rare events will follow approximately a Poisson distribution. Example 1.6 [Customers Arriving in a Queue] Let N (s, t) be the number of customers arriving in a queue (at a bank or store) during the time interval (s, t]. Let us make the following assumptions: (i) The number of arrivals in disjoint time intervals are independent, (ii) the law of N (s, t) depends only on t − s, (iii) P(N (0, h) = 1) = λh + o(h), i.e., limh↓0 (iv) P(N (0, h) ≥ 2) = o(h), i.e., limh↓0 P(N (0,h)=1) h P(N (0,h)≥2) h = λ, = 0. P jt dist Then for each t > 0, N (0, t) = nj=1 Xn,j with Xn,j := N (j−1)t = N (0, t/n], which n , n Pn satisfies the conditions in Corollary 1.3 with j=1 P(N (0, t/n] = 1) → λt, and hence N (0, t) is distributed as a Poisson random variable with mean λt. In fact the family of random variables (N (0, t))t≥0 defines a so-called Poisson process. 2 2 Poisson Process Definition 2.1 [Poisson Process] A family of random variables (Nt )t≥0 is called a Poisson process with rate λ, if (i) N0 = 0, (ii) For t0 = 0 < t1 < · · · < tm , (Ntk − Ntk−1 )1≤k≤m are independent random variables, (iii) For any 0 ≤ s < t, Nt − Ns is a Poisson random variable with mean λ(t − s), (iv) Almost surely, Nt is right continuous in t ≥ 0. Conditions (i)–(iii) determine the finite-dimensional distributions of (Nt )t≥0 , and hence the distribution of the joint realizations of Nt for all rational t; condition (iv) then uniquely determines the definition of Nt for irrational t. An immediate consequence of the definition of the Poisson process is that it is Markov, namely that for any t0 ≥ 0, (Nt0 +t − Nt0 )t≥0 is again a Poisson process, which is in particular independent of (Nt )0≤t≤t0 . We can interpret Nt as the number of customers that have arrived in a queue by time t. We show next that almost surely, no two customers arrive at the same time. More precisely, let Nt+ := limh↓0 Nt+h , and let Nt− = limh↓0 Nt−h . Then we claim that almost surely, Nt+ − Nt− ∈ {0, 1} for all t ≥ 0. Indeed, for any deterministic s ≥ 0, P(Ns+ − Ns− ≥ 1) ≤ lim P(Ns+h − Ns−h ≥ 1) = 0, h↓0 and hence a.s., N(qt)+ − N(qt)− = 0 for all rational q. Therefore modulo a set of probability 0, {Ns+ − Ns− ≥ 2 for some 0 ≤ s ≤ t} ⊂ {Njt/n − N(j−1)t/n ≥ 2 for some 1 ≤ j ≤ n} ∀ n ∈ N, and hence P(Ns+ − Ns− ≥ 2 for some 0 ≤ s ≤ t) ≤ P(Njt/n − N(j−1)t/n ≥ 2 for some 1 ≤ j ≤ n) ≤ nP(Nt/n − N0 ≥ 2) ∞ X λk tk = ne−λt/n −→ 0. k!nk n→∞ k=2 This shows that almost surely no two customers arrive at the same time. Therefore an alternative way of characterizing the process (Nt )t≥0 is to identify the distribution of the set of times s with Ns+ − Ns− = 1, i.e., the times at which a customer arrives. If ξ1 denotes the arrival time of the first customer, then we note that P(ξ1 > t) = P(Nt = 0) = e−λt , which is an exponential distribution with mean 1/λ. It is then natural to conjecture that the difference between the consecutive arrival times of customers are i.i.d. exponentially distributed with mean 1/λ. The next result shows this to be indeed the case. Theorem 2.2 [Construction of a Poisson Process] Let (ξi )i∈N be i.i.d. exponential ranP dom variables with mean 1/λ, i.e., P(ξ1 > t) = e−λt for t ≥ 0. Let T0 := 0 and Tn := ni=1 ξn for n ∈ N, and let Nt := max{n ≥ 0 : Tn ≤ t}. Then (Nt )t≥0 is a rate λ Poisson process as defined in Definition 2.1. Proof. From the definition of Nt , it is clear that N0 = 0 and Nt is a.s. right continuous. Instead of verifying Definition 2.1 (ii) and (iii) through a direct calculation, we will approximate 3 Nt by a discrete time counting process, for which the two equivalent ways of characterizing the process is self-evident, i.e., either characterized via the finite-dimensional distributions as in Definition 2.1, or via the inter-arrival time of customers as in Theorem 2.2. (n) (n) Given n ∈ N, let (Xi )i∈N be i.i.d. Bernoulli random variables with P(X1 = 1) = P (n) (n) (n) (n) 1−P(X1 = 0) = λ/n, and Xi serves to approximate Ni/n −N(i−1)/n . Let Sj = ji=1 Xi , (n) (n) and for t ≥ 0, define Snt := Sbntc , which we will show to converge to Nt as n → ∞. (n) Let τ0 (n) (τi − (n) := 0, and for k ∈ N, let τk (n) τi−1 )i∈N (n) (n) := min{i > τk−1 : Xi = 1}. It is clear that are i.i.d. geometrically distributed random variables with mean λ−1 n, and (n) P(τ1 λ j−1 λ = j) = 1 − , n n j ∈ N. In particular, for any t > 0, (n) P(τ1 ≥ nt) = ∞ X 1− j=dnte (n) τi λ j−1 λ λ dnte−1 = 1− −→ e−λt = P(ξ1 ≥ t). n→∞ n n n (n) −τi−1 n i∈N converge jointly in distribution to (ξi )i∈N . Using Skorohod’s (n) (n) τ −τ representation theorem for weak convergence, we can couple i n i−1 i∈N and (ξi )i∈N on In other words, (n) the same probability space such that a.s. τi (n) −τi−1 n → ξi for each i ∈ N. For any fixed (n) (n) 0 = t0 < t1 < · · · < tm , by our construction of Nt and Snt , it then follows that Sntk → Ntk (n) (n) a.s. for each 1 ≤ k ≤ m, and hence (Sntk − Sntk−1 )1≤k≤m converges in joint distribution to (Ntk − Ntk−1 )1≤k≤m . On the other hand, given 0 = t0 < t1 < · · · < tm , when n is sufficiently large, we (n) (n) have bntk−1 c < bntk c for all 1 ≤ k ≤ m, which makes (Sntk − Sntk−1 )1≤k≤m independent random variables, and hence their joint weak limit (Ntk − Ntk−1 )1≤k≤m are also independent Pbntc (n) (n) (n) random variables. Furthermore, for each s < t, Snt − Sns = i=bnsc+1 Xi converges in distribution to a Poisson random variable with mean λ(t − s) by Theorem 1.2, which must be the distribution of Nt − Ns . Therefore (Nt )t≥0 satisfies the conditions in Definition 2.1 and is a Poisson process. Exercise 2.3 The fact that for any t0 > 0, (Nt0 +t − Nt0 )t≥0 is a Poisson process independent of (Nt )0≤t≤t0 , and the construction of the Poisson process (Nt )t≥0 from i.i.d. exponential random variables as done in Theorem 2.2 implicitly shows that the exponential distribution has a memoryless property. Namely, if ξ is an exponential random variable with mean 1/λ, then P(ξ > t + s|ξ > t) = P(ξ > s) for all s, t > 0. Prove this fact directly. Exercise 2.4 If ξ1 and ξ2 are two independent exponential random variables with mean 1/λ1 and 1/λ2 respectively, then prove that ξ := min{ξ1 , ξ2 } is also an exponential random variable, with mean 1/(λ1 + λ2 ). The memoryless property of the exponential distribution makes it an essential tool in the construction of continuous time Markov processes with a discrete state space. The Markov property states that given the present state of the process, the law of the future and the past are independent. It is then necessary that the time the process has to wait before jumping away from its present location is exponentially distributed. 4 3 Poisson Point Process An alternative way to think about a Poisson process (Nt )t≥0 is to identify it with a locally finite measure Ξ on [0, ∞), with X Ξ(dx) := δτi (dx), i∈N where τ1 < τ2 < · · · are the times at which Nt makes a jump, and δz (dx) denotes a delta measure at position z. Such an interpretation allows a natural extension of a Poisson process, which can be regarded as a random measure on [0, ∞), to random meausres on more general spaces (in particular, Polish spaces) called Poisson Point Processes. We state below a theorem on the characterization of a general Poisson point process on a locally compact Polish space, where local compactness refers to the property that any x ∈ S is contained in some open set U , whose closure is compact. The notions we will need is M(S), the space of Radon measures on (S, S), i.e., if µ ∈ M(S), then µ(K) < ∞ for all compact v sets K. The topology on M(S) is the so-called vague topology, i.e., Rµn ⇒ µ in M(S) if and R only if for all continuous f : S → R with compact support, f dµn → f dµ. We remark that the difference between weak convergence and vague convergence is that, M(S) may admit v infinite measures, and even if µn , µ ∈ M(S) are finite measures and µn ⇒ µ, mass of µn may escape to infinity so that µ(S) < lim inf n→∞ µn (S), which is not possible under weak convergence. Theorem 3.1 [Poisson Point Processes (PPP)] Let S be a locally compact Polish space with Borel σ-algebra S. For any µ ∈ M(S) (called mean measure or intensity measure), there is a unique (in law) M(S)-valued random variable Ξ, called a Poisson Point Process with mean measure µ, which satisfies the following properties: (i) For any disjoint relatively compact sets A1 , . . . , An ∈ S, (Ξ(Ai ))1≤i≤n are independent. (ii) For each relatively compact A ∈ S, Ξ(A) is a Poisson random variable with mean µ(A). Note that when S = Rd , all bounded sets are relatively compact, and hence it suffices to consider bounded Borel measurable sets in Theorem 3.1 (i) and (ii). We call Ξ a point process because almost surely, Ξ is a collection of delta measures with positive integer mass, i.e., for any A ∈ S, Ξ(A) ∈ N ∪ {0, ∞}. For a proof of Theorem 3.1, see e.g. [3, Chapter 24]. Remark 3.2 [Non-locally Compact Polish Spaces] If S is only assumed to be a Polish space, not necessarily locally compact, then we need to change M(S) from the space of Radon measures to the space M# (S) of locally finite measures, i.e., µ(B) < ∞ for any bounded open ball B with a finite radius. The vague topology on M(S) should be replaced by the so-called R R w# w# (weak-hash) topology on M# (S), where µn ⇒ µ if f dµn → f dµ for all bounded continuous f : S → R with bounded support (see [1, Chapter A 2.6]). Standard references on random measures consider random Radon measures on a locally compact Polish space. For a general reference on M# (S)-valued random variables, see [1, 2]. Poisson point processes provide the fundamental building block in the construction of many stochastic objects, including infinitely divisible distributions, Lévy processes, excursions of a Markov process from a point, extreme order statistics, etc. We consider here the simplest example. 5 Example 3.3 [Compound Poisson Distribution] Let Ξ be a PPP on [0, ∞) × R with mean measure λdt × µ(dx) for a probability measure µ on R. Note that Nt := Ξ([0, t] × R) is in fact a Poisson process with rate λ. Let τ1 < τ2 < · · · denote the successive times at which Nt makes a jump. Then Ξ = {(τi , ξi ) : i ∈ N} for some ξi ∈ R. Conditioned on the realization of τ1 < τ2 < · · · , it can be seen that (ξi )i∈N are i.i.d. random variables with distribution µ. Thus we obtain a representation for the compound Poisson random variable X ξi , X := i:τi ≤1 in terms of the underlying PPP Ξ. When µ is the delta measure at 1, X is just a Poisson random variable with mean λ. Analogous to the definition of a Poisson process, if we define X ξi , (3.1) Xt := i:τi ≤t then (Xt )t≥0 is a compound Poisson process. When µ is a finite measure with total mass A > 0, we can rewrite the mean measure λdt × µ(dx) as Aλdt × µ̃(dx), with µ̃(dx) = A−1 µ(dx) being the normalized probability measure. We are thus still in the compound Poisson setting. Interestingly, for a large family of infinite measures µ, it is still possible to define Xt in (3.1), even though the sum is over infinitely many terms for each t > 0. Together with Brownian motion, such Poisson point process constructions are the building blocks in the construction of Lévy processes, which are defined to be Rd -valued stochastic processes (Xt )t≥0 with independent and stationary increments, i.e., for 0 = t0 < t1 < · · · < tn , (Xti − Xti−1 )1≤i≤n are independent, and the distribution of Xti − Xti−1 depends only on ti − ti−1 . References [1] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume I: Elementary Theory and Methods, 2nd edition, Springer, 2003. [2] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd edition, Springer, 2008. [3] A. Klenke. Probability Theory–A Comprehensive Course, Springer-Verlag. 6