Download Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Stochastic convergence. 1 Organizational Issues • Lectures: Tue/Thu 11am–12:30pm, 332 Evans. • Peter Bartlett. bartlett@stat. Office hours: Tue 1-2pm, Wed 1:30-2:30pm (Evans 399). • GSI: Siqi Wu. siqi@stat. Office hours: Mon 3:30-4:30pm, Tue 3:30-4:30pm (Evans 307). • http://www.stat.berkeley.edu/∼bartlett/courses/210b-spring2013/ Check it for announcements, homework assignments, ... • Texts: Asymptotic Statistics, Aad van der Vaart. Cambridge. 1998. Convergence of Stochastic Processes, David Pollard. Springer. 1984. Available on-line at http://www.stat.yale.edu/∼pollard/1984book/ 2 Organizational Issues • Assessment: Homework Assignments (60%): posted on the website. Final Exam (40%): scheduled for Thursday, 5/16/13, 8-11am. • Required background: Stat 210A, and either Stat 205A or Stat 204. 3 Asymptotics: Why? Example: We have a sample of size n from a density pθ . Some estimator gives θ̂n . • Consistent? i.e., θ̂n → θ? Stochastic convergence. • Rate? Is it optimal? Often no finite sample optimality results. Asymptotically optimal? • Variance of estimate? Optimal? Asymptotically? • Distribution of estimate? Confidence region. Asymptotically? 4 Asymptotics: Approximate confidence regions Example: We have a sample of size n from a density pθ . Maximum likelihood estimator gives θ̂n . √ −1 Under mild conditions, n θ̂n − θ is asymptotically N 0, Iθ . Thus √ 1/2 nIθ (θ̂n − θ) ∼ N (0, I), and n(θ̂n − θ)T Iθ (θ̂n − θ) ∼ χ2 (k). So we have an approximate 1 − α confidence region for θ: ( ) 2 χk,α T θ : (θ − θ̂n ) Iθ̂n (θ − θ̂n ) ≤ . n 5 Overview of the Course 1. Tools for consistency, rates, asymptotic distributions: • Stochastic convergence. • Concentration inequalities. • Projections. • U-statistics. • Delta method. 2. Tools for richer settings (eg: function space vs Rk ) • Uniform laws of large numbers. • Empirical process theory. • Metric entropy. • Functional delta method. 6 3. Tools for asymptotics of likelihood ratios: • Contiguity. • Local asymptotic normality. 4. Asymptotic optimality: • Efficiency of estimators. • Efficiency of tests. 5. Applications: • Nonparametric regression. • Nonparametric density estimation. • M-estimators. • Bootstrap estimators. 7 Convergence in Distribution X1 , X2 , . . . , X are random vectors, Definition: Xn converges in distribution (or weakly converges) to X (written Xn X) means that their distribution functions satisfy Fn (x) → F (x) at all continuity points of F . 8 Review: Other Types of Convergence d is a distance on Rk (for which the Borel σ-algebra is the usual one). as Definition: Xn converges almost surely to X (written Xn → X) means that d(Xn , X) → 0 a.s. P Definition: Xn converges in probability to X (written Xn → X) means that, for all ǫ > 0, P (d(Xn , X) > ǫ) → 0. 9 Review: Other Types of Convergence Theorem: as P Xn → X =⇒ Xn → X =⇒ Xn P Xn → c ⇐⇒ Xn as X, c. P NB: For Xn → X and Xn → X, Xn and X must be functions on the sample space of the same probability space. But not convergence in distribution. 10 Convergence in Distribution: Equivalent Definitions Theorem: [Portmanteau] The following are equivalent: 1. P (Xn ≤ x) → P (X ≤ x) for all continuity points x of P (X ≤ ·). 2. Ef (Xn ) → Ef (X) for all bounded, continuous f . 3. Ef (Xn ) → Ef (X) for all bounded, Lipschitz f . T T 4. Eeit Xn → Eeit X for all t ∈ Rk . (Lévy’s Continuity Theorem) 5. for all t ∈ Rk , tT Xn tT X. (Cramér-Wold Device) 6. lim inf Ef (Xn ) ≥ Ef (X) for all nonnegative, continuous f . 7. lim inf P (Xn ∈ U ) ≥ P (X ∈ U ) for all open U . 8. lim sup P (Xn ∈ F ) ≤ P (X ∈ F ) for all closed F . 9. P (Xn ∈ B) → P (X ∈ B) for all continuity sets B (i.e., P (X ∈ ∂B) = 0). 11 Convergence in Distribution: Equivalent Definitions Example: [Why do we need continuity?] Consider f (x) = 1[x > 0], Xn = 1/n. Then Xn → 0, f (x) → 1, but f (0) = 0. [Why do we need boundedness?] Consider f (x) = x,  n w.p. 1/n, Xn = 0 w.p. 1 − 1/n. Then Xn 0, Ef (Xn ) → 1, but f (0) = 0. 12 Relating Convergence Properties Theorem: Xn P X and d(Xn , Yn ) → 0 =⇒ Yn Xn X and Yn P X, c =⇒ (Xn , Yn ) P (X, c), P Xn → X and Yn → Y =⇒ (Xn , Yn ) → (X, Y ). 13 Relating Convergence Properties Example: NB: NOT Xn X and Yn Y =⇒ (Xn , Yn ) (X, Y ). (joint convergence versus marginal convergence in distribution) Consider X, Y independent N (0, 1), Xn ∼ N (0, 1), Yn = −Xn . Then Xn X, Yn Y , but (Xn , Yn ) (X, −X), which has a very different distribution from that of (X, Y ). 14 Relating Convergence Properties: Continuous Mapping Suppose f : Rk → Rm is “almost surely continuous” (i.e., for some S with P (X ∈ S)=1, f is continuous on S). Theorem: [Continuous mapping] Xn X =⇒ f (Xn ) f (X). P P as as Xn → X =⇒ f (Xn ) → f (X). Xn → X =⇒ f (Xn ) → f (X). 15 Relating Convergence Properties: Continuous Mapping Example: For X1 , . . . , Xn i.i.d. mean µ, variance σ 2 , we have √ n (X̄n − µ) N (0, 1). σ So n 2 ( X̄ − µ) n σ2 (N (0, 1))2 = χ21 . Example: We also have X̄n − µ 0 hence (X̄n − µ)2 0. Consider f (x) = 1[x > 0]. Then f ((X̄n − µ)2 ) 1 6= f (0). (The problem is that f is not continuous at 0, and PX (0) > 0, for X satisfying (X̄n − µ)2 X.) 16 Relating Convergence Properties: Slutsky’s Lemma Theorem: Xn X and Yn c imply X n + Yn Yn X n Yn−1 Xn (Why does Xn X and Yn X + c, cX, c−1 X. Y not imply Xn + Yn 17 X + Y ?) Relating Convergence Properties: Examples Theorem: For i.i.d. Yt with EY1 = µ, EY12 = σ 2 < ∞, √ Ȳn − µ n Sn N (0, 1), where Ȳn = n−1 n X Yi , i=1 Sn2 = (n − 1)−1 18 n X i=1 (Yi − Ȳn )2 . Proof:    2    X  n     n 1 2 2   Yi −  Ȳn   Sn =   |{z} n − 1 n   | {z }  i=1 P  | {z } →EY1 P →1 P →EY12 (weak law of large numbers) P → EY12 − (EY1 )2 (continuous mapping theorem, Slutsky’s Lemma) = σ2 . 19 Also √ 1 n Ȳn − µ Sn | {z } |{z} N (0,σ 2 ) P →1/σ (central limit theorem) N (0, 1) (continuous mapping theorem, Slutsky’s Lemma) 20 Showing Convergence in Distribution Recall that the characteristic function demonstrates weak convergence: itT X itT Xn → Ee for all t ∈ Rk . Xn X ⇐⇒ Ee Theorem: [Lévy’s Continuity Theorem] itT Xn → φ(t) for all t in Rk , and φ : Rk → C is continuous at 0, If Ee itT X then Xn X, where Ee = φ(t). Special case: Xn = Y . So X, Y have same distribution iff φX = φY . 21 Showing Convergence in Distribution Theorem: [Weak law of large numbers] P Suppose X1 , . . . , Xn are i.i.d. Then X̄n → µ iff φ′X1 (0) = iµ. Proof: P We’ll show that φ′X1 (0) = iµ implies X̄n → µ. Indeed, EeitX̄n = φn (t/n) = (1 + tiµ/n + o(1/n))n → |{z} eitµ . =φµ (t) Lévy’s Theorem implies X̄n P µ, hence X̄n → µ. 22 Showing Convergence in Distribution e.g., X ∼ N (µ, Σ) has characteristic function itT X φX (t) = Ee itT µ−tT Σt/2 =e . Theorem: [Central limit theorem] √ 2 Suppose X1 , . . . , Xn are i.i.d., EX1 = 0, EX1 = 1. Then nX̄n N (0, 1). 23 Proof: φX1 (0) = 1, φ′X1 (0) = iEX1 = 0, φ′′X1 (0) = i2 EX12 = −1. √ it nX̄n Ee √ = φ (t/ n) n 2 2 = 1 + 0 − t EY /(2n) + o(1/n) → e−t 2 /2 = φN (0,1) (t). 24 n Uniformly tight Definition: X is tight means that for all ǫ > 0 there is an M for which P (kXk > M ) < ǫ. {Xn } is uniformly tight (or bounded in probability) means that for all ǫ > 0 there is an M for which sup P (kXn k > M ) < ǫ. n (so there is a compact set that contains each Xn with high probability.) 25 Notation: Uniformly tight Theorem: [Prohorov’s Theorem] 1. Xn X implies {Xn } is uniformly tight. 2. {Xn } uniformly tight implies that for some X and some subsequence, X. Xnj 26 Notation for rates: oP , OP Definition: P Xn = oP (1) ⇐⇒Xn → 0, Xn = oP (Rn ) ⇐⇒Xn = Yn Rn and Yn = oP (1). Xn = OP (1) ⇐⇒Xn uniformly tight Xn = OP (Rn ) ⇐⇒Xn = Yn Rn and Yn = OP (1). (i.e., oP , OP specify rates of growth of a sequence. oP means strictly slower (sequence Yn converges in probability to zero). OP means within some constant (sequence Yn lies in a ball). 27 Relations between rates oP (1) + oP (1) = oP (1). oP (1) + OP (1) = OP (1). oP (1)OP (1) = oP (1). (1 + oP (1))−1 = OP (1). oP (OP (1)) = oP (1). P Xn → 0, R(h) = o(khkp ) =⇒ R(Xn ) = oP (kXn kp ). P Xn → 0, R(h) = O(khkp ) =⇒ R(Xn ) = OP (kXn kp ). 28

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Theoretical Statistics. Lecture 1.