Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
"Classical" Inference Two simple inference scenarios Question 1: Are we in world A or world B? Possible worlds: World A World B X number added X number added [-.5, .5] 38 38 [4, 6] 38 38 [-1, 1] 68 30 [3, 7] 68 30 [-1.5, 1.5] 87 19 [2, 8] 87 19 [-2, 2] 95 8 [1, 9] 95 8 [-2.5, 2.5] 99 4 [0, 10] 99 4 (- ∞, ∞) 1 (- ∞, ∞) 100 1 100 Jerzy Neyman and Egon Pearson D: Decision in favor of: T: The Truth of the matter: H0: Null Hypothesis H1: Alternative Hypothesis Correct acceptance Type I Error of H0 pr(D=H0| T=H0) H0: Null Hypothesis = (1 – ) Type II Error H1: Alternative Hypothesis pr(D=H0| T=H1) = pr(D=H1| T=H0) = [aka size] Correct acceptance of H1 pr(D=H1| T=H1) = (1 – ) [aka power] Definition. A subset C of the sample space is a best critical region of size α for testing the hypothesis H0 against the hypothesis H1 if (i) Pr[ X 1 ,..., X n C | H 0 ] and for every subset A of the sample space, whenever: (ii) Pr[ X 1 ,..., X n A | H 0 ] we also have: (iii) Pr[ X 1 ,..., X n C | H1 ] Pr[ X 1 ,..., X n A | H1 ] Neyman-Pearson Theorem: Suppose that for for some k > 0: 1. Pr[C | H 0 ] 2. Pr[C | H 0 ] k Pr[C | H1 ] Pr[C | H 0 ] k 3. Pr[C | H1 ] Then C is a best critical region of size α for the test of H0 vs. H1. • When the null and alternative hypotheses are both Normal, the relation between the power of a statistical test (1 – ) and is given by the formula | H H | H 0 (1 ) 1 | q | 0 H1 1 H n 1 is the cdf of N(0,1), and q is the quantile determined by . • fixes the type I error probability, but increasing n reduces the type II error 9 probability Question 2: Does the evidence suggest our world is not like World A? World A X number added [-.5, .5] 38 38 [-1, 1] 68 30 [-1.5, 1.5] 87 19 [-2, 2] 95 8 [-2.5, 2.5] 99 4 (- ∞, ∞) 1 100 Sir Ronald Aymler Fisher Fisherian theory Significance tests: their disjunctive logic, and p-values as evidence: ``[This very low p-value] is amply low enough to exclude at a high level of significance any theory involving a random distribution….. The force with which such a conclusion is supported is logically that of the simple disjunction: Either an exceptionally rare chance has occurred, or the theory of random distribution is not true.'' (Fisher 1959, 39) Fisherian theory ``The meaning of `H' is rejected at level α' is `Either an event of probability α has occurred, or H is false', and our disposition to disbelieve H arises from our disposition to disbelieve in events of small probability.'' (Barnard 1967, 32) • • Fisherian theory: Distinctive features Notice that the actual data x is used to define the event whose significance is evaluated. • Also based on H0 and H1 Can only reject H0, evidence cannot allow one to accept H0. • Many other theories besides H0 could also explain the data. • Common philosophical simplification: • Hypothesis space given qualitatively; • H0 vs. –H0, • Murderer was Professor Plum, Colonel Mustard, Miss Scarlett, or Mrs. Peacock • More typical situation: • Very strong structural assumptions • Hypothesis space given by unknown numeric `parameters' • Test uses: • a transformation of the raw data, • a probability distribution for this transformation (≠ the original distribution of interest) Three Commonly Used Facts • Assume {X1,..., X n } is a collection of independent and identically distributed (i.i.d.) random variables. • Assume also that the X s share a mean of i μ and a standard deviation of σ. Three Commonly Used Facts n 1 For the mean estimator X X i : n i1 1. E[X ] E[X] 1 2. Var[X ] Var[X] n n 2 Three Commonly Used Facts The Central Limit Theorem. If {X1,…, Xn} are i.i.d. random variables from a distribution with mean and variance 2, then: 3. 1 lim n n Xi ~ N(0,1) i 1 n Equivalently: lim n X X ~ N(0,1) Examples • Data: January 2012 CPS • Sample: PhD’s, working full time, age 2834 • H0: mean income is 75k 21996.00 89999.52 119999.9 40999.92 67600.00 68640.00 96999.76 77296.96 65000.00 71999.72 100100.0 45999.72 149999.7 19968.00 10140.00 37999.52 74999.60 69992.00 31740.80 65000.00 57512.00 87984.00 35999.60 38939.68 99999.64 74999.60 149999.7 47996.00 62920.00 62920.00 54999.88 104000.0 8 Series: WAGES Sample 1 13566 IF PRTAGE > 27 AND PRTAGE < 35 AND PEEDUCA = 46 AND PEHRACTT > 39 Observations 32 7 6 5 4 3 2 1 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 68898.16 66300.00 149999.7 10140.00 33707.49 0.624378 3.253256 Jarque-Bera Probability 2.164708 0.338797 0 40000 Hyp. H0 80000 Value -1.024022 120000 Probability 0.3138 Comments • The background conditions (e.g., the i.i.d. condition behind the sample) are a clear example of `Quine-Duhem’ conditions. • When background conditions are met, ``large samples’’ don’t make inferences ``more certain’’ • Multiple tests • Monitoring or ``peeking'‘ at data, etc. Point estimates and Confidence Intervals • Many desiderata of an estimator: • • • • • • • Consistent Maximum Likelihood Unbiased Sufficient Minimum variance Minimum MSE (mean squared error) (most) efficient • By CLT: approximately: Pr[2 • Thus: • By algebra: • So: X X X X ~ N(0,1) 2] .95 Pr[X 2 X X 2 X ] .95 1 Pr[ X 2] .95 n Interpreting confidence intervals 1 Pr[ X 2] .95 n • The only probabilistic component that determines what occurs is X . • Everything else are constants. • Simulations, examples • Question: Why ``center’’ the interval? Confidence Intervals • $68,898.16 ± $12,152.85 • ``C.I. = mean ± m.o.e’’ • = ($56,745.32 , $81,051.01) Using similar logic, but different computing formulae, one can extend these methods to address further questions e.g., for standard deviations, equality of means across groups, etc. Equality of Means: BAs Sex 1 2 All Count 223 209 432 Value 4.424943 Mean 63619.54 51395.43 57705.56 Std. Dev. 31370.01 25530.66 29306.13 Probability 0.0000 Equality of Means: PhDs Sex Count Mean 1 21 2 11 All 32 Value -0.560745 Std. Dev. 66452.71 73566.76 36139.78 29555.10 68898.16 33707.49 Probability 0.5791