Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PAS08 – Statistical hypothesis testing Jan Březina Technical University of Liberec 27. listopadu 2014 Motivation - problem with interval estimate for relative frequency Having data {Xi } from Alt(π) with sample relative frequency p = X. According to Moivre-Laplace theorem: U=p p−π √ π(1 − π) n ≈ N (0, 1) Thus P u(α/2) ≤ U ≤ u(1 − α/2) = 1 − α after straight forward calculations: P p − ∆ < π < p + ∆) = 1 − α but we have to use approximation: r π(1 − π) ∆= u(1 − α/2) n Motivation - problem with interval estimate for relative frequency Having data {Xi } from Alt(π) with sample relative frequency p = X. According to Moivre-Laplace theorem: U=p p−π π(1 − π) √ n ≈ N (0, 1) Thus P u(α/2) ≤ U ≤ u(1 − α/2) = 1 − α after straight forward calculations: P p − ∆ < π < p + ∆) = 1 − α but we have to use approximation: r r π(1 − π) p(1 − p) ∆= u(1 − α/2) ≈ u(1 − α/2) n n Can we avoid this somehow? Hypothesis testing - example Example A political party gained 30% in the last votes, survey after half a year, 59 respondents, 25%. Can we be 80% sure that there is a drop in preferences from the previous votes? null hypothesis (no drop): H0 : π = π0 = 0.3 alternative hypothesis (drop): HA : π < π0 Using interval estimate using one side interval estimate: sample rel. frequency p = 0.25, n = 59 r p(1 − p) u(0.8) = 0.0474, ∆= n π < 0.25 + ∆ = 0.297 < 0.3 with probability 80% we reject (zamı́tneme) H0 in favour of (ve prospěch) HA Direct testing interval estimate use approximation, is therefore INEXACT direct test under assumption of null hypothesis H0 , we assume π = π0 √ 0.3 − 0.25 √ p − π0 n= p 59 = −0.8381 U=p π0 (1 − π0 ) 0.3(1 − 0.3) u(α) = −u(0.8) ≈ −0.8416 < U , can not reject H0 ... but its on the edge General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . 3. Determine distribution function of the statistics. General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . 3. Determine distribution function of the statistics. 4. Select the significance level (hladina významnosti) α giving probability of Type I. error or confidence level 1 − α General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . 3. Determine distribution function of the statistics. 4. Select the significance level (hladina významnosti) α giving probability of Type I. error or confidence level 1 − α 5. Construct critical region C(α) (for rejection), using quantile function (inverse of distribution). General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . 3. Determine distribution function of the statistics. 4. Select the significance level (hladina významnosti) α giving probability of Type I. error or confidence level 1 − α 5. Construct critical region C(α) (for rejection), using quantile function (inverse of distribution). 6. Compute value tobs of test statistic from observation data. General scheme of hypothesis testing (clasical test) 1. State null hypothesis (equilibrium) and alternative hypothesis (two side or one side) about parameter. 2. Select appropriate test - the statistics - function of the data. Consider test assumptions: independence, normality, small variance, . . . 3. Determine distribution function of the statistics. 4. Select the significance level (hladina významnosti) α giving probability of Type I. error or confidence level 1 − α 5. Construct critical region C(α) (for rejection), using quantile function (inverse of distribution). 6. Compute value tobs of test statistic from observation data. 7. Reject H0 if tobs ∈ C(α) in favor HA or do not reject H0 . We construct a statistic (a function of sample vector) such, that its value grows with the parameter in hypothesis. Then the inequality for hypothesis match inequality for critical region. Errors of I. and II. kind error of the first kind - with probability α, WRONG rejection of H0 error of the second kind - wit probability β, do not reject H0 that doesn’t hold, minimize using larger sample or better test (ASSUMPTIONS !!) Power of the test: 1 − β Test about mean value (normal, known σ) Sample {Xi } of size n from normal distribution N (µ, σ 2 ), σ known. Statistic (Z-test): Z= X − µ0 √ n with distribution N (0, 1) σ Example: Reading test in CR: mean 124 points, deviation 12 points. One school, sample of 55 students, we observe sample mean 120 points. Is it 95% significant? H0 : µ = 124, HA : µ < 124 Critical region: Z < Zcrit < u(0.05) = −1.64 Z= . . . reject hypothesis 120 − 124 √ 55 = −2.47 12 p-value test (čistý test významnosti) p-value: smallest level α on which we can reject H0 Our example: FN (0,1) (−2.47) = 0.0068 Comparison of classical and p-value Z-tests HA : µ < µ 0 H0 rejecting for: Zobs < u(α) = FN−1(0,1) (α) resp. p = F (Zobs ) < α HA : µ > µ 0 H0 rejecting for: Zobs > u(1 − α) = FZ−1 (1 − α) resp. p = 1 − F (Zobs ) < α HA : µ 6= µ0 H0 rejecting for: Zobs < FZ−1 (α/2) ∨ FZ−1 (1 − α/2) < Zobs p = 2 min{1 − F (Zobs ), F (Zobs )} < α resp. Error of II. kind, reading test example Probability β of not rejecting H0 for various values of µ. X − µ0 √ n < u(α)|EX = µ 1−β =P σ X − µ√ µ0 − µ √ =P n< n + u(α)|EX = µ σ σ µ − µ√ 0 = FN (0,1) n + u(α) σ Example (reading test): µ0 = 124, Sn = 12, n = 55, α = 0.05, u(α) = −1.64 β(µ) = 1 − F ( 124 − µ √ 55 − 1.64) 12 0.0 0.2 0.4 0.6 0.8 1.0 Error of II. kind, continued 116 118 120 122 124 Test about mean value (normal, unknown σ) t-test, statistic: T = X − µ0 √ n has Student’s distribution tn−1 Sn Example: Measurement of heat conductivity coefficient: 0.62, 0.64, 0.57, 0.61, 0.59, 0.57, 0.62, 0.59 The nominal coefficient should be 0.60, decide if there is a significant deviation (assuming normal data). H0 : µ = 0.60, HA : µ 6= 0.60, two side test X = 0.60125, Sn = 0.02531939, T = 0.1396, p = 0.89 R> 1-pt(0.1396, 7) R> qt(0.975, 7) . . . fail to reject H0 . // 0.446454 // 2.364624 T -test in R > hc=c(0.62, 0.64, 0.57, 0.61, 0.59, 0.57, 0.62, 0.59) > t.test(hc, mu=0.6, conf.level=0.99) One Sample t-test data: hc t = 0.1396, df = 7, p-value = 0.8929 alternative hypothesis: true mean is not equal to 0.6 99 percent confidence interval: 0.5699235 0.6325765 sample estimates: mean of x 0.60125 Test about variance (or deviation) Consider normally distributed sample X1 , . . . , Xn . X= Sn2 (n − 1) has χ2n−1 distribution σ2 Example: Std. deviation in filling beer bottles should not be greater then 0.5 ml, Measurement of bottles in liters: 0.4981, 0.5016, 0.5004, 0.4978, 0.4996, 0.5002, 0.4874, 0.4890, 0.4772, 0.5013, 0.4961. Is the filling device precise enough? H0 : σ = 0.005, HA : σ > 0.005, one side test Fχ−1 2 ,10 (1 − 0.05) = 18.3 Sn = 7.67ml, σ0 = 5ml, X = 23.55 . . . rejecting H0 , p = 1 − F (X) = 0.009 Note: Very sensitive to normality assumption, problematic usage in practice. No dedicated function in R. Test about relative frequency (proportion test) sample X1 , . . . , Xn from Alt(π), sample relative frequency p for np or n(1 − p) > 5 T =p p−π π(1 − π) √ n ≈ N (0, 1) or using F-distribution: FBi(n,π) (s) = Fdf1 ,df2 df (1 − π) 2 , df1 = 2(n − s), df2 = 2(s + 1) df1 π Example: π0 = 0.3, p = 0.25, n = 50, HA : π < π0 s = 0.25 ∗ 50 = 12.5, F75,25 (0.84) = 0.27 . . . can not reject for α < 27%. R> prop.test(0.25*50, n=50, p=0.3, alternative="less", conf.level=0.7)