Download CROPS545_04_Inferenc..

Statistical Genomics Lecture 4: Statistical inference Zhiwu Zhang Washington State University Administration  Homework1, due Feb 1, Wednesday, 3:10PM Outline        X2 test on contingency table Empirical null distribution X2 test on variance t test Hypothesis test two types of error Power Observed and expected frequency Transgenetic Non transgenetic SUM Herbicide 35 5 40 No herbicide 35 25 60 SUM 70 30 100 Transgenetic Non transgenetic SUM Herbicide 28 12 40 No herbicide 42 18 60 SUM 70 30 100 Approximate Distributions     Poisson distribution: Mean=Var=Expected (Observed-Expected)/Sqrt(Expected) ~ N(0,1) SUM(Observed-Expected)2/ Expected ~ X2(df) df=number of independent cells Observed and expected frequency Transgenetic Non transgenetic SUM Herbicide 35 5 40 No herbicide 35 25 60 SUM 70 30 100 Transgenetic Non transgenetic SUM Herbicide 28 12 40 No herbicide 42 18 60 SUM 70 30 100 49/28+49/12+49/42+49/18=9.72 Distribution of x2(1) 0 0 2000 4000 6000 99% percentile 6.97 0.6 0.4 8000 10000 0 2 4 6 8 10 12 1.0 N = 10000 ecdf(x) Bandwidth = 0.1299 0.6 0.4 Observed 9.72 P<1% 0.2 3000 Fn(x) 5000 0.8 7000 Index of x Histogram 0.0 0 1000 Frequency x=rchisq(k,1) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.99) 0.0 2 0.2 4 6 x par(mfrow=c(2,2),mar = c(3,4,1,1)) Density 8 0.8 10 12 1.0 density.default(x = x) 0 2 4 6 8 10 12 0 5 10 Tests on samples  A sample has mean of 103.6 and variance of 27.82  The sample has 10 observations  Q1: What is the probability that the sample was from a normal distribution with variance of 25?  Q2: What is the probability that the sample was from a normal distribution with mean of 100? Q1: distribution with variance of 25  Empirical solution:  Sample ten observations from a normal distribution with variance of 25.  Calculate observed variance.  Repeat the sampling and get null distribution of the sample variances  Find percentile of observed variance on the null distribution Q1: distribution with variance of 25 2000 4000 6000 8000 0 20 40 60 80 N = 10000 ecdf(x) Bandwidth = 1.642 0.6 Observed 27.82 P>25% 0.0 0.2 0.4 Fn(x) 1000 0.8 1500 1.0 Index of x Histogram 0 > length(x[x>27.82])/10000 [1] 0.3516 75% percentile 31.6 0.01 0.00 20 0 0 500 Frequency par(mfrow=c(2,2),mar = c(3,4,1,1)) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.75) 0.02 Density 60 40 x x=replicate(10000, {s=rnorm(10,0,5) var=var(s) }) 0.03 80 density.default(x = x) 0 20 40 60 80 0 20 40 60 80 100 Q1: distribution with variance of 25  Theoretical solution: v=(10-1)*27.82/25=10.026 > 1-pchisq(10.026,9) [1] 0.3483845 vs. 0.3516 from empirical Q2: distribution with mean of 100  Empirical solution      Sample ten observations from N(100, 25) Calculate mean Repeat the process 10,000 times Null distribution of of the 10,000 means Determine the percentile of testing mean (103.6) on the null distribution 99% percentile 102.6 95% percentile 102.6 0.20 4000 6000 8000 10000 95 1.0 Index of x Histogram 0.8 0.6 Fn(x) 1500 1000 100 105 N = 10000 ecdf(x) Bandwidth = 0.2281 Observed 103.6 1%<P<5% 0 0.0 0.2 500 Frequency > length(x[x>103.6])/10000 [1] 0.0132 0.15 0.05 0.00 2000 0.4 0 2000 par(mfrow=c(2,2),mar = c(3,4,1,1)) d=density(x) plot(x) plot(d) hist(x) plot(ecdf(x)) quantile(x,.95) quantile(x,.99) density.default(x = x) 0.10 Density 98 100 96 94 x=replicate(10000, {s=rnorm(10,100,5) m=mean(s) }) x 104 0.25 Q2: distribution with mean of 100 95 100 105 95 100 105 t test Let Z ~ N (0,1) V ~ c k2 Z,V independent Z Define: T = V /k Application: X1,..., X n ~ iid N ( m, s 2 ) æ X -mö X -m Z= = nç ÷ ~ N(0,1) s/ n è s ø V= T= (n -1)S 2 s 2 2 ~ c n-1 æ X -mö nç ÷ è s ø (n -1)S 2 s2 (n -1) Z,V Independent æX -mö æ X -m ö = nç ÷ ÷=ç è S ø èS n ø t test æ X -m ö T =ç ÷ èS n ø T=(103.6-100)/(5/sqrt(10)) P=1-pt(T,9) c(T,P) 2.27683992 0.02440704 Under 5% of threshold, reject the hypothesis that the sample was from a distribution with mean of 100 Hypothesis test      Null hypothesis (H0): Initial assumption Alternative hypothesis (Ha): Opposite to the assumption Find the probability of H0 If the probability is too low (e.g. 5%), reject Ho and accept Ha Otherwise, accept Ho Two types of errors and power  Type I error: Reject true H0, False positive, the probability is the threshold used, e.g. α=5%  Type II error: Accept false H0, false negative, β  Power: Probability to reject false H0, (1-β) Summary Test H0 is True Ho is False Positive (reject H0) False positive Type I: α Power=1-β Negative (Accept H0) Specificity=1-α False negative Type II: β Sum 100% 100% Highlight        X2 test on contingency table Empirical null distribution X2 test on variance t test Hypothesis test two types of error Power

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CROPS545_04_Inferenc..