Download CROPS545_04_Inferenc..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical Genomics
Lecture 4: Statistical inference
Zhiwu Zhang
Washington State University
Administration
 Homework1, due Feb 1, Wednesday, 3:10PM
Outline







X2 test on contingency table
Empirical null distribution
X2 test on variance
t test
Hypothesis test
two types of error
Power
Observed and expected frequency
Transgenetic
Non transgenetic
SUM
Herbicide
35
5
40
No herbicide
35
25
60
SUM
70
30
100
Transgenetic
Non transgenetic
SUM
Herbicide
28
12
40
No herbicide
42
18
60
SUM
70
30
100
Approximate Distributions




Poisson distribution: Mean=Var=Expected
(Observed-Expected)/Sqrt(Expected) ~ N(0,1)
SUM(Observed-Expected)2/ Expected ~ X2(df)
df=number of independent cells
Observed and expected frequency
Transgenetic
Non transgenetic
SUM
Herbicide
35
5
40
No herbicide
35
25
60
SUM
70
30
100
Transgenetic
Non transgenetic
SUM
Herbicide
28
12
40
No herbicide
42
18
60
SUM
70
30
100
49/28+49/12+49/42+49/18=9.72
Distribution of x2(1)
0
0
2000
4000
6000
99% percentile
6.97
0.6
0.4
8000 10000
0
2
4
6
8
10
12
1.0
N = 10000 ecdf(x)
Bandwidth = 0.1299
0.6
0.4
Observed
9.72
P<1%
0.2
3000
Fn(x)
5000
0.8
7000
Index of x
Histogram
0.0
0 1000
Frequency
x=rchisq(k,1)
d=density(x)
plot(x)
plot(d)
hist(x)
plot(ecdf(x))
quantile(x,.99)
0.0
2
0.2
4
6
x
par(mfrow=c(2,2),mar =
c(3,4,1,1))
Density
8
0.8
10
12
1.0
density.default(x = x)
0
2
4
6
8
10
12
0
5
10
Tests on samples
 A sample has mean of 103.6 and variance of 27.82
 The sample has 10 observations
 Q1: What is the probability that the sample was from a normal
distribution with variance of 25?
 Q2: What is the probability that the sample was from a normal
distribution with mean of 100?
Q1: distribution with variance of 25
 Empirical solution:
 Sample ten observations from a normal distribution with
variance of 25.
 Calculate observed variance.
 Repeat the sampling and get null distribution of the
sample variances
 Find percentile of observed variance on the null
distribution
Q1: distribution with variance of 25
2000 4000 6000 8000
0
20
40
60
80
N = 10000 ecdf(x)
Bandwidth = 1.642
0.6
Observed
27.82
P>25%
0.0
0.2
0.4
Fn(x)
1000
0.8
1500
1.0
Index of x
Histogram
0
> length(x[x>27.82])/10000
[1] 0.3516
75% percentile
31.6
0.01
0.00
20
0
0
500
Frequency
par(mfrow=c(2,2),mar = c(3,4,1,1))
d=density(x)
plot(x)
plot(d)
hist(x)
plot(ecdf(x))
quantile(x,.75)
0.02
Density
60
40
x
x=replicate(10000,
{s=rnorm(10,0,5)
var=var(s)
})
0.03
80
density.default(x = x)
0
20
40
60
80
0
20
40
60
80
100
Q1: distribution with variance of 25
 Theoretical solution:
v=(10-1)*27.82/25=10.026
> 1-pchisq(10.026,9)
[1] 0.3483845
vs. 0.3516 from empirical
Q2: distribution with mean of 100
 Empirical solution





Sample ten observations from N(100, 25)
Calculate mean
Repeat the process 10,000 times
Null distribution of of the 10,000 means
Determine the percentile of testing mean (103.6) on the
null distribution
99% percentile
102.6
95% percentile
102.6
0.20
4000
6000
8000 10000
95
1.0
Index of x
Histogram
0.8
0.6
Fn(x)
1500
1000
100
105
N = 10000 ecdf(x)
Bandwidth = 0.2281
Observed
103.6
1%<P<5%
0
0.0
0.2
500
Frequency
> length(x[x>103.6])/10000
[1] 0.0132
0.15
0.05
0.00
2000
0.4
0
2000
par(mfrow=c(2,2),mar = c(3,4,1,1))
d=density(x)
plot(x)
plot(d)
hist(x)
plot(ecdf(x))
quantile(x,.95)
quantile(x,.99)
density.default(x = x)
0.10
Density
98 100
96
94
x=replicate(10000,
{s=rnorm(10,100,5)
m=mean(s)
})
x
104
0.25
Q2: distribution with mean of 100
95
100
105
95
100
105
t test
Let Z ~ N (0,1) V ~ c k2
Z,V independent
Z
Define: T =
V /k
Application: X1,..., X n ~ iid N ( m, s 2 )
æ X -mö
X -m
Z=
= nç
÷ ~ N(0,1)
s/ n
è s ø
V=
T=
(n -1)S 2
s
2
2
~ c n-1
æ X -mö
nç
÷
è s ø
(n -1)S 2
s2
(n -1)
Z,V Independent
æX -mö æ X -m ö
= nç
÷
÷=ç
è S ø èS n ø
t test
æ X -m ö
T =ç
÷
èS n ø
T=(103.6-100)/(5/sqrt(10))
P=1-pt(T,9)
c(T,P)
2.27683992 0.02440704
Under 5% of threshold, reject the hypothesis that the
sample was from a distribution with mean of 100
Hypothesis test





Null hypothesis (H0): Initial assumption
Alternative hypothesis (Ha): Opposite to the assumption
Find the probability of H0
If the probability is too low (e.g. 5%), reject Ho and accept Ha
Otherwise, accept Ho
Two types of errors and power
 Type I error: Reject true H0, False positive, the probability is
the threshold used, e.g. α=5%
 Type II error: Accept false H0, false negative, β
 Power: Probability to reject false H0, (1-β)
Summary
Test
H0 is True
Ho is False
Positive
(reject H0)
False positive
Type I: α
Power=1-β
Negative
(Accept H0)
Specificity=1-α
False negative
Type II: β
Sum
100%
100%
Highlight







X2 test on contingency table
Empirical null distribution
X2 test on variance
t test
Hypothesis test
two types of error
Power
Related documents