Download More Statistical Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
More Statistical Hypothesis Testing
Paul Cohen ISTA 370
January, 2012
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
1 / 27
Examples of Hypothesis Testing
The Sampling Distribution
The sampling distribution is the probability distribution over all
possible experiment results under the null hypothesis, H0 .
There are three main ways to get it:
Traditional: Derive it mathematically. Examples: t distribution,
χ2 distribution, F distribution, Gaussian distribution.
Traditional: Exact forms for some distributions, such as the
binomial.
Computer-intensive: Use the computer to simulate the process
that produced the experimental result under H0 . Examples:
Monte Carlo, Bootstrap, Randomization.
We’ll start with Monte Carlo so you’ll understand the logic of
hypothesis testing better before switching to traditional methods.
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
2 / 27
Examples of Hypothesis Testing
Father’s Profession Affect Offspring’s Gender?
The CIA Factbook lists the sex ratio in the US as 105 boys to
100 girls at birth (sex ratio of 105/205 = .51)
62 pilots and astronauts subjected to high G forces had 166
children with a sex ratio of .39*, i.e., 65 boys.
Are these pilots and astronauts significantly different from the
population?
What’s the null hypothesis?
How can you get a sampling distribution by Monte Carlo
simulation?
*http://www.hfea.gov.uk/docs/Appendix_C_-_Scientific_and_Technical_
Literature_Review.pdf
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
3 / 27
Examples of Hypothesis Testing
Father’s Profession Affect Offspring’s Gender?
H0 : Sex ratio is σ = .51
H1 : σ 6= .51
Greek symbols for population parameters, roman for sample statistics
Two tailed test. For one-tailed, H 1 :
σ < .51
Sample result: of N = 166 children, 65 were boys, so s = .39.
How can you get a sampling distribution by Monte Carlo
simulation? What is the experiment that you will repeat
“infinitely” to get the sampling distribution under H0 ?
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
4 / 27
Examples of Hypothesis Testing
Getting the Sampling Distribution
> probEvents<-function(eventIndex,events,probs,N){
# counts the number of events of a particular type given
# by the event index in a sample of N events
s<-sample(events,N,replace=TRUE,prob=probs)
sum(s==events[eventIndex])/N }
> probEvents(1,c("m","f"),c(.51,.49),166)
[1] 0.4939759
> samplingDist<replicate(10000,
probEvents(1,c("m","f"),c(.51,.49),166))
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
5 / 27
Examples of Hypothesis Testing
Getting the Sampling Distribution
> hist(samplingDist,breaks=50,xlim=c(0.3,.7),main=NULL)
300
100
0
Frequency
500
Recall, the pilots
and astronauts had
65 boys for a sex
ratio of s = .39.
Do you think H0
stands?
0.3
0.4
0.5
0.6
0.7
samplingDist
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
6 / 27
Examples of Hypothesis Testing
Hypothesis Testing Step by Step
1. State a null hypothesis, H0 : Sex ratio is σ = 0.51
2. Perform an experiment: Pilots and astronauts have s = .39
(65/166 boys).
3. Find the sampling distribution, the probability distribution of the
sample statistic, s, if H0 were true.
4a. Decide on α, a maximum acceptable probability of incorrectly
rejecting H0 .
4b. Use the sampling distribution under H0 to find critical values
c + and c − such that P (s ≥ c + ) + P (s ≤ c − ) ≤ α.
5. If s ≥ c + or s ≤ c − , reject H0 .
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
7 / 27
Examples of Hypothesis Testing
One- and Two-tailed Tests
We’ll reject H0 : σ = k if the sample statistic s is very far into
one of the tails of the sampling distribution of s under H0 .
We will limit the probability of rejecting H0 when it is true to α.
Sometimes we’ll reject H0 if the sample result is very large or
very small, sometimes we have prior reason to believe that only
large (or only small) sample results speak against H0 .
The first case is called a two-tailed test and the critical values
bound regions of the sampling distribution with areas α/2
The second case is called a one-tailed test, and the critical value
bounds an upper (or lower) region of the sampling distribution
with area α.
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
8 / 27
Examples of Hypothesis Testing
One- and Two-tailed Tests
>
>
>
>
>
>
>
>
alpha<-.05
OneTailedCritLower<-quantile(samplingDist,.05)
TwoTailedCritLower<-quantile(samplingDist,.025)
TwoTailedCritUpper<-quantile(samplingDist,.975)
hist(samplingDist,breaks=50,xlim=c(0.3,.7),main=NULL)
abline(v=OneTailedCritLower,col="red")
abline(v=TwoTailedCritLower,col="blue")
abline(v=TwoTailedCritUpper,col="blue")
300
Two-tailed lower critical value is 0.434
100
Two-tailed upper critical value is 0.584
0
Frequency
500
One-tailed lower critical value is 0.446
0.3
0.4
0.5
samplingDist
Paul Cohen ISTA 370 ()
0.6
0.7
More Statistical Hypothesis Testing
January, 2012
9 / 27
Examples of Hypothesis Testing
So...Do astronauts have more daughters?
300
100
0
Frequency
500
The sex ratio of offspring of astronauts and fighter pilots is 0.39.
0.3
0.4
0.5
0.6
0.7
samplingDist
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
10 / 27
Examples of Hypothesis Testing
Hypothesis Testing The “Proper” Way
1
2
3
4
5
6
7
State a null hypothesis, H0 : φ = k . If H1 : φ 6= k then
two-tailed test; if H1 : φ ≥ k or H1 : φ ≤ k then one-tailed
test.
Perform an experiment and get a sample result f = x .
Find the sampling distribution of f under H0 .
Decide on α, a maximum acceptable probability of incorrectly
rejecting H0 .
Use the sampling distribution under H0 to find critical values
c + and c − such that P (f ≥ c + ) + P (f ≤ c − ) ≤ α.
If x ≥ c + or x ≤ c − , reject H0 with p ≤ α
If α is, say, .05, write it as “Rejected H0 with p ≤ .05”
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
11 / 27
Examples of Hypothesis Testing
Hypothesis Testing The Common Way
1
2
3
4
5
6
State a null hypothesis, H0 : φ = k . Generally, it’ll be a
one-tailed test so H1 : φ ≥ k or H1 : φ ≤ k .
Perform an experiment and get a sample result f = x .
Find the sampling distribution of f under H0 .
Calculate the quantile p of the sampling distribution that
corresponds to x
If p is small, reject H0 .
If p is, say, .0215, write it as “Rejected H0 with p = .0215”
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
12 / 27
Examples of Hypothesis Testing
Beer Drinking and SISTA 370 Students
The “Proper” Way
Hypothesis: ISTA 370 students drink significantly more beer than
UA undergraduates in general.
Suppose that UA undergrads in general drink 0, 1, 2, 3 beers per
day with probabilities 0.1, 0.3, 0.4, 0.2, respectively.
Mean beer consumption for 25 students in ISTA 370 is 2 bottles
per day compared with 1.7 bottles per day in the general UA
undergraduate population.
In-class Quiz: What is H0 ? Is this a one-tailed or two-tailed test?
How can we get the sampling distribution under H0 by simulation?
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
13 / 27
Examples of Hypothesis Testing
Hypothesis Testing The “Proper” Way
1
2
3
State a null hypothesis, H0 : β = 1.7. H1 : β ≥ k .
Perform an experiment, get a sample result: b = 2.0, N = 25.
Find the sampling distribution of b under H0 ...
> UABeers<-c(0,1,2,3)
> UAprobs<-c(.1,.3,.4,.2)
> sample(UABeers,25,replace="true",prob=UAprobs)
[1] 2 2 3 3 2 2 0 2 2 2 3 1 1 2 2 1 3 3 1 2 1 3 3 2 2
> oneUABeerSample<-function(N){
mean(sample(UABeers,N,replace="true",prob=UAprobs))}
> oneUABeerSample(25)
[1] 1.8
> BeerSamplingDist<-replicate(10000,oneUABeerSample(25))
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
14 / 27
Examples of Hypothesis Testing
Hypothesis Testing The “Proper” Way
> p05<-quantile(BeerSamplingDist, probs=c(.95))
> print(qplot(BeerSamplingDist) +
geom_bar(aes(fill = BeerSamplingDist > p05)))
800
count
600
BeerSamplingDist > p05
FALSE
400
TRUE
200
0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
BeerSamplingDist
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
15 / 27
Examples of Hypothesis Testing
Hypothesis Testing The “Proper” Way
Use the sampling distribution under H0 to find a critical values
c + (it’s a one-tailed test) such that P (f ≥ c + ) ≤ α.
If x ≥ c + reject H0 with p ≤ α
4
5
> p05<-quantile(BeerSamplingDist, probs=c(.95))
> print(qplot(BeerSamplingDist,xlim=c(1.9,2.5)) +
geom_bar(aes(fill = BeerSamplingDist > p05)))
400
count
300
BeerSamplingDist > p05
200
FALSE
TRUE
Reject H0 ? Do ISTA
370 students drink more
than the average UA
student?
100
0
1.9
2.0
2.1
2.2
2.3
2.4
2.5
BeerSamplingDist
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
16 / 27
Examples of Hypothesis Testing
Hypothesis Testing The Common Way
2
3
4
5
>
>
>
>
>
Perform an experiment and get a sample result f = x .
Find the sampling distribution of f under H0 .
Calculate the quantile p of the sampling distribution that
corresponds to x
If p is small, reject H0 .
x<-2 # sample result: ISTA 370 beers = 2
count<-0 # count number of replicates with beers > 2
k=10000 # number of replicates in sampling distribution
for (i in 1:k) if (oneUABeerSample(25) >= x) count<-count+1
p=count/k
p =0.061 is the probability that a random sample of 25 from the
general population will drink at least 2 beers. Is this significant, i.e.,
small enough to say ISTA 370 students drink more beer?
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
17 / 27
Examples of Hypothesis Testing
Hypothesis Testing The Common Way
In 30 tosses a coin landed heads 23 times. Is the coin fair?
> rbinom(10,1,.5) # R function to create a binomial sample
[1] 1 1 0 0 1 0 1 0 1 1
>
>
>
>
>
>
>
b<-.5 # bias of the coin under H0
Nh<-23 # sample result: Number of heads = 23
Nt<-30 # number of tosses = 30
count<-0 # count number of replicates Nh >= 23
k=10000 # number of replicates in sampling distribution
for (i in 1:k) if (sum(rbinom(Nt,1,b))>=Nh) count<-count+1
p=count/k
p =0.0029 is the probability that a random sample of 30 tosses will
contain at least 23 heads. Is this significant, i.e., small enough to
say the coin is not fair?
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
18 / 27
Examples of Hypothesis Testing
What Affects The Sampling Distribution?
Every statistical test is influenced by three factors:
The difference between the sample result, f , and the expected
result under H0 , φ;
The variance, s 2 , of the population under H0 ;
The sample size, N .
The quantile of f in the sampling distribution is roughly:
quantile(f ) ∝
abs(φ − f )
p
s 2 /N
p
The term s 2 /N is the standard deviation of the sampling distribution, also
called the standard error. It is not the
standard deviation
√ of the population, s,
but is roughly 1/ N as large.
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
19 / 27
Examples of Hypothesis Testing
The Role of Sample Size – Critical Values
> MakeBeerSamplingDist<-function(k,N){
replicate(k,oneUABeerSample(N))}
> k<-10000
> SD15<-MakeBeerSamplingDist(k,15)
> SD30<-MakeBeerSamplingDist(k,30)
> SD100<-MakeBeerSamplingDist(k,100)
> c(quantile(SD15,.95),quantile(SD30,.95),quantile(SD100,.95)
95%
95%
95%
2.066667 1.966667 1.850000
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
20 / 27
Examples of Hypothesis Testing
The Role of Sample Size – Sampling Dist. Variance
n<-factor(c(rep(10,k),rep(25,k),rep(100,k)))
sds<-c(SD15,SD30,SD100)
df<-data.frame(sds,n)
print(qplot(sds,data=df,geom="density",
size=I(1.5),color=n)+
geom_vline(aes(xintercept=1.7)))
4
3
n
density
>
>
>
>
10
2
25
100
1
0
1.0
1.5
2.0
sds
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
21 / 27
Examples of Hypothesis Testing
The Role of Sample Size – Standard Errors
UAPopulation<-sample(UABeers,20000,replace="true",prob=UApr
UAPopSD<-sd(UAPopulation)
# Predicted Standard Errors:
sampleSizes<-c(15,30,100)
plot(c(UAPopSD/sqrt(15),# s.e. for sampling dist w/N=15
UAPopSD/sqrt(30),# s.e. for N=30
UAPopSD/sqrt(100)),# s.e. for N=100
type="l",x=sampleSizes,
ylab="Standard Errors")
> lines(c(sd(SD15),
sd(SD30),
sd(SD100))+.001,
x=sampleSizes,col="red")
0.18
0.14
0.10
Standard Errors
0.22
>
>
>
>
>
The standard errors for sampling distributions with N = 15, 30,p
100 are nearly to the
values predicted by s/ (N ).
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
20
40
60
80
100
sampleSizes
January, 2012
22 / 27
Examples of Hypothesis Testing
The Role of Sample Size in Sex Ratio Example
Our example looked at N = 166 children born to astronauts and
fighter pilots and found that sex ratio s = .39. The α = .05
one-tailed, lower critical value is:
> quantile(samplingDist,.05)
5%
0.4457831
Suppose our example had only N = 88 (but s = 0.39, 33 boys).
> SD88<replicate(10000,probEvents(1,c("m","f"),c(.51,.49),88))
> quantile(SD88,.05)
5%
0.4204545
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
23 / 27
Examples of Hypothesis Testing
The Role of Sample Size in Sex Ratio Example
Let’s look at the α = .05 one-tailed, lower critical value at different
values of N , or sample size:
0.38
0.34
0.30
Critical Value
0.42
sampleSizes<-c(10,20,30,40,50,60,70,80,90,100)
findCrit<-function(i){
SD<-replicate(10000,probEvents(1,c("m","f"),c(.51,.49),i))
quantile(SD,c(.05))[[1]]}
plot(sapply(sampleSizes,findCrit),
x=sampleSizes,
type="l",ylab="Critical Value",xlab="N")
abline(h=.39,col="red")
abline(v=46,col="red")
20
40
60
80
100
N
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
24 / 27
Examples of Hypothesis Testing
The Role of Sample Size in Sex Ratio Example
0.38
0.34
0.30
Critical Value
0.42
The α = .05 one-tailed, lower critical value becomes more
extreme with lower sample sizes.
If N < 46, or thereabouts, then s = .39 wouldn’t be significant
at the α = .05 level.
20
40
60
80
100
N
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
25 / 27
Examples of Hypothesis Testing
The Role of Population Variance
> N<-50
> k<-10000
> runif(5,4,6) # 5 uniform variates between 4 and 6
[1] 4.284958 5.325778 4.994246 4.541612 5.632105
> runif(5,1,9) # Larger variance...
[1] 3.139289 2.424885 2.321525 6.489595 1.380612
>
>
>
>
>
>
>
>
>
SD1<-replicate(k,mean(runif(N,4,6)))
# Sampling distribution of mean of uniform samples
SD2<-replicate(k,mean(runif(N,3,7)))
SD3<-replicate(k,mean(runif(N,2,8)))
SD4<-replicate(k,mean(runif(N,1,9)))
# Each SD is for a population of larger variance
sds=c(SD1,SD2,SD3,SD4)
v<-factor(c(rep(1,k),rep(2,k),rep(3,k),rep(4,k)))
df<-data.frame(sds,v)
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
26 / 27
Examples of Hypothesis Testing
The Role of Population Variance
> print(qplot(sds,data=df,geom="density",size=I(1.5),color=v)
.
4
v
density
3
1
2
2
3
4
1
0
4.0
4.5
5.0
5.5
6.0
sds
The sampling distribution of the mean of a sample (called the standard
error, or s.e.) depends on the variance (var ) from p
which the sample was
drawn, and N , the sample size. In general, s.e. = var /N
Paul Cohen ISTA 370 ()
More Statistical Hypothesis Testing
January, 2012
27 / 27
Related documents