Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
More Statistical Hypothesis Testing Paul Cohen ISTA 370 January, 2012 Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 1 / 27 Examples of Hypothesis Testing The Sampling Distribution The sampling distribution is the probability distribution over all possible experiment results under the null hypothesis, H0 . There are three main ways to get it: Traditional: Derive it mathematically. Examples: t distribution, χ2 distribution, F distribution, Gaussian distribution. Traditional: Exact forms for some distributions, such as the binomial. Computer-intensive: Use the computer to simulate the process that produced the experimental result under H0 . Examples: Monte Carlo, Bootstrap, Randomization. We’ll start with Monte Carlo so you’ll understand the logic of hypothesis testing better before switching to traditional methods. Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 2 / 27 Examples of Hypothesis Testing Father’s Profession Affect Offspring’s Gender? The CIA Factbook lists the sex ratio in the US as 105 boys to 100 girls at birth (sex ratio of 105/205 = .51) 62 pilots and astronauts subjected to high G forces had 166 children with a sex ratio of .39*, i.e., 65 boys. Are these pilots and astronauts significantly different from the population? What’s the null hypothesis? How can you get a sampling distribution by Monte Carlo simulation? *http://www.hfea.gov.uk/docs/Appendix_C_-_Scientific_and_Technical_ Literature_Review.pdf Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 3 / 27 Examples of Hypothesis Testing Father’s Profession Affect Offspring’s Gender? H0 : Sex ratio is σ = .51 H1 : σ 6= .51 Greek symbols for population parameters, roman for sample statistics Two tailed test. For one-tailed, H 1 : σ < .51 Sample result: of N = 166 children, 65 were boys, so s = .39. How can you get a sampling distribution by Monte Carlo simulation? What is the experiment that you will repeat “infinitely” to get the sampling distribution under H0 ? Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 4 / 27 Examples of Hypothesis Testing Getting the Sampling Distribution > probEvents<-function(eventIndex,events,probs,N){ # counts the number of events of a particular type given # by the event index in a sample of N events s<-sample(events,N,replace=TRUE,prob=probs) sum(s==events[eventIndex])/N } > probEvents(1,c("m","f"),c(.51,.49),166) [1] 0.4939759 > samplingDist<replicate(10000, probEvents(1,c("m","f"),c(.51,.49),166)) Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 5 / 27 Examples of Hypothesis Testing Getting the Sampling Distribution > hist(samplingDist,breaks=50,xlim=c(0.3,.7),main=NULL) 300 100 0 Frequency 500 Recall, the pilots and astronauts had 65 boys for a sex ratio of s = .39. Do you think H0 stands? 0.3 0.4 0.5 0.6 0.7 samplingDist Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 6 / 27 Examples of Hypothesis Testing Hypothesis Testing Step by Step 1. State a null hypothesis, H0 : Sex ratio is σ = 0.51 2. Perform an experiment: Pilots and astronauts have s = .39 (65/166 boys). 3. Find the sampling distribution, the probability distribution of the sample statistic, s, if H0 were true. 4a. Decide on α, a maximum acceptable probability of incorrectly rejecting H0 . 4b. Use the sampling distribution under H0 to find critical values c + and c − such that P (s ≥ c + ) + P (s ≤ c − ) ≤ α. 5. If s ≥ c + or s ≤ c − , reject H0 . Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 7 / 27 Examples of Hypothesis Testing One- and Two-tailed Tests We’ll reject H0 : σ = k if the sample statistic s is very far into one of the tails of the sampling distribution of s under H0 . We will limit the probability of rejecting H0 when it is true to α. Sometimes we’ll reject H0 if the sample result is very large or very small, sometimes we have prior reason to believe that only large (or only small) sample results speak against H0 . The first case is called a two-tailed test and the critical values bound regions of the sampling distribution with areas α/2 The second case is called a one-tailed test, and the critical value bounds an upper (or lower) region of the sampling distribution with area α. Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 8 / 27 Examples of Hypothesis Testing One- and Two-tailed Tests > > > > > > > > alpha<-.05 OneTailedCritLower<-quantile(samplingDist,.05) TwoTailedCritLower<-quantile(samplingDist,.025) TwoTailedCritUpper<-quantile(samplingDist,.975) hist(samplingDist,breaks=50,xlim=c(0.3,.7),main=NULL) abline(v=OneTailedCritLower,col="red") abline(v=TwoTailedCritLower,col="blue") abline(v=TwoTailedCritUpper,col="blue") 300 Two-tailed lower critical value is 0.434 100 Two-tailed upper critical value is 0.584 0 Frequency 500 One-tailed lower critical value is 0.446 0.3 0.4 0.5 samplingDist Paul Cohen ISTA 370 () 0.6 0.7 More Statistical Hypothesis Testing January, 2012 9 / 27 Examples of Hypothesis Testing So...Do astronauts have more daughters? 300 100 0 Frequency 500 The sex ratio of offspring of astronauts and fighter pilots is 0.39. 0.3 0.4 0.5 0.6 0.7 samplingDist Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 10 / 27 Examples of Hypothesis Testing Hypothesis Testing The “Proper” Way 1 2 3 4 5 6 7 State a null hypothesis, H0 : φ = k . If H1 : φ 6= k then two-tailed test; if H1 : φ ≥ k or H1 : φ ≤ k then one-tailed test. Perform an experiment and get a sample result f = x . Find the sampling distribution of f under H0 . Decide on α, a maximum acceptable probability of incorrectly rejecting H0 . Use the sampling distribution under H0 to find critical values c + and c − such that P (f ≥ c + ) + P (f ≤ c − ) ≤ α. If x ≥ c + or x ≤ c − , reject H0 with p ≤ α If α is, say, .05, write it as “Rejected H0 with p ≤ .05” Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 11 / 27 Examples of Hypothesis Testing Hypothesis Testing The Common Way 1 2 3 4 5 6 State a null hypothesis, H0 : φ = k . Generally, it’ll be a one-tailed test so H1 : φ ≥ k or H1 : φ ≤ k . Perform an experiment and get a sample result f = x . Find the sampling distribution of f under H0 . Calculate the quantile p of the sampling distribution that corresponds to x If p is small, reject H0 . If p is, say, .0215, write it as “Rejected H0 with p = .0215” Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 12 / 27 Examples of Hypothesis Testing Beer Drinking and SISTA 370 Students The “Proper” Way Hypothesis: ISTA 370 students drink significantly more beer than UA undergraduates in general. Suppose that UA undergrads in general drink 0, 1, 2, 3 beers per day with probabilities 0.1, 0.3, 0.4, 0.2, respectively. Mean beer consumption for 25 students in ISTA 370 is 2 bottles per day compared with 1.7 bottles per day in the general UA undergraduate population. In-class Quiz: What is H0 ? Is this a one-tailed or two-tailed test? How can we get the sampling distribution under H0 by simulation? Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 13 / 27 Examples of Hypothesis Testing Hypothesis Testing The “Proper” Way 1 2 3 State a null hypothesis, H0 : β = 1.7. H1 : β ≥ k . Perform an experiment, get a sample result: b = 2.0, N = 25. Find the sampling distribution of b under H0 ... > UABeers<-c(0,1,2,3) > UAprobs<-c(.1,.3,.4,.2) > sample(UABeers,25,replace="true",prob=UAprobs) [1] 2 2 3 3 2 2 0 2 2 2 3 1 1 2 2 1 3 3 1 2 1 3 3 2 2 > oneUABeerSample<-function(N){ mean(sample(UABeers,N,replace="true",prob=UAprobs))} > oneUABeerSample(25) [1] 1.8 > BeerSamplingDist<-replicate(10000,oneUABeerSample(25)) Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 14 / 27 Examples of Hypothesis Testing Hypothesis Testing The “Proper” Way > p05<-quantile(BeerSamplingDist, probs=c(.95)) > print(qplot(BeerSamplingDist) + geom_bar(aes(fill = BeerSamplingDist > p05))) 800 count 600 BeerSamplingDist > p05 FALSE 400 TRUE 200 0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 BeerSamplingDist Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 15 / 27 Examples of Hypothesis Testing Hypothesis Testing The “Proper” Way Use the sampling distribution under H0 to find a critical values c + (it’s a one-tailed test) such that P (f ≥ c + ) ≤ α. If x ≥ c + reject H0 with p ≤ α 4 5 > p05<-quantile(BeerSamplingDist, probs=c(.95)) > print(qplot(BeerSamplingDist,xlim=c(1.9,2.5)) + geom_bar(aes(fill = BeerSamplingDist > p05))) 400 count 300 BeerSamplingDist > p05 200 FALSE TRUE Reject H0 ? Do ISTA 370 students drink more than the average UA student? 100 0 1.9 2.0 2.1 2.2 2.3 2.4 2.5 BeerSamplingDist Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 16 / 27 Examples of Hypothesis Testing Hypothesis Testing The Common Way 2 3 4 5 > > > > > Perform an experiment and get a sample result f = x . Find the sampling distribution of f under H0 . Calculate the quantile p of the sampling distribution that corresponds to x If p is small, reject H0 . x<-2 # sample result: ISTA 370 beers = 2 count<-0 # count number of replicates with beers > 2 k=10000 # number of replicates in sampling distribution for (i in 1:k) if (oneUABeerSample(25) >= x) count<-count+1 p=count/k p =0.061 is the probability that a random sample of 25 from the general population will drink at least 2 beers. Is this significant, i.e., small enough to say ISTA 370 students drink more beer? Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 17 / 27 Examples of Hypothesis Testing Hypothesis Testing The Common Way In 30 tosses a coin landed heads 23 times. Is the coin fair? > rbinom(10,1,.5) # R function to create a binomial sample [1] 1 1 0 0 1 0 1 0 1 1 > > > > > > > b<-.5 # bias of the coin under H0 Nh<-23 # sample result: Number of heads = 23 Nt<-30 # number of tosses = 30 count<-0 # count number of replicates Nh >= 23 k=10000 # number of replicates in sampling distribution for (i in 1:k) if (sum(rbinom(Nt,1,b))>=Nh) count<-count+1 p=count/k p =0.0029 is the probability that a random sample of 30 tosses will contain at least 23 heads. Is this significant, i.e., small enough to say the coin is not fair? Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 18 / 27 Examples of Hypothesis Testing What Affects The Sampling Distribution? Every statistical test is influenced by three factors: The difference between the sample result, f , and the expected result under H0 , φ; The variance, s 2 , of the population under H0 ; The sample size, N . The quantile of f in the sampling distribution is roughly: quantile(f ) ∝ abs(φ − f ) p s 2 /N p The term s 2 /N is the standard deviation of the sampling distribution, also called the standard error. It is not the standard deviation √ of the population, s, but is roughly 1/ N as large. Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 19 / 27 Examples of Hypothesis Testing The Role of Sample Size – Critical Values > MakeBeerSamplingDist<-function(k,N){ replicate(k,oneUABeerSample(N))} > k<-10000 > SD15<-MakeBeerSamplingDist(k,15) > SD30<-MakeBeerSamplingDist(k,30) > SD100<-MakeBeerSamplingDist(k,100) > c(quantile(SD15,.95),quantile(SD30,.95),quantile(SD100,.95) 95% 95% 95% 2.066667 1.966667 1.850000 Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 20 / 27 Examples of Hypothesis Testing The Role of Sample Size – Sampling Dist. Variance n<-factor(c(rep(10,k),rep(25,k),rep(100,k))) sds<-c(SD15,SD30,SD100) df<-data.frame(sds,n) print(qplot(sds,data=df,geom="density", size=I(1.5),color=n)+ geom_vline(aes(xintercept=1.7))) 4 3 n density > > > > 10 2 25 100 1 0 1.0 1.5 2.0 sds Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 21 / 27 Examples of Hypothesis Testing The Role of Sample Size – Standard Errors UAPopulation<-sample(UABeers,20000,replace="true",prob=UApr UAPopSD<-sd(UAPopulation) # Predicted Standard Errors: sampleSizes<-c(15,30,100) plot(c(UAPopSD/sqrt(15),# s.e. for sampling dist w/N=15 UAPopSD/sqrt(30),# s.e. for N=30 UAPopSD/sqrt(100)),# s.e. for N=100 type="l",x=sampleSizes, ylab="Standard Errors") > lines(c(sd(SD15), sd(SD30), sd(SD100))+.001, x=sampleSizes,col="red") 0.18 0.14 0.10 Standard Errors 0.22 > > > > > The standard errors for sampling distributions with N = 15, 30,p 100 are nearly to the values predicted by s/ (N ). Paul Cohen ISTA 370 () More Statistical Hypothesis Testing 20 40 60 80 100 sampleSizes January, 2012 22 / 27 Examples of Hypothesis Testing The Role of Sample Size in Sex Ratio Example Our example looked at N = 166 children born to astronauts and fighter pilots and found that sex ratio s = .39. The α = .05 one-tailed, lower critical value is: > quantile(samplingDist,.05) 5% 0.4457831 Suppose our example had only N = 88 (but s = 0.39, 33 boys). > SD88<replicate(10000,probEvents(1,c("m","f"),c(.51,.49),88)) > quantile(SD88,.05) 5% 0.4204545 Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 23 / 27 Examples of Hypothesis Testing The Role of Sample Size in Sex Ratio Example Let’s look at the α = .05 one-tailed, lower critical value at different values of N , or sample size: 0.38 0.34 0.30 Critical Value 0.42 sampleSizes<-c(10,20,30,40,50,60,70,80,90,100) findCrit<-function(i){ SD<-replicate(10000,probEvents(1,c("m","f"),c(.51,.49),i)) quantile(SD,c(.05))[[1]]} plot(sapply(sampleSizes,findCrit), x=sampleSizes, type="l",ylab="Critical Value",xlab="N") abline(h=.39,col="red") abline(v=46,col="red") 20 40 60 80 100 N Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 24 / 27 Examples of Hypothesis Testing The Role of Sample Size in Sex Ratio Example 0.38 0.34 0.30 Critical Value 0.42 The α = .05 one-tailed, lower critical value becomes more extreme with lower sample sizes. If N < 46, or thereabouts, then s = .39 wouldn’t be significant at the α = .05 level. 20 40 60 80 100 N Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 25 / 27 Examples of Hypothesis Testing The Role of Population Variance > N<-50 > k<-10000 > runif(5,4,6) # 5 uniform variates between 4 and 6 [1] 4.284958 5.325778 4.994246 4.541612 5.632105 > runif(5,1,9) # Larger variance... [1] 3.139289 2.424885 2.321525 6.489595 1.380612 > > > > > > > > > SD1<-replicate(k,mean(runif(N,4,6))) # Sampling distribution of mean of uniform samples SD2<-replicate(k,mean(runif(N,3,7))) SD3<-replicate(k,mean(runif(N,2,8))) SD4<-replicate(k,mean(runif(N,1,9))) # Each SD is for a population of larger variance sds=c(SD1,SD2,SD3,SD4) v<-factor(c(rep(1,k),rep(2,k),rep(3,k),rep(4,k))) df<-data.frame(sds,v) Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 26 / 27 Examples of Hypothesis Testing The Role of Population Variance > print(qplot(sds,data=df,geom="density",size=I(1.5),color=v) . 4 v density 3 1 2 2 3 4 1 0 4.0 4.5 5.0 5.5 6.0 sds The sampling distribution of the mean of a sample (called the standard error, or s.e.) depends on the variance (var ) from p which the sample was drawn, and N , the sample size. In general, s.e. = var /N Paul Cohen ISTA 370 () More Statistical Hypothesis Testing January, 2012 27 / 27