Download hypothesis test

Document related concepts
Transcript
Chapter 8: Introduction to Hypothesis
Testing
Hypothesis Testing
• A hypothesis test is a statistical method that
uses sample data to evaluate a hypothesis
about a population.
• The general goal of a hypothesis test is to rule
out chance (sampling error) as a plausible
explanation for the results from a research study.
• If M is a distance away from your expected μ,
you need some tools to tell you whether your
“guess” is “trueH0” or “falseH1”.
Hypothesis Test - Steps
1. State hypothesis (H0, H1) about the population.
2. Use hypothesis to predict the characteristics
the sample should have. (formalize the decision
process: choose α)
3. Obtain a sample from the population. (calculate
M, s, and z)
4. Compare data with the hypothesis prediction.
(make a decision: reject or failed to reject H0)
Hypothesis Testing (cont'd.)
• If the individuals in the sample are noticeably
different from the individuals in the original
population, we have evidence that the treatment
has an effect.
• However, it is also possible that the difference
between the sample and the population is simply
sampling error
Example 8.1 (p. 235)
• neuropsychological tests: blueberry (high in
antioxidants) v.s. aging (↓cognitive function)
• age 65 and up: daily dos of a blueberry
supplement for 6 months (n=25, μ=80, σ=20)
• after 6 months, give another test  M,
z=(M-μ)/ σM,
• noticeably different  effective
• if not  not effective
Hypothesis Testing (cont'd.)
• The purpose of the hypothesis test is to decide
between two explanations:
1. The difference between the sample and the
population can be explained by sampling error
(there does not appear to be a treatment effect)
2. The difference between the sample and the
population is too large to be explained by
sampling error (there does appear to be a
treatment effect).
The Hypothesis Test: Step 1
• State the hypothesis about the unknown
population.
– The null hypothesis, H0, states that there is no
change in the general population before and after
an intervention. In the context of an experiment,
H0 predicts that the independent variable had no
effect on the dependent variable.
– The alternative hypothesis, H1, states that there
is a change in the general population following an
intervention. In the context of an experiment,
predicts that the independent variable did have
an effect on the dependent variable.
Mutually exclusive & collectively exhaustive
LO10-2
Step 1: State the Null and the Alternate
Hypothesis
NULL HYPOTHESIS A statement about the value of a population
parameter developed for the purpose of testing numerical evidence. It is
represented by H0.
ALTERNATE HYPOTHESIS A statement that is accepted if the sample
data provide sufficient evidence that the null hypothesis is false. It is
represented by H1.
10-10
Important Things to Remember about H0 and H1
• H0 is the null hypothesis; H1 is the alternate hypothesis.
• H0 and H1 are mutually exclusive and collectively
exhaustive.
• H0 is always presumed to be true.
• H1 has the burden of proof.
• A random sample (n) is used to “reject H0.”
• If we conclude “do not reject H0,” this does not
necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence to reject H0;
rejecting the null hypothesis, suggests that the
alternative hypothesis may be true given the probability
of Type I error.
• Equality is always part of H0 (e.g. “=”, “≥”, “≤”).
• Inequality is always part of H1 (e.g. “≠”, “<”, “>”).
p.236(example 8.1)
• H0: μ = 80
• H1: μ ≠ 80 (note: mutually exclusive and
collectively exhaustive)
• cannot both be true & one of them must be true
• a two-tail test
The Hypothesis Test: Step 2
• The α level establishes a criterion, or "cut-off",
for making a decision about the null hypothesis.
The alpha level also determines the risk of a
Type I error.
α = .01, α = .05 (most used), α = .001
• The critical region consists of outcomes that
are very unlikely to occur if the null hypothesis is
true. That is, the critical region is defined by
sample means that are almost impossible to
obtain if the treatment has no effect.
• Once α is determined critical region is set for
the hypothesis testing
p. 239
1. class size ↑ negative effect or not?
H0:?
2. α↑boundaries↑ (true/false?)
3. α=0.02  z*=? (two-tail test)
1%2.575
2%2.33
5%1.96
10%1.645, 0.1%3.3
The Hypothesis Test: Step 3
• Compare the sample means (data) with the null
hypothesis.
• Compute the test statistic. The test statistic (zscore) forms a ratio comparing the obtained
difference between the sample mean and the
hypothesized population mean versus the
amount of difference we would expect without
any treatment effect (the standard error), i.e. z.
Step 4: Formulate a Decision Rule:
One-Tail vs. Two-Tail Tests
CRITICAL VALUE Based on the selected level of significance, the
critical value is the dividing point between the region where the null
hypothesis is rejected and the region where it is not rejected.
If the test statistic is greater than or less than the critical value (in the region
of rejection), then reject the null hypothesis.
17
10-17
One-Tailed Test versus Two-Tailed
Test
10-18
The Hypothesis Test: Step 4
• If the test statistic results are in the critical
region, we conclude that the difference is
significant or that the treatment has a
significant effect. In this case we reject the null
hypothesis.  reject H0
• If the mean difference is not in the critical region,
we conclude that the evidence from the sample
is not sufficient, and the decision is fail to reject
the null hypothesis.  cannot reject H0
p.241 (example 8.1)
•
•
•
•
•
n=25, μ=80, σ=20, M=84  σM =20/5 = 4
H0: μ = 80
H1: μ ≠ 80
α = .05
z = (84-80)/4 = 1 not in the critical region 
failed to reject H0
Analogy for Hypothesis Testing
1. begin with a null hypothesis
• H0: no treatment effect
• H0: innocent
• H0: original μ (before treatment)
2. gather evidence, data, ...
3. choose acceptable “error” (type I)
4. decision:
• enough evidence  reject H0
• not enough evidence  failed to reject H0
z score as.....
• a recipe
1. H0: guess what’s in the recipe
2. cook and taste it
3. taste good: H0: maybe true
taste bad: H0: maybe false
• a ratio
z = sample error / standard error
= actual difference / standard difference
Errors in Hypothesis Tests
• Just because the sample mean (following
treatment) is different from the original
population mean does not necessarily indicate
that the treatment has caused a change.
• You should recall that there usually is some
discrepancy between a sample mean and the
population mean simply as a result of sampling
error.
Errors in Hypothesis Tests (cont'd.)
• Because the hypothesis test relies on sample
data, and because sample data are not
completely reliable, there is always the risk that
misleading data will cause the hypothesis test to
reach a wrong conclusion.
• Two types of errors are possible.
Errors in Hypothesis Testing
Type I Errors
• A Type I error occurs when the sample data
appear to show a treatment effect when, in fact,
there is none.
– In this case the researcher will reject the null
hypothesis and falsely conclude that the
treatment has an effect.
• Type I errors are caused by unusual,
unrepresentative samples, falling in the critical
region even though the treatment has no effect.
• The hypothesis test is structured so that Type I
errors are very unlikely; specifically, the probability
of a Type I error is equal to the alpha level.
Type II Errors
• A Type II error occurs when the sample does
not appear to have been affected by the
treatment when, in fact, the treatment does have
an effect.
– In this case, the researcher will fail to reject the
null hypothesis and falsely conclude that the
treatment does not have an effect.
– Type II errors are commonly the result of a very
small treatment effect. Although the treatment
does have an effect, it is not large enough to
show up in the research study.
Type I and Type II Errors Illustrated
zC 
X C 
M
 1.96 
X C    1.96   M
z  9922409900  0.55
n = 100, σ = 400, α = 0.05. H0: μ = 10,000, middle 95% zC = (-1.96, +1.96)
XC=10,000+(1.96)σM  XC = (9921.6, 10078.4) ≈ (9922, 10078)
it is possible that a sample would have a sample mean greater than 9,922.
See Region B. So we could commit a Type II error: Fail to reject a false null
hypothesis. Type II error is 0.2912 when the population mean is 9,900.
Type I and Type II Errors Illustrated
p < α  significant
•
•
•
•
zM = (M-μ)/σM  as critical value
p = Prob(|z|>zM) for 2-tailed test
p = Prob(z>zM) for 1-tailed test (right-hand tail)
p = Prob(z<zM) for 1-tailed test (left-hand tail)
• if p < α  should reject H0  statistically
significant
p < α  significant
_
f (x )
α=0.05
↓rejection region
p =0.0062
μ
0
603.2
8
_
605
1.64 2.5
x
z
p < α  significant
z test is be influenced by
1. σ↑ σM ↑  z ↓  less likely to reject H0
test statistics
M 
z
 ↑
n
↑
↓
2. n ↑ σM↓  z ↑ more likely to reject H0
z
M 

n↑
↓
↑
Basic assumption for Hypothesis Testing
• Random sampling
• Independent observations
Box 8.1
• σ unchanged by the treatment
• Normal distribution
p. 255
1. μ = 10.5, σ = 4.8, n = 16, M = 15.9, normal
a. α = 0.01, significant or not?
z = (15.9-10.5)/(4.8/4) = 4.5
b. write a report. Texting had a significant effect on
driving and p < 0.01.
5. σ = 2, σ = 10, which is more likely to reject H0?
σ↑ z ↓  more difficult to reject H0
σ↓ z ↑  more likely to reject H0
Directional Tests (one-tailed test)
• When a research study predicts a specific
direction for the treatment effect (increase or
decrease), it is possible to incorporate the
directional prediction into the hypothesis test.
• The result is called a directional test or a onetailed test. A directional test includes the
directional prediction in the statement of the
hypotheses and in the location of the critical
region.
Directional Tests (cont'd.)
• For example, if the original population has a
mean of μ = 80 and the treatment is predicted to
increase the scores, then the null hypothesis
would state that after treatment:
H0: μ ≤ 80 (there is no increase)
• In this case, the entire critical region would be
located in the right-hand tail of the distribution
because large values for M would demonstrate
that there is an increase and would tend to reject
the null hypothesis. H1: μ > 80
example 8.4 & p. 257-258
•
•
•
•
μ = 80, σ = 20, n = 25  σM = 20/5 = 4
if α = 0.01  critical value: z* = 2.33
if α = 0.025  critical value: z* = 1.96
if α = 0.05  critical value: z* = 1.645
• Now α=0.05, M=87, H1: μ > 80, z* = 1.645
 test statistics: z=(87-80)/4=1.75  reject H0
if H1: μ ≠ 80, α=0.05, M=87, z* = 1.96
 test statistics: z=1.75  failed to reject H0
~ One-Tailed Test versus Two-Tailed
Test for p. 258
10-39
two-tailed vs. one-tailed
• 2-tailed test:
- more rigorous, more convincing when H0 is rejected
- need more evidence (i.e. ∆=(M-μ) ) to reject H0,
∆=(M-μ) : treatment effect
• 1-tailed test:
- more sensitive (small ∆ can be significant)
- more precise (test a specific directional effect)
Box 8.2 (p. 260)
• type I error (α) is “true” only if H0 is true.
• If H0 is false, then α tells you nothing about the
population distribution and your hypothesis.
• Suppose: 80% H0 is true, and 20% H0 is false.
 for 125 tests, 100 H0 is true, 25 H0 is false
• if α = 0.05  5 out of 100’s H0 is wrongly rejected
• Suppose: when H0 is false, 60% is correctly rejected 
15 out of 25 H0 is correctly rejected
 20 out of 125’s H0 is reject (20 significant results)
• True probability of type I error (H0 true but rejected) =
5/20 = 0.25
• So, ¼ of significant research results has type I error!!
Limitations of Hypothesis Testing
1. the test depend on data rather than the hypothesis
reject H0 ≈ M is very unlikely to be so far away from μ
≈ H0 is very likely to be false
≠ H0 is truly false
2. significant ≠ big effect (treatment effect maybe small)
(M- μ)↑ z ↑  more likely to be significant
n↑ σM↓ z ↑  more likely to be significant
σ↓ z ↑  more likely to be significant
example 8.5 (p.261)
• μ =5, σ = 10, M = 51, n = 25,
• treatment effect = 51-50 = 1 (quite small)
• 2-tailed test:
n = 25, z = (51-50)/(10/5)=0.5 < 1.96
 failed to reject H0
but
if n = 400  z = (51-50)/(10/20) = 2 reject H0
Measuring Effect Size
• A hypothesis test evaluates the statistical
significance of the results from a research study.
• That is, the test determines whether or not it is
likely that the obtained sample mean occurred
without any contribution from a treatment effect.
• The hypothesis test is influenced not only by the
size of the treatment effect (M-μ) but also by the
size of the sample (σM ).
• Thus, even a very small effect can be significant
if it is observed in a very large sample.
• n ↑ σM↓  z ↑ more likely to reject H0
Measuring Effect Size
• Because a significant effect does not necessarily
mean a large effect, it is recommended that the
hypothesis test be accompanied by a measure
of the effect size.
• We use Cohen’s d as a standardized measure of
effect size.
• Much like a z-score, Cohen’s d measures the
size of the mean difference in terms of the
standard deviation.
Measuring Effect Size
•
•
•
•
Effect size = absolute size of treatment effect
Effect size should be independent of n
simplest, most direct effect size measure = d
Cohen’s d :
 After   Before M After -  Before
d



example 8.5 (p.261)
• μ =5, σ = 10, M = 51, n = 25,
• treatment effect = 51-50 = 1
• 2-tailed test :
z = (51-50)/(10/5)=0.5 < 1.96  failed to reject H0
if n = 400  z = (51-50)/(10/20) = 2 reject H0
• effect size: Cohen’s d = (M - μ) / σ
M: estimated population mean with/after treatment
μ: population mean without/before treatment
Cohen’s d = (51-50)/10 = 0.1 (for both n)
p.262-263
Case 1 (Fig. 8.11 (a))
• no treatment: μ =500, σ = 100,
• after treatment: μ =515, σ = 100,
d = 15/100 = 0.15 (the size of treatment effect is 0.15
standard deviation)
Case 2 (Fig. 8.11 (b))
• no treatment: μ =100, σ = 15,
• after treatment: μ =115, σ = 15,
d = 15/15 = 1 (the size of treatment effect is 1 standard
deviation)
effect size: Cohen’s d
• mean difference ↑  Cohen’s d ↑
• σ ↓ Cohen’s d ↑
 after  before
d

d
M after  before

effect size: Cohen’s d
• d = 0.2 small effect
• d = 0.5 medium effect
• d = 0.8 large effect
p. 265
1. n↑  σM ↓ z ↑ more likely to reject H0
• n↑  Cohen’s d ?
2. μ = 45, σ = 8, M = 47
• d = (47-45)/8 = 0.25
M after  before
 after  before
d
d


Power of a Hypothesis Test
• The power of a hypothesis test is defined is the
probability that the test will reject the null
hypothesis when the treatment does have an
effect. P(reject H0 | H0 is false) = 1-β
• The power of a test depends on a variety of
factors, including the size of the treatment effect
and the size of the sample.
• β = P(failed to reject H0 | H0 is false)
Example 8.6 (p. 266-267)
• normal: μ = 80, σ = 10, H1: μ ≠ 80 (2-tailed test)
Case 1: n = 25, α = 0.05  Zc = 1.96
X c 
zc  
n
 1.96
X c    1.96

n
 Xc = 80 1.96*(10/5)
 Xc = (76.08, 83.92)
if true μ = 88  recalculate zc
upper Zc = (83.92 - 88)/2= -2.04
lower Zc = (76.08 - 88)/2 = -5.96
1-β = P(z < -5.96)+P(z > -2.04) ≈ P(z > -2.04)
= 0.4793+0.5= 0.9793
-5.96
-2.04
Example 8.6 (p. 268-269)
Case 2: n = 4, α = 0.05, μ = 80, σ = 10
X c 
zc  
n
 1.96
X c    1.96

n
 Xc = 80 1.96*(10/2)  Xc = (70.2, 89.8)
 upper Zc = (89.8 - 88)/5= 0.36,
lower Zc = (70.2 - 88)/5 = -3.56
1-β = P(z<-3.56)+P(z>0.36) ≈ P(z > 0.36)
= 0.5 – 0.1406 = 0.3594
p. 270
1. power of test = 1-β = 0.5, M-μ = 5
for M-μ = 10, 1-β ↑↓? (see Fig 8.13 and Fig 8.12)
2. 1-β ↑  type II error ↑↓?
3. n ↑  1-β ↑↓?
Other things being equal, the greater the sample
size, the greater the power of the test.
4. Fig 8.13, find 1-β = ?