Download Hypothesis testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Hypothesis testing
In research we want to get answers to posed questions
(hypothesis).
• Are all coffee flavors equally popular?
• Is the use of bike helmets effective in protecting people in
bicycle accidents from head injuries?
• Is there a connection between gender and alcohol
consumption among the students at Umeå university?
HYPOTHETIC-DEDUCTIVE METHOD
1
Hypothesis
Statement
Deduction – logically
valid argument
(Predictive inference)
2
3
Induction
(Inductive inference)
1Tries to predict what will happen if the hypothesis holds.
2 ”Dialogue with reality”
Observation
Logical valid hypothesis (example)
Valid
Invalid
Hypothesis: It is raining.
Hypothesis: It is raining.
Statement: If it is raining the
ground will be wet.
Statement: If it it is raining the
ground will be wet.
Observation: The ground is not
wet.
Observation: The ground is wet.
Conclusion: It does not rain.
Conclusion: It rains.
Non valid conclusion. The ground can
be wet due to several reasons.
Contradiction proofs
Within statistical hypothesis testing (inference theory) we are
not looking for ”impossible” events” in order to reject posed
hypotheses.
(e.g. it is impossible that the ground is dry if it rains. If the
ground is dry hypothesis ”it rains” is rejected)
Instead we are looking for contradictions in terms of
”improbable events”.
Improbable event
Assume that we suspect that the usage of bicycle helmets is an effective way
to protect people in bicycle accidents from skull damage.
Null hypothesis: The percentage of persons with skull damage after a bicycle
accident is the same whether or not they use bicycle helmets.
Statement: If the percentage of persons with skull damage after a bicycle
accident is the same whether or not they use bicycle helmets, in a sample
survey there should only be a small difference in the percentage of people
with skull damage in the two groups.
If the hypothesis holds, it is an improbable event in a sample survey, to
observe a large percentage difference between these kinds of groups.
Improbable event
Assume that we suspect that there is a difference between male and female
students at Umeå university concerning the opinion about EMU.
Null hypothesis: The percentage of students that are against EMU is the same
whether or not they are male or females.
Statement: If the percentage of students that are against EMU is the same
whether or not they are male or females, in a sample survey there should
only be a small percentage difference of students against EMU between
the two groups.
If the hypothesis holds, it is an improbable event in a sample survey, to
observe a large percentage difference between these groups.
Test statistic
Within statistical inference theory the statements are
summarized in a test statistic.
The value of the test statistic is estimated from a sample.
The value of the test statistic varies between different
samples.
From our hypothesis and from the probability theory we
can predict the value of the test statistic if the null
hypothesis is true.
Next, we draw a sample and calculate the value of the test
statistic.
If we get an improbable value the null hypothesis is
rejected.
Note: Different types of test statistics are used for different
types of tests. (The computer program SPSS keeps track of
that?)
P-VALUE
• Assuming that the null hypothesis is true, the p-value is the
probability of obtaining a sample result that is at least as
unlikely as what is observed.
• If the p-value is small, we either have something which is
improbable or the null hypothesis does not hold.
• If the p-value is small (< 0.05 or <0.01) the null hypothesis
should be rejected.
9
Mosquito cream example:
We have tested anti mosquito creams on 10 persons.
Each person did get the cream A on a random chosen
arm and cream B on the other arm. The persons was
then forced to walk in Amazon jungle. The number of
mosquito bites were counted on each arm.
Suppose 7 out of the 10 persons did have less mosquito
bites on the arm with cream A. Is this enough evidence
to say that there is a difference in effectiveness
between the creams?
Help me with the null hypothesis.
Example:
• Null hypothesis: The anti mosquito creams A and B are
equally effective.
• Alternative hypothesis: the anti mosquito creams are
nor equally effective
• Statement: If the Null hypothesis holds then we expect
that about half of the people in our sample get more
mosquito bites with cream A.
• Math Calculations gives that if the null hypothesis is
true then the number of people in our sample that get
more mosquito bites on arm with cream A is binominal
distributed.
If Null hypothesis is true.
Is 7 out of 10 a Improbable event?
The probability of getting more than 7
or less than 3 is about 34%.
Conclusion
• The P-value is 34%. This means that it is not
uncommon to get the data we got in our
sample or anything more extreme if the null
hypothesis is true.
• We can not reject the Null hypotheses.
Reasons for non-significant results
• There is no difference
• There is a difference, but we have too few
observations to detect it
• Important. The fact that we can’t reject the
null hypothesis does not mean that the null
hypothesis is true.
Steps of hypothesis testing
• Develop a null hypothesis
• Develop an alternative hypothesis (what we want
to know)
• Specify the level of significance α: 0.05, 0.01,
0.001 (How certain do we want to be?)
• Select the test statistic that will be used to test the
hypothesis.
• Perform the test and calculate the p-value.
• Draw a conclusion by comparing the level of
significance (α) and the p-value.
- Reject the null hypothesis (p-value < α )
- Do not reject the null hypothesis (p-value ≥ α )
Choose the right test
• Hand out the summery picture.
• Different test uses different assumptions.
• Generally, the less assumption a test uses the
less power the test has. The power is the
ability to reject a false hull hypotheses.
• Many tests requires that the sample or some
transformation of the sample is normally
distributed.
Parametric/non-parametric test
• Parametric tests:
– if data are normally distributed
– describe your data with mean and SD
• Non-parametric tests:
– primarily if data are not normaly distributed
– can also be used if data is normally distributed, but less
powerful
– less sensitive to outliers
– describe your data with median and percentiles
Normal probability distribution
• How do I know if my variable is normally
distributed?
– continuous variable, no cut-off point
– draw histogram, normal probability plot
– symmetric, bell-shaped, mean=median
– Unsure? Use non-parametric tests if available
Comparing means example
A. Comparing means from 2 samples
(using T-test)
B. Comparing means from several samples
(using ANOVA).
C. Comparing means from several samples
(using Blocked ANOVA)
A: Do gender affect the mean score on
a statistical exam?
A: SPSS gives (T-test)
What does the SPSS output imply?
B: Do students with different grades
put down different amount of time in
the studies?
A: SPSS gives (One Way ANOVA)
• What is the simple idée behind the analysis?
• What does the SPSS output imply?
• Where is the difference?
Tukey intervals
(Where does the mean differ?)
C: Do math background or Gender or
both influence the time put into the
course?
SPSS gives (two way ANOVA)
If time: Why the tests works 1
• “The law of large numbers (LLN). Given a
sample of independent and identically
distributed random variables with a finite
population mean, the average of these
observations will eventually approach and stay
close to the population mean.”
• This result tells us that the larger the sample,
the better precision of the estimates.
If time: Why the tests works 2
• “The central limit theorem (CLT) states that if
the sum of independent identically distributed
random variables has a finite variance, then it
will be approximately normally distributed
(i.e., following a Gausian distribution, or bellshaped curve). “
• This result (and similar) is important because
it lets us approximate the distribution of test
statistics which is necessary to test hypothesis
If even more time: Example
100 people took part in a survey about different
brands of coffee. Each person tasted four
different brand (in a blind test), and noted
which one they preferred. The result of the
test was as follows:
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
26
28
16
30
Does the result of the survey show that any of
the brands are more popular than the others,
or are they all equal?
In statistical terms we can formulate the
problem as:
Null hypothesis: All the coffee brands are
equally popular.
Alternative hypothesis: All the coffee brands
are not equally popular.
If the null hypothesis is true, we could expect
the following result of the survey:
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
25
25
25
25
Can we with a significance level of 5% say
anything about whether or not the null
hypothesis is true.
One way of measuring how much the observed
table differs from the expected table is to look
at the differences:
26  25  28  25  16  25  30  25
2
2
2
2
However, there is a problem with the fact that the
difference between 10 and 20 is relatively larger than
the difference between 10000 and 10010. How can
we take that into account?
Divide with the expected value and formulate a test
statistic:


26  25

2
2
obs
25
 4.64

28  25

2
25

16  25

2
25

30  25

2
25

If the null hypothesis is true,  ought to be
close to zero. Is 4.64 so far away from zero
that we can reject the null hypothesis?
2
obs
We compare the obtained p-value with our
chosen level of significance.
Observed p-value: 0,20
Conclusion?
Distribution under the null hypothesis.
(To get 4.64 or more is not unusual.
We can not reject the null hypothesis.)