Download INFOWO Statistics lecture S3: Hypothesis testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Overall Overview
INFOWO Statistics lecture S3:
Hypothesis testing
Peter de Waal
Department of Information and Computing Sciences
Faculty of Science, Universiteit Utrecht
Lecture S3:
1
Descriptive statistics
2
Scores and probability distributions
3
Hypothesis testing and one-sample t-test
4
More on t-tests
5
Homegeneity and reliability
6
Correlation and prediction
7
Analysis of variance
8
Chi2 -test
9
Q&A lecture
1 / 47
Today
Lecture S3:
2 / 47
Today’s overview
Does the facebook diet really work?
Recap
Hypotheses testing (Chapter 8)
I Test procedure
???
I
Or is the Breezer diet better?
Hypotheses H0 and H1
t-distribution (Chapter 9)
One-sample t-test
???
Lecture S3:
3 / 47
Lecture S3:
4 / 47
Recap
Hypothesis testing
Recal from Lecture M1:
Normal distribution:
I Shape
I
Hypothesis
Parameters µ and σ
Empirical formulation of proposition, stated as relationship between
variables.
Calculations
Sample and Sampling distribution:
I Sample distribution = (theoretical) distribution of data when
one person/item is measured
I
Examples:
Sampling distribution = distribution of data when sample of n
items are measured and average is taken.
Central Limit Theorem:
I Sampling distribution is approximately normal.
I
Confidence intervals
Lecture S3: Recap
5 / 47
Hypothesis testing
There is relation between a students’ IQ score and his grade point
average.
Students who live at home with their parents spend more time on
Facebook.
There is a positive correlation between Facebook use and
narcissism.
Lecture S3: Hypothesis testing Introduction
6 / 47
Hypothesis testing
THE BRAND NEW FACEBOOK DIET (AS SEEN ON THE WEB)!
Spend time on
Randomly select test group
Put test group on Facebook regime (8 hours per day) for 4 weeks
Weigh test persons
Compare with known mean from reference population
Results?
and lose
People may gain weight due to inactivity
People may lose weight due to lack of time to eat
Weight may not change at all. . .
without excessive exercises!
Lecture S3: Hypothesis testing Introduction
7 / 47
Lecture S3: Hypothesis testing Introduction
8 / 47
Some data
Basic experimental situation
Mean weight of reference population (before diet) is µ = 80,
σ = 20
After the test trial:
Mean weight in test group is X = 76.
Does this mean the diet works? Or is it random fluctuation?
What if X = 86?
What if X = 70?
Lecture S3: Hypothesis testing Introduction
9 / 47
Hypothesis testing
Lecture S3: Hypothesis testing Introduction
10 / 47
1. Formulation of the hypotheses
Pose two possible, exclusive, hypotheses about the world or about a
population:
Hypotheses
A statistical method that uses sample data to evaluate a hypothesis
about a population
Null hypothesis (H0 ):
States that, in the general population, there is no change, no
difference, or no relationship.
Basic steps of hypothesis testing
1
Formulate hypotheses
2
Set criteria for decision
3
Collect data and compute sample statistic
4
Make decision
Alternative hypothesis (H1 ):
States that there is a change, a difference, or a relationship in the
general population.
Diet example:
H0 : µafter = 80, (Facebook diet has no effect)
H1 : µafter 6= 80, (Facebook diet has effect)
Lecture S3: Hypothesis testing Procedure
11 / 47
Lecture S3: Hypothesis testing Procedure
12 / 47
2. Set criteria for decision
2. Set the criteria for the decision
If H0 is true, which values for sample means are likely?
Significance level or alpha level
Defines boundary between likely and unlikely
Denoted by symbol α
Value is determined beforehand (i.e. before you take a sample!)
Typical values are α = 0.05 or α = 0.01.
Critical region
The extreme sample values that are very unlikely
Boundaries of critical region are determined by α.
Lecture S3: Hypothesis testing Procedure
Critical region of Z =
13 / 47
X − 80
for α = 0.05
σX
Lecture S3: Hypothesis testing Procedure
14 / 47
Check!
True or false?
The critical region defines unlikely values if the null hypothesis is
true. (True)
If the alpha level is decreased, the critical region becomes smaller.
(True)
Lecture S3: Hypothesis testing Procedure
15 / 47
Lecture S3: Hypothesis testing Procedure
16 / 47
Critical region boundaries
3. Collect data and compute sample statistic
Data is collected after hypotheses are formulated.
Data is collected after criteria for decision are set.
This sequence assures objectivity.
Compute a sample statistic (in this case Z-score) to show the exact
position of the sample.
Lecture S3: Hypothesis testing Procedure
17 / 47
4. Make decision
Lecture S3: Hypothesis testing Procedure
18 / 47
Examples
Example outcome experiment A:
Sample size n = 16
Observed sample mean is X = 75
After calculation of the sample statistic:
X−µ
75 − 80
√ =
= −1
20/4
(σ/ n)
Decision: Not in critical region, so retain H0
Z-score for observed sample mean is Z =
If sample data are in the critical region: null hypothesis is rejected.
If the sample data are not in the critical region: the researcher fails
to reject the null hypothesis.
Example outcome experiment B:
Sample size n = 25
Observed sample mean is X = 88
Z-score for observed sample mean is Z =
X−µ
88 − 80
√ =
= +2
20/5
σ/ n
Decision: In critical region, so reject H0
Lecture S3: Hypothesis testing Procedure
19 / 47
Lecture S3: Hypothesis testing Procedure
20 / 47
Check!
Why the null hypothesis?
True or False
Question: Seems odd to focus on null hypothesis, which we do not
believe to be true?
When the Z-score is quite large, it shows the null hypothesis is
true. (False)
Answer: In logic, it is easier to demonstrate that a universal
hypothesis is false than to prove that it is true.
A decision to retain the null hypothesis means you showed that
the treatment has no effect. (False)
(Recall Popper’s falsification criterion from Lecture M1!)
Lecture S3: Hypothesis testing Procedure
21 / 47
What could possibly go wrong?
Lecture S3: Hypothesis testing Procedure
Summary
(Test has indicated a non-existant treatment effect)
Probability that type I error occurs is equal to significance
level α.
Type II error
I H is not true, but outcome is such that H is not rejected.
0
0
H0 true
(no effect)
H1 true
(effect exists)
retain H0
OK
Type II error
reject H0
Type I error
OK
I
I
(Test has failed to detect a real treatment effect)
I
Probability of Type II error is sometimes denoted with
symbol β.
Lecture S3: Hypothesis testing Uncertainty and errors
22 / 47
Possible test outcomes
Type I error
I H is true, but by chance outcome is such that H is rejected.
0
0
I
So: Usually the alternative hypothesis corresponds to your
experimental hypothesis
23 / 47
Lecture S3: Hypothesis testing Uncertainty and errors
24 / 47
Some remarks
Directional (one-tailed) test
Terminology in literature:
A result is called significant or statistically significant if it makes us
reject the null hypothesis.
So far:
Two-sided (two-tailed) hypothesis:
Does not indicate a direction for the possible effect or relation
Factors influencing hypothesis test:
Size of difference between sample mean and original population
mean:
Appears in numerator of the Z-score
Variability of the scores:
Influences size of the standard error
What if you expect an effect in a certain direction?
One-sided (one-tailed) hypothesis:
Indicates a possible direction for the assumed effect or relation
Number of scores in the sample:
Influences size of the standard error
Lecture S3: Hypothesis testing Uncertainty and errors
25 / 47
Example directional hypothesis
Lecture S3: Hypothesis testing Directional hypothesis
26 / 47
Critical region
THE BREEZER DIET (AS SEEN ON MTV)!
How does drinking 4 Breezers a day affect
your weight?
We expect that Breezers make you gain weight.
1
Formulate the hypothesis:
I H
0 : µafter ≤ 80 (null hypothesis)
I
2
H1 : µafter > 80 (alternative hypothesis)
Set criteria for decision:
I Significance level α = 0.05
I
Critical region: Z ≥ 1.65 (From Column C in Table B.1)
We take a sample of n = 25 test persons.
Lecture S3: Hypothesis testing Directional hypothesis
27 / 47
Lecture S3: Hypothesis testing Directional hypothesis
28 / 47
Example directional hypothesis
3
More often than not the population variance σ is unknown
So also standard error of the mean σM is not known
What to do?
SS
as estimate for σ 2 .
Use sample standard deviation s2 =
n
−1
r
r
σ2
s2
with estimated standard error sM =
Replace σM =
n
n
If Variance σ known, use:
If Variance σ unknown, use:
σ
s
σM = √
sM = √
n
n
Collect data and compute sample statistic
I Sample size n = 25
I
Population σ = 20
I
Sample mean X = 87
I
Standard error of the means is σM =
20
=4
5
87 − 80
= 1.75
4
Make decision:
I Z-score is in critical region, so we reject H
0
I
4
Unknown variance
Z=
Z=
So: We reject H0 and conclude that Breezers makes you gain weight!
Lecture S3: Hypothesis testing Directional hypothesis
X−µ
σM
Z has a standard
distribution under H0
29 / 47
t-distribution
t=
normal
X−µ
sM
t has a t-distribution with
df = n − 1 under H0
Lecture S3: Hypothesis testing Directional hypothesis
30 / 47
t-distribution: plots
Is a family of distributions
Resembles the standard normal
distribition in shape and spread
Has a bit “more mass” in the tails
(flatter)
Has one parameter:
degrees of freedom (df)
For df = ∞ the t-distribution equals
the standard normal distribution
William Sealey
Gosset (1876–1937)
Sometimes also called Student
distribution.
Lecture S3: Hypothesis testing t-distribution
31 / 47
Lecture S3: Hypothesis testing t-distribution
32 / 47
Example
Example (continued)
Assume we have a sample of n = 10 Information Science students
We would like to test the following hypothesis
Observed sample mean X = 21.2
Information science students on average spend 20 hours per week
on INFOWO
H0 : µ = 20
(null hypothesis)
H1 : µ 6= 20
(alternative hypothesis
Recall calculation of t:
t=
We set criteria for decision
X−µ
X−µ
√
=
sM
(s/ n)
Observed data:
Significance level α = 0.05
Lecture S3: Hypothesis testing t-distribution
Observed standard deviation s = 3.4
t=
33 / 47
X−µ
21.2 − 20.0
1.2
√ =
√
=
= 1.11
1.08
(s/ n)
(3.4/ 10)
Lecture S3: Hypothesis testing t-distribution
t-test: critical value
Example (continued)
t Table
cum. prob
one-tail
Two-sided test, significance level α = 0.05
two-tails
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
1000
Decision rule:
I Reject H if t < −t
0
crit or if t > tcrit
I
Do not reject H0 if −tcrit ≤ t < tcrit
How do we determine tcrit ?
Look up value in Table B.2
z
Lecture S3: Hypothesis testing t-distribution
34 / 47
35 / 47
t .50
t .75
t .80
t .85
t .90
t .95
t .975
t .99
t .995
t .999
t .9995
0.50
1.00
0.25
0.50
0.20
0.40
0.15
0.30
0.10
0.20
0.05
0.10
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
0.0005
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.678
0.677
0.675
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.846
0.845
0.842
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.045
1.043
1.042
1.037
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
1.282
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
1.646
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
1.962
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
2.330
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
2.581
318.31
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.232
3.195
3.174
3.098
636.62
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
3.300
0.000
0.674
0.842
1.036
1.282
1.645
1.960
2.326
2.576
3.090
3.291
0%
50%
60%
70%
80%
90%
95%
Confidence Level
98%
99%
99.8%
99.9%
Lecture S3: Hypothesis testing t-distribution
0.001
36 / 47
One-sample t-test
Example (continued)
Two-sided test, significance level α = 0.05.
Properties of the one-sample t-test
tcrit = 2.262 (df = n − 1 = 9)
Decision rule:
I Reject H if t < −2.262 or t > 2.262
0
I
Compare one sample mean with a reference value
(a mean value that was determined earlier or beforehand)
Population standard deviation σ unknown
Do not reject H0 if −2.262 ≤ t < 2.262
Sample size n < 120
So? t = 1.11, so do not reject H0 .
Use t-distribution:
t=
X−µ
X−µ
√
=
sM
(s/ n)
Determine the correct value for degrees of freedom (df = n − 1)
Use Table B.2 to determine critical value
Lecture S3: Hypothesis testing t-distribution
37 / 47
One-sample t-test: income example
38 / 47
One-sample t-test: income example
Hypothesis: Your income is different from what the average
Dutch student, living away from home, needs?
Formulation of hypotheses:
I H : µ = 962
(null hypothesis)
0
Student Income
Question: Is your income different from what the average Dutch
student (living away from home) needs?
According to the National Institute for Family Finance Information
(NIBUD) the average Dutch student needs per month: e 962.
Lecture S3: Hypothesis testing One-sample t-test
Lecture S3: Hypothesis testing One-sample t-test
I
H1 : µ 6= 962
(alternative hypothesis)
Significance level: α = 0.05
39 / 47
Lecture S3: Hypothesis testing One-sample t-test
40 / 47
One-sample t-test: income example
One-sample t-test: income example
Your net income
Your income:
n
Income per month
Valid
49.00
Missing
3.00
Mean
833.02
Std. Deviation 462.90
So:
Your average income per month is e 833
This is e 129 less than the NIBUD average income for Dutch
students
√
Std.error Mean = Std.Deviation/ 49 ≈ 63.7
Question: Is this difference due to randomness or it is significant?
Lecture S3: Hypothesis testing One-sample t-test
Lecture S3: Hypothesis testing One-sample t-test
41 / 47
One-sample t-test: income example
42 / 47
One-sample t-test: Formula’s and output
X−µ
sM
df = n − 1
s
sM = √
n
t=
SPSS: Menu Analyze > Compare Means > One-sample T Test. . .
SPSS Output
So,
462.91
s = 462.91, sM = √
= 66.13
49
t=
833.02 − 962.00
= −1.95
66.13
tcrit ≈ 2.01, (df = 48), so H0 not rejected.
Lecture S3: Hypothesis testing One-sample t-test
43 / 47
Lecture S3: Hypothesis testing One-sample t-test
44 / 47
One-sample t-test in SPSS: More output
One-sample t-test: income example
Conclusion: (Also: the proper way of reporting the result)
Your average income (according to the questionnaire) is e 833 per
month
“Sig.” column: p-value or significance value of the test result
Probability to get the measured data or smaller under the null
hypothesis (or P(|t| > 1.950) under H0 ).
This is e 129 less than the NIBUD average income for Dutch
students, living away from home
Indication of how extreme the measured data is.
This difference is not significant:
t = −1.95 (df = 48), p = .057 (two-sided)
Rule
If p-value ≥ α, do not reject H0 ,
If p-value < α, reject H0 .
So? Do not reject H0 .
Lecture S3: Hypothesis testing One-sample t-test
45 / 47
Lessons learnt
What is inferential statistics?
The proper procedure for hypothesis testing
One-sample t-test:
I What it is
I
And how to use it
The role of the hypothesis in research
Lecture S3: Hypothesis testing One-sample t-test
47 / 47
Lecture S3: Hypothesis testing One-sample t-test
46 / 47
Related documents