Download Goodness of Fit Tests: Unknown Parameters

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Goodness of Fit Tests: Unknown Parameters
Mathematics 47: Lecture 33
Dan Sloughter
Furman University
May 8, 2006
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
1 / 12
Fitting a family of distributions
I
Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F .
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
2 / 12
Fitting a family of distributions
I
I
Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F .
Let F be a family of distributions depending on r parameters θ1 , θ2 ,
. . . , θr .
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
2 / 12
Fitting a family of distributions
I
I
I
Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F .
Let F be a family of distributions depending on r parameters θ1 , θ2 ,
. . . , θr .
Suppose we wish to test
H0 : F ∈ F
HA : F ∈
/ F.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
2 / 12
Fitting a family of distributions
I
I
I
Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F .
Let F be a family of distributions depending on r parameters θ1 , θ2 ,
. . . , θr .
Suppose we wish to test
H0 : F ∈ F
HA : F ∈
/ F.
I
We divide the range of the random variables into k disjoint cells and,
for i = 1, 2, . . . , k, let
pi = probability that an observation lies in the ith cell
and
Yi = number of observations in the ith cell.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
2 / 12
Fitting a family of distributions
I
I
I
Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F .
Let F be a family of distributions depending on r parameters θ1 , θ2 ,
. . . , θr .
Suppose we wish to test
H0 : F ∈ F
HA : F ∈
/ F.
I
We divide the range of the random variables into k disjoint cells and,
for i = 1, 2, . . . , k, let
pi = probability that an observation lies in the ith cell
and
Yi = number of observations in the ith cell.
I
Under the null hypothesis, the probabilities p1 , p2 , . . . , pk are
functions of the parameters θ1 , θ2 , . . . , θr .
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
2 / 12
Fitting a family of distributions (cont’d)
I
That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ).
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
3 / 12
Fitting a family of distributions (cont’d)
I
I
That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ).
For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi
and let
p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ).
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
3 / 12
Fitting a family of distributions (cont’d)
I
I
I
That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ).
For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi
and let
p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ).
It may be shown that, when H0 is true, both
−2 log(Λ) = 2
k
X
Yi log
i=1
and
Q=
Yi
np̂i
n
X
(Yi − np̂i )2
np̂i
i=1
are asymptotically χ2 (k − r − 1).
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
3 / 12
Fitting a family of distributions (cont’d)
I
I
I
That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ).
For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi
and let
p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ).
It may be shown that, when H0 is true, both
−2 log(Λ) = 2
k
X
Yi log
i=1
and
Q=
Yi
np̂i
n
X
(Yi − np̂i )2
np̂i
i=1
I
are asymptotically χ2 (k − r − 1).
Note: we lose one degree of freedom for every parameter that we
estimate from the data.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
3 / 12
Example
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
4 / 12
Example
I
During World War II it was asked whether German bomb strikes on
South London were random or not.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
4 / 12
Example
I
During World War II it was asked whether German bomb strikes on
South London were random or not.
I
To test this hypothesis, the city was divided into 576 regions, each
of a square kilometer in area.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
1
4
4 / 12
Example
I
During World War II it was asked whether German bomb strikes on
South London were random or not.
I
To test this hypothesis, the city was divided into 576 regions, each
of a square kilometer in area.
I
The data were as follows:
Number of Hits
0
1
2
3
4
5+
Total
Dan Sloughter (Furman University)
Frequency
229
211
93
35
7
1
576
1
4
Expected Frequency
227.53
211.34
98.15
30.39
7.06
1.54
576.01
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
4 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
5 / 12
Example (cont’d)
I
We want to test the hypotheses
H0 : Data are Poisson
HA : Data are not Poisson.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
5 / 12
Example (cont’d)
I
We want to test the hypotheses
H0 : Data are Poisson
HA : Data are not Poisson.
I
If λ is the mean of the hypothesized Poisson distribution, then the
maximum likelihood estimator of λ is
λ̂ =
0 · 229 + 1 · 211 + 2 · 93 + 3 · 35 + 4 · 7 + 5 · 1
535
=
= 0.9288.
576
576
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
5 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
6 / 12
Example (cont’d)
I
If pi (λ), i = 0, 1, 2, 3, 4, 5, is the probability of an outcome in the ith
cell, then
p0 (λ̂) = e −0.9288 = 0.3950,
p1 (λ̂) = 0.9288e −0.9288 = 0.3669,
(0.9288)2 e −0.9288
= 0.1704,
2
(0.9288)3 e −0.9288
= 0.0528,
p3 (λ̂) =
6
(0.9288)4 e −0.9288
p4 (λ̂) =
= 0.0122,
24
p5 (λ̂) = 1 − p0 (λ̂) − p1 (λ̂) − p2 (λ̂) − p3 (λ̂) − p4 (λ̂) = 0.0027.
p2 (λ̂) =
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
6 / 12
Example (cont’d)
I
If pi (λ), i = 0, 1, 2, 3, 4, 5, is the probability of an outcome in the ith
cell, then
p0 (λ̂) = e −0.9288 = 0.3950,
p1 (λ̂) = 0.9288e −0.9288 = 0.3669,
(0.9288)2 e −0.9288
= 0.1704,
2
(0.9288)3 e −0.9288
= 0.0528,
p3 (λ̂) =
6
(0.9288)4 e −0.9288
p4 (λ̂) =
= 0.0122,
24
p5 (λ̂) = 1 − p0 (λ̂) − p1 (λ̂) − p2 (λ̂) − p3 (λ̂) − p4 (λ̂) = 0.0027.
p2 (λ̂) =
I
Note: Since the range of a Poisson random variable is the set of
nonnegative integers, the last cell is really the set {5, 6, 7, . . .}.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
6 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example (cont’d)
I
Multiplying each of these probabilities by 576 yields the expected
frequencies shown in the table.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example (cont’d)
I
Multiplying each of these probabilities by 576 yields the expected
frequencies shown in the table.
I
Note: The fifth cell has an expected frequency of only 1.54; since this
is less than 5, we combine the fourth and fifth cells into a single cell
with observed frequency 8 and expected frequency 8.60.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example (cont’d)
I
Multiplying each of these probabilities by 576 yields the expected
frequencies shown in the table.
I
Note: The fifth cell has an expected frequency of only 1.54; since this
is less than 5, we combine the fourth and fifth cells into a single cell
with observed frequency 8 and expected frequency 8.60.
I
Plugging into the respective formulas, we now find that q = 1.021441
and −2 log(λ) = 0.9743838.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example (cont’d)
I
Multiplying each of these probabilities by 576 yields the expected
frequencies shown in the table.
I
Note: The fifth cell has an expected frequency of only 1.54; since this
is less than 5, we combine the fourth and fifth cells into a single cell
with observed frequency 8 and expected frequency 8.60.
I
Plugging into the respective formulas, we now find that q = 1.021441
and −2 log(λ) = 0.9743838.
I
If U is χ2 (3), the corresponding p-values are
P(U ≥ 1.021441) = 0.796064 and P(U ≥ 0.9743838) = 0.80745.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example (cont’d)
I
Multiplying each of these probabilities by 576 yields the expected
frequencies shown in the table.
I
Note: The fifth cell has an expected frequency of only 1.54; since this
is less than 5, we combine the fourth and fifth cells into a single cell
with observed frequency 8 and expected frequency 8.60.
I
Plugging into the respective formulas, we now find that q = 1.021441
and −2 log(λ) = 0.9743838.
I
If U is χ2 (3), the corresponding p-values are
P(U ≥ 1.021441) = 0.796064 and P(U ≥ 0.9743838) = 0.80745.
I
Hence we have no evidence to reject H0 , and it appears that the
Poisson model provides a good description of the observed data.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
7 / 12
Example
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
8 / 12
Example
I
A test of the lifetimes of 100 lightbulbs yielded the following data:
366.34
398.96
155.72
1210.49
593.00
2899.19
238.31
96.10
3435.11
1517.13
62.46
161.34
1218.80
509.95
228.57
150.63
2810.98
143.71
1173.72
3316.39
1651.60
1421.59
911.44
355.56
855.55
483.64
41.06
602.75
87.77
2375.15
Dan Sloughter (Furman University)
216.19
993.60
817.35
206.91
49.69
276.94
62.52
790.79
3838.57
185.74
328.92
590.45
425.45
512.72
1870.05
2226.60
1150.37
233.15
283.51
365.56
837.63
1283.83
215.17
1238.47
438.24
35.34
688.08
229.54
204.58
531.07
488.04
1535.54
2723.97
106.38
1995.29
1730.51
1656.36
1499.39
747.85
33.52
117.92
784.20
Goodness of Fit Tests: Unknown Parameters
2234.09
1140.28
698.01
4157.71
95.42
1763.62
965.04
480.00
289.00
2155.75
429.91
179.37
372.43
288.06
115.51
1701.23
1276.04
216.48
1036.12
2452.65
422.81
822.78
804.78
493.45
582.57
283.00
582.45
1370.94
May 8, 2006
8 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
9 / 12
Example (cont’d)
I
We wish to test the hypothesis that these data are from an
exponential distribution:
H0 : Data are exponential
HA : Data are not exponential.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
9 / 12
Example (cont’d)
I
We wish to test the hypothesis that these data are from an
exponential distribution:
H0 : Data are exponential
HA : Data are not exponential.
I
We group the data into cells of length 500, count the observed
frequencies of each cell, and compute the expected frequencies for
each cell.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
9 / 12
Example (cont’d)
I
We wish to test the hypothesis that these data are from an
exponential distribution:
H0 : Data are exponential
HA : Data are not exponential.
I
We group the data into cells of length 500, count the observed
frequencies of each cell, and compute the expected frequencies for
each cell.
I
Since x̄ = 914.29, we will compute expected frequencies using an
exponential distribution with mean 914.29.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
9 / 12
Example (cont’d)
I
We wish to test the hypothesis that these data are from an
exponential distribution:
H0 : Data are exponential
HA : Data are not exponential.
I
We group the data into cells of length 500, count the observed
frequencies of each cell, and compute the expected frequencies for
each cell.
I
Since x̄ = 914.29, we will compute expected frequencies using an
exponential distribution with mean 914.29.
I
That is, for example, the expected frequency for the first cell is
Z 500
500
x
1
100
e − 914.29 = 100 1 − e − 914.29 = 42.12.
914.29
0
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
9 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
10 / 12
Example (cont’d)
I
We then have the following table:
Interval
Observed frequency
Expected frequency
[0, 500]
(500, 1000]
(1000, 1500]
(1500, 2000]
(2000, 2500]
(2500, 3000]
(3000, 3500]
(3500, 4000]
(4000, ∞)
46
21
12
9
5
3
2
1
1
42.12
24.38
14.11
8.17
4.73
2.74
1.58
0.92
1.26
Total
100
100.01
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
10 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
11 / 12
Example (cont’d)
I
We will group the final four cells together because of their low
expected frequencies:
Interval
Observed frequency
Expected frequency
[0, 500]
(500, 1000]
(1000, 1500]
(1500, 2000]
(2000, 2500]
(2500, ∞)
46
21
12
9
5
7
42.12
24.38
14.11
8.17
4.73
6.50
Total
100
100.01
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
11 / 12
Example (cont’d)
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
12 / 12
Example (cont’d)
I
Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
12 / 12
Example (cont’d)
I
Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602.
I
If U is χ2 (4), then our p-values are P(U ≥ 1.279737) = 0.864804
and P(U ≥ 1.285602) = 0.8638135.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
12 / 12
Example (cont’d)
I
Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602.
I
If U is χ2 (4), then our p-values are P(U ≥ 1.279737) = 0.864804
and P(U ≥ 1.285602) = 0.8638135.
I
Hence we have no evidence to reject H0 , and it appears that the
exponential distribution provides a good description of the observed
data.
Dan Sloughter (Furman University)
Goodness of Fit Tests: Unknown Parameters
May 8, 2006
12 / 12
Related documents