Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Goodness of Fit Tests: Unknown Parameters Mathematics 47: Lecture 33 Dan Sloughter Furman University May 8, 2006 Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 1 / 12 Fitting a family of distributions I Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F . Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 2 / 12 Fitting a family of distributions I I Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F . Let F be a family of distributions depending on r parameters θ1 , θ2 , . . . , θr . Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 2 / 12 Fitting a family of distributions I I I Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F . Let F be a family of distributions depending on r parameters θ1 , θ2 , . . . , θr . Suppose we wish to test H0 : F ∈ F HA : F ∈ / F. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 2 / 12 Fitting a family of distributions I I I Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F . Let F be a family of distributions depending on r parameters θ1 , θ2 , . . . , θr . Suppose we wish to test H0 : F ∈ F HA : F ∈ / F. I We divide the range of the random variables into k disjoint cells and, for i = 1, 2, . . . , k, let pi = probability that an observation lies in the ith cell and Yi = number of observations in the ith cell. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 2 / 12 Fitting a family of distributions I I I Suppose X1 , X2 , . . . , Xn is a random sample from a distribution F . Let F be a family of distributions depending on r parameters θ1 , θ2 , . . . , θr . Suppose we wish to test H0 : F ∈ F HA : F ∈ / F. I We divide the range of the random variables into k disjoint cells and, for i = 1, 2, . . . , k, let pi = probability that an observation lies in the ith cell and Yi = number of observations in the ith cell. I Under the null hypothesis, the probabilities p1 , p2 , . . . , pk are functions of the parameters θ1 , θ2 , . . . , θr . Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 2 / 12 Fitting a family of distributions (cont’d) I That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ). Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 3 / 12 Fitting a family of distributions (cont’d) I I That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ). For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi and let p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ). Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 3 / 12 Fitting a family of distributions (cont’d) I I I That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ). For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi and let p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ). It may be shown that, when H0 is true, both −2 log(Λ) = 2 k X Yi log i=1 and Q= Yi np̂i n X (Yi − np̂i )2 np̂i i=1 are asymptotically χ2 (k − r − 1). Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 3 / 12 Fitting a family of distributions (cont’d) I I I That is, for i = 1, 2, . . . , k, we may write pi = pi (θ1 , θ2 , . . . , θr ). For i = 1, 2, . . . , r , let θ̂i be the maximum likelihood estimator for θi and let p̂i = pi (θ̂1 , θ̂2 , . . . , θ̂r ). It may be shown that, when H0 is true, both −2 log(Λ) = 2 k X Yi log i=1 and Q= Yi np̂i n X (Yi − np̂i )2 np̂i i=1 I are asymptotically χ2 (k − r − 1). Note: we lose one degree of freedom for every parameter that we estimate from the data. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 3 / 12 Example Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 4 / 12 Example I During World War II it was asked whether German bomb strikes on South London were random or not. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 4 / 12 Example I During World War II it was asked whether German bomb strikes on South London were random or not. I To test this hypothesis, the city was divided into 576 regions, each of a square kilometer in area. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 1 4 4 / 12 Example I During World War II it was asked whether German bomb strikes on South London were random or not. I To test this hypothesis, the city was divided into 576 regions, each of a square kilometer in area. I The data were as follows: Number of Hits 0 1 2 3 4 5+ Total Dan Sloughter (Furman University) Frequency 229 211 93 35 7 1 576 1 4 Expected Frequency 227.53 211.34 98.15 30.39 7.06 1.54 576.01 Goodness of Fit Tests: Unknown Parameters May 8, 2006 4 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 5 / 12 Example (cont’d) I We want to test the hypotheses H0 : Data are Poisson HA : Data are not Poisson. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 5 / 12 Example (cont’d) I We want to test the hypotheses H0 : Data are Poisson HA : Data are not Poisson. I If λ is the mean of the hypothesized Poisson distribution, then the maximum likelihood estimator of λ is λ̂ = 0 · 229 + 1 · 211 + 2 · 93 + 3 · 35 + 4 · 7 + 5 · 1 535 = = 0.9288. 576 576 Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 5 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 6 / 12 Example (cont’d) I If pi (λ), i = 0, 1, 2, 3, 4, 5, is the probability of an outcome in the ith cell, then p0 (λ̂) = e −0.9288 = 0.3950, p1 (λ̂) = 0.9288e −0.9288 = 0.3669, (0.9288)2 e −0.9288 = 0.1704, 2 (0.9288)3 e −0.9288 = 0.0528, p3 (λ̂) = 6 (0.9288)4 e −0.9288 p4 (λ̂) = = 0.0122, 24 p5 (λ̂) = 1 − p0 (λ̂) − p1 (λ̂) − p2 (λ̂) − p3 (λ̂) − p4 (λ̂) = 0.0027. p2 (λ̂) = Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 6 / 12 Example (cont’d) I If pi (λ), i = 0, 1, 2, 3, 4, 5, is the probability of an outcome in the ith cell, then p0 (λ̂) = e −0.9288 = 0.3950, p1 (λ̂) = 0.9288e −0.9288 = 0.3669, (0.9288)2 e −0.9288 = 0.1704, 2 (0.9288)3 e −0.9288 = 0.0528, p3 (λ̂) = 6 (0.9288)4 e −0.9288 p4 (λ̂) = = 0.0122, 24 p5 (λ̂) = 1 − p0 (λ̂) − p1 (λ̂) − p2 (λ̂) − p3 (λ̂) − p4 (λ̂) = 0.0027. p2 (λ̂) = I Note: Since the range of a Poisson random variable is the set of nonnegative integers, the last cell is really the set {5, 6, 7, . . .}. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 6 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example (cont’d) I Multiplying each of these probabilities by 576 yields the expected frequencies shown in the table. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example (cont’d) I Multiplying each of these probabilities by 576 yields the expected frequencies shown in the table. I Note: The fifth cell has an expected frequency of only 1.54; since this is less than 5, we combine the fourth and fifth cells into a single cell with observed frequency 8 and expected frequency 8.60. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example (cont’d) I Multiplying each of these probabilities by 576 yields the expected frequencies shown in the table. I Note: The fifth cell has an expected frequency of only 1.54; since this is less than 5, we combine the fourth and fifth cells into a single cell with observed frequency 8 and expected frequency 8.60. I Plugging into the respective formulas, we now find that q = 1.021441 and −2 log(λ) = 0.9743838. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example (cont’d) I Multiplying each of these probabilities by 576 yields the expected frequencies shown in the table. I Note: The fifth cell has an expected frequency of only 1.54; since this is less than 5, we combine the fourth and fifth cells into a single cell with observed frequency 8 and expected frequency 8.60. I Plugging into the respective formulas, we now find that q = 1.021441 and −2 log(λ) = 0.9743838. I If U is χ2 (3), the corresponding p-values are P(U ≥ 1.021441) = 0.796064 and P(U ≥ 0.9743838) = 0.80745. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example (cont’d) I Multiplying each of these probabilities by 576 yields the expected frequencies shown in the table. I Note: The fifth cell has an expected frequency of only 1.54; since this is less than 5, we combine the fourth and fifth cells into a single cell with observed frequency 8 and expected frequency 8.60. I Plugging into the respective formulas, we now find that q = 1.021441 and −2 log(λ) = 0.9743838. I If U is χ2 (3), the corresponding p-values are P(U ≥ 1.021441) = 0.796064 and P(U ≥ 0.9743838) = 0.80745. I Hence we have no evidence to reject H0 , and it appears that the Poisson model provides a good description of the observed data. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 7 / 12 Example Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 8 / 12 Example I A test of the lifetimes of 100 lightbulbs yielded the following data: 366.34 398.96 155.72 1210.49 593.00 2899.19 238.31 96.10 3435.11 1517.13 62.46 161.34 1218.80 509.95 228.57 150.63 2810.98 143.71 1173.72 3316.39 1651.60 1421.59 911.44 355.56 855.55 483.64 41.06 602.75 87.77 2375.15 Dan Sloughter (Furman University) 216.19 993.60 817.35 206.91 49.69 276.94 62.52 790.79 3838.57 185.74 328.92 590.45 425.45 512.72 1870.05 2226.60 1150.37 233.15 283.51 365.56 837.63 1283.83 215.17 1238.47 438.24 35.34 688.08 229.54 204.58 531.07 488.04 1535.54 2723.97 106.38 1995.29 1730.51 1656.36 1499.39 747.85 33.52 117.92 784.20 Goodness of Fit Tests: Unknown Parameters 2234.09 1140.28 698.01 4157.71 95.42 1763.62 965.04 480.00 289.00 2155.75 429.91 179.37 372.43 288.06 115.51 1701.23 1276.04 216.48 1036.12 2452.65 422.81 822.78 804.78 493.45 582.57 283.00 582.45 1370.94 May 8, 2006 8 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 9 / 12 Example (cont’d) I We wish to test the hypothesis that these data are from an exponential distribution: H0 : Data are exponential HA : Data are not exponential. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 9 / 12 Example (cont’d) I We wish to test the hypothesis that these data are from an exponential distribution: H0 : Data are exponential HA : Data are not exponential. I We group the data into cells of length 500, count the observed frequencies of each cell, and compute the expected frequencies for each cell. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 9 / 12 Example (cont’d) I We wish to test the hypothesis that these data are from an exponential distribution: H0 : Data are exponential HA : Data are not exponential. I We group the data into cells of length 500, count the observed frequencies of each cell, and compute the expected frequencies for each cell. I Since x̄ = 914.29, we will compute expected frequencies using an exponential distribution with mean 914.29. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 9 / 12 Example (cont’d) I We wish to test the hypothesis that these data are from an exponential distribution: H0 : Data are exponential HA : Data are not exponential. I We group the data into cells of length 500, count the observed frequencies of each cell, and compute the expected frequencies for each cell. I Since x̄ = 914.29, we will compute expected frequencies using an exponential distribution with mean 914.29. I That is, for example, the expected frequency for the first cell is Z 500 500 x 1 100 e − 914.29 = 100 1 − e − 914.29 = 42.12. 914.29 0 Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 9 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 10 / 12 Example (cont’d) I We then have the following table: Interval Observed frequency Expected frequency [0, 500] (500, 1000] (1000, 1500] (1500, 2000] (2000, 2500] (2500, 3000] (3000, 3500] (3500, 4000] (4000, ∞) 46 21 12 9 5 3 2 1 1 42.12 24.38 14.11 8.17 4.73 2.74 1.58 0.92 1.26 Total 100 100.01 Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 10 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 11 / 12 Example (cont’d) I We will group the final four cells together because of their low expected frequencies: Interval Observed frequency Expected frequency [0, 500] (500, 1000] (1000, 1500] (1500, 2000] (2000, 2500] (2500, ∞) 46 21 12 9 5 7 42.12 24.38 14.11 8.17 4.73 6.50 Total 100 100.01 Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 11 / 12 Example (cont’d) Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 12 / 12 Example (cont’d) I Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 12 / 12 Example (cont’d) I Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602. I If U is χ2 (4), then our p-values are P(U ≥ 1.279737) = 0.864804 and P(U ≥ 1.285602) = 0.8638135. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 12 / 12 Example (cont’d) I Our test statistics are q = 1.279737 and −2 log(λ) = 1.285602. I If U is χ2 (4), then our p-values are P(U ≥ 1.279737) = 0.864804 and P(U ≥ 1.285602) = 0.8638135. I Hence we have no evidence to reject H0 , and it appears that the exponential distribution provides a good description of the observed data. Dan Sloughter (Furman University) Goodness of Fit Tests: Unknown Parameters May 8, 2006 12 / 12