Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Retrospective Theses and Dissertations 1985 Goodness-of-fit statistics for location-scale distributions Fah Fatt Gan Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Gan, Fah Fatt, "Goodness-of-fit statistics for location-scale distributions " (1985). Retrospective Theses and Dissertations. Paper 12065. This Dissertation is brought to you for free and open access by Digital Repository @ Iowa State University. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Digital Repository @ Iowa State University. For more information, please contact digirep@iastate.edu. INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1.The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting througli an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duphcate copy, or copyrighted materials that should not have been filmed. For blurred pages, a good image of the page can be found in the adjacent frame. If copyrighted materials were deleted, a target note will appear listing the pages in the adjacent frame. 3. When a map, drawing or chart, etc., is part of the material being photographed, a definite method of "sectioning" the material has been followed. It is customary to begin filming at the upper left hand comer of a large sheet and to continue from left to right in equal sections with small overlaps. If necessary, sectioning is continued again—beginning below the first row and continuing on until complete. 4. For illustrations that cannot be satisfactorily reproduced by xerographic means, photographic prints can be purchased at additional cost and inserted into your xerographic copy. These prints are available upon request from the Dissertations Customer Services Department. 5. Some pages in any document may have indistinct print. In all cases the best available copy has been filmed. Universl^ MicrxSilms International 300 N. Zeeb Road Ann Arbor, Ml 48106 8524655 Gan, Fah Fatt GOODNESS-OF-FIT STATISTICS FOR LOCATION-SCALE DISTRIBUTIONS Iowa State University University Microfilms Int©rnSti0n& 1 PH.D. 1985 300 N. Zeeb Road, Ann Arbor, Ml 48106 Copyright 1985 by Gan, Fah Fatt All Rights Reserved PLEASE NOTE: In all cases this material has been filmed in the best possible way from the available copy. Problems encountered with this document have been identified here with a check mark -/ . 1. Glossy photographs or pages 2. Colored illustrations, paper or print 3. Photographs with dark background 4. Illustrations are poor copy 5. Pages with black marks, not original copy 6. Print shows through as there is text on both sides of page 7. Indistinct, broken or small print on several pages 8. Print exceeds margin requirements 9. Tightly bound copy with print lost in spine 10. Computer printout pages with indistinct print 11. Page(s) author. lacking when material received, and not available from school or 12. Page(s) seem to be missing in numbering only as text follows. 13. Two pages numbered 14. Curling and wrinkled pages 15. Dissertation contains pages with print at a slant, filmed as received 16. Other . Text follows. University Microfilms International Goodness-of-fit statistics for location-scale distributions by Fah Fatt Gan A Dissertation Submitted to the Graduate Faculty in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major: Statistics Approved: Signature was redacted for privacy. In Charge of Major Work Signature was redacted for privacy. For the Major Department Signature was redacted for privacy. For the Graduate College Iowa State University Ames, Iowa 1985 Copyright (C) Fah Fatt Gan, 1985. All rights reserved. ii TABLE OF CONTENTS Page I. II. III. IV. V. VI, INTRODUCTION 1 GOODNESS-OF-FIT STATISTICS 6 A. Correlation Statistics 6 B. Chi-square and Likelihood Ratio Statistics 18 C. Statistics Based on the Empirical Distribution Function 48 D. Statistics Based on Moments 57 PROBABILITY PLOTS AND DISTRIBUTION CURVES 55 EMPIRICAL POWER COMPARISON 110 A. Methods of Computation 110 B. Results of the Power Comparison 121 PERCENTILES OF THE r^ AND STATISTICS 187 SUMMARY AND RECOMMENDATIONS 209 REFERENCES 229 ACKNOWLEDGEMENTS 241 IX, APPENDIX A. PARAMETRIC FAMILIES OF DISTRIBUTIONS 242 X. APPENDIX B. RANDOM VARIATES GENERATORS 246 XI. APPENDIX C. COMPUTER PROGRAMS 250 VII. VIII. 1 I. INTRODUCTION The function of statistics is to extract and explicate the informational content of a set of data. Fitting a probability model to a data set is often a useful step in this endeavor. For example, in survival time analysis, a probability model enables one to make statements about the probabilities that individuals will survive specified time intervals. Many statistical procedures are based on certain probability model assumptions. Assessing the fit of a proposed model to a data set is a necessary preliminary measure, leading perhaps to a transformation of the data or to an alternate statistical procedure. The importance of assessing the fit of a probability model was highlighted by the inclusion of the Karl Pearson's development of the chi-square test in the list of the twenty most significant discoveries of the current century presented by SCIENCE 84 (1984). The chi-square test was quoted in SCIENCE 84 as "a tiny event by itself but it was the signal for a sweeping transformation in the ways we interpret our numerical world" also "Karl Pearson's chi-square test measured the fit between theory and reality, ushering in a new sort of decision making". The traditional limiting chi-square theory for the distribution of the Pearson chi-square statistic keeps the number of cells fixed as the sample size is increased. A non-traditional large sample theory for the chi-square statistic is examined in this dissertation, where the number of cells is allowed to increase at a certain rate as the sample size increases. In the later case, the asymptotic distribution of the 2 chi-square statistic may not be a chi-square distribution. In fact, for the case of testing simple null hypotheses, Hoist (1972) and Morris (1975) showed that under certain regularity conditions the goodness-of-fit statistic has a large sample normal distribution. The accuracy of the large sample normal and chi-square approximations for the chi-square and likelihood ratio statistics was investigated by Koehler and Larntz (1980). In this dissertation, attention is focused on tests of composite null hypotheses. For example, one might be interested in determining if the observed data were sampled from a normal distribution. An assessment could be made by partitioning the real line into a certain number of cells and comparing observed counts in the cells to estimates of expected counts from the hypothesized probability model. Attention is restricted to the case where the cells are of equal probability since Hoist (1972) showed that this partitioning has a certain optimum power property. Also, Gumbel (1943) pointed out that different conclusions can be reached by using cells with unequal probabilities. In testing composite null hypotheses, the unknown parameters must be estimated and then used to approximate a partitioning with equal probabilities. The asymptotic theory of the chi-square or the likelihood ratio statistic under this non-traditional setup for testing composite null hypotheses has not been established in the literature, but some results are given in the next chapter. The question of how finely the interval should be partitioned for various sample sizes will also be investigated. There are many other methods available for assessing the goodness 3 of fit of probability models to a data set. One interesting test statistic is the Pearson correlation coefficient of points on a normal Q-Q (quantile versus quantile) probability plot. This statistic provides a measure of the linearity of a normal Q-Q probability plot. If the normal probability model provides a good fit to the data set, an approximate straight line is obtained and the correlation coefficient will be close to one, the normal probability model will not be rejected. On the other hand, one will reject the null hypotheses of normality for a small value of the correlation coefficient because this indicates the non-linearity of the probability plot. The Q-Q probability plot is very popular among statisticians and engineers. tools in statistical quality control. It is one of the important The popularity of the Q-Q probability plot can be largely attributed to the linear invariance property it possesses. In general, the P-P (percent versus percent) probability plot is not linear invariant. However, if the observations are standardized, the P-P probability plot can be shown to possess the linear invariance property. A new statistic based on the Pearson correlation coefficient of points on a P-P probability plot is proposed for assessing the goodness of fit of probability models to a data set. The logic behind this statistic is similar to that of the Shapiro-Wilk statistic. It is a measure of linearity of the probability plot which provides a measure of the goodness of fit of the normal probability model to the data set. Since the Q-Q probability plot places more emphasis on the tails of the distribution than the P-P probability plot, one would expect that a correlation coefficient based on a Q-Q 4 probability plot would be more likely to detect long or heavy tailed departures from the hypothesized distribution. The Pearson correlation coefficient based on a P-P probability plot may be more sensitive to discrepancies near the center of the hypothesized distribution. A new qualitative method based on distribution curves on a P-P probability plot is developed for assessing the alternatives to the hypothesized probability model for a data set. One advantage of this technique is that it is not limited to location-scale distributions. The relative power of these goodness of fit statistics to detect various alternative distributions is of major interest. An extensive Monte Carlo power comparison was performed to assess the power of the chi-square and likelihood ratio statistics, correlation coefficient statistics, statistics based on the empirical distribution function, and statistics based on sample moments. The power comparison also provides some information about how finely to partition the support of the hypothesized distribution for the chi-square and likelihood ratio statistics so as to achieve nearly optimum power. The extensive power comparison is performed for the normal, Gumbel and exponential distributions. The exponential and Weibull distributions are used frequently in modeling the survival time or reliability of certain individuals or components. An interesting relationship between the Gumbel and the Weibull distributions is that a logarithmic transformation of the Weibull random variable produces a Gumbel random variable. The distributions of the statistics based on the Pearson 5 correlation coefficients are mathematically difficult to tract for finite samples. Consequently, the empirical percentiles of these statistics are simulated and smoothed. Curves are fitted through smoothed Monte Carlo percentiles to obtain formulas for the true percentiles as functions of the sample size. Based on the results of these extensive power comparisons, some recommendations will be made concerning the use of these statistics for assessing the fit of probability models to a data set. 6 II. GOODNESS-OF-FIT STATISTICS A. Correlation Statistics Tests of fit based on correlation coefficients are reviewed in this section. Particular attention has been given to the empirical methods of generating the percentiles of the statistics. This provides information for deciding how to generate the empirical percentiles of the statistics based on the Pearson correlation coefficient for points on a P-P or a Q-Q probability plot. Shapiro and Wilk (1965) devised a statistic to test the linearity of the probability plot of the ordered observations against the expected values of the order statistics from a standardized version of the hypothesized distribution. This statistic compares two estimates of the variance of a normal distribution. One is the square of the generalized least-squares estimate of the slope and the other is based on the second sample moment about the sample mean. Let X = (X^...,X^) be an ordered vector of n observations. a = .,ct^) and Let be the mean vector and the covariance matrix, respectively, of the order statistics of a random sample of n observations from a standard normal distribution. The Shapiro-Wilk statistic can be written as W = (a'fi ^X)2/(a'0 % ï(Xj - X)2 ^a) , (2.1) where X = (}X^)/n and note that the numerator, (a'S2~^x)2/(a'n~^n~''^) is the best linear unbiased estimator of the scale parameter S of the 7 normal distribution with density function f(x) = (x-g)^ 2g: e < a < " g > 0 /2it3^ -co < X < ® (2.2) . The Shapiro-Wilk statistic can also be written as (I OiXi): W = (2.3) %o;.I(Xi - X): where (c^.c^,...,g^)' = a'n -1 Shapiro and Wilk (1965) also presented this statistic in the form h 1=1 noting that c^ = n in n-i+1 (2.4) 1=1 and h denotes n/2 or (n-1)/2 according to whether n is even or odd. The Shapiro-Wilk statistic is location-scale invariant and statistically independent of X and S, the maximum likelihood estimates of the mean and the standard deviation of a normal distribution, respectively. The distribution of the Shapiro-Wilk statistic depends on the sample size and the hypothesized distribution only. The exact distribution of the Shapiro-Wilk statistic has not been derived for finite sample sizes except for sample sizes of 3 or 4 for which explicit results have been derived by Shapiro and Wilk (1965) and Shapiro (1954). The percentiles of the Shapiro-Wilk statistic for sample sizes 8 n=3(1)50 [that is 3 to 50 with increment 1] were obtained by Shapiro and Wilk (1965) using a Monte Carlo method. Five thousand statistics were computed for n=3(1)20 and 100000/n statistics were computed for n=21(1)50. The justification of the choice of the number of statistics in the Monte Carlo study was provided by comparing the theoretical one-half moment (E /W) and the first moment of the sample with the corresponding empirical moments of W for n=3(l)20. The Johnson bounded system of curves (Johnson, 19^9) was used for smoothing the empirical distribution of W. Normal random variates were obtained from the Rand Tables (Rand Corporation, 1955). Shapiro and Wilk (1965) provided the necessary constants and a table of lower and upper 0.01, 0.02, 0.05 and 0.10 percentiles and the median of W for n=3(1)50 under the null hypothesis of normality. These constants and the table of lower percentiles can also be found in Shapiro and Brain (1982). Small values of W indicate significant departure from normality. Extensive Monte Carlo experiments performed by Shapiro, Wilk and Chen (1968), Pearson, D'Agostino and Bowman (1977) showed that the Shapiro-Wilk test of normality has good power against a wide range of alternative distributions. Since the development of the Shapiro-Wilk statistic, almost every new goodness-of-fit statistic proposed in the literature is compared to the Shapiro-Wilk statistic in empirical power studies. The Shapiro-Wilk statistic has become a standard statistic for the testing of normality. Any statistic that is almost as powerful as the Shapiro-Wilk statistic for a wide range of alternative distributions is considered to be an excellent statistic. In an attempt to summarize the appearance of nonlinearity in normal probability plots, LaBrecque (1977) developed three modifications of the Shapiro-Wilk statistic which assess the amount of certain types of curvature in the normal Q-Q probability plot. The Shapiro-Wilk statistic for the exponential distribution was presented by Shapiro and Wilk (1972) as W = Cl'fi~^(1 a' - a 1')nX]: / [I'n'^l a'Q~^a - (l'Q~^a)^] . I(Xi - X)z (2.5) The above formula can be written neatly as W = n(X - X ) : / ( n - 1 ) : KX^ - X): , (2.6) and note that the numerator, n(X - X^)^/(n-1) is the best linear unbiased estimator of the scale parameter g of the exponential distribution with density function X - a f(x) = -^ e ^ -to <^ a < 00 6 > 0 X > a . (2.7) The empirical null distribution of the Shapiro-Wilk statistic for the exponential distribution was obtained by Monte Carlo sampling. thousand samples of W were generated for sample sizes n = 3(1)50 and [250000/n] samples were used for n = 51(1)100. The empirical Five 10 percentiles were plotted against the sample size n and smoothed by hand to obtain a table of approximate percentiles for the Shapiro-Wilk statistic. A table of smoothed percentiles for significance levels 0.005, 0.01, 0.025, 0.10, 0.5, 0.9, 0.95, 0.975, 0.99 and 0.995 and sample sizes n = 3(1)100 can be found in Shapiro and Wilk (1972) or Shapiro and Brain (1982). Since the Shapiro-Wilk statistic for the exponential distribution responds to the nonexponentiality by shifting either to smaller or larger values, Shapiro and Wilk suggested that this test statistic must be two-tailed. Shapiro and Francia (1972) modified W so that it can be used for large samples where the covariance matrix 0 is unknown. for large samples, the They argue that values may be treated as if they are independent and hence the variance-covariance matrix can be replaced by the identity matrix I. The Shapiro-Francia statistic is given by W' = (a'X)(a'a) = , (2.8) I(Xi - X): [% (a. - a)(X - X)]2 ^ = I(a^ - a)^'I(X^ - X)2 or W' = , since = 0 for the normal distribution. (2.9) Noting that , the Shapiro-Francia statistic can also be written as h n °ln(Xn-i+1 - Xi]=/I(Xi - X): ' i=1 in n J. I 1 1 (2-10) where h denotes n/2 or (n-1)/2 according to whether n is odd or even. 11 Percentiles of the Shaplro-Francia statistic for sample sizes n = 50(1)99 can be found in Shapiro (1980) or Shapiro and Brain (1982). The percentiles of the distributions of W and W were found to be very similar by Weisberg (1974). Weisberg pointed out that the use of the tabulated percentiles of W for the distribution of W will result in only a small loss of accuracy, often giving a slightly conservative test. The Shapiro-Wilk and the Shapiro-Francia statistics were shown by Sarkadi (1975) to have the same asymptotic distribution under the null hypothesis of normality. Sarkadi also showed that both the Shapiro-Wilk and Shapiro-Francia statistics provide consistent tests of fit. In fact, the consistency of the test statistics holds if any distribution with a finite variance replaces the normal distribution as the null hypothesis. Weisberg and Bingham (1975) modified the Shapiro-Francia statistic slightly by replacing the expected values of the order statistic by a simple approximation m = (m^.,m^). This modified statistic is given by ^ (m'X)^/(m'm) W = : ICX; - X): or * W = , (2.11) [I(m. - m)(X. - X)]^ ^ — . I(nL - m):.%(X^ - X): (2.12) where m^ = $ ^[(i-0.375)/(n+0.25)] and $ ^(.) is the inverse standard normal distribution function. Note that (i-0.375)/(n+0.25) is the 12 plotting position suggested by Blom (1958). They showed empirically * that the distribution function of W and W are essentially identical. * The advantage of the statistic W over W and W is that no storage of constants is required if a routine for the inverse of the standard normal distribution function is available, as is common on most computer systems. An algorithm for computing the inverse standard normal distribution function $ («) is Algorithm AS111 developed by Beasley and Springer (1977). Royston (1982a) obtained an approximate normalizing transformation for the Shapiro-Wilk statistic using extensive Monte Carlo simulations for sample sizes n = 7 to 2000. The exact covariance matrix was not used in computing the Shapiro-Wilk statistic. due to Shapiro and Wilk (1965) was used. Instead, an approximation Six thousand Shapiro-Wilk statistics for each sample size n = 7(1)30(5)100, 125, 150, 150, 200(100)600, 750, 1000, 1250 and 2000 were simulated. Royston (1982b) wrote two FORTRAN programs that compute the expected values of the normal order statistics in exact or approximate form. The expected values of the normal order statistics are based on a formula suggested by Blom (1958, pp. 69-71). An approximation for the coefficients used in the Shapiro-Wilk statistic which does require be computed. to be known can thus The practical significance of these two papers is that the W test of normality can now be programmed on a computer for sample sizes up to 2000 without storing tables of percentiles and coefficients. The probability plot correlation coefficient test r was presented by Filliben (1975). It is essentially the Pearson correlation 13 coefficient between the ordered observations and their medians. The Filliben statistic measures the linearity of a normal Q-Q probability plot. It rejects the null hypothesis of normality for small values of r since small values indicate non-linearity of the normal Q-Q probability plot. The formula for r is r = where I(X. - X)(M. - M) ^ — , /I(X. - M): (2.13) is the median of the i^^ order statistic from the standard normal distribution. The median of the i^^ order statistic from a standard normal distribution is exactly related to the median of the i^^ order statistic —1 from a uniform distribution on [0,1] by -] where $ («) is the inverse normal distribution function. = $ (pu) The approximate median of the iorder statistic from the uniform [0,1] distribution is given by p. = 1 - Pn , i = 1 (i - 0.3175)/(n + 0.365) , i = 2,3 , i = n . n -1 (2.14) The table of smoothed Monte Carlo percentiles for 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99 and 0.995 levels can be found in Filliben (1975). This table was based on extensive Monte Carlo simulations for sample size n=3(1)50(5)100. 100000 samples were generated for n g 10 and 100000/n samples were generated for n > 10. Normal random variates from the Rand Tables (Rand 14 Corporation, 1955) were used. Checks on the accuracy of the empirical percentiles were provided by comparing the empirical mean and standard deviation with the theoretical mean and standard deviation of r for each sample size, and by comparing empirical and theoretical percentiles of r for n = 3. The Monte Carlo power comparison presented by Filliben (1975) indicated that the Filliben statistic and the Shapiro-Francia statistic have similar power properties. Looney and Gulledge (1985) investigated the power of various versions of the Shapiro-Francia statistic corresponding to different approximations of the mean vector a of the ordered statistics from the normal distribution. They concluded that the Francia-Shapiro statistic using Blom's formula as suggested by Weisberg and Bingham (1975), has slightly better power over the other versions of the Shapiro-Francia statistic. Filliben called r the probability plot correlation coefficient but the plotting positions suggested by Filliben are seldom used by practitioners. Motivated by the better power of the Weisberg-Bingham statistic (Shapiro-Francia statistic using Blom's formula) and the increasing acceptance of the Blom's plotting positions, Looney and Gulledge (1985) generated a table of percentiles for n=3(1)50(5)100 for the statistic at the 0.005, 0.01, 0.025, 0.5. 0.1, 0.25, 0.5 0.75, 0.9, 0.95, 0.975 ,0.99 and 0.995 levels. Twelve sets of 10000 samples were generated for each of the sample sizes n=3(1)50(5)100. The percentiles were smoothed by taking the average over all 12 samples for a particular sample size. The uniform random numbers were generated by the algorithm developed by Wichmann and Hill 15 (1982a) and the normal random numbers were generated by the GRAND generator (Brent, 1974). Looney and Gulledge duplicated some of the work by Weisberg and Bingham (1975) since Weisberg and Bingham showed * the distributions of W and the Shapiro-Francia statistics are nearly the same and the table of percentiles for the Shapiro-Francia statistic has already been tabulated. In much the same spirit as the Shapiro-Wilk and the Shapiro-Francia statistics, which attempt to measure the linearity of a normal Q-Q probability plot, a statistic based on the Pearson correlation coefficient of points on a P-P probability plot is proposed. This statistic is given as k^ = where [I(z. - z)(p - p)]= —: :— I(z. - z)='I(p. - p)= , (2.15) = F[(x^^j - a)/B ], the p^'s are plotting positions, and a and g are the location and scale parameters of the distribution function F('). The estimators a and g are taken to be the maximum likelihood estimators unless otherwise specified. The maximum likelihood estimators for the normal distribution are given by n ot = ilx. )/n , i=r (2.16) n B = [%(x i=1 (2.17) and - a):]/n . 16 The unbiased estimator for g, ng/(n-1) is used instead of g, and shall be denoted by g from here on. The maximum likelihood estimators for the exponential distribution are given by a = min x^, (2,18) and n g = (lx.)/n - min x. i=l . (2.19) The distribution function of the Gumbel distribution is given as F(x) = exp[-exp[(x-a)/g] , (2.20) and the maximum likelihood estimators a and g can be obtained by solving the maximum likelihood equations: n I exp[-(x. - a)/g] = n , i=1 (2.21) n I (x. - a){1 - exp[-(x i=1 (2.22) - a)/g]} = ng . Combining the two likelihood equations, an equation involving only g can be written as g = (Ix^)/n - [)]x^exp(-x^/g)]Clexp(-x^/g)] ^ . (2.23) The maximum likelihood estimator g can thus be computed easily using the bisection method or Newton's method. 17 A question then exists as to which plotting position to use. The plotting position p^ = i/(n+1) is chosen because of the theoretical property: E{F[(X^i) - ct)/6]} = i/(n+1) . (2.24) Obviously, one can perform a Monte Carlo power study to investigate if there is any difference between various plotting positions. The results of Looney and Gulledge (1985) indicated that differences are small and appear only for small sample sizes. Barnett (1975) showed that the choice of plotting positions can make a difference when the object is precise estimation of a and g. For most practical purposes, it does not matter which plotting position to use. It is of interest to see how this new statistic performs relative to its counterpart, r^ based on the Q-Q probability plot. Thus, the r^ statistic, using the plotting position p^ = i/(n+1) is also studied. The formula for the r^ statistic is LliX. - X)(M - M)]: r== = , (2.25) I(X. - X):.I(M. - M)2 where = F[i/(n+1)] and F is either the distribution function of the standard normal, Gumbel or exponential distribution in this study. A similar test for normality presented in Johnson and Wichern (1982, pp. 155,156) is a Shapiro-Francia test based on the plotting position p^ = (i - 0.5)/n. 18 B. Chi-square and Likelihood Ratio Statistics The chi-square statistic was first introduced by Karl Pearson (1900). The simplicity of the chi-square statistic and the intuitively sound logic behind it have made the chi-square statistic one of the most widely used tools in statistics. Since its introduction in 1900, the chi-square statistic has generated a tremendous amount of interest in the problem of assessing the fit of probability models to sets of observed data. The fascinating idea of measuring the "goodness of fit" of a distribution to a data set using the squares of the differences between observed and expected counts was a great catalyst to the development of many statistical concepts including tests of hypotheses. The Pearson chi-square statistic can be written as k .% (0 i= 1 - E )VE , (2.25) and the likelihood ratio statistic is given by k 0= = 2 I 0 log(0./E.), i=1 where 0^ and (2.27) are the observed and expected cell counts, respectively, and k is the number of cells. A great deal of research had been done on the chi-square statistic. Extensive lists of references can be found in Cochran (1952), Lancaster (1969) and Hutchinson (1979). This can be atttributed to the flexibility in the use of the chi-square statistic. There are many 19 issues involved in use of the Pearson chi-square statistic to test the goodness of fit of a distribution to a data set. Some of these issues are listed below. [1] How many cells or intervals should be formed? [2] How should the cells be formed? [3] Should the cells be random or predetermined? [4] What are the consequences of using different methods to estimate the expected cell frequencies? [5] What are the consequences of using different methods of estimation for unknown parameters? [5] Is the chi-square test unbiased? [7] How is power affected by small cell frequencies? [8] How does the discreteness of the chi-square statistic affect the chi-square approximation? The Pearson chi-square statistic was first proposed for testing of goodness of fit of a known distribution with a set of predetermined or fixed cells. This was later extended to the more practical case of the composite null hypotheses for which the data were sampled from some member of a parametric family F(x;9) of distributions. The fundamental theorem of the Pearson chi-square testing procedure, that the Pearson chi-square statistic is asymptotically distributed as a chi-square random variable with degrees of freedom equal to the number of cells less one less the number of parameters estimated, was established by Fisher (1924). A more rigorous proof with a set of regularity conditions was given by Cramer (1946, pp. 477-479). 20 It was observed by Fisher (1924) that if the estimators for the unknown parameters did not have the same efficiencies as the maximum likelihood estimators based on the observed cell counts, then the chi-square statistic would not have a limiting chi-square distribution. Chernoff and Lehmann (1954) gave a precise solution to this problem. They considered the case where the cells were predetermined and maximum likelihood estimators based on the original (or ungrouped) data were used, and they showed that the Pearson chi-square statistic is asymptotically distributed as a linear combination of chi-square random variables: *^-<1-1 * 1,z; * " ••• * (2.28) where the Z^'s are independent normal random variables, the coefficients are contrained by 0 ^ A S 1 and may depend on the q unknown parameters, and k is the number of cells. This shows explicitly that the chi-square statistic is stochastically larger than a chi-square random variable with k-q-1 degrees of freedom. The practical significance of Chernoff and Lehmann's result is limited since the asymptotic distribution of the Pearson chi-square statistic depends on the unknown parameters. In an attempt to more closely model the procedure followed by researchers, Roy (1956) and Watson (1957, 1958 and 1959) investigated the case where the cell boundaries are determined from the maximum likelihood estimators of the unknown parameters, based on the original data. The number of classes and the desired cell probabilities are predetermined. The cell boundaries vary with the composition of the 21 sample, and the cells are commonly referred to as random cells. Roy and Watson showed that the asymptotic distribution of the chi-square statistic is a linear combination of independent chi-square random variables of the form given by (2.28). They also showed that if the cells are chosen in a proper manner and if F(x;8) is a location-scale parametric family of distributions, then the asymptotic distribution of the chi-square statistic does not depend on the unknown parameters. Some notations will be introduced to facilitate discussion. This notation will be applied throughout the whole dissertation unless otherwise noted. Let X be a continuous univariate random variable with a distribution function F(x;0) where 9 is a column vector with q components, that is e'=(e^ ,02,...,0q). Let X^,X2, (2.29) ,X^ be a random sample of size n from this distribution. The Pearson chi-square statistic can be written as k (n. - np ): X: = I — , i=1 np. (2.30) where there are k cells, p^ is the true probability that the random variable X will fall in the i^^ cell, n^ is the observed frequency or count in the i th cell and p^ denotes an estimator of the true probability p^. The following theorem due to Roy (1956) will be stated without 22 proof. This theorem is of important practical significance for applying the chi-square test of goodness of fit when the cell boundaries are constructed using the maximum likelihood estimator of the true parameters based on the original or ungrouped data. Theorem 2.1 (Roy, 1956) (i) Let f(x;8) be the density function of X , where the parameter 9 is a column vector with q components, that is 8 ' = ( 8 , ,82,...,8 ). (2.31) and assume that f(x;0) is continuous in x and differentiable in 6. (ii) Let the qxl vector 0 be an estimator of 9 based on the original data with the property that there exist functions g^(x) (i=1,2,...,q), which may depend on 6 such that 1 0 -0 = n I g(X.) + e, (2.32) 1= 1 where g(«) and e are qxl vectors, E{g(X^)} = 0, var{g(X^)} is finite and /ne^ > 0 in probability. (ill) Let the range of X, namely (-",=») be partitioned into k, k>q, mutually exclusively cells C^(0) (i=1,2,...,k) depending on 9 such that C^(6) is the half open interval w._^(0) < X < w.(0), (2.33) where w^ is a function of 0 with continuous partial derivatives. 23 (iv) Let = number of X^'s falling in C^(0), p^(e) = F(w^(e);e) - F( w^_^(0);9), (2.34) where F(*;0) is the cumulative distribution function of the random variable X. Under the above regularity conditions, the asymptotic distribution of the Pearson chi-square statistic is that of (2.35) where ,Z^,...,Z^ are mutually independent standard normal random variates. The coefficients the matrix are the characteristic roots of ^ described as follows: (2.36) J = Dp - pp' - UW - WU' + UGU', where p- 0 D 0 = ... 0 Pg . . . 0 (2.37) P 0 0 ... p. p' = (p^,p2,...,p^), U' = (U^,U2,...,U^), W = (W^.W^,...,#^), w (0) U. = / w. .(6) 1-1 ôf(x;0) 60 w (0) dx, W. = / w, . ( 0 ) g(x)f(x;0) dx, 1-1 and G = E(g(X)g(X)'} is the covariance matrix of g(X). (2.38) Here U^ and 24 are qxl vectors. Watson (1957) derived similar results and showed that if e is the maximum likelihood estimator of 0, then the asymptotic distribution of the Pearson chi-square statistic is that of; Xk-q-1 + A,:' + AgZ: + . - + Aqz; , where (2.39) are in the interval (0,1) and may depend on the true parameter 0. For the case where the distribution function is a member of a parametric family of location-scale distributions, Roy (1956) showed that the asymptotic distribution of the Pearson chi-square statistic does not depend on the location parameter a and the scale parameter g provided that the estimators a and B used in the chi-square statistic are the maximum likelihood estimators and the cell boundaries are of the form a + c\g, where the c^'s are some specified constants. Dahiya and Gurland (1972) proved a similar theorem where the chi-square statistic follows the asymptotic distribution of (2.39) when the location and scale parameters are estimated by the sample mean and standard deviation respectively. Of course, these two results coincide for the normal distribution. Following the test procedure suggested by Watson and Roy, the asymptotic distribution of the chi-square statistic is of the form: X%_1 + AiZ; + when the normal distribution is tested. (2.40) Explicit expressions for the A's were derived by Watson (1957, 1958) for various distributions. Specifically, for the normal distribution. 25 k k A . - 1 - k I ? : ( ! ) , A _ = 1 - - T T I Y ? (i), ' i=1 (2.41 ) i=1 where 4(x) = 7:; and that w^'s are the cell boundaries for the standard normal ditribution. Watson (1957) computed and for the number of cells k from 2 to 10 and noted that each \ value decreases to 0 as k increases. He suggested using at least 10 cells so that the contribution from the terms and is negligible. Watson also required that none of the cells have small expected cell frequencies in order to avoid deficiencies in the asymptotic theory due to small sample effects. Dahiya and Gurland (1972) provided a straight forward solution to the problem of contribution due to the A's. the distribution of (2.40) by the Instead of approximating distribution with k-3 degrees of freedom, Dahiya and Gurland computed a table of percentiles for the distribution of (2.40) using Laguerrian expansions (Gurland, 1955 and 1955, and Kotz, Johnson and Boyd, 1967) for a weighted sum of independent random variables. This table of percentiles is presented as Table 2.1. In this table, d, and d, _ are defined as k,a k-3,a * + AiZ! + AgZl ^ and = Ct, (2.42) 26 'V3,«' • «• * Table 2.1. Critical points d, , d, _ and the corresponding values k,a k-S.ot of and a = 0.10 for normal null distributions a = 0.05 * * d, k,a k 3 4 5 6 7 8 9 10 11 12 13 14 15 d, k-3,a 3.928 5.442 6.905 8.322 9.703 11.055 12.384 13.694 14.988 16.267 17.535 18.792 d, k,a d, ^ k-3,a 3.248 2.371 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.352 14.684 15.987 17.275 18.549 a = 0.01 5.107 6.844 8.479 10.038 11.543 13.007 14.438 15.843 17.226 18.589 19.937 21.270 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 15.917 18.307 19.675 21.026 * d, k,a 5.418 7.917 10.075 12.021 13.837 15.567 d, ^ k-3,a 6.635 9.210 11.341 13.277 15.086 15.812 18.475 19.090 21.666 23.209 17.234 18.852 20.431 21.977 23.495 24.990 2 4 . 7 2 5 25.464 25.217 X, 1 2 0.207 0.139 0. 7 7 9 0. 6 3 3 0 . 1 0 3 0.532 0.081 0.459 0.066 0.404 0.055 0.361 0.047 0. 3 2 6 0.041 0.298 0.036 0. 2 7 4 0.032 0. 2 5 4 0.029 0.2 3 6 0.026 0.221 0.024 0.208 The case in which the cell boundaries1 are simple quantiles was studied by Witting (1959) and was investigated further by Bofinger (1973). A basic technique in deriving the asymptotic distribution of the chi-square statistic when the cells are random, is to show that the difference between the fixed cell and the random cell chi-square statistics converges to zero in distribution. This technique was first employed by Roy (1956) and later used by Moore (1971) and Chibisov (1971). The asymptotic distribution of the random cell version of the Pearson chi-square statistic, was obtained by Chibisov, under the null hypothesis and also under sequences of Pitman alternatives. The multivariate version of the random cell chi-square statistic was studied 27 by Moore (1970, 1971). Moore and Spruill (1975) presented a unified large-sample theory of general chi-square tests of fit under composite hypotheses and Pitman alternatives. Wald's method (1943) of constructing test statistics having chi-square limiting distributions from estimators having nonsingular multivariate normal limiting distributions, was generalized by Moore (1977) to the case where the estimators have singular multivariate normal limiting distributions. This generalized Wald's method was then use to construct chi-square type statistics having a chi-square limiting null distribution, for the case when unknown parameters have to be estimated. The methods of proof discussed by the above authors fail if the number of cells increases with the number of observations at a rate faster than 0(/n). Thus, the case where the number of cells and the number of observations increase at the same rate, is beyond the framework of their proof. Several authors have proposed modified or nonstandard chi-square statistics. Kambhampati (1971) proposed a quadratic form of the observed minus the expected cell frequencies. The asymptotic distribution of his statistic is chi-square when the maximum likelihood estimator based on the ungrouped data is used. Rao and Robson (1974) constructed a modified chi-square statistic based on the quadratic form of the asymptotic multinormal conditional distribution of the cell frequencies given the parameter estimates. It is simply the chi-square statistic with an-extra term added on to it. They showed by simulation that the distribution of their statistic agrees with the chi-square distribution with degrees of freedom one less than the number of cells 28 after grouping, regardless of the number of parameters estimated. They also provided results from a small Monte Carlo power comparison of their statistic with both the fixed cell and random cell chi-square statistics. The Rao-Robson statistic showed a slight improvement over the two other chi-square statistics. Cressie and Read (1984) investigated the family \ e R} of "power divergence" statistics 2nl^ = A (A + 1) I 0.{(0./E.)^ - 1}, A e R, ^ ^ ^ (2.44) Pearson's chi-square (A = 1) and likelihood ratio (A = 0) statistics are special cases of the power divergence statistics. Based on power consideration of the power divergence statistics, they recommended 2nl^ for A E [0,3/2] to be used when no knowledge of alternative distributions is known. Note that the Pearson chi-square and likelihood ratio statistics are among the set of statistics recommended. The small-sample properties of these statistics was investigated by Read (1984). He also gave similar recommendations about the choice of A for several classes of alternatives. Traditional discussions of the limiting distribution of the Pearson chi-square and the likelihood ratio statistics for the goodness of fit problem are based on the assumption that all the expected cell frequencies become large as the sample size is increased. Slakter (1966), Roscoe and Byars (1971), and Tate and Hyer (1973) studied the inaccuracy of the Pearson goodness of fit test when expected cell frequencies are small. In order not to violate the assumptions of the traditional chi-square test of goodnees of fit, cells are often collapsed to avoid small expected cell frequencies. Otherwise, one might feel rather uncomfortable with the large sample chi-square approximation for the null distribution of the test statistic. However, information may be lost when cells are collapsed, and the choice of cells to be collapsed introduces a certain degree of subjectivity into the test. Hoist (1972) expressed the view that it is rather unnatural to keep the number of cells fixed when the sample size Increases, for the classical goodness of fit problem. The asymptotic distribution of the Pearson chi-square and likelihood ratio statistics for the goodness of fit problem when the number of cells increases as the sample size increases, was studied by Steck (1957), Hoist (1972, 1976), Morris (1975) and Medvedev (1977a and 1977b). They considered the case of testing a simple hypothesis and each gave similar, but not identical sets of conditions for the asymptotic normality of the goodness of fit statistics. Hoist and Medvedev used complex analysis to derive the asymptotic normal theory of the goodness of fit statistics based on the convergence of a sequence of characteristic functions. Morris extended a conditioning argument of Steck (1957) to obtain a central limit theorem for sums of functions of multinomial counts, and used the result to obtain the limiting distribution of the Pearson and likelihood ratio test statistics for sparse data sets. Certain results from Hoist (1972) and Morris (1975) will be given here without proof. Further notation will be introduced 30 as needed. Theorem 2.2 (Hoist, 1972) Let = (N^^, - Multinomial(p^,k,n^) i.e., T "ik"'""kk' ""''kk''' where k = number of cells, n^^ = sample size or the number of observations, " (Pik' P2k'* * •' Pkk^'" Sometimes the subscript for n^^ is suppressed to facilitate the presentation of formulas. Let the real measurable function f|^(v,x) be defined for v = 0,1,2,... and 0 ^ x S 1. Let 1= I Let « Poisson(np^j^) for i=1,2,...k. P(X,,. = ik X,, ) = ik (np^^)! i.e., 31 Set k u = I E{f,(X ,i/k)}, " i=1 (2.45) k [ I cov{X. ,f|^(X. ,i/k)}, °n var{fk(X.,i/k)} ^ 1=1 If n and k kp^^ â > so that n/k > a (0 < a < »); C < ®, for some real number C and all k and i; jf|^(v,x)j è a*exp(bv) for some real numbers a and b; and 0 < lim inf a^/n â Ixm sup a^/n < » , n —>= n —>= then is asymptotically N(y^,ff^) when n (2.46) > ®. Theorem 2.3 (Asymptotic normality of Pearson's chi-square statistic, Morris, 1975) Let Nj^ = _ Multinomial(p^,k,n^), where k = number of cells, nj^ = sample size or the number of observations, ^ k " ^^Ik' ^ 2 k ' ' P k k ^ ' 32 k > 0 , 1 p?j^ Let {p°j^ : l^iSk} be given with max p.. = 1<i<k 0 (1) as k > and that there exists e > 0 such that n^p^^ i e for all i, k. Denote ? "ik , u,, = L ZÔ— + n. i=lPik |! <Pik - Pik'' 2 — '^i=1 Pik Ik = 2 ''ik Pik Pik "k^njp? k^ik + 2 Pik Pik - Tv)=P ik and = I Oiki=1 Suppose the condition max 2 iSi^k^ik = o( 1 ) as k — > holds. Then, 1 k (Nik - Vik)' 5 y.} 1=1 "kPik — { I s L > N(0,1). as k > 33 Define and »lk • ""p?/"' - °K- (2-53) Then a . ^ is asymptotically of the exact order of k + ko: + "k I G^kPik' (2-54) 1=1 and condition (2.50) is equivalent to the condition that max ne:.Pi ^ k 33 )CO. (2.55: " * ""k" * "k.^lk^iK 1=1 When the "null hypothesis p^^^ = p?^" is true for every i, condition (2.55) is trivially met and so (2.51) holds provided only that (2.47) and (2.49) are valid. The Morris conditions bound all expected cell frequencies away from zero and do not allow any cell probability to remain bounded away from zero as the sample size and the number of cells increase. In contrast, Hoist's conditions do not require all expected cell counts to be bounded away from zero but requires the cell probabilities to be less than c/k for all cells and some c. The conditions of these theorems dictate certain ways of refining the partitions as the sample size increases, to ensure convergence in distribution to a normal distribution. The accuracy of these normal approximations for the null distribution of the Pearson chi-square and likelihood ratio statistics was investigated by Koehler and Larntz (1980). One controversial issue concerning the use of the chi-square statistic is the choice of the cell probabilities. In regard to this issue, Mann and Wald (1942) showed that the equiprobable chi-square test is locally unbiased. The equiprobable chi-square test was later shown to be strictly unbiased by Cohen and Sackrowitz (1975) and Sinha (1976). However, Rayner and Best (1982) demonstrated the existence of unbiased chi-square tests with unequal cell probabilities. The other rationale behind using the equiprobable chi-square statistic is the fact that strikingly different outcomes could be reached by using different configurations of intervals with unequal probabilities, as pointed out by Gumbel (1943). A further attractive feature of the equiprobable chi-square test is that Roscoe and Byars (1971), Smith et al. (1979) and others have shown that the chi-square approximation to the null distribution of is more accurate than for cases with unequal cell probabilities. The special case of Morris's theorem with equal cell probabilities will be stated: Theorem 2.4 (Asymptotic normality of Pearson's chi-square statistic for the null hypothesis of equal probabilities) Let ^k = Ngk»..., N^^) _ Multinomial(p^,k,n^), 35 where k = number of cells, = sample size or the number of observations, Pk " (Pik' ^2k'"" Pkk^' Let (p?^ : lâi^k} be given with max p., =0(1) as k = 1/k. Suppose (2.56) > ® ISlSk and that there exists e > 0 such that n^p^^ k t for all i, k. (2.57) Denote = k + n^k % (p.^ - 1/k): ^ (2.58) i=l °?k - "îk'^Plk' 1= 1 and ^k' * I ''ikSuppose the condition. max 2 iSiSk^ik = 0(1 ) as k > " (2.59) holds. Then, j_i ; 1-1 " V"'. I VK " M(o,,) , as k > (2.60) Suppose the null hypothesis "Pj^|^ = P?|^ = 1 /k" for every i, then = k, = 2 and = 2k. The condition (2.59) is thus satisfied. Morris's theorem of the asymptotic normality of the chi-square statistic was proved for the case of a simple null hypothesis and also certain classes of alternatives satisfying the conditions stated in the theorem. This theorem will be extended to the case of a composite null hypotheses for which the hypothesized distribution is a member of a parametric family of distributions F(*;a), where a denotes the location parameter. A conditional approach developed by Fligner and Hettmansperger (1979) will be used. This method of proof is based on some theorems on the convergence of a sequence of joint distributions due to Sethuraman (1961). A special case of some very general theorems contained in Sethuramen can be found in Fligner and Hettmansperger (1979) and will be stated here. Some definitions and theorems concerning strong and weak convergence of probability measures will be introduced here. Definition 2.1 (Strong convergence of probability measures) Let p^, Pg,.... be a sequence of probability measures defined on a measurable space (n,F). converges strongly to p if p^(A) converges to p(A) for each AeF. Theorem 2.5 (Strong convergence of probability measures (Halmos, 1950)) Let p^, Pg, be a sequence of probability measures defined on a measurable space (0,F). p^ converges strongly to p if and only if Tgdp^ converges to /gdp for all bounded measurable functions g on fi. 37 Theorem 2.6 (Strong convergence of probability measures (Scheffê, 1947)) Let p^, Pg,.... be a sequence of probability measures defined on a measurable space (0,F). If the density f^(') of p^ with respect to some finite measure p^, converges in measure [p^] to a density f(») then there is a measure p such that p^ converges strongly to p. Definition 2.2 (Weak convergence of probability measures) Let p^, Pg,.... be a sequence of probability measures defined on a measurable space (Q,F). p^ converges weakly to p if and only if /gdp^ converges to /gdp for all bounded continuous functions g on Q. Theorem 2.7 (Sethuraman, 1951) Suppose (X|^, Y|^) is a sequence of random vectors such that the conditional distribution of given = c converges weakly to a normal distribution for which the limiting conditional mean is a linear function of c and the limiting conditional variance does not depend on c. If the marginal distribution of converges strongly to a normal distribution, then the joint distribution of (X^, Yj^) converges weakly to a bivariate normal distribution. A theorem concerning the asymptotic normal theory of the Pearson chi-square statistic, where the number of cells is allowed to increase as the sample size increases and an unknown location parameter has to be estimated, will now be proved. 38 Theorem 2.8 (Asymptotic distribution of the Pearson chi-square statistic when the location parameter is estimated from the data via the sample median) Let k = number of cells, = sample size or the number of observations, Let X ,X„,...,X be a random sample from a continuous distribution "k with distribution function F(x:a) and density function f(x;a), where a is the location parameter. Let the sample median 0 be the estimate of the population median 6. Note that 0 is based on the ungrouped data. Let the location parameter a be estimated via the sample median by solving the equation: F(0;a) = 1/2. Let the cells be constructed as follows F(Wik;a) - F(w\_^ = 1/k for i=1,2,...,k, (2.61) where w. , , and w. , are the cell boundaries of the i^^ cell. 1-1,k i,k Let be the resulting multinomial: "^k " ^^1k' Ngk'"'"' ^kk^ ~ Multinomial(p^,k,n^) Pk = (p^^, Pg^,'"', is the vector of true random cell probabilities. Assume n^/k converges to a constant X > 0. Denote k = k + n^k I (p.j^ - 1/k): i=1 , (2.62) 39 °ik " ZkZpïk + -.1 PÏki'Pik , 1=1 and =5 - j / . ; Then, k (N. - n./k)= L -V 1=1 -> «"•" • k as k > " Also, 1 k (N - U, Sk 1=1 - n /k): TTk L ::k) —> ' k as k > =» . Proof Without loss of generality, assume n^^ is an even integer, k is an even integer, and let m = n^/2. Let 0 be the sample median and 0 be the population median. Let U^,U2,...,U^ be those observations less than 0 and be those observations greater than 0. Given 0=0+ c//n^, and noting that the sample median is asymptotically distributed as a normal random variable with mean equal to the population median and variance equal to 1/{4n[f(0)]^}, the following facts concerning the conditional distributions of the U's and V's follow and are stated without proof: 40 (1) .,U^ are independently and identically distributed random variables with distribution function: f F(t;a) , t < F(e;a) (2.63) Fy(t) , t à e (2) independently and identically distributed random variables with distribution function: ^ F(t;ot) - F (0 ;a) , t > 1 - F(e ;a ) (2.64) , t < (3) U's and V's are mutually stochastically independent. Given 6=9+ c//n^, where c is some constant, let the estimator a of a be obtained by solving the equation: (2.65) F(6,a) = 1/2. Let the cell boundaries F(Wik,a) = i/k i=0,1,...,k be constructed as follows; , i=0,1,...,k. (2.66) Note that 9 = w. k/2,k* Consider any cell ("i-i,such that w.^^ g q and let pj^^ be the true probability of U's falling into the cell ("i-ipk'^i.k^' 41 (2.67) F(e ;a) Similarly, consider any cell (w._, . ,w. ) such that 9 Û w x^ljK IjK lljK let be the true probability of the V's falling into the cell i V ^ F(w.^^;a) - F(w._^^^;a) (2.68) 1 - F(9;a) Condition on 0 = 6 + c//n^, and let be the true conditional cell probability associated with the cell (w\_^ k'^i k^' 0-5 Pik ' "ik 3 (2.69) Pik " 0-5 Pik ' ® ° "i-l,k Let k ^k = /2k "ikCp.,) - I i=1 (2.70) Hk/k where N., , > denotes the cell count of i^^ cell and p., is the ik(p.j^) Ik associated true cell probability, is attached to so that the notation will be more precise when the conditional version of given later. Condition on 0 = 6 + c//n^, where c is some constant, then is 42 n^/k]: k I • i=1 1 9 +c//n^ = /2k n^/k (2.71) where = k + "kk.l^fplk - ' 1=1 and N. , + . denotes the cell count of the cell and p'î is the ikip.k; . + associated cell probability, given that 8 = 8 + c//n^. Let denotes the conditional distribution of given 8=0 +c//n^, for some constant c. Note that (N. , + .,..,N . + .) is not a multinomial random vector ^Pik ^ik since the sum of the probabilities of the first half or the second half of the cells is equal to one half. However, using (1), (2) and (3), y|^ can be written as the sum of two independent random variables. ,v+ /2F /2k Y. = s" . * (2.72) + k(Pik) where ,u+ + G//n^, 1 k/2 '"T'T'' I (2.73) 1=1 2 2 k k(Pik) 43 n k/Z (2.74) k/2 1 2P,. ik + 2 ^«P.j = Î ' ik i=l rij^ 2 (2.75) ' 2Plk ' 2 2 k k 2p '^ -F - ^k)'- ^Pik ' ik(Pik) ( k (2.76) and k/2 o" 2 _ y u k(Pik) "i:l 2 (2.77) ik' Similarly, define k •«"ik' i=k/2+1 kL n 2 y ^^Pik^ i=k/2+1 / , 2 k (2.78) k(Pik) ( 2 p , k - 2 / k ) : , k (2.79) i=k/2+1 2Pik — + " —T-)"Pik ' "k 2 2 I 2 1 ^ \ k — yV 2 (2.80) 44 2P,. + 2 ik (Pik) 2 2 n, 2 2 Ik ( k k (2.81) and .V 2 = y k(Pik) i=k/2+1 V 2 (2.82) Note that (2.83) and Let n" and k greater than e k be vectors of cell counts for observations less and + respectively. Also, note that p j^), given 0 = 6 + c//n^. probability of the cell O'SPÏk O'SPlk is the true cell '"ikS (2.84) s w. i-1,k Thus, nJ^ ~ MultinomiaKpj^, k/2, n ^ / 2 ) , and ~ Multinomial(p^, k/2, n ^/2). where ' ,u k " ^^^1k'-'"^Pk/2,k^ ' 45 K- "k.ktZp; • Pr " (^^k/2+1,k'*••'^^k.k^ • Let 6 e (0,1/2), ^6k,k ^ O'SPgk.k 1/k (2.85) 1/k [F("6k,k:*) - F("6k-l,k:=) 2F(0;a) F("6k,k:*) - F("5k-i,k:*) \ f("6k,k:G)["5k.k - "gk-l.k^ 2F(e,a) f(Wgk,k:G)["ak,k ~ "6k-1,k^ -> 1 as k > m. Similarly, let 6 e (1/2,1), then (2.86) 1/k 1/k [f("6k,k:*) - F("6k-1,k:*) 2(1 - F(6;a)) f("6k,k:*) - f("6k-i,k:*) 1 "6k-1 ,k^ 2F(e,a) -> 1 as k «"sk.k'-'tV.k - "Sk-I.k^ > 46 Using Theorem 2.3, yJJ"" (2.87) > N(0,1) , and > N(0,1) . Note that k(Pik) 2ir > 1 //2 , (2.88) -> 1 //2 "TIF Consequently, /2k ,v+ 'k(Plk) -> N(0,1 ) as k Let > (2.89) = /n^(8 - 0) and note that %k = %k I Xk = ° (2.90) Applying the central limit theorem to the estimator of the median (Mood, Graybill and Boas, 1974), 47 strongly -> N(0, -) (2.91) . itf(0;a)' Hence, with Theorem 2.7, X, \ weakly (2.92) -> N Since, (2.93) Var(Y) = E(Var(Y|X)) + Var(E(Y|X)) , and al = ^ , then weakly -> N(0,1) Note that and may be replaced with any asymptotically * equivalent formulas, say * * and Sj^ such that s^/s^ converges to one and * - p^)/s^ converges to zero as k tends to infinity. This is not * important for the asymptotic result, but the choice of s^ and * may greatly influence the accuracy of the limiting normal distribution for small samples. its C. Statistics Based on the Empirical Distribution Function This section reviews various statistics based on the empirical distribution function. Let be an ordered random sample from a distribution with distribution function F^(x:0) where 8 is a vector of unknown parameters. The empirical distribution function at x, F^(x), is defined as the proportion of the x^ values less than or equal to X. More explicitly, F^^x) is defined as F^(x) = 0 , X < X. i/n , x^ 3 X < 1 , X a x^ . (2.94) The statistics based on the empirical distribution function can be roughly divided into two broad classes of statistics typified by the well-known Kolmogorov-Smirnov statistic, D = sup |F (x) - F (x;0)|, (2.95) -= 3 X 3 " and the Cramer-von Mises statistic. = n / [F^(x) - F^(x;e)]^ dF^(x;e) . (2.96) The Kolmogorov statistic was developed by Kolmogorov (1933). Two one-sided statistics very similar to the Kolmogorov statistic which were proposed by Smirnov (1939, 1941) are = sup -co < X â [F^^x) - F^(x;e)] , (2.97) 49 and D sup n (2.98) [Fg(x;8) - F^(x)] . is commonly known as the Kolmogorov-Smirnov statistic. The Kolmogorov-Smirnov statistic measures the maximum discrepancy between the empirical and the hypothesized cumulative distribution functions. In an attempt to make full use of the discrepancy between the empirical and hypothesized cumulative distribution function, Cramer (1928, p. 145) developed the statistic = (2.99) / [F^(x) - F^(x;6)]^ dF^(x;6) , which averages the square of the difference between the empirical and the hypothesized distribution function across all values of x. The spirit behind the Cramer statistic is similar to that of the chi-square statistic which measures the square of the differences between the expected and observed cell counts. The Cramer statistic was later generalized by von-Mises (1931) by introducing a weight function g(x) to obtain = / g(x) [F^(x) - F^(x:e)]^ dFg(x;8) . (2.100) When the weight function is identically one, this reduces to the original Cramer statistic. This statistic was further modified by Smirnov (1936, 1937), who obtained 50 = n ; Y(F (x;8)) [F^(x) - Fg(x;8)]: where dF ^(x;e), (2.101) is some function. Anderson and Darling (1952) studied a special case of Cramer-von Mises-Smirnov statistic where the weight function is ¥(F^(x;0)) = [F^(x;0)(1 - F^(x;0))]"\ (2.102) and the resulting statistic, " = n / (F (x) - F (x;8))2 dF (x;9) . (2.103) F (x;8)(1 - F^(x;e)) is commonly called the Anderson-Darling statistic. Note that the denominator approaches 0 when x approaches the extreme ends of the distribution and it achieves the maximum value of 0.25 when x is the median of the distribution. The Anderson-Darling statistic gives greater weight to the tails of the distribution and can be expected to be more powerful in detecting distributions with heavier or longer tails than those of the null distributions. Sometimes, the data are in the form of directions and one wishes to test the hypothesis that the orientation of the directions is random. Data of this type can also be represented as a set on points on the circumference of a circle. The testing of the hypothesis that the n points are distributed at random on the circumference of the circle is exactly the same as testing the hypothesis of randomness of directions. 51 Statistics developed for this kind of situation are commonly known as tests on the circle. One essential property of statistics of this kind is the invariance property of the choice of reference point on the circumference of the circle. To be more precise, let R be any arbitrary point on the circumference of the unit circle. Let d^,d2,...,d^ be the distances from the reference point R to the n sample points in a particular direction. The sample d^.d^j-.-.d^ completely determines the sample for a fixed R. Thus, it is important that the statistics developed for this kind of situation remains unchanged with any other choice of reference point on the circle. A Kolmogorov-Smirnov test on the circle of the form V n = D* + D~ n n (2.10'-1) was proposed by Kuiper (1959). A more powerful statistic for testing points on the circle, U' = n/[F^(x)-F^(x;0) - /{F^(t)-F^(t;0)ldF^(t,9)]^dF^(x;0) , (2.105) was developed by Watson (1961). The Watson statistic attempts to measure the variance of the differences between the empirical and hypothesized distribution functions. These statistics for test of points on a circle can also be used for testing of points on a line. The Kuiper statistic is more powerful in detecting a change in scale rather than a change in location, when compared to the Kolmogorov-Smirnov statistic. Similarly, the Watson statistic can be 52 expected to be more powerful in detecting shifts in the variance of a symmetric distribution. The EDF statistics are summarized as follows: Kolmogorov-Smirnov statistics: D = sup |F (x) - F (x;e)| , -= S X a " (2.106) CF^(x) - F^(x;e)] , (2.107) [Fg(x;8) - F^(x)] . (2.108) = sup -00 < X ^ " D = sup — CO ^ X = 00 Kuiper statistic: V n = D"*" n D~ . n (2.109) Cramer von-Mises statistic: .n / [F^(x) - F^(x;8)]^ dF^(x;0), (2.110) Anderson-Darling statistic: » A: = n / " (F„(x) - F^(x;0))2 dF (x;e) F^(x;0)(1 - Fg(x;8)) (2.111) 53 Watson statistic: 00 00 y: = n/[F (x)-F (x;0) - /{F (t)-F^(t;0)}dF^(t,0)]^dF^(x;8) n no n 0 o o (2.112) These previous formulas are not necessarily the most convenient formulas for practical computations. formulas are useful. Note that The following computational = F^(x^;e) or F^(x^;0) depending on whether 0 is known or not under the null hypothesis, respectively. Unless otherwise indicated, 0 is assumed to be the maximum likelihood estimator of 0. Only those statistics used in the subsequent power comparison are listed below. Kolmogorov-Smirnov statistics: = max [ i/n - z.] , 1gi<n ^ (2.113) D = max [ z. - i/n] , " ISiSn ^ (2.114) " D = max I i/n - z.| = max ( D^, D ) . " IZiSn ' " (2.115) " Kuiper statistic: V n = D + n + D . n (2.116) Cramer-von Mises statistic: n = % [z. - ( 2 i - 1 ) / ( 2 n ) ] : + 1/(12n) . " i=1 ^ (2.117) 54 Anderson-Darling Statistic: n = - (%(2i-1)[ln i=l + In (1 - z^+^_^)]}/n - n . (2.118) Watson Statistic: - n(z - 1/2)2 ^ where z = (^z^)/n. (2.119) Extensive references for tests based on the empirical distribution function can be found in Darling (1957), Barton and Mallows (1965), Sahler (1968) and Durbin (1973b). Darling (1955) considered testing a composite null hypothesis where one parameter has to be estimated, and this was extended to the multiparameter case by Sukhatme (1972). Durbin (1973b) presented a comprehensive treatment of the theory for the derivation of the sampling distribution of a wide range of statistics based on the empirical distribution function. The statistics considered by Durbin include the Kolmogorov-Smirnov, Kuiper, Cramer-von Mises, Anderson-Darling and Watson statistics. Treatment of the asymptotic theory of statistics based on the empirical distribution functions can be found in Anderson and Darling (1952, 1954), Darling (1955, 1957), Durbin (1973a, 1973b, 1975), Kac et al. (1955), Stephens (1976, 1977) and Watson (1961, 1952). Asymptotic percentiles of the Anderson-Darling statistic were tabulated by Anderson and Darling (1954) for testing simple null hypothesis. Stephens (1974, 1976) obtained the asymptotic percentiles for the Cramer-von Mises, Watson and Anderson-Darling statistics, when the distribution tested is normal with mean or variance or both unknown. 55 These percentiles were obtained by Stephens by fitting Pearson curves to the distributions using the first four cumulants. The asymptotic percentiles for D and V were obtained by Stephens (1974) using extrapolation of Monte Carlo percentiles of finite samples. Stephens (1974, 1976) also provided Monte Carlo percentiles for the statistics A^, , V and D corresponding to finite samples for the normal case, where parameters have to be estimated. The Weibull probability model is used widely in modelling reliability or lifetime data because of its wide range of density curves. The distribution function of a Weibull random variable X is F(x) = 1 - exp[-(x/e)^] . (2.120) The Weibull distribution can be transformed into a Gumbel distribution using the simple transformation Y = -log X. The distribution function of the Gumbel random variable is given by F(y) = exp[-exp{-(y - a)/g}], (2.121) a = - log 8 , (2.122) where and B = 1/Y . Thus, if one is interested in fitting a Weibull probability model to a set of data, one can first transform the data using the minus . logarithmic transformation and then fit a Gumbel probability model to the transformed data. A good fit of the Gumbel probability model to the 56 transformed data set would imply a good fit of the Weibull probability model to the original data set. The Monte Carlo percentiles for finite sample D^, D and V statistics for the testing of goodness of fit of the Gumbel probability model with unknown parameters, were provided by Chandra et al. (1981). Stephens (1977) provided the Monte Carlo percentiles for the and statistics for the Gumbel case. The exponential probability model has also been widely used as a model in lifetime study. It has a constant hazard function and this can be useful in certain situations. "lifetime" of glass bottles. will not deteriorate. One interesting example is the Unlike many other things, a glass bottle The simplicity of the exponential density function often leads to many elegant derivations of properties. Monte Carlo percentiles for D were provided by Lilliefors (1967» 1969). Stephens (1974, 1975) provided Monte Carlo percentiles of A^, W^, U^, V and D for the exponential case with unknown scale parameter. + method for the exact distributions of D , D sizes was developed by Durbin (1975). — An elegant and D for finite sample Tables of percentiles for D^, D and D for a wide range of sample sizes, were also provided by Durbin. 57 D. Statistics Based on Moments The mean and the variance of a distribution are among the most basic statistical concepts. They measure the location and the spread of a distribution, respectively. Two less well-known measures are the skewness and kurtosis statistics. The skewness and kurtosis are two measures of the shape of a distribution. The skewness is a measure of asymmetry and the kurtosis is a measure of the heaviness of the tails of The skewness /Bi and the kurtosis 62 of a distribution a distribution. are defined as (2.123) and 62 = , where yj, U3, and (2.124) are the central moments defined as U2 = E(X - y)^ , (2.125) U3 = E(X - u)^ , (2.126) ^4 = E(X - u)"* . (2.127) and An asymptotically unbiased estimate of /gj is given by n /bi = n I (X (n-1)(n-2) i=1 _ - X)^ / , (2.128) 58 where 1 n (n-1) i=1 _ - X): , I (X (2.129) and n X = (I X.)/n . i=1 (2.130) The bias of this sample estimate is of the order of 1/n. An asymptotically unbiased estimate of 6% is given by bz = n(n+1) : (n-1)(n-2)(n-3) n il (X i=1 3(n-l)(n-l) - X)") / s" +3. (n-2)(n-3) (2.131) The bias of this estimate is also known to be of the order of 1/n. Table 2.2 contains the skewness and kur-tosis of several different distributions. Table 2.2. Skewness and kurtosiss of certain distributions Distribution Skewness Kurtosis uniform 0 1.8 normal 0 3 Gumbel 1.14 5.4 exponential 2 9 59 The uniform and normal distributions are symmetrical distributions whereas the Gumbel and exponential distributions are skewed distributions. Flat distributions with short tails like the uniform distribution have small kurtosis. The exponential distribution has a long tail and a large kurtosis value. Other sample estimates of the skewness and kurtosis are possible. Common ones are * /b, = 1 n I (X n i=1 - X)= / s' , (2.132) and * ba = 1 n I (X n i=1 - X)- / s" . / (2.133) / * * The reasons for using /b^ and b^ instead of /bi and bg are the asymptotic unbiased property of /bj and bg and the fact that these two sample estimates can be computed easily using the procedures PROC MEANS or PROC UNIVARIATE of SAS (SAS Inc., 1982, pp. 497,498). kurtosis computed in SAS differs from bj by 3. Note that the In other words, add 3 to the kurtosis computed by SAS to get b^. The skewness and kurtosis statistics can be used as goodness of fit statistics. For testing normality, the normal probability model will be rejected for large absolute values of skewness since this is an indication of asymmetry. A kurtosis value too far from 3 will either indicate a distribution with tails shorter or longer than that of the normal distribution, and hence the null hypothesis of normal probability 50 model will be rejected. The skewness and kurtosis tests can also be performed for other null probability models, bearing in mind that the critical values for the rejection of the null hypothesis will differ among distributions. D'Agostino and Pearson (1973) provided charts of curves through smooth Monte Carlo percentiles of bj, from a normal distribution. An approximation to the distribution of bg was obtained by Anscombe and Glynn (1983). Work on approximating the distribution of /b^ can be found in Bowman and Shenton (1973) and D'Agostino and Tietjen (1973). The skewness and kurtosis statistics are tailored for different classes of alternative distributions. The kurtosis test is generally more powerful than the skewness test for detecting symmetrical distributions with longer or heavier tails than the normal probability model. On the other hand, the skewness test will perform better for skewed distributions with kurtosis near that of the null probability model. Pearson et al. (1977) introduced a test based on the joint use of the skewness and kurtosis statistics. for the data to fit in. This test specifies a frame An extreme deviation of the skewness or kurtosis values from those for the null probability model will lead to the rejection of the null hypothesis. This test was referred to as the rectangle test by Pearson et al. (1977). Consider the case of testing a null hypothesis F^, let /bi(L) and /bi(U) be the lower and upper 1006% points of /bj and let b2(L) and b2(U) be the lower and upper 1006% points of bg. The four points (/b,(L),b2(L)), (/bi(L),b2(U)), (/bj(U),b2(L)) and (/b^(U).b^(U)) 61 define a rectangle as shown in Figure 2.1 (/bi(L),b2(U)) (/bi(U),b2(U)) (/bi(L).bzCL)) (/bi(U),b2(L)) b,(U) b,(L) /bi(L) Figure 2.1. /bi(U) /b. Rectangle defined by critical values of a rectangle test If /bj and bg are independent, then the probability of a point falling outside this rectangle will be a. The a and g values are related by a = 4 (6 - g^) , (2.134) or =[1 -/(I - a)]/2 . Table 2.3 shows the different g values needed to achieve various a levels for the rectangle test. 62 Table 2.3. Relationship between a and g a g 0.100 0.075 0.050 0.025 0.010 0.005 0.001 0.025658 0.019115 0.012660 0.006290 0.002506 0.001256 0.000250 Note that /bi and bg are usually not independent, and a percentage of points smaller than a will fall outside the rectangle, yielding a conservative test. To obtain the Monte Carlo percentiles corresponding to the specified a level, the following algorithm implementing a bisection method was designed. 1. [1] Algorithm to obtain Monte Carlo percentiles of the rectangle test corresponding to the specified g level Generate N sets of random samples from the null distribution compute the skewness and kurtosis values, i.e., (/bi^.bj^), ( / b • [2] Create a sorted array of skewness values /bi^^^, /bi^^)»**»» and a sorted array of kurtosis values bz^^^, bz^g^, ..., C3] Construct the first rectangle test: (a) Obtain the lower and upper percentiles and /bjj-n_ng-j for the /bj component, where g corresponds to the and 63 specified a level as shown in Table 2. (b) Obtain the lower and upper percentiles and for the bj component. (c) The first rectangle is defined by ^^^[OUTupper]' ^^[OUTlower] [ouTlower]' ^^[OUTupper] OUTlower = [ng] and OUTupper = [n-ng]. Note: The fraction of (/bi^.bz^) points falling outside this rectangle will be less than a. * [4] Compute for the rectangle defined by »^t)iouTlower' '^'^lOUTupper' ^'OUTlower ^'OUTupper' the fraction of (/bi^,b2^) points falling outside the rectangle. * [5] Select a suitable positive integer j (such that the a achieved by the new rectangle will be greater than a) to contruct a new rectangle defined by /b.^Niower' '''='I»upper'''=imo»er "'raupper "here INlower = ([ng]+j), and INupper = ([n-ng]-j). This rectangle is smaller than the previous one. * [6] Compute for the rectangle defined by /biiNlower' /b'lNupper' ^ziNlower bziNupper' b? computing the fraction of (/bi^.bz^) points falling outside the rectangle. [7] Compute Cj/2]. If [j/2] = 0, stop If [j/2] > 0, go to [8]. 64 [8] Let MIDlower = (OUTlower + [j/2]) , and MIDupper = (OUTupper - [j/2]). * [9J Compute Type I error achieved by the rectangle defined /biMiDlower' "^^^MIDupper' ^^MIDlower [10] If (a - X (a - < 0, then OUTlower = MIDlower, OUTupper = MIDupper, * * "out " "mid and j = INlower - MIDlower. Go to [?]. else, INlower = MIDlower, INupper = MIDupper, * * "in " "mid and j = MIDlower - OUTlower. Go to [7]. ^^MIDupper' III. PROBABILITY PLOTS AND DISTRIBUTION CURVES The probability plot is a common qualitative tool used widely by statisticians and engineers. The scatter plot, which includes the probability plot, is considered to be one of the "magnificent seven" in statistical quality control. "Probabably the single most powerful tool with which the results of an experiment can be studied is a collection of plots of raw and transformed data" (Gerson, 1975). Probability plots provide a qualitative estimate of the goodness of fit of a probability model to a data set. One important application is assessing the goodness of fit of a normal probability model to the residuals from a fitted model of some experimental data. There are two main types of probability plots, namely the P-P (percent versus percent) and the Q-Q (quantile versus quantile) probability plots. Wilk and Gnanadesikan (1968) and Gerson (1975) have comprehensive reviews of P-P and Q-Q probability plots and some variants. The Q-Q probability plot seems to enjoy a greater popularity than the P-P probability plot. This can be largely attributed to the linear invariance property possessed by the Q-Q probability plot. The linear invariance property guarantees that, if a linear transformation is performed on the observations, the resulting Q-Q probability plot would still be linear but with a change in slope and intercept. This chapter reviews the construction of P-P and Q-Q probability plots, the choice of plotting positions and a comparison between the Q-Q and P-P probability plots. A new technique based on P-P probability plots for assessing the goodness of fit of nonhypothesized probability models to a data set is developed. This 66 technique is not limited to location-scale distributions. Finally, a computer implementation of this technique is proposed. Let be an ordered random sample of size n from a location-scale distribution with distribution function F^C(x-a)/B], where a and g are the location and scale parameters, respectively 1. Construction of an "F" Q-Q probability plot —» 1 Plot against a^ where a_ = F (p^) and p^ is the plotting position. 2. Construction of an "F" P-P probability plot Plot against p^ where = F[(x^ - a)/B]. If a and g are unknown, they are replaced by the corresponding maximum likelihood estimators a and g. If F is the normal distribution function, then the resulting probability plot is known as a normal probability plot. Similarly, if F is the exponential distribution function, the resulting probability plot is called an exponential probability plot. Different choices of plotting positions are available. Table 3.1 contains a list of different plotting positions for the P-P probability plots. Plotting positions for the Q-Q probability plots are obtained by evaluating the inverse distribution function at these plotting positions, that is, F (p^). 57 The plotting positions are similar for large sample sizes, but there are differences among these sets of plotting positions, especially at the extremes when the sample size is small. The plotting position i/n is known to hydrologists as the California Method (California Department, 1923). This has generally been discarded because it was not possible to plot the largest or the smallest observation on the Q-Q probability plot, but this problem does not occur for the P-P probability plots. The plotting positions (i-0.5)/n and i/(n+l) are the most often sited in the journals, with the former being more popular. However, there has been an increasing acceptance of Blom's plotting position (i-0.375)/(n+0.25) in recent years. Blom (1958) proposed the — "j formula $ [(i-c)/(n-2c+1 )] as an approximation of the expectation of the normal order statistics and recommended the compromise value c = 0.375. Harter (1961) provided a formula for c as a function of i and n, improving the overall accuracy of the approximation to about 0.002 for n S 400. The crude normal Q-Q probability plot produced by the SAS UNIVARIATE procedure is based on this plotting position. The plotting •position i/(n+1) has the feature that E{F[(X^- a)/g]} = i/(n+l) since FC(X^-a)/B] has a beta(i, n-i+1) distribution. Kimball (I960) has a detailed discussion on the choice of plotting positions. Looney and Gulledge (1985) investigated empirically the power of the Shapiro-Francia statistic using different plotting positions. The power of the Shapiro-Francia statistic remains approximately the same for different plotting positions. 68 Table 3.1. Plotting positions for the P-P probability plots Plotting position, p^ (i-0.5)/n i/n i/(n+1) (i-0.3)/(n-0.4) (i-0.375)/(n+0.25) (3i-1)/(3n+1) (i-0.44)/(n+0.12) (i-0.3175)/(n+0.365) (i-0.33)/(n+0.33) (i-0.4)/(n+0.2) (i-0.567)/(n-0.134) References Hazen (1914) California State Department (1923) Weibull (1939) Benard and Bos-Lavenbach (1953) Blom (1958) Tukey (1962) Gringorten (1963) Filliben (1975) Biomedical (1979) Larsen, Curran and Hunt (1980) Larsen, Curran and Hunt (1980) The construction of normal P-P and Q-Q probability plots is illustrated with a data set taken from Snedecor and Cochran (1980, p. 94). The data set consists of gains in weight of female rats under a high protein diet. The location and scale parameters are estimated by the sample mean and standard deviation a = 120 and g = 21.39. The ordered observations are listed in Table 3.2, along with the plotting positions {i/(n+1)}. In order to construct a normal Q-Q —1 probability plot, the inverse normal distribution function, $ (•) must be evaluated at the plotting positions. To facilitate the construction of probability plots, special probability graph papers are available. -1 These graph papers have a scale based on the values of F (i/(n+1)) but -1 labelled with an i/(n+1) scale, so the point (x\, F Ci/(n+1)]), can be plotted by knowing the value of the point (x%, i/(n+1)). A plot of the ordered observations against the inverse normal distribution function of the plotting positions would yield a normal Q-Q probability plot, as 69 shown in Figure 3.2. To contruct a normal probability P-P plot, the /V ^ _1 observations are standardized using a and g, and $ («) is evaluated for the standardized observations. columns of Table 3.2. These values are listed in the last two A plot of $C(X^-ot)/g] against i/(n+1) yields the normal P-P probability plot as shown in Figure 3.1. Table 3.2. i 1 2 3 4 5 6 7 8 9 10 n 12 Gains in weight of female rats under a high protein diet and plotting positions for the normal P-P and Q-Q plots i/(n+l) 83 97 104 107 113 119 123 124 129 134 146 161 0.077 0.154 0.231 0.308 0.385 0:462 0.538 0.61 5 0.692 0.769 0.846 0.923 $ ^[i/(n+1)] (x^-a)/3 -1.426 -1 .020 -0.736 -0.502 -0.293 -0.097 0.097 0.293 0.502 0.736 1 .020 1 .426 -5.993 -3.725 -2.591 -2.106 -1.134 -0.162 0.486 0.648 1.458 2.268 4.211 6.641 $[(x^-a)/g] 0.0000 0.0001 0.0048 0.0176 0.1285 0.4357 0.5869 0.7415 0.9275 0.9883 0.9999 1.0000 F[(X. - a)/B] = i/(n+1), (3.1) X. = g F"T[i/(n+1)] + a. (3.2) and These approximations provide a heuristic explanation for the linearity of the normal P-P and Q-Q probability plots respectively. The slope and the intercept on the vertical axis of a Q-Q probability plot 70 O O00 6- 0 o- s S- M 6- o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities Figure 3.1. Normal P—P probability plot 0.9 1.0 Observed percentiles 83.0 90.8 98.6 106.4 114.2 122.0 129.8 137.6 145.4 153.2 161.0 J 1 I I I L OQ c m CO k) ? B JD I 3 W S-. I "O I—• o P CO 72 provide graphical estimates of the scale and location parameters, as is obvious from (3.2). Chernoff and Lieberman (1954), and Barnett (1975, 1976) discussed the problem of obtaining efficient and unbiased estimators of the location and scale parameters using probability plotting methods. Figures 3.1 and 3.2 show that the P-P and Q-Q probability plots are similar. A common feature of the normal Q-Q probability plot is that points near the middle of the plot usually have the smallest variance. The opposite is true for P-P probability plots when the form of F^. = F regardless of Michael (1983) considered the use of certain transformations involving the arsin function, on the plotting positions and {F[(X^ - a)/g]} to achieve uniform variance of the points on a probability plot from one end to the other. An appealing property of the Q-Q probability plot is that if y^ is a linear transformation of x^, then the resulting Q-Q probability plot will remain linear but with possibly changed slope and intercept. This linear invariance property has made Q-Q probability plots valuable and very popular. One geometric configuration that humans can perceive most easily is linearity. The general P-P probability plots discussed in Wilk and Gnanadesikan (1968) do not necessarily possess the linear invariance property. However, as long as the observations are properly standardized, the P-P probability plot can be shown to be linear invariant. If fact, a P-P probability plot of the original observations and a P-P probability plot of a linear transformation of the original 73 observations are identicaly the same. This can be proved using the linear invariance property of maximum likelihood estimation. A theorem stating the property of the maximum likelihood estimators is presented without proof. The proof of this theorem can be found in Mood, Graybill and Boes (1974). Theorem 3.1 (Mood, Graybill and Boes, 1974, Ch. VII, p. 285) Let 0 = (8^,...,8^), where 8^ likelihood estimator of 6 . J (X^,...,X^) is a maximum in the density f(•;e ' 8 . K ). If t(0) = ((0),...,T^(0)) for 1 g r g k is a transformation of the parameter s p a c e 0, then a maximum likelihood estimator of t(8) = (i^(0),...,t^(e)) is t(8) = (0),...,t^(0)). Theorem 3.2 (Invariance property of the P-P plot or the statistic) The P-P plot or the k^ statistic is linear invariant if the location and scale parameters are estimated using maximum likelihood estimators. Pooof Let F[(X-a)/g] be the distribution function of a location-scale distribution with location parameter a and scale parameter g. Let X = (X^.X^,...,X^) be an ordered random sample from the standard distribution with location parameter 0 and scale parameter 1. Let = b X^ + a be any linear transformation of the X^. The distribution function for the transformed random variable is FC(X-a)/b]. 74 It is sufficient to show that X. - a since the Y. ~ a . y , (3.3) statistic or the points on a-P-P probability plot depend on X through the transformed observations only. Note that and are the maximum likelihood estimators of the location parameter 0 and scale parameter 1 and and are the maximum likelihood estimators of the location parameter a and the scale parameter b, respectively. By Theorem 3.1, the maximum likelihood estimators of a and b are a = b a + a , y X (3.4) and By - b 'x Hence, ^i "y b X_ + a - ( b + a) ^x *1 - "x 3. Problems associated with Q-Q probability plots Mage (1982) in his paper entitled "An Objective Graphical Method for Testing Normal Distributional Assumptions Using Probability Plots" 75 provided a good review on the problem of drawing "the" best straight line on a Q-Q probability plot. If one resorts to the use of a machine, then a straight line can be drawn on the graph objectively using the methods of least-squares, weighted least-squares, moments or maximum likelihood. If one uses the hand to draw a straight line on the Q-Q probability plot, a straight line is drawn subjectively. Some of the methods suggested in the literature for drawing a straight line on a Q-Q probability plot are as follows: [1] Gumbel (1964): "After the observations have been plotted, the straight line may be drawn by a ruler, provided that the scatter of the observations is sufficiently small. The question of acceptance or rejection of the probability function may be settled by mere inspection." [2] Hahn and Shapiro (1967): "If a straight line appears to fit the data, draw such a line on the graph 'by eye'." [3] Ferrell (1958), described by King (1971): "First, make a good 'eye-ball fit', using the straightedge. near the smallest plotted point. Then place a pencil point Pivot the straightedge around the pencil point until the points in the upper half (P>0.5) of the plot are divided into two equal parts. (Equal numbers of points above and below the upper half of the line.) This is readily done by counting. Next, shift the pencil point up near the largest plotted point on the new trial line and divide the points in the lower half (P<0.5) of the plot into two equal parts. Two or three such 76 points into an upper half and a lower half with respect to the straightedge." Motivated by the need of a method of drawing an objective straight line on a Q-Q probability plot, Mage (1982) suggested a set of 10 rules for drawing a such a line. The idea behind the 10 rules is to draw a straight line to minimize the Kolmogorov-Smirnov statistic. The uncertainty and subjectivity of the drawing of a straight line is one of the drawbacks of Q-Q probability plots. For the P-P probability plot, there is no confusion at all. The unique best-fit straight line is the diagonal line joining the points (0,0) and (1,1). Another advantageous feature of the P-P probability plot is that the x-coordinate values depend only on the sample size and not upon the hypothesized distribution. In addition, the points always fall within the unit square and are not bunched as closely together in various regions as with Q-Q probability plots for certain distribution. An example is the exponential Q-Q probability plot for which the points are usually bunched together at the left end of the exponential Q-Q probability plot. Furthermore, the variation of the points about the line is relatively small for the left end of the plot since the variance of the i^^ ordered exponential random variable is given by i Var(X.) = I [1/(n-k+1):], k=1 (3.5) and the variance of the Y-coordinate values on an exponential Q-Q probability plot increases steadily from the lower end to the upper end. 77 On the contrary, the variance of the i^^ uniform ordered random variable is i(n-i+1) Var(X ) = , (n+1):(n+2) and the quantiles are evenly spaced. (3.5) So, the points will spread out within an oval shape band enclosing the diagonal. 4. Distribution curves on a P-P probability plot The random variable F[(X^ - a)/g] is the i^^ ordered uniform random variable and hence E{F[(X^ - a)/g]} = i/(n+1). should be close to i/(n+1). Thus, F[(x^ - a)/6] If X is a random sample from a distribution with distribution function F('), then the points {(F[(X^ - a)/6], i/(n+1))} of the resulting "F" P-P probability plot will fall roughly along the diagonal joining the points (0,0) and (1,1). However, if the sample is from a non-"F" distribution, then the points of the "F" P-P probability plot may fall along some curve on the "F" P-P probability plot. Figure 3.3 shows a normal P-P probability plot of a random sample of size twenty from the uniform (0,1) distribution generated using the generator RANUNI(9882017) (SAS Inc., 1982, p. 195). Note that the points fall along a curve in Figure 3.3. There are specific curves corresponding to various non-"F" distributions for a particular "F" P-P probability plot. will be called "distribution curves". These curves These curves can be used just like the diagonal line as a measure of fit of a probability model to a 78 o O00 d- O" o- o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities Figure 3.3. Normal P—P probability plot 0.9 1.0 79 set of data that is plotted on a P-P probability plot. The obvious problem is to produce these distribution curves for a particular "F" P-P probability plot. The following method is presented for obtaining the distribution curve for a random variable with distribution function F^ ^ F. 5. Constructing distribution curves for "F" P-P probability plot -1 [1] For each p^ = i/(n+1) compute F^ (pr), for i=1,2,,..,n. [2] Compute the maximum likelihood estimates of a and g based on the —1 F^ (p^) values. [3] Standardized the observations by computing - a [4] Compute the "F" probability [5] Plot y^ against p^. [5] Join the points to get a smooth curve (using a good graphics package). For uniform P-P probability plot, a = min x^ and 6 = max x^ - min x.. The smooth curve obtained is called the "F^" distribution curve on an "F" P-P probability plot. Note that steps 2 through 5 are exactly what is required for constructing an "F" P-P plot. Distribution curves 80 for the normal, exponential, Gumbel and uniform P-P plots are displayed in Figures 3.4 - 3.27. Figures 3.28, 3.29 and 3-30 are normal P-P probability plots of random samples of size 100 generated using RANCAU(9874127), RANEXP(2572191) AND RANNOR(7250493) (SAS Inc., 1982, p. 195), respectively. The same random sample from the normal distribution is displayed on an exponential P-P probability plot in Figure 3.31. Program codes for constructing normal P-P probability plot, in SAS and DISSPLA languages can be found in Appendix C. A computer implementation of the technique of using distribution curves is now presented. The main idea is to find the best match between the plotted points and an "F" distribution curve. The "F" P-P plot is then constructed for additional support (using the diagonal line) of the chosen probability model. The matching procedure can be automated by matching curves and plotted points using certain criteria like least squares. 6. Computer implementation [1] Input the observations [2] Select plotting position otherwise use the default. [3] Select the type of P-P probability plot wanted: normal, exponential, uniform etc. [4] Plot the points on the screen. [5] Good match with the diagonal? Yes, stop. [6] Select an alternative distribution. Any alternative distribution left? [7] Good match? No, stop. Yes, go to [31 or stop. 81 The graphical and the quantitative methods ought to complement each other. A probability plot often imparts a greater impression of the nature of the data than a number. Shapiro and Wilk (1965) stated that "The formal use of the (one-dimensional) test statistic as a methodological tool in evaluating the normality of a sample is visualized by the authors as a supplement to the normal probability plot and not as a substitute for it." One solution is to incorporate a test statistic into a P-P or Q-Q probability plot using a simultaneous confidence band. Quesenberry and Hales, (1980) suggested using the fact that F[(X^-a)/B] is a beta(i,n-i+1) random variable to construct (1-Y) confidence intervals (L. , U. 1 , T 1> T ) for the Y-coordinate F[(X. - a)/6]. 1 The end points of these confidence intervals are joined together to form a "concentration band". The main disadvantage of a concentration band is that the (1-Y) concentration band does not give a (1-T) simultaneous confidence set for the entire probability plot. The probability that all the points of a sample will fall inside the concentration band will be less than (1-Y). Stirling (1982) showed how to construct a simultaneous confidence band on the P-P or Q-Q probability plots, corresponding to the Kolmogorov-Smirnov test statistic but this type of band appears to be rather conservative. Probability plots provide a qualitative method of assessing the goodness of fit of a probability model to a data set, and insights into any apparent lack of fit of a proposed probability model. The use of probability plots, accompanied by some quantitative tests like the r^ and statistics often provide an excellent tool for assessing the fit of a probability model. 82 o O- O- 1—4 CO Z o- O- O- O- o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential Laplace Cauchy Gumbel Figure 3.4. Normal P—P probability plot 0.9 1.0 83 o o- O" 0 O- ja O// O- oi doo 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.6 Uniform probabilities gamma(0,l,2) logistic gamma(0,l,5) gamma(0,l,10) Figure 3.5. Normal P—P probability plot 0.9 1.0 Normal probabilities 0.3 p 5 b C! M o- oq cd 0Û o. to P ? CÛ g 0 D) I o. cr (D (+ p cr (T> r*- D) 'CO M 6 & o cr % cr P 'to' B *T3 1 o. bi O % : ( - ' • fD O. M ^ 4" U o r*- o. bo o. CO -J 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1 1 I I 1.0 Normal probabilities 0.3 P 5 b OQ 0 M CD Cû o_ o. to !z: 0 g s - B 0) o* •-1 o. I—» TJ 1 •n M O cr o" M U o a* cr (D ÎB- r*p P 3 'o ai bi CO cr a> r+ P cr CD r-iP M o_ O cji cr Is (D o . Ul kl o. CD o. Ô3 -J 0.4 1 0.5 1 0.6 L_ 0.7 I 0.8 I 0.9 I 1.0 Normal probabilities 0.0 0.1 J— 0.2 1 0.3 1 0.4 I 0.5 I 0.6 L 0.7 I 0.8 I 0.9 I orq pi 3 W 00 ? s 1 u 1 S' 4" U o' Î5. P) o en m r*OJ o" bi cr D) 1-» o M 0 y I—»• o* •-J cr m c*D) 3 bi o .en. o oo Normal probabilities 0.2 O 3 -I 0.3 1 0.4 1 0.5 1 0.6 1 0.7 I 0.8 I 0.9 1.0 I b ffQ cM o_ P o. k) (D CO g ? g O M o. P I—' Ti I U o D' 2- ^. cr cr cr cr (D rf P 'Cû .W (D D) P 13 "co CO CO (b P M o. bi & W. Is O" % Ti t—' O r*- m o. M kl o. QD O. CO 00 -s 88 o O" O- oo 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential normal Laplace logistic Figure 3.10. Gumbel P-P probability plot 0.9 1.0 Gumbel probabilities 0. o 5 b orq pi M O- m Cû o. M g £. I T) M 0 cr 1 |: cr cr (D p p (D 0) 'ro' o u' fu cr fD r+P M OO O» cr lw»§ h-»» CD P- W -^î 4" O. I—' o CD O. CÛ O 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Gumbel probabilities 0. O" b QTQ C y CD Cû o- î\3 o. M g g I O* M o. g. •n I Tl M O cr" & >->• I—' hd I—' O 2- cr cr cr cr CD CD CD CD (+ f+ P) r*P) P3 'o 'o' 'h^ CJI CJI 3 CA M 9 M o_ O oi cr & p M: O) CD o. w kl o. ôo o. ô) b 0.3 - 1 0.4 1 0.5 0.6 0.7 0.8 0.9 1.0 1 1 I ' l l 91 o O) d" 00 d- O O- C\î oo 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities beta(0.5,0.5) beta(l,l) or uniform beta(0.5,l) beta(l,0.5) Figure 3.13. Gumbel P—P probability plot 0.9 1.0 92 o CO o o- ^ o- O- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities beta(2,2) beta(3,3) beta(2,3) beta(3,2) Figure 3.14. Gumbel P-P probability plot 0.9 1.0 93 o o- O" 5 2•—H _ O- go CV2 d o d 0.0 0.1 0.2 0,3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities Laplace logistic normal Gumbel Figure 3.15. Exponential P-P probability plot 1.0 94 o O" co docO ® mO o- O- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities gamma(0,l,10) gamma(0,l,4) gamma(0,l,3) gamma(0,l,2) Figure 3.16. Exponential P-P probability plot 1.0 Exponential probabilities 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 O 5 b ? o-J 3 CO o. K) g 20 p O CD B a I—• cr (T> f+ 1 •n A CO o M 0 1 I u o cr m r+p cr m f+ P 'co of o cr p. Ip. o. bi o I ' (D pM ->I O. CD O. b b VO Exponential probabilities 0.0 5 0.1 0.2 J 0.3 1 0.4 1 0.5 1 0.6 I 0.7 I 0.8 I 0.9 I acj 0 m CO 00 M X X) 0 P (D 5; P) 1 • •n I M o cr ë- I—• o cr cr cr cr n> CD (D (D r+ r*(+ (-+- P 33 cn bi 3 P t—^ CÛ ro O 01 kO Exponential probabilities 0.0 O 5 b OQ c M CD o- W Η>• CO o. to M % Xi O m (D gM" P I—' TJ I T) M O O" P cr g SO M o. cr cr cr a> (D CD (-4r*P P p 'P^ 3 o oi 3 1-^ o 0 n o* cr (D (-4P 3 CJl o 3 3 M o. en g- Is m o. M kl o. CD % •m o c-t- o_ CO b 0.2 J 0.3 0.4 0.5 0.6 0.7 0.8 1 1 1 I I I 0.9 1. Exponential probabilities 0. O" 3? 0.1 0.2 L 0.3 1 0.4 1 0.5 1 0.6 L_ 0.7 0.8 L b OQ d o- % CO M o. to M g S- O 0 O* CD 0 r*M» p) h—» 1 % 3 cr & cr cr cr cr CD CD CD ÇD (+ (-H p p fu P 'co "co 3^ 3 3 B S" p- o OI cr" & P- 11: oi CD' P- M -O O. •D I—' o O. b Co 99 o O" oo dOT ^ 0 o~ 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities Cauchy Laplace logistic normal Figure 3.21. Uniform P—P probability plot 1.0 100 / // / / / / /1 .• / / / / Uniform probabilities symmetric triangle Gumbel exponential uniform Figure 3.22. Uniform P—P probability plot I 101 o O- OOT P0) 6- O- o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities gamma(0,1.10) gamma(0,l,4) gamma(0,l,3) gamma(0,l,2) Figure 3.23. Uniform P—P probability plot 1.0 Uniform probabilities 0.0 p b 5 (m C M (û Où O- ro o. M g gs O B nj I O cr g- VJ O r*" cr CJ* cr (D fD fD r*- r+- f+ P) P) 'Cû 'Fo' CO O O 3 ê CM (D r+P) J—^ B o. 4^ M o. O bi cr Is (D O . M O. ba O. io o 0.1 0 0.3 —I 0.4 0.5 1 1 0.6 1 0.7 0.8 1 I 0.9 I 1.0 Uniform probabilities 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5 (TQ c •I m Oû bo ai g o* M B •n I TJ •n M o cr D) cr M*' H—» o cr (D r+ P 3 ai .CJ, or fb p) 'o ai cr cr (D m (+ D) p 'P' & o OJ Uniform probabilities 0.3 p 1 0.4 1 0.5 1 0.6 1 0.7 1 0.8 I 0.9 I b 'S o- 3 CO ho o. to pi g gs o* o M o. "1 cr I 3O" (D f-c p o .bi. cr cr cr (D (D c-i- P fD p Ol Ol o cr X o' M o P 'o' '3 M 0 y t-"' P) r+- O Ol. M o. O cn cr Is CD O. M o. 00 o. CO \\ \ 1.0 105 o m 6 w V o IS O W b 6- « d CM o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities beta(2,2) beta(3,3) beta(2,3) beta(3,2) Figure 3.27. Uniform P—P probability plot 1.0 106 O" 00 O" O" Oo 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential Laplace Cauchy Gumbel Figure 3.28. Normal P-P probability plot 0.9 1.0 107 o o00 d" 0 O- O- o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities — exponential Laplace Cauchy Gumbel Figure 3.29. Normal P-P probability plot 0.9 1.0 108 o O- O" 0) O " mO O" O" O" o- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential Laplace Cauchy Gumbel Figure 3.30. Normal P-P probability plot 0.9 1.0 109 o O- O- O- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities normal Laplace Cauchy Gumbel Figure 3.31. Exponential P—P probability plot 1.0 110 IV. EMPIRICAL POWER COMPARISON A. Methods of Computation This section describes concisely the computing methods behind the Monte Carlo power comparison study. The power study actually consists of three separate power studies: normal, Gumbel and exponential power comparisons. The null composite hypothesis is F = Fg[(X - where a)/g] , (4.1) is either the normal, Gumbel or the exponential cumulative distribution function. The unknown parameters, a and g, are the location and scale parameters, respectively. 1. Statistics used in the power comparison study [1] Correlation type statistics: k^, r^ and W. [2] Pearson chi-square and likelihood ratio statistics: XVj, XI, X3, X5, X7, XI0, XI3, X17, GVj, G1, G3, G5, G7, GIG, G13 and 017. [3] Statistics based on the empirical distribution function: A\ WS U\ V and D. [4] Statistics based on moments: /bj, bg and R. Note that Xm or Gm refers to a chi-square or likelihood ratio statistic with expected cell count equals to m. X7 and G7 for sample size 20, XI0 and GIG for sample size 50 and X17 and G17 for sample size 100 are based on a recommendation by Mann and Wald (1942). The Mann and Wald formulation was based on the equiprobability case for the simple null hypothesis. The W test was used for the normal power study only. Ill The neccessary coefficients for the W test were tabulated for sample sizes up to 50 only, so it is not used in the normal power study for sample size 100. 2. Percentiles used in the power comparison study Except for the Anderson-Darling statistic, the percentiles for the statistics based on the empirical distribution function for the normal power study were obtained from Stephens (197^, 1976). Percentiles for all the other statistics were generated using 15000 Monte Carlo samples. As a check of the accuracy of the percentiles used in the power study, two sets of 5000 random samples of were generated for each statistic and the empirical Type I errors were computed. Table 4.1 contains the empirical Type I errors of the statistics for the test of normality. The empirical Type I error levels were reasonably close to the the specified Type I error levels for all the statistics but the empirical Type I error levels of the chi-square and likelihood statistics showed slightly more fluctuation. This is due to the discreteness of the chi-square or likelihood ratio statistics. Since the percentiles for the Anderson-Darling statistic provided by Stephens (1975) for the test of normality consistently showed inflated Type I error levels, new Monte Carlo percentiles for the Anderson-Darling statistic were generated using 15000 samples. The Anderson-Darling statistic using these percentiles is denoted by A^ and the one using the percentiles presented by Stephens (1974, 1976) is denoted by B^. 112 Table 4.1. Empirical Type I error levels of the statistics based on two sets of 5000 random samples for the testing of departure from the normal distribution (sample size = 20) Set 1 Set 2 Level of significance Level of significance Statistics 0.1 0.05 0.01 0. 1 0.05 0.01 XV, XI X3 X5 X7 .077 .094 .093 .100 .078 .039 .037 .047 .032 .031 .009 .009 .007 .007 .005 .070 .090 .093 .100 .086 .032 .031 .052 .030 .036 .008 .005 .008 .006 .007 GV, G1 G3 G5 G7 .107 .104 .098 .108 .078 .066 .050 .059 .044 .031 .008 .010 .009 .010 .008 .111 .094 .104 .113 .086 .061 .047 .058 .043 .036 .006 .008 .009 .008 .008 W .096 .092 .101 .048 .046 .054 .009 .012 .011 .103 .092 .096 .052 .044 .050 .012 .008 .011 /bi bz R .100 .105 .102 .054 .047 .051 .009 .010 .010 .103 .105 .101 .051 .046 .048 .009 .008 .010 D V .097 .098 .095 .098 .130 .048 .049 .048 .050 .070 .009 .008 .010 .010 .017 .099 .095 .097 .101 .131 .050 .048 .050 .050 .069 .010 .008 .009 .010 .017 Uz 8= 113 3. Sample sizes and significant levels used in the power comparison study Three significant levels 0.1, 0.05 and 0.01 and three sample sizes 20, 50 and 100 were considered in the power study. 1000, 500 and 250 statistics were generated for each of the alternative distributions, for sample sizes 20, 50 and 100, respectively. Obviously, the estimated • power levels have larger variances for small samples like 250 and 500. However, with such a wide range of alternative distributions, one can obtain good estimates of the powers of these statistics by examining average results for various subsets of the alternatives. 4. Alternative distributions used in the power comparison study The alternative distributions used in the power comparison study consist of a wide range of distributions. These alternative distributions include symmetrical distributions like the Laplace and logistic distributions, skewed distributions like the chi-square and beta distributions, short tailed distributions like the uniform distribution, and heavy tailed distributions like the Cauchy distribution. Bimodal distributions, location or scale contaminated normal distributions and location or scale contaminated exponential distributions were also included. The formulas for the alternative distributions used in the study are given in Appendix A. The skewness and kurtosis values for the sets of alternative distributions used in the normal, Gumbel and exponential power comparison study are given in Tables 4.2 - 4.4. 114 Table 4.2. Alternative distributions used in the normal power comparison study No. Distribution Bz No. Distribution 62 N(0,1)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1 ) 0 0 0 0 0 1.15 1 .50 1 .51 1 .63 1 .72 36 37 38 39 40 t(4) t(2) t(1) Cauchy(0,1) 38(0.5333,0.5) 6 7 8 9 10 Tukeyd .5) Uniform(0,1) 38(0,0.707) Tukey(0.7) TruncN(-1 ,1) 0 0 0 0 0 1.75 1 .80 1.87 1.92 1 .94 41 42 43 44 45 TruncN(-2,1 ) Beta(3,2) Beta(2,1) TruncN(-3,2) Weibull(3.6) 11 12 13 14 15 N(0,1)+N(3,1 ) TukeyO) Beta(2,2) TruncN(-2,2) Triangle 1(1 ) 0 0 0 0 0 2.04 2.06 2.14 2.36 2.40 46 47 48 49 50 Weibull(4) SB(1,2) TruncN(-3,1) SB(1,1) Weibull(2.2) -.09 0.28 -.55 0.73 N(0,1)+N(2,1) 2.50 2.84 2.92 3.53 4.00 51 52 53 54 55 LoConN(0.2,3) LoConN(0.2,5) LoConN(0.2,7) Weibull(2) Half N(0,1) 0.68 3.09 1.07 1.25 0.63 3.16 0.97 3.20 3.25 3.78 4.20 4.51 56 57 58 59 60 LoConNO.1,3) LoConN(0.05,3) Gumbel(0,1) LoConN(0.1,5) 3U(-1,2) 0.80 0.68 1.14 1 .54 0.87 4.02 4.35 5.40 5.45 5.59 1 2 3 4 5 0 0 0 0 0.65 2.13 -.32 2.27 0.29 2.36 -.57 2.40 2.65 2.72 -.18 0.00 0.51 2.75 2.77 2.78 2.91 3.04 16 17 18 19 20 N(0,1)+N(1,1 ) SU((0,3) t(10) 0 0 0 0 0 21 22 23 24 25 Logistic(0,1) SU(0,2) Tukey(IO) Laplace(0,1) ScConN(0.2,3) 0 0 0 0 0 6.00 7.54 25 27 29 30 SGConN(0.05,3) ScConN(G.1,3) ScConN(0.2,5) ScConN(0.2,7) ScConN(0.1,5) 0 0 0 0 0 8.33 11.2 12.8 16.5 61 62 63 64 65 Chi-Square(4) LoConN(0.1,7) LoConN(0.05,5) Exponent!aid ) LoConN(0.05,7) 1.41 1.96 1.65 2.00 2.42 6.00 6.60 7.44 9.00 10.4 31 32 33 34 35 ScConN(0.05,5) SoConN(0.1,7) ScConN(G.05,7) SU(0,1) SU(0,0.9) 0 0 0 0 0 20.0 21 .5 31 .4 36.2 82.1 66 67 68 69 70 Chi-square(1) Triangle 11(1) Weibull(0.5) SU(1,1) LogN(0,1,0) 2.83 0.57 6.62 -5.3 6.18 15.0 16.4 87.7 93.4 114 28 TruncN(-3,3) 5.38 7.65 115 Table 4.3. Alternative distributions used In the Gumbel power comparison study No. Distribution 1 N(0,1)+N(10,1 ) 2 Beta(0.5,0.5) 3 N(0,l)+N(5,l) 4 SB(0,0.5) 5 N(0,l)+N(4,1) No. Distribution 32 0 0 0 0 0 1.15 1 .50 1.51 1 .53 1.72 36 37 38 39 40 SU(0,0.9) t(4) t(2) t(1) Cauchy(0,1) Bz 0 0 0 0 0 82.1 5 7 8 9 10 Tukeyd .5) Uniform(0,1) SB(0,0.707) Tukey(0.7) TruncN(-1,1) 0 0 0 0 0 1.75 1 .80 1.87 1.92 1.94 41 42 43 44 45 58(0.5333,0.5) TruncN(-2,1) Beta(3,2) Beta(2,1) TruncN(-3,2) 0.65 -.32 0.29 -.57 -.18 2.40 2.65 11 12 13 14 15 N(0,1)+N(3,1) Tukey(3) Beta(2,2) TruncN(-2,2) Triangle 1(1) 0 0 0 0 0 2.04 2.06 2.14 2.36 2.40 46 47 48 49 50 Weibull(3.6) Weibull(4) SB(1,2) TruncN(-3,1) SB(1,1) 0.00 -.09 0.28 -.55 0.73 2.75 2.77 2.78 2.91 15 17 18 19 20 N(0,l)+N(2,1) TruncN(-3,3) N(0,1) SU((0,3) 0 0 0 0 0 2.50 2.84 2.92 3.00 3.53 51 52 53 54 55 Weibull(2.2) LoConN(0.2,3) LoConN(0.2,5) LoConN(0.2,7) Weibull(2) 0.51 0.68 1.07 1.25 0.63 3.16 3.20 3.25 21 22 23 24 25 t(10) Logistic(0,1) SU(0,2) Tukeyd 0) Laplace(0,1 ) 0 0 0 0 0 4.00 4.20 4.51 5.38 6.00 56 57 58 59 60 Half N(0,1) LoConN(0.1 ,3) LoConN(0.05,3) LoConN(0.1,5) SU(-1,2) 0.97 0.80 0.68 1.54 0.87 3.78 4.02 4.35 5.45 5.59 25 27 30 ScConN(0.2,3) ScConN(0.05,3) ScConN(0.1,3) ScConN(0.2,5) ScConN(0.2,7) 0 0 0 0 0 7.54 7.65 8.33 11.2 12.8 61 62 63 64 65 Chi-square(4) LoConN(0.1,7) LoConN(0.05,5) Exponential(1) LoConN(0.05,7) 1,41 1.96 1.65 2.00 2.42 6.60 7.44 9.00 10.4 31 32 33 34 35 ScConNCO.1,5) ScConN(0.05,5) ScConN(0.1,7) ScConN(0.G5,7) SU(0,1) 0 0 0 0 0 16.5 20.0 21.5 31.4 36.2 66 67 Chi-square(1) Triangle 11(1) Weibull(0.5) SU(1,1) LogN(0,1,0) 2.83 0.57 6.62 -5.3 28 29 N(0,1)+N(1,1) . 68 69 70 6.18 2,13 2.27 2.36 2.72 3.04 3:09 6.00 15.0 16.4 87.7 93.4 114 115 Table 4.4. Alternative distributions used in the exponential power comparison study No. Distribution No. Distribution /6i 62 N(0,1)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1) 0 0 0 0 0 1.15 1 .50 1 .51 1.53 1 .72 31 32 33 34 35 ScConN(0.05,5) ScConN(0.1,7) ScConN(0.05,7) SU(0,1) SU(0,0.9) Tukeyd .5) Uniform(0,1) SB(0,0.707) Tukey(0.7) TrunoN(-1,1) 0 0 0 0 0 1 .75 1 .80 1 .87 1 .92 1 .94 36 37 38 39 40 t(4) t (2) t(1) Cauchy(0,1) SB(0.5333,0.5) 0 0 0 0 0.65 2.13 11 N(0,1)+N(3,1) 1 2 Tukey(3) 13 Beta(2,2) 1 4 TruncN(-2,2) 1 5 Triangle 1(1) 0 0 0 0 0 2.04 2.06 2.14 41 42 43 44 45 TruncN(-2,1 ) Beta(3,2) Beta(2,1) TruncN(-3,2) Weibull(3.6) -.32 0.29 -.57 -.18 0.00 2.27 2.36 2.40 2.65 2.72 16 17 18 19 20 N(0,1)+N(2,1) TruncN(-3,3) N(0,1) SU((0,3) t(10) 0 0 0 0 0 .46 47 48 49 50 Weibull(4) SB(1,2) TruncN(-3,1) SB(1,1) Weibull(2.2) -.09 0.28 -.55 0.73 0.51 2.75 2.77 2.78 2.91 3.04 21 22 23 24 25 Logistic(0,1 ) SU(0,2) Tukey(IO) Laplace(0,1) ScConN(0.2,3) 0 0 0 0 0 51 52 53 54 55 LoConN(0.2,3) LoConN(0.2,5) LoConN(G.2,7) TruncE(0,3) Weibull(2) 0.68 1 .07 1.25 0.99 0.63 3.16 3.20 3.22 3.25 26 27 28 29 30 ScConN(0.05,3) ScConN(0.1 ,3) ScConN(0.2,5) ScConN(0.2,7) ScConN(0.1 ,5) 0 0 0 0 0 56 57 58 59 60 LoConE(0,2,7) LoConE(0.2,5) Half N(0,1) LoConN(0.1,3) LoConE(0.2,3) 1.33 1 .25 0.97 0.80 1 .20 3.40 3.78 4.02 4.09 1 2 3 4 5 5 7 8 9 10 2.36 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 1 2.8 16.5 62 0 0 0 0 0 20.0 21 .5 31.4 36.2 82.1 3.09 3.27 117 Table 4.4 (continued) No. Distribution /6. 6: 6? 52 63 64 65 TruncE(0,4) LoConN(0.05,3) TrunGE(0,5) GumbeKO,1 ) LoConN(0.1,5) 1 .27 0.68 1 .50 1 .14 1 .54 4.20 4.35 5.26 5.40 5.45 66 67 69 70 SU(-1,2) LoConE(0.1,3) Chi-square(4) LoConE(0.1,5) TruncE(0,6) 0.87 1 .62 1 .41 1 .88 1.68 5.59 5.86 6.00 5.02 6.29 71 72 73 74 75 LoConN(0.1,7) LoConE(0.05,3) LoConN(0.05,5) LoConE(0.05,7) ScConE(0.05,2) 1 .96 1,.85 1 .65 2,.75 2,.42 6.60 7.29 7.44 10.9 13.6 76 77 Chi-square( 1 ) ScConECO.I,2) ScConE(0.2,2) LoConE(0.01 ,7) Triangle 11(1) 2,.83 2,.61 2..71 2.,94 0..57 15.0 15.3 15.6 15.9 1 6.4 ScConE(0.01 ,3) ScConE(0.2,3) ScConE(0.1,3) ScConE(0.05,3) ScConE(0.2,7) 2..59 3..57 3.,81 3.60 4.50 18.0 ScConE(0.1,5) ScConE(0.1,7) ScConE(0.01 ,5) ScConE(0.05,5) Weibull(0.5) 5.38 6.02 4.81 6.05 6.62 87.7 SU(1,1) LogN(0,1,0) -5;.3 6.18 93.4 11 4 68 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 23.8 29.4 29.8 31.5 48.7 56.2 55.7 68.2 118 5. Random variâtes generators Hoaglin and Andrews (1975) emphasized the importance of comprehensive and concise reporting of computing methods. The two uniform pseudo random number generators, RANDOM (Wichmann and Hill, 1982a) and DSMCG, used in the power comparison study, will be described in detail. To obtain a uniform (0,1) random number using RANDOM, three integers IX, lY and IZ are generated using three different multiplicative congruential generators: IX = MOD (171*IX, 30269) , lY = MOD (172*IY, 30307) and (4.2) IZ = MOD (170*IZ, 30323). The uniform (0,1) random number, U is then given by the fractional part of U = IX/30269 + IY/30307 + IZ/30323. (4.3) The results of tests of uniformity and randomness of the generator RANDOM can be found in Wichmann and Hill (1982b). DSMCG (double shuffled multiplicative congruential generator) employs six multiplicative congruential generators: PICKG, GENT1, GENT2, GENT3, GENT4 AND PICKR. DSMCG. Figure 4.1 is a flowchart of the generator 119 PICKG IGS7575 7576<IGg15134 15134<IGâ22701 GENT1 GENT2 GENTS IRS7576 7576<IR<1513^ 1513^<IR^22701 ST0RE1 ST0RE2 ST0RE3 I I I IX IX IX Figure 4.1. Flowchart of DSMCG generator IG>22701 GENT4 IR>22701 STORED 1 IX 1 20 The multiplicative congruential generators used by DSMCG are IG = MOD (171*IG, 30269) , (4.4) IR = MOD (172*IR, 30307) and (4.5) IX = MOD (170*IX, 30326) . (4.6) PICKG uses (4.4), PICKR uses (4.5) and GENT1, GENT2, GENT3 and GENT4 use (4.5) to generate a uniform integer. The steps in generating a uniform (0,1) random number using DSMCG are as follows: 6. Algorithm for DSMCG [1] Supply 5 seeds in the range [10000,30000]. [2] An initialization routine is run so that GENTi (i=1,2,3,4) will each generate a random number and store it in STOREi. [3] Generate IG using PICKG. [4] Select one of the GENTi based on IG (see Figure 4.1) and generate IX, [5] Generate IR using PICKR. [6] Select one of the STOREi based on IR (see Figure 4.1) and deliver IX from STOREi as IX/30323. [7] Put IX generated in [4] in STOREi. [6] Go to [3] for the next IX. Some tests of randomness on DSMCG indicated that DSMCG has very good randomness properties. properties. It also has moderately good uniform The results of these tests can be found in Gan (1985). The power comparison results obtained from these two generators are very similar. Appendix B contains a complete description of the generation 121 of random numbers from the various distributions used in the power comparison study. 7. Machines used in the power comparison study The IBM personal computer (IBM PC) with an 8087 coprocessor was used for the entire power comparison study. All the computer programs needed were developed from scratch since reliable subroutines like those provided by IMSL were not available on the IBM PC. All programs were written in FORTRAN 77 and a description of FORTRAN 77 can be found in Microsoft FORTRAN Reference Manual (1983). Each subroutine developed was thoroughly tested, using at least two different methods. B. Results of the Power Comparison This section summarizes the results from the Monte Carlo power comparison. Some general and specific observations concerned with the performance of these statistics will be made. Each class of statistics will be studied separately and then an overall comparison will be made. 1. Comparisons among statistics of the correlation coefficient type The numbers in Tables 4.5 - 4.12 indicate the proportions of simulated samples for which the null distribution was rejected. A number in the column for W is printed in bold if this number is greater than or equal to the corresponding number in the r^ column. The and the r^ statistics were contrasted by printing in bold the larger of the 122 two numbers for each alternative distribution. In the event of a draw, both the numbers were printed in bold. The Shapiro-Wilk statistic is the most powerful in detecting alternative distributions with kurtosis less than 3- However, the r^ statistic is more powerful than the Shapiro-Wilk statistic in detecting symmetrical alternative distibutions with kurtosis greater than 3, In order to understand the difference between the Shapiro-Wilk and the r^ statistics, the coefficients used in computing the Shapiro-Wilk and the r^ statistics ought to be examined. The Shapiro-Wilk and the r^ statistics can be expressed as (I "iX ): W = , (it.7) (I r X ): ^ . - X): (4.8) I(X. - X): and r^ = The coefficients w^ and r^ for certain selected sample sizes are listed in Table 4.5. The r^ statistic puts more weight in the tails than at the center of the null distribution in the sense that ^rvEX. > ^w^EX^ although w^ i r^ and w^ g r^. 123 Table 4.5. Comparison between the coefficients used in computing the Shapiro-Wilk and r^ statistics Coefficients (i=[n/2]+1,...,n) 5 10 .2413 EÏj .4950 .6460 .6646 1.1630 r. .0458 .0399 .1227 .1399 .1224 .3758 ^ .2876 4 20 .2425 .2141 .6561 .3644 .3291 1.0014 .5355 .5739 1.5388 OC r=.O or .0154 .0463 .0780 .1108 .1457 .1834 .2255 .2748 .3370 .4295 .0140 .0422 .0711 .1013 .1334 .1686 .2085 .2565 .3211 .4734 "i EX^ .0620 .1870 .3149 .4483 .5903 .7454 .9210 1.867 The Shapiro-Wilk statistic performed better than the r^ statistic in detecting skewed distributions except the location contaminated normal distributions. The Shapiro-Wilk statistic generally performed better than the k^ statistic. The statistic generally performed better than the k^ statistic in detecting alternative distributions with tails heavier or longer than that of the null distribution. As the kurtosis of the null distribution increases from 3 to 9, the relative performance of the improved. statistic The r^ statistic is more powerful in detecting alternative distributions like the location or scale contaminated normal distributions for the normal power comparison, and location or scale contaminated exponential distributions for the exponential power comparison. 1 24 1.0 0 Figure 4.2. X Sketch of the cumulative distribution function of the standard normal random variable Figure 4.2 is a sketch of the cumulative distribution function of the standard normal random variable. A small change in x near the location of the normal distribution causes a larger change in the cdf F(x) than an equivalent change in x further out in the tail. This is clear from the diagram since the slope of F(') is the largest at the location. The k^ statistic which is based on the distribution function, is thus sensitive to deviations occurring near the location of the null distribution. To understand the r^ statistic, a sketch of the density function of the standard normal random variable is helpful. Figure 4.3 is a sketch of the density function of the standard normal random variable. The same change in the probability p causes a greater change —1 of the percentile F (p) at the tails than at the location. Consequently, the r^ statistic which is based on percentiles is more sensitive to deviations from the tails of the hypothesized distribution. 125 0 Figure 4.3. X Sketch of the density function of the standard normal random variable For the normal case, the statistic is slightly more powerful than the r^ statistic in detecting very close alternative distribution like the Weibull(4) distribution. Unlike the normal case where the kurtosis provided a shape division for the performance of the k^ and r^ statistics, it is less obvious for the Gumbel case. For the Gumbel case, the k^ statistic is usually more powerful than the r^ statistic except for some alternative distributions with large kurtotis values. The k^ statistic is much more powerful than the r^ statistic for most alternative distributions in the exponential case. The r^ statistic performed better than the k^ statistic for some alternative distributions with very small or large kurtosis values. For certain alternative distributions, the k^ statistic is more powerful than the r^ statistic when the sample size is small but the trend reverses when the sample size is large. 126 Table 4.6. Empirical 5? level power (in % xlO) for tests of departure from the normal distribution n = 20 Sample sizes Statistics W No. Distribution 1 2 3 U 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 N(0,1)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1) Tukeyd .5) Uniforra(0,1 ) SB(0,0.707) Tukey(0.7) TruncN(-l,1) N(0,1)+N(3,1) Tukey(3) Beta(2,2) TruncN(-2,2) Triangle 1(1) N(0,1)+N(2,1) TruncN(-3,3) N(0,1)+N(1,1) SU((0,3) t(10) Logistic(0,1) SU(0,2) Tukey(IO) Laplace(0,1) ScConN(0.2,3) ScConN(0.05,3) ScConN(0.1,3) ScConN(0.2,5) ScConN(0.2,7) ScConN(O.T,5) ScConN(0.05,5) ScConN(0.1,7) ScConN(0.05,7) SU(0,1) SU(0,0.9) r: n = 100 n = 50 k: W r: k: r: k: Bz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1 .50 1 .51 1.63 1 .72 1.75 1 .80 1 .87 1 .92 1 .94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 1 2.8 16.5 20.0 21.5 31 .4 36.2 82.1 1000 729 759 443 403 267 189 136 135 in 135 77 44 32 38 43 43 50 66 96 108 128 816 244 381 196 318 704 845 525 340 689 457 429 508 1000 1000 284 438 356 787 109 247 117 438 46 141 33 118 81 21 29 93 14 79 31 156 20 50 11 39 10 33 41 11 47 23 50 36 45 51 86 61 119 81 126 84 143 97 863 938 316 226 452 255 211 120 346 192 766 627 874 803 562 406 347 234 704 587 475 382 504 380 575 475 1000 1000 1000 992 942 920 876 746 6l 6 624 402 390 224 84 82 68 34 44 64 118 118 158 996 422 604 316 446 948 988 822 608 930 770 690 818 1000 1000 954 908 974 1000 656 662 638 930 320 388 284 372 142 266 104 206 78 152 112 364 70 30 12 68 4 46 14 48 12 50 30 54 64 54 138 66 228 102 250 82 290 11 4 1000 1000 664 488 784 458 468 162 610 250 976 890 996 982 908 654 694 402 952 850 808 596 832 666 922 814 1000 1000 1000 1000 1000 1000 1000 1000 984 1000 912 788 832 700 644 6l 2 496 444 480 376 360 764 176 1 44 88 176 8 60 8 40 15 120 4 32 56 .36 176 68 252 64 460 136 472 158 1000 1000 848 776 932 696 660 188 844 392 996 988 1000 1000 980 856 924 564 1000 980 948 780 964 892 996 964 127 Table 4.7. Empirical 5? level power (in % xlO) for tests of departure from the normal distribution n = 20 Sample sizes Statistics No. Distribution 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 t(4) t(2) t(1) Cauchy(0,1 ) 38(0.5333,0.5) TruncN(-2,1 ) Beta(3,2) Beta(2,1) TruncN(-3,2) Weibull(3.6) Weibull(4) SB(1,2) TruncN(-3,1) SB(1,1) Weibull(2.2) LoConN(0.2,3) LoConN(0.2,5) LoConN(0.2,7) Weibull(2) Half N(0,1) LoConN(0.1,3) LoConN(0.05,3) GumbeKO,1 ) LoConN(0.1,5) SU(-1,2) Chi-Square(4) LoConN(0.1,7) LoConN(0.05,5) Exponential(1) LoConN(0.05,7) Chi-square(1) Triangle 11(1) Weibull(0,5) SU(1,1) LogN(0,1,0) W r^ n = 50 1<== n =100 r^ r: 560 296 906 804 996 998 996 994 998 998 1000 972 964 416 86 1 66 244 56 108 886 576 584 48 66 26 14 50 50 40 18 38 64 90 42 486 270 210 806 602 552 260 192 152 602 516 594 1000 1000 1000 1000 1000 1000 400 310 204 944 850 722 470 574 370 310 444 190 618 606 470 978 988 900 390 492 296 958 890 804 992 992 974 854 898 536 1000 998 986 936 940 790 1000 1000 1000 884 590 580 1000 1000 1000 972 980 950 1000 1000 998 768 1000 1000 1000 1000 376 W k" Ba 0 218 0 507 0 873 0 868 0.65 2.13 725 96 -;32 2.27 0.29 2.36 63 316 -.57 2.40 -;18 2.65 46 0.00 2,72 44 -.09 2.75 40 0.28 2.77 56 164 -.55 2.78 0.73 2.91 312 0.51 3.04 111 0.68 3.09 273 1.07 3.16 887 1.25 3.20 985 0.63 3.25 173 0.97 3.78 445 0.80 4.02 258 0.68 4.35 210 1.14 5.40 314 1.54 5.45 775 0.87 5.59 216 1.41 6.00 519 1.96 6.60 878 1.65 7.44 539 2.00 9.00 832 2.42 10.4 649 2.83 15.0 989 0.57 16.4 301 6.62 87.7 1000 745 -5.3 93.4 6.18 114 934 255 158 563 442 898 865 898 869 421 555 44 78 26 54 158 225 25 45 26 52 61 32 62 37 105 106 205 232 88 89 222 263 785 848 978 966 133 127 329 289 270 208 225 133 282 240 772 611 220 162 437 384 875 803 559 336 738 703 652 551 959 952 151 221 997 996 727 676 893 860 372 812 448 948 1000 1000 1000 388 196 244 988 920 28 60 24 56 20 56 136 164 672 488 956 900 388 284 884 908 1000 1000 1000 1000 592 444 996 940 864 664 712 328 936 784 1000 992 712 452 996 988 1000 1000 988 832 1000 1000 1000 932 1000 1000 980 912 1000 1000 1000 1000 1000 1000 128 Table 4.8. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution n = 20 Sample sizes Statistics No. Distribution 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 N(0,1)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1) Tukey(1.5) Uniform(0,l) SB(0,0.707) Tukey(0.7) TruncN(-1,1) N(0,1)+N(3,1) Tukey(3) Beta(2,2) TruncN(-2,2) Triangle 1(1) N(0,1)+N(2,1) TruncN(-3,3) N(0,1)+N(1,1) N(0,1) SU((0,3) t(10) Logistic(0,1) SU(0,2) TukeydO) Laplace(0,1) ScConN(0.2,3) ScConN(0.05,3) ScConNCO.1,3) 29 ScConNCO.2,5) 30 31 32 33 34 ScConN(0.2,7) ScConN(0.1,5) ScConNCO.05,5) ScConN(0.1,7) ScConN(0.05,7) 35 SU(0,1) n = 50 n = 100 r" k2 r: kz r2 kz 991 400 464 226 243 1 64 1 42 116 92 104 1 24 1000 1000 543 934 1000 944 996 814 898 674 604 548 540 494 558 462 396 378 406 352 382 384 422 466 500 538 504 1000 760 792 532 626 936 982 1000 1000 1000 1000 1000 996 984 960 972 940 904 908 804 1000 1000 1000 1000 1000 62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1 .50 1 .51 1 .63 1 .72 1 .75 1 .80 1 .87 1.92 1 .94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 1 2.8 16.5 20.0 21.5 31.4 36.2 83 70 86 76 87 106 109 114 138 174 192 190 766 310 386 249 318 765 337 462 263 240 215 200 206 236 167 160 176 174 156 173 170 196 223 253 253 279 963 441 422 270 334 936 81 2 676 598 544 436 426 388 384 298 244 184 204 192 176 222 200 264 298 318 340 972 500 680 432 580 930 640 656 644 468 564 560 568 668 620 636 1000 808 880 696 824 336 656 646 505 560 421 874 734 457 503 932 782 724 1000 1000 960 924 996 952 852 940 700 816 565 398 656 652 789 515 990 858 814 952 960 900 880 852 916 816 756 716 700 708 664 704 728 764 832 824 844 1000 968 952 764 872 1000 1000 960 904 984 936 988 129 Table 4.9. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution n = 20 Sample sizes Statistics No. Distribution 35 SU(0,0.9) 37 t(4) 38 t(2) 39 t(1) 40 CauGhy(0,l ) 41 88(0.5333,0.5) 42 TruncN(-2,l) 43 Beta(3,2) 44 Beta(2,l) 45 TruncN(-3,2) 45 Weibull(3.6) 47 Weibull(4) 48 SB(1,2) 49 TruncN(-3,l) 50 SB(1,1) 51 Weibull(2.2) 52 LoConN(0.2,3) 53 LoConN(0.2,5) 54 LoConN(0.2,7) 55 Weibull(2). 56 Half N(0,1) 57 LoConN(0.1,3) 58 LoConN(0.05,3) 59 LoConN(0.1,5) 60 SU(-1,2) 51 Chi-square(4) 62 LoConN(0.1,7) 53 LoConN(0.05,5) 54 ExponentiaK1) 55 LoConN(0.05,7) 55 Chi-square(l ) 57 Triangle 11(1) 68 Weibull(0.5) 59 SU(1,1) 70 LogN(0,l,0) r^ /6i 82 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.58 1.07 1.25 0.63 0.97 0.80 0.58 1.54 0.87 1.41 1.96 1.55 2.00 2.42 2.83 0.57 5.62 -5.3 6.18 82.1 523 291 525 862 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 844 33 251 239 590 117 70 132 35 395 5 14 13 147 506 15 13 41 53 279 79 68 731 261 225 594 533 10 874 897 512 n = 50 r^ 569 358 550 872 846 291 349 299 591 210 177 218 92 41 4 56 69 59 306 701 47 64 86 133 1 54 104 50 404 157 247 234 704 79 940 912 484 795 504 844 998 994 136 762 656 972 352 188 282 46 906 2 8 4 266 988 4 12 18 64 382 890 660 892 994 998 800 746 686 940 500 440 508 168 852 96 120 84 794 980 72 142 164 222 396 80 158 85 136 988 708 454 302 582 312 914 384 860 998 12 218 1000 1000 998 1000 768 874 n = 100 r=^ k: 960 760 972 1000 1000 728 1000 996 1000 888 558 692 136 1000 8 28 0 836 1000 8 4 15 76 736 1 40 116 1000 704 560 996 996 64 1000 1000 952 1000 904 976 1000 1000 996 980 964 1000 800 728 808 328 988 176 188 156 996 1000 120 316 252 420 684 260 188 940 472 920 644 1000 436 1000 1000 1000 130 Table 4.10. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution n = 20 Sample sizes Statisti OS No. Distribution 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,l)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,l) 6 TukeyCl.S) 7 Uniforra(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,1)+N(3,l) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 1 5 Triangle 1(1) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 SoConN(0.2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN{0,1,5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n = 100 k^ rz k: r^ k2 955 973 1000 692 728 600 600 641 580 996 990 1000 970 611 555 529 487 505 485 511 451 477 419 417 438 439 450 441 442 453 472 782 524 532 472 508 685 793 640 625 634 625 650 682 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 632 1000 824 1000 1000 r^ /Si n = 50 62 1.15 1 .50 1.51 1.63 1.72 1 .75 1 .80 1 .87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 982 964 982 974 968 966 701 964 703 747 800 818 817 783 858 864 866 874 881 880 993 903 884 874 851 888 914 862 938 972 948 924 906 892 878 850 824 864 822 830 982 856 834 810 344 944 970 944 982 942 976 962 966 974 968 986 988 996 994 998 1000 994 1000 1000 998 1000 1000" 1000 1000 998 1000 994 998 1000 998 998 996 1000 1000 996 1000 980 1000 980 1000 988 1000 1000 1000 980 1000 964 1000 980 976 996 1000 996 1000 1000 1000 1000 1000 131 Table 4.11. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution n = 20 Sample sizes Statistics No. Distribution 0 31 ScConN(0.05,5) 0 32 ScConN(0.1,7) 0 33 ScConN(0.05,7) 3M SU(0,1) 0 0 35 SU(0,0.9) 36 t(it) 0 0 37 t(2) 38 t(1) 0 0 39 Cauchy(0,1) 40 38(0.5333,0.5) 0.,65 41 TruncN(-2,1) .32 42 Beta(3,2) 0.29 43 Beta(2,1) .57 44 TruncN(-3,2) 18 45 Weibull(3.5) 0.GO 46 Weibull(4) 09 47 SB(1,2) 0.28 48 TruncN(-3,1) 55 49 SB(1,1) 0.73 50 Weibull(2.2) 0.51 51 LoConN(0.2,3) 0.68 52 LoConN(0.2,5) 1.07 53 LoConN(0.2,7) 1. 25 54 TruncE(0,3) 0.99 55 Weibull(2) 0.63 56 LoConE(0.2,7) 1.33 57 LoConE(0.2,5) 1.25 58 Half N(0,1) 0.97 59 LoConN(0.1,3) 0.80 60 LoConE(0.2,3) 1.20 n = 50 n = 100 r^' k" r" kz r== kz 541 723 627 566 848 873 858 912 910 874 910 957 960 107 908 892 956 885 851 861 709 951 225 589 551 359 475 71 524 82 95 187 680 108 904 962 940 998 1000 998 998 1000 998 998 1000 1000 288 1000 1000 1000 998 1000 1000 990 1000 498 944 912 866 984 86 900 156 144 360 968 154 992 996 996 984 1000 1000 1000 1000 1000 1000 1000 1000 1000 632 1000 1000 1000 1000 1000 1000 1000 1000 824 1000 996 996 1000 156 1000 260 308 652 1000 256 Bz 20.0 21 .5 31.4 36.2 82.1 632 498 623 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.22 3.25 3.27 3.40 3.78 4.02 4.09 854 857 69 751 711 920 550 466 477 246 855 38 152 69 40 161 13 107 36 32 48 133 46 874 908 846 932 994 990 286 1000 1000 1000 964 854 942 650 998 88 322 150 96 554 6 260 38 38 50 194 40 992 984 1000 1000 1000 796 1000 1000 1000 1000 1000 1000 936 1000 264 704 396 428 976 32 512 44 48 48 348 44 132 Table 4.12. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution n = 20 Sample sizes Statistics No. Distribution r== r" /6i 61 TruncE(0,4) 1.27 62 LoConN(0.05,3) 0.68 1.50 63 TruncE(0,5) 64 Gumbel(0,1) 1.14 65 LoConN(0.1,5) 1.54 66 SU(-1,2) 0.87 67 LoConE(0.1,3) 1.62 68 Chi-Square(4) 1.41 69 LoConE(0.1,5) 1.88 1 .68 70 TruncE(0,6) 71 LoConN(0.1,7) 1.96 72 LoConE(0.05,3) 1.85 73 LoConN(0.05,5) 1.65 74 LoConE(0.05,7) 2.75 75 ScConE(0.05,2) 2.42 76 Chi-square(1) 2.83 77 ScConE(0.1,2) 2.61 78 ScConE(0.2,2) 2.71 79 LoConE(0.01 ,7) 2.94 80 Triangle 11(1) 0.57 81 ScConE(0.01,3) 2.59 82 ScConE(0.2,3) 3.57 83 ScConE(0.1,3) 3.81 84 SoConE(0.05,3) 3.60 85 ScConE(0.2,7) 4,50 86 ScConE(0.1,5) 5.38 87 ScConE(0.1,7) 6.02 88 ScConE(0.01,5) 4.81 89 ScConE(0.05,5) 6.05 90 Weibull(0.5) 6.62 91 SU(1,1) -5.3 6.18 92 LogN(0,1,0) n = 50 n = 100 r^ k2 0 76 1000 68 1000 1000 1000 84 660 112 36 1000 72 1000 32 4.20 4.35 5.26 5.40 5.45 5.59 5.86 6.00 6.02 6.29 6.60 7.29 7.44 10.9 13.6 15.0 15.3 15.6 15.9 16.4 18.0 23.8 29.4 29.8 31.5 48.7 56.2 66.7 68.2 87.7 93.4 114 131 209 38 40 42 22 349 38 269 46 785 39 471 585 708 52 177 65 48 590 50 701 59 76 51 48 374 60 90 162 223 210 598 316 89 78 54 58 64 40 280 51 209 54 58 42 156 48 57 116 128 56 48 20 212 532 644 832 70 180 152 133 62 99 77 36 218 138 84 439 978 1000 359 509 102 251 179 333 51 99 704 894 128 490 556 160 548 920 316 1000 1000 1000 256 740 48 544 59 535 98 320 336 222 493 964 122 649 994 227 1000 402 12 236 27 58 4 408 4 1 04 102 330 56 18 48 10 582 42 776 50 998 46 872 960 978 76 376 62 40 976 60 990 36 508 54 58 44 108 788 46 172 974 1000 206 628 0 144 1 40 580 60 20 24 0 968 908 1000 524 40 52 864 860 88 1000 1000 1000 44 324 1000 1000 • 424 133 2. Comparison between various versions of Pearson chi-square statisti c The numbers in Tables 4.13 ~ 4.33 indicate the proportion of simulated samples for which the null distribution was rejected. largest number in each line was printed in bold. The Only the power results for the Pearson chi-square statistic are listed in Tables 4.13 - 4.33. The X3 statistic is generally the most powerful when the sample size is 20. When the sample size increases to 50, the X5 statistic becomes the dominant statistic. When the sample size is 100, it is harder to pin point the best chi-square statistic, however, any chi-square statistic with expected cell count around 8 is probably optimum. for all the three null hypotheses investigated. This trend holds The power study suggests that the number of cells ought to increase with the sample size to achieve optimum power. However, the choice of the number of cells which provides optimum power depends somewhat on the alternative distributions. The chi-square statistics with large expected cell counts are the most powerful in detecting scale contaminated normal distributions for the normal power study. The scale contaminated normal distributions are very similar in shape to the normal distribution. In order to distinguish between the normal distribution and a close alternative distribution, the observed cell counts must be sufficiently large to provide deviation from the expected cell counts for the null distribution. Thus, the recommendations of Mann and Wald (1942), the X7, XI3 and the XI7 statistics performed very well here. For alternative distributions that differ in shape from the normal distribution like beta(0.5,0.5), uniform or exponential distribution, a 134 Table 4.13. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution (sample size = 20) Statist!OS No. Distribution 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukey(1.5) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 15 N(0,1)+N(2,1) 17 TrunoN(-3,3) 18 N(0,1)+N(1,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) XVa XI X3 884 988 511 375 531 X5 X7 999 997 957 264 209 529 02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1 .50 1.51 1.53 1.72 1 .75 1 .80 1 .87 1 .92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5.38 5.00 7.54 7.65 8.33 11.2 12.8 15:5 20.0 21.5 31 .4 35.2 82.1 496 247 175 131 111 91 61 65 70 51 53 40 42 45 34 29 32 33 46 28 33 809 56 62 47 49 175 312 126 87 244 155 99 152 76 48 41 37 41 26 29 40 31 39 33 31 856 55 73 59 57 255 449 172 101 337 89 185 481 99 195 74 55 42 45 45 75 34 25 18 22 38 25 30 37 44 47 50 770 135 157 74 127 459 689 125 469 682 315 190 493 318 213 341 140 187 255 329 219 158 175 258 118 113 69 91 83 59 82 88 77 58 111 53 45 32 43 52 43 51 48 68 63 72 884 136 163 196 512 322 268 333 158 315 93 84 56 79 55 138 40 34 29 35 44 32 32 50 65 48 71 777 201 181 93 132 472 678 326 198 503 340 290 392 135 Table 4.14. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution (sample size = 20) Statistics No. Distribution 36 t(4) 37 k(2) 38 t(l) 39 Cauchy(0,1) 40 86(0.5333,0.5) 41 TruncN(-2,l) 42 Beta(3,2) 43 Beta(2,1) 44 TruncN(-3,2) 45 Weibull(3.6) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 Weibull(2) 55 Half N(0,1) 56 LoConN(0.1,3) 57 LoConN(0.05,3) 58 GumbeKO,!) 59 LoConN(0.1,5) 60 SUM,2) 5l Chi-Square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponentiald ) 65 LoConN(0.05,7) 66 Chl-square(l) 67 Triangle 11(1) 68 Weibull(0.5) 69 SU(1,1) 70 LogN(0,1,0) /B. 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.63 0.97 0.80 0.68 1.14 1.54 0.87 1.41 1.96 1.65 2.00 2.42 2.83 0.57 6.62 -5.3 6.18 X'/a XI X3 X5 X7 44 160 569 59 199 103 94 333 325 774 774 773 773 1 45 122 370 794 800 652 647 394 55 48 127 37 34 42 31 64 94 52 65 Ba 572 2.13 2.27 2.36 2.40 2.65 2.72 2,75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.40 5.45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 324 • 54 63 113 45 31 33 45 52 76 37 68 219 478 50 93 54 30 62 1 66 47 95 319 92 258 177 766 89 953 226 455 298 679 51 115 63 45 81 215 62 138 415 109 364 245 831 116 964 328 586 483 71 59 176 43 55 57 50 78 138 68 143 578 898 81 239 98 77 147 385 104 305 703 211 655 456 947 148 999 517 810 29 24 54 29 21 26 28 29 53 32 71 372 765 41 51 59 61 76 286 81 101 628 187 1 82 442 436 59 802 420 457 206 44 35 69 42 40 42 35 46 90 42 57 233 414 43 81 81 70 58 266 82 82 490 180 148 403 335 81 623 262 288 136 Table 4.15. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution (sample size = 50) Statistics No. Distribution /6: 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N(0,1)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1) Tukeyd.S) Uniform(0,1) SB(0,0.707) Tukey(0.7) TruncN(-1,1) N(0,1)+N(3,1) 12 Tukey(3) 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Beta(2,2) TruncN(-2,2) Triangle 1(1) N(0,1)+N(2,1) TruncN(-3,3) N(0,1)+N(1,1) SU((0,3) t(10) Logistic(0,1) SU(0,2) Tukey(IO) Laplace(0,1 ) ScConN(0.2,3) ScConN(0.05,3) ScConN(0.1,3) ScConN(0.2,5) ScConN(0.2,7) ScConN(0.1,5) ScConN(0.05,5) ScConN(0.1,7) ScConN(0.05,7) SU(Q,1) SU(0,0.9) X'/z XI X3 X5 XlO X13 1000 1000 1000 738 978 1000 • 948 1000 956 1000 922 448 Bz 1.15 1 .50 1 .51 1 .63 1.72 1 .75 1.80 1 .87 1 .92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.531.4 36.2 82.1 472 234 242 238 116 108 98 92 104 42 48 26 54 32 36 22 34 28 66 986 70 72 44 74 346 634 222 152 458 286 176 260 720 636 948 724 712 976 412 436 464 402 662 708 692 368 462 410 246 202 202 218 196 168 110 182 108 98 42 50 58 32 46 36 40 44 50 200 56 50 34 44 234 198 152 120 134 206 76 56 802 276 242 212 162 98 238 1000 WOO 166 236 264 134 322 184 152 156 136 158 86 68 50 48 36 46 38 64 40 64 994 116 136 70 80 502 836 328 212 620 390 218 358 216 88 106 52 72 20 46 40 50 44 60 48 80 74 62 80 90 34 130 168 730 946 486 304 770 498 352 520 812 968 576 346 810 566 470 6l 6 54 1000 350 360 174 234 876 970 662 422 840 608 564 718 690 982 68 62 46 54 46 46 42 44 80 70 88 980 332 408 184 238 888 972 670 424 858 614 602 754 137 Table 4.16. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution (sample size = 50) Statistics No. Distribution 36 t(4) 37 t(2) 38 t(1) 39 Cauchy(0,1) 40 38(0.5333,0.5) 41 TruncN(-2,1) 42 Beta(3,2) 43 Beta(2,1) 44 TrunoN(-3,2) 45 Weibull(3.5) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1 ) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 Weibull(2) 55 Half N(0,1) 56 LoConN(0.1,3) 57 LoConN(0.05,3) 58 Gumb6l(0,1) 59 LoConN(0.1,5) 60 SU(-1,2) 61 Chi-Square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponential(1 ) 65 LoConN(0.05,7) 66 Chi-square(1 ) 67 Triangle 11(1) 68 Weibull(0.5) 69 SU(1,1) 70 LogN(0,1,0) X'/z XI X3 X5 XlO XI3 82 322 90 450 940 940 880 104 84 304 50 22 46 136 180 592 978 978 880 128 86 398 54 678 986 990 946 254 744 272 762 990 994 532 62 0 0 0 0 0.65 2.13 2.27 0.29 2.36 -.57 2.40 -.18 2.65 0.00 2.72 -.09 2.75 0.28 2.77 -.55 2.78 0.73 2.91 0.51 3.04 0.68 3.09 1.07 3.16 1.25 3.20 0.63 3.25 0.97 3.78 0.80 4.02 0.68 4.35 1.14 5.40 1.54 5:45 0.87 5.59 1.41 6.00 1.96 6.60 1 .65 7.44 2.00 9.00 2.42 10.4 2.83 15.0 0.57 16.4 6.62 87.7 -5.3 93.4 6.18 114 -.32 896 874 710 58 50 210 44 40 32 44 104 114 44 90 446 906 52 220 94 38 78 388 80 1 42 740 188 602 408 128 162 54 116 726 986 72 346 112 72 92 526 82 248 840 262 788 494 998 998 58 40 58 36 132 294 76 222 940 1000 88 464 130 86 174 666 124 480 914 344 104 88 472 44 46 62 44 140 384 94 288 966 1000 • 112 638 776 698 1 22 212 948 400 964 966 492 666 478 468 804 952 11 6 830 820 1000 484 1000 858 990 390 402 794 560 920 772 984 90 56 196 866 998 142 998 1000 1000 88 32 72 148 668 986 300 64 254 942 60 176 40 46 38 186 78 260 786 1000 1 26 72 746 124 938 184 466 58 48 162 50 52 44 34 60 66 130 1 46 96 154 670 1000 992 996 178 158 98 326 1000 868 934 176 986 790 844 138 Table 4.17. Empirical 5^ level power (in % xlO) for tests of departure from the normal distribution (sample size = 100) Statistics No. Distribution 1 N(0,1)+N(10,l) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 5 Tukey(1.5) 7 Uniform(0,1) 8 SB(0,0,707) 9 Tukey(0.7) 1 0 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 1 5 Triangle 1(1) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) X'/z XI X3 X5 X10 XI7 1000 1000 1000 1000 1000 944 920 1000 1000 1000 996 984 868 1000 716 696 340 456 356 200 180 208 144 132 56 36 40 60 32 24 16 28 36 48 1000 1000 1000 972 388 972 760 404 372 284 220 40 48 44 84 96 340 256 156 116 500 76 92 28 56 60 44 56 40 36 52 112 1000 268 1000 1000 320 308 104 196 368 124 260 964 492 452 152 320 976 1000 1000 780 476 952 760 672 796 • 804 568 980 6a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.50 1.51 1.63 1.72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.5 31.4 36.2 82,1 628 656 520 344 308 364 252 200 116 52 52 64 56 48 36 44 64 48 1000 1000 • 124 172 60 132 612 920 184 208 56 136 780 368 188 724 428 264 504 268 872 564 372 368 580 988 972 908 828 788 572 584 452 460 376 316 168 72 424 412 456 224 1 48 64 64 68 88 60 100 44 52 40 24 68 56 928 996 656 416 948 700 608 760 936 984 1000 1000 876 704 836 760 852 588 508 68 132 48 32 56 36 80 64 76 100 156 1000 576 608 220 432 980 1000 868 624 992 832 824 912 139 Table 4.18. Empirical 5% level power (in ? xlO) for tests of departure from the normal distribution (sample size = 100) Statistics No. Distribution 36 t(4) 37 t(2) 38 t(1) 39 Cauchy(0,1) 40 38(0.5333,0.5) 41 TruncN(-2,l) 42 Beta(3,2) 43 Beta(2,l) 44 TruncN(-3,2) 45 Weibull(3.6) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SBd.l) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 Wei bull(2) 55 Half N(0,1) 56 LoConN(0.1,3) 57 LoConN(0.05,3) 58 Gumbel(0,1) 59 LoConN(0.1,5) 50 SU(-1,2) 51 Chi-Square(4) 52 LoConN(0.1,7) 53 LoConN(0.05,5) 54 ExponentiaK1) 55 LoConN(0.05,7) 56 Chi-square(1) 57 Triangle II( 1 ) 58 Weibull(0.5) 59 sue 1,1) 70 LogN(0,1,0) XI/2 XI X3 X5 XlO 88 160 624 180 776 240 320 376 1000 996 992 1000 1000 1000 900 1000 1000 996 920 980 932 144 100 188 124 332 488 276 156 780 44 44 28 40 1 28 200 52 124 720 64 820 1000 1000 1000 376 196 840 84 68 948 988 1000 84 432 1 56 72 144 708 72 324 1 28 620 224 92 240 840 132 200 816 244 148 348 944 548 984 512 988 812 X17 62 0 0 0 0 0.55 -.32 480 980 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 2.13 2.27 2.35 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 0.68 3.09 0:29 1.07 3.16 1.25 3.20 0.63 3.25 0.97 3.78 0.80 4.02 0.58 4.35 1.14 5.40 1.54 5.45 0.87 5:59 1.41 6.00 1.96 5.60 1.55 7.44 2.00 9.00 2.42 10.4 2.83 15.0 0.57 16.4 5.52 87.7 -5.3 93.4 114 6.18 976 440 888 76 64 60 75 75 28 52 212 304 376 572 100 152 432 232 1000 1000 168 1000 876 1000 760 1000 968 1000 760 972 • 575 172 532 1000 1000 224 620 1000 1000 1000 1000 208 132 848 856 436 316 304 1 40 304 448 952 444 968 272 1 44 440 936 204 912 1000 512 796 52 44 44 64 1 44 1000 912 1000 836 1000 1000 356 76 256 668 575 744 160 100 412 688 1000 1000 372 1000 1000 886 1000 32 72 152 112 708 72 48 48 1000 1000 900 980 1000 256 916 1000 736 1000 940 1000 672 1000 988 1000 184 336 1 40 540 180 684 1000 764 984 956 1000 376 1000 1000 1000 140 Table 4.19. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution (sample size = 20) Statistics X'/g No. Distribution /6i 1 N(0,1)+N(10,l) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukey(1.5) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,l)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 N(0,1) 20 SU((0,3) 21 t(10) 22 Logistic(0,1) 23 SU(0,2) 24 Tukey(IO) 25 Laplace(0,1) 26 ScConN(0.2,3) 27 ScConN(0.05,3) 28 ScConN(0.1,3) 29 ScConN(0.2,5) 30 ScConN(0.2,7) 31 ScConN(0.1,5) 32 ScConN(0.05,5) 33 ScConN(0.1,7) 34 ScConN(0.05,7) 35 SU(0,1) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 XI X3 X5 972 196 456 119 207 71 74 72 55 71 X7 62 1.15 1.50 1.51 1.63 1.72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.5 31 .4 36.2 916 530 295 205 181 141 1 42 92 80 90 79 69 61 48 54 49 45 59 58 67 71 82 71 849 102 95 70 91 287 445 244 150 304 236 167 983 986 548 464 355 261 244 242 171 161 117 83 101 1 00 73 83 56 57 58 48 71 54 83 76 94 374 197 190 176 86 879 162 1 48 1 08 141 393 541 314 194 385 275 213 616 145 150 1 20 128 152 170 817 147 154 845 286 328 315 185 225 167 231 578 687 576 736 418 253 504 347 402 83 110 112 49 45 51 61 60 78 75 70 116 132 127 134 134 144 182 187 189 913 299 324 226 253 565 695 430 330 101 92 91 82 81 138 46 49 38 61 56 72 60 97 107 193 125 930 243 577 152 414 273 269 492 353 384 495 351 382 126 293 141 Table 4.20. Empirical 5? level power (in % xlO) for tests of departure from the Gumbel distribution (sample size = 20) Xi/z Statistics No. Distribution 36 31/(0,0.9) 37 t(4) 38 t(2) 39 t(1) 40 Cauchy(0,1) 41 58(0.5333,0.5) 42 TruncN(-2,1) 43 Beta(3,2) 44 Beta(2,1) 45 TrunoN(-3,2) 46 Weibull(3.5) 47 Weibull(4) 48 SB(1,2) 49 TrunoN(-3,1) 50 SB(1,1) 51 Weibull(2.2) 52 LoConN(0.2,3) 53 LoConN(0.2,5) 54 LoConN(0.2,7) 55 Weibull(2) 56 Half N(0,1) 57 LoConN(0.1,3) 58 LoConN(0.05,3) 59 LoConN(0.1,5) 60 SU(-1,2) 61 Chi-square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponential0) 65 LoConN(0.05,7) 66 Chi-square(1) 67 Triangle 11(1) 68 WeibulKO.S) 69 SUd.l) 70 LogN(0,1,0) XI X3 295 438 268 436 792 X5 X7 442 467 254 485 82 0 0 0 0 0 0.65 -.32 2.13 2.27 0.29 2.36 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.63 0.97 0.80 0.68 1 .54 0.87 1.41 1.96 1.65 2.00 2.42 2.83 0.57 6.62 -5.3 6.18 2.40 2.65 2.72 2.75 82.1 217 1 06 238 588 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 575 226 91 84 207 68 43 59 38 113 34 41 38 116 266 37 54 42 51 63 50 42 127 48 99 62 509 69 853 500 1 45 120 321 672 657 241 108 94 302 69 57 68 51 147 33 39 40 117 319 29 62 51 47 69 51 44 149 763 151 267 238 544 162 106 140 74 370 51 61 56 178 426 49 51 69 92 103 76 58 195 58 103 113 52 544 120 127 561 73 57 873 644 202 907 817 296 235 449 786 757 144 70 78 149 81 67 87 38 135 34 29 47 91 223 30 36 47 71 73 65 35 107 84 63 77 155 54 338 783 102 802 784 184 79 89 103 83 54 70 39 93 49 34 47 98 1 47 36 66 71 78 82 75 41 148 96 120 109 314 73 601 520 173 1 42 Table 4.21. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution (sample size = 50) Statistics No. Distribution /6. 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukeyd.S) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 15 N(0,1 )+N(2,1) 17 TruncN(-3,3) 18 N(0,1 )+N(1,1) 19 N(0,1) 20 SU((0,3) 21 t(10) 22 Logistic(0,1) 23 SU(0,2) 24 Tukey(IO) 25 Laplace(0,1 ) 26 ScConN(0.2,3) 27 ScConN(0.05,3) 28 ScConN(0.1,3) 29 SoConN(0.2,5) 30 ScConN(0.2,7) 31 ScConNCO.I,5) 32 SoConN(0.05,5) 33 ScConN(0.1,7) 34 SoCohN(0.05,7) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 SU(0,1) X'/g XI X3 X5 XlO X13 1000 1000 964 1000 1000 1000 1000 934 800 702 512 422 396 224 230 980 744 802 580 534 822 974 718 782 546 716 942 430 594 262 202 202 708 948 472 630 290 214 178 136 1 24 186 80 96 86 118 Bz 1.15 1.50 1.51 1.63 1 .72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.5 31.4 36.2 906 540 546 284 310 260 160 144 146 124 126 92 88 82 72 80 80 76 80 98 110 120 998 208 310 138 188 668 826 496 312 664 480 350 264 416 372 344 374 278 510 438 444 408 464 336 254 224 230 246 224 218 236 194 184 162 116 98 108 128 114 100 120 130 162 1 66 226 174 164 174 184 242 292 314 282 282 334 362 344 1000 1000 1000 350 404 194 524 622 384 458 856 954 702 492 826 596 326 762 908 614 408 740 556 482 252 214 638 668 156 150 246 118 1 24 150 178 142 210 216 234 364 394 404 418 998 88 182 188 210 300 350 398 404 672 674 740 998 716 762 426 522 468 592 434 576 888 916 956 756 552 830 650 716 972 798 578 928 972 802 588 874 862 702 712 796 794 143 Table 4.22. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution (sample size =50) Statistics No. Distribution 35 SU(0,0.9) 37 t(4) 38 t(2) 39 t(1) 40 Cauchy(0,1) 41 88(0.5333,0.5) 42 TruncN(-2,1) 43 Beta(3,2) 44 Beta(2,1) 45 TruncN(-3,2) 46 Weibull(3.6) 47 Weibull(4) 48 SB(1,2) 49 TruncN(-3,1 ) 50 SB(1,1) 51 Weibull(2.2) 52 LoConN(0.2,3) 53 LoConN(0.2,5) 54 LoConN(0.2,7) 55 Weibull(2) 56 Half N(0,1) 57 LoConN(0.1,3) 58 LoConN(0.05,3) 59 LoConN(0.1,5) 60 SU(-1,2) 61 Chi-square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponential(l) 65 LoConN(0.05,7) 66 Chi-square(1) 67 Triangle 11(1) 68 Weibull(0.5) 69 SU(1,1) 70 LogN(0,1,0) X'/z XI X3 X5 XlO X13 450 174 482 890 926 528 186 136 370 104 48 86 50 206 62 28 60 256 670 42 88 56 58 140 54 46 388 82 192 156 922 1 20 996 852 326 586 246 620 938 958 698 304 220 656 134 102 126 52 376 98 56 62 370 858 62 138 82 74 190 68 80 488 108 292 184 962 164 1000 934 414 738 426 756 978 988 832 552 414 864 234 204 236 84 640 86 66 80 568 918 68 178 76 90 242 82 98 454 132 480 146 974 234 1000 986 634 802 502 794 980 992 566 676 578 916 320 246 278 102 764 74 60 70 594 856 72 86 96 126 188 112 88 378 152 376 160 990 1 22 1000 994 674 862 606 854 988 996 560 276 292 552 278 210 268 80 482 74 66 88 396 752 44 90 138 162 176 104 90 370 218 190 242 574 156 908 1000 354 874 582 874 990 998 594 170 1 66 292 200 180 174 80 314 94 52 70 252 632 50 106 1 26 1 44 212 1 44 62 382 222 226 276 662 196 946 986 312 2: 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.63 0.97 0.80 0.68 1.54 0.87 1.41 1.96 1.65 2.00 2.42 2.83 0:57 6.62 -5.3 6.18 82.1 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 144 Table 4.23. Empirical 5? level power (in % xlO) for tests of departure from the Gumbel distribution (sample size = 100) Statistics No. Distribution /8i 1 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N(0,1)+N(10,1) N(0,1)+N(10,l) Beta(0.5,0.5) N(0,l)+N(5,l) SB(0,0.5) N(0,1)+N(4,l) Tukeyd.S) Uniform(0,1) SB(0,0.707) Tukey(0.7) TruncN(-l,1) N(0,l)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 16 N(0,1)+N(2,l) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 N(0,1) 20 SU((0,3) 21 t(10) 22 Logistic(0,1) 23 SU(0,2) 24 Tukey(IO) 25 Laplace(0,1 ) 25 ScConN(0.2,3) 27 ScConN(0.05,3) 28 ScConN(0.1,3) 29 SGConN(0.2,5) 30 ScConN(0.2,7) 31 ScConN(0.1 ,5) 32 ScConN(0.05,5) 33 SGConN(0.1,7) 34 ScConN(0.05,7) 35 SU(0,1) Xi/z XI X3 X5 XlO XI7 1000 1000 1000 • 884 824 536 640 476 348 292 312 272 308 152 132 176 136 128 1 60 1 40 168 240 192 216 1000 472 564 288 424 920 976 764 1000 1000 1000 • 980 928 748 796 672 484 376 432 352 372 212 160 208 152 148 220 228 208 292 1000 1000 1000 1000 • 984 948 892 836 61 2 604 552 524 408 272 212 264 196 208 192 216 268 360 392 424 1000 732 812 460 648 984 996 852 720 944 784 900 1000 1000 1000 1000 975 988 904 872 836 764 740 800 6l 5 544 424 444 380 336 404 440 444 580 608 596 1000 876 892 576 768 992 1000 908 828 964 855 956 1000 1000 984 1000 952 988 776 756 760 680 664 864 596 564 412 476 564 436 532 588 5l 5 732 736 732 1000 932 932 596 856 996 1000 940 848 976 896 976 1000 1000 972 1000 832 952 544 440 400 368 356 504 340 304 292 320 364 432 508 552 640 748 736 776 1000 924 944 712 884 1000 1000 948 868 984 896 988 Ba 1.15 1.15 1.50 1 .51 1.63 1.72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.5 31.4 36.2 588 888 724 644 288 332 1000 624 688 384 484 952 1000 808 652 908 760 760 145 Table 4.24. Empirical 5? level power (in ? xlO) for tests of departure from the Gumbel distribution (sample size = 100) Statistics No. Distribution 36 SU(0,0.9) 37 t(4) 38 t(2) 39 t(l) 40 Cauchy(0,1) 41 38(0.5333,0.5) 42 TruncN(-2,1) 43 Beta(3,2) 44 Beta(2,1) 45 TruncN(-3,2) 46 Weibull(3.6) 47 Weibull(4) 48 SB(1,2) 49 TruncN(-3,1) 50 SB(1,1) 51 Weibull(2.2) 52 LoConN(0.2,3) 53 LoConN(0.2,5) 54 LoConN(0.2,7) 55 Weibull(2} 56 Half N(0,1) 57 LoConN(0.1,3) 58 LoConN(Q.05,3) 59 LoConN(0.1,5) 50 SU(-1,2) 61 Chi-square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponentiaid ) 65 LoConN(0.05,7) 66 Chi-square(1) 67 Triangle 11(1) 58 Weibull(0.5) 69 SU(1,1) 70 LogN(0,1,0) X'/z XI X3 X5 XlO XI7 732 420 844 496 856 1000 1000 936 544 404 904 256 158 184 136 920 61 6 916 1000 1000 992 736 628 984 352 184 268 92 844 100 40 64 820 1000 40 240 96 64 400 80 92 788 204 748 308 1000 316 1 000 1000 968 750 952 1000 1000 996 920 832 1000 556 392 504 156 976 128 76 148 948 1000 64 400 136 160 492 1 40 136 768 272 900 360 1000 460 1000 1000 928 972 992 872 972 1000 1000 928 Bz 0 0 0 0 0 0.55 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1 .25 0.63 0.97 0.80 0.68 1 .54 0.87 1 .41 1 .95 1.65 2.00 2.42 2.83 0.57 6.62 -5.3 6.18 82.1 CO CO 00 CO 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.45 5.59 6.00 6.50 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 788 996 992 824 344 324 756 176 108 1 40 92 408 124 80 104 500 932 80 172 72 84 288 80 54 836 172 416 512 7 000 248 1000 980 604 628 168 116 92 648 996 60 228 116 80 340 76 80 936 204 636 492 1000 276 1000 996 728 888 816 976 1000 1000 864 976 904 1000 644 524 616 184 988 95 100 132 952 1000 84 180 180 264 372 164 80 584 292 724 404 1000 248 1000 1000 908 656 612 944 484 452 544 1 68 864 104 116 116 772 968 76 196 192 324 312 224 112 664 404 388 508 948 324 1000 1000 •720 146 Table 4.25. Empirical 5? level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics No. Distribution /6i 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukey(1.5) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(^1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1 ) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 SoConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X'/a XI X3 X5 X7 908 543 445 311 331 253 226 197 211 218 264 208 211 217 253 245 260 284 335 358 353 347 963 490 493 349 425 660 750 538 967 575 613 397 468 382 304 290 302 308 342 300 331 321 350 361 407 424 467 481 507 493 973 635 6l 6 457 526 726 806 624 858 527 693 511 61 6 515 492 506 523 532 601 531 584 580 625 593 646 648 666 696 694 706 976 795 729 662 687 812 871 733 890 148 221 105 1 44 129 133 147 197 190 230 254 319 408 461 416 536 580 601 656 646 680 964 803 758 642 681 837 865 757 708 210 308 154 271 137 130 151 161 182 258 195 248 336 330 349 432 479 502 549 553 571 914 679 668 531 583 787 848 688 Bz 1.15 1 .50 1 .51 1 .63 1 .72 1 .75 1 .80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 147 Table 4.26. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics No. Distribution 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) 36 t(4) 37 t(2) 38 t(1) 39 Cauchy(0,1) 40 58(0.5333,0.5) 41 TruncN(-2,1) 42 Beta(3,2) 43 Beta(2,1) 44 TruncN(-3,2) 45 Weibull(3.5) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 5'2 LoConN(0.2,5) 53 LoConN(0.2,7) 54 TruncE(0,3) 55 Weibull(2) 56 LoConE(0.2,7) 57 LoConE(0.2,5) 58 Half N(0,1) 59 LoConN(0.1,3) 50 LoConE(0.2,3) /6i Bz 0 0 0 0 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.99 0.63 1.33 1.25 0.97 0.80 1.20 20.0 21.5 31.4 36.2 82.1 CO ' 00 2.13 2.27 2.35 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.22 3.25 3.27 3.40 3.78 4.02 4.09 X'/z XI X3 X5 X7 397 590 455 536 568 442 582 815 808 108 374 363 571 304 282 267 169 478 81 122 159 146 267 41 104 • 42 37 56 191 55 504 656 550 659 686 549 682 869 861 112 498 517 732 454 409 409 266 662 97 162 200 220 344 55 170 60 36 71 268 61 584 752 702 799 809 717 810 917 906 107 799 764 938 587 551 644 451 903 101 308 293 265 356 50 262 53 51 89 432 79 659 758 597 819 821 717 834 927 912 73 464 499 512 554 499 508 364 633 80 255 334 236 288 47 217 59 62 71 474 63 562 724 626 707 747 613 762 903 911 75 393 422 491 449 409 443 287 551 70 199 344 312 305 55 174 50 50 55 383 46 148 Table 4.27. Empirical 5? level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics No. Distribution 61 TruncE(0,4) 52 LoConN(0.05,3) 63 TruncE(0,5) 64 Gumbel(0,1) 65 LoConN(0.1,5) 66 SU(-1,2) 67 LoConE(0.1,3) 68 Chi-square(4) 69 LoConE(0.1,5) 70 TruncE(0,6) 71 LoConN(0.1,7) 72 LoConE(0.05,3) 73 LoConN(0.05,5) 74 LoConE(0.05,7) 75 ScConE(0.05,2) 76 Chi-square(l) 77 ScConECO.1,2) 78 ScConE(0.2,2) 79 LoConE(0.01,7) 80 Triangle 11(1) 81 ScConE(0.01,3) 1.27 0.68 1.50 1.14 1.54 0.87 1.62 1.41 1.88 1.68 1.96 1.85 1.65 2.75 2.42 2.83 2.61 2.71 2.94 0.57 2.59 82 ScConECO.2,3) 3.57 83 ScConECO.1,3) 3.81 84 ScConECO.05,3) 3.60 85 ScConECO.2,7) 4.50 86 ScConECO.1,5) 5.38 87 ScConECO.1,7) 6.02 88 ScConECO.01,5) 4.81 89 ScConECO.05,5) 6.05 90 Weibull(0.5) 6.62 91 SU(1,1) -5.3 92 LogN(0,1,0) 6.18 X'/z XI X3 X5 X7 54 231 36 95 187 187 46 57 45 46 214 37 217 49 42 263 29 48 55 69 62 61 57 54 199 66 123 40 60 625 893 44 42 319 36 124 280 282 45 66 32 50 279 43 317 51 49 296 35 58 42 79 53 76 61 52 294 73 1 43 35 70 690 948 54 41 530 31 213 377 464 48 90 44 48 378 42 481 44 33 303 46 63 42 102 • 42 88 53 52 388 93 215 41 66 708 981 63 37 553 38 237 431 476 33 79 41 41 429 43 505 38 35 284 50 57 48 57 52 95 77 59 439 139 270 48 85 684 938 70 34 409 37 230 425 432 45 82 43 41 435 39 424 43 37 226 54 42 57 45 47 103 64 49 409 1 34 271 52 89 591 892 79 32 4.20 4.35 5.26 5.40 5.45 5.59 5.86 6.00 6.02 6.29 6.60 7.29 7.44 10.9 13.6 15.0 15.3 15.6 15.9 16.4 18.0 23.8 29.4 29.8 31:5 48.7 56.2 66.7 68.2 87.7 93.4 114 149 Table 4.28. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution {sample size = 50) Statistics No. Distribution /B Bz 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 5 Tukeyd.S) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 1 0 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 TukeyO) 13 Beta(2,2) 14 TrunoN(-2,2) 15 Triangle 1(1) 15 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 SeConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.50 1 .51 1.63 1.72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 XV2 XI X3 X5 XlO XI3 1000 924 804 592 666 540 488 440 422 436 590 454 494 512 590 600 682 696 736 784 780 816 1000 • 924 930 762 882 972 976 922 1000 964 944 822 878 750 708 684 656 694 782 696 718 778 822 784 866 860 896 914 940 930 1000 976 982 920 930 988 986 952 1000 960 992 888 970 922 892 934 914 918 952 932 964 972 972 952 972 972 982 992 992 992 1000 996 1000 972 982 996 994 992 1000 978 996 962 994 978 974 980 976 982 988 986 996 992 996 988 988 988 996 996 1000 996 1000 998 1000 994 990 1000 996 994 1000 522 662 386 732 506 588 722 772 776 870 874 930 972 986 958 986 990 988 994 1000 996 1000 998 1000 994 992 1000 996 1000 988 516 786 318 666 414 464 560 554 654 788 802 852 952 968 946 976 982 978 994 1000 996 1000 998 1000 980 992 1000 998 1000 150 Table 4.29. Empirical 5? level power (in % xlO) for tests of departure from the exponential distribution (sample size = 50) Statistics No. Distribution 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) 36 t(4) 37 t(2) 38 t(1) 39 Cauchy(0,1) 40 SB(0.5333,0.5) 41 TruncN(-2,1) 42 Beta(3,2) 43 Beta(2,1) 44 TruncN(-3,2) 45 Weibull(3.6) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SBd.l) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 TruncE(0,3) 55 Weibull(2) 56 LoConE(0.2,7) 57 LoConE(0.2,5) 58 Half N(0,1) 59 LoConN(0.1,3) 60 LoConE(0.2,3) X'/: XI X3 X5 XlO X13 840 938 848 942 956 876 952 996 992 174 808 736 962 724 654 712 458 894 94 254 344 460 626 64 232 42 54 72 482 52 928 972 940 984 996 944 990 1000 •996 266 958 906 992 924 856 892 708 982 176 482 516 664 856 76 364 74 68 118 696 74 978 994 984 998 1000 988 994 1000 1000 380 998 998 1000 992 970 988 920 1000 266 760 780 852 920 60 652 82 88 150 894 96 994 998 996 998 1000 1000 1000 1000 1000 292 1000 1000 1000 998 992 996 964 1000 366 862 868 848 968 50 786 96 102 172 940 94 994 1000 996 998 1000 1000 1000 1000 1000 • 222 994 988 994 992 986 996 962 996 342 848 884 862 924 66 832 100 122 166 964 106 994 998 990 998 1000 998 1000 1000 1000 220 972 966 986 986 974 994 946 992 308 838 882 782 830 48 768 72 70 136 952 82 Bz 0 0 0 0 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.99 0.63 1.33 1.25 0.97 0.80 1.20 20.0 21 .5 31.4 36.2 82.1 CO CO 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.22 3.25 3.27 3.40 3.78 4.02 4.09 151 Table 4.30. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 50) Statistics No. Distribution 61 TruncE(0,if) 62 LoConN(G.05,3) 63 TruncE(0,5) 64 Gumbel(0,1) 65 LoConN(0.1,5) 66 SU(-1,2) 67 LoConE(0.1,3) 68 Chi-square(4) 69 LoConE(0.1,5) 70 TruncE(0,6) 71 LoConN(0.1,7) 72 LoConE(0.05,3) 73 LoConN(0.05,5) 74 LoConE(0.05,7) 75 ScConE(0.05,2) 76 Chi-square(l) 77 ScConE(0.1,2) 78 ScConE(0.2,2) 79 LoConE(0.01 ,7) 80 Triangle 11(1) 81 ScConE(0.01,3) 82 ScConE(0.2,3) 83 ScConE(0.1,3) 84 ScConE(0.05,3) 85 ScConE(0.2,7) 86 ScConE(0.1,5) 87 ScConE(0.1,7) 88 ScConE(0.01,5) 89 ScConE(0.05,5) 90 Weibull(0.5) 91 SUd.l) 92 LogN(0,1,0) X'/z XI X3 X5 XlO XI3 30 598 32 284 492 548 44 90 34 30 562 46 602 42 50 530 42 46 42 76 26 80 48 28 984 188 464 64 66 940 998 54 50 770 52 390 704 758 66 128 30 58 774 56 764 56 80 638 34 54 36 150 32 106 40 54 998 246 538 50 930 42 658 878 904 48 186 48 74 924 38 912 42 62 728 36 68 44 230 52 142 64 44 996 300 568 64 78 982 1000 86 48 970 38 760 932 956 62 252 58 36 942 46 954 60 64 742 44 58 42 304 50 166 92 70 998 346 626 72 68 990 1000 112 54 990 44 832 944 960 60 314 44 48 926 58 980 54 58 710 60 62 48 188 50 204 102 64 998 432 726 64 140 982 1000 140 46 982 48 846 962 966 60 322 42 52 954 52 976 46 52 662 62 62 34 120 54 178 84 68 1000 464 740 64 130 982 1000 136 Gz 1.27 0.68 1.50 1.14 1.54 0.87 1.62 1.41 1.88 1.68 1.96 1.85 1.65 2.75 2.42 2.83 2.61 2.71 2.94 0.57 2.59 3.57 3.81 3.60 4.50 5.38 6.02 4.81 6.05 6.62 -5.3 6.18 4.20 4.35 5.26 5.40 5.45 5.59 5.86 6.00 6.02 6.29 6.60 7.29 7.44 10.9 13.6 15.0 15.3 15.6 15.9 16.4 18.0 23.8 29.4 29.8 31.5 48.7 56.2 66.7 68.2 87.7 93.4 114 72 68 974 1000 • 76 152 Table ^,31- Empirical 5% level power (in % xlO) for tes ts of departure from the exponential distribution (sampl e size = 100) Statistics No. Distribution 1 2 3 4 5 6 7 8 9 10 11 N(0,l)+N(10,1) Beta(0.5,0.5) N(0,1)+N(5,1) SB(0,0.5) N(0,1)+N(4,1) Tukey(1.5) Uniform(0,1) SB(0,0.707) Tukey(0.7) TruncN(-1,1) N(0,l)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1 ) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1 ) 22 SU(0,2) 23 Tukey(lO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) /6 02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1 .50 1 .51 1.63 1 .72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 3.00 3.53 4,00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 XV, XI X3 X5 XlO X17 1000 992 992 836 916 796 760 732 688 708 880 756 832 868 152 892 936 936 972 968 968 984 1000 996 1000 976 992 1000 1000 992 1000 1000 996 968 988 936 944 928 920 920 980 952 960 976 208 988 1000 1000 1000 1000 988 1000 1000 1000 1000 996 996 1000 1000 1000 1000 1000 1000 1000 1000 992 1000 1000 1000 992 996 1000 1000 1000 408 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 996 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 596 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 620 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 868 1000 844 1000 924 948 992 992 984 1000 1000 1000 996 384 1000 1000 1000 1000 1000 1 000 1000 1000 1000 1000 1000 1000 1000 1000 1000 153 Table 4.32. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 100) Statistics No. Distribution 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) 36 t(4) 37 t(2) 38 t(1 ) 39 Cauchy(0,1) 40 SB(0.5333,0.5) 41 TruncN(-2,1) 42 Beta(3,2) 43 Beta(2,1) 44 TruncN(-3,2) 45 Weibull(3.5) 45 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 TruncE(0,3) 55 Wei bull(2) 56 LoConE(0.2,7) 57 LoConE(0.2,5) 58 Half N(0,1) 59 LoConN(0.1,3) 60 LoConE(0.2,3) XV, XI X3 X5 XlO X17 976 1000 1000 1000 996 1000 1000 1000 1000 312 988 988 1000 960 932 932 780 992 184 564 720 772 940 56 44 48 52 112 864 52 996 1000 1000 1000 1000 1000 1000 1000 1000 • 488 1000 1000 1000 1000 996 1000 956 1000 308 792 896 920 984 88 672 68 56 116 968 84 1000 1000 1000 1000 1000 1000 1000 1000 1000 624 1000 1000 1000 1000 1000 1000 1000 1000 • 540 992 976 996 1000 88 984 104 60 236 1000 96 1000 1000 1000 1000 1000 1000 1000 1000 1000 708 1000 1000 1000 1000 1000 1000 1000 1000 • 728 992 996 996 1000 136 984 144 1 44 296 1000 132 1000 1000 1000 1000 1000 1000 1000 1000 1000 • 444 1000 1000 1000 1000 1000 1000 1000 1000 840 1000 996 1000 1000 76 1000 192 148 392 1000 148 1000 1000 1000 1000 1000 1000 1000 1000 1000 • 404 1000 1000 1000 1000 1000 1000 1000 1000 • 804 1000 996 1000 1000 100 992 160 204 340 1000 148 62 0 0 0 0 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0:73 0.51 0.68 1.07 1.25 0.99 0.63 1.33 1.25 0.97 0.80 1.20 20.0 21 .5 31 .4 35.2 82.1 2.13 2.27 2.36 2.40 2.65 2:72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.22 3.25 3.27 3.40 3.78 4.02 4.09 154 Table 4.33. Empirical 5? level power (in % xlO) for tests of departure from the exponential distribution (sample size = 100) Statistics No. Distribution 61 TruncE(0,4) 62 LoConN(0.05,3) 63 TruncE(0,5) 64 Gumbel(0,1) 65 LoConN(0.1,5) 66 SU(-1,2) 67 LoConE(0.1,3) 68 Chi-square(4) 69 LoConE(0.1,5) 70 TruncE(0j6) 71 LoConN(0.1,7) 72 LoConE(0.05,3) 73 LoConN(0.05,5) 74 LoConE(0.05,7) 75 ScConE(0.05,2) 76 Chi-square(l) 77 ScConE(0.1,2) 78 ScConE(0.2,2) 79 LoConE(0.01,7) 80 Triangle 11(1) 81 ScConE(0.01 ,3) 82 ScConE(0.2,3) 83 ScConE(0.1,3) 84 ScConE(0.05,3) 85 ScConE(0.2,7) 86 ScConE(0.1,5) 87 ScConE(0.1,7) 88 ScConE(0.Q1 ,5) 89 ScConE(0.05,5) 90 Weibull(0.5) 91 SU(1,1) 92 LogN(0,1,0) /6i Ba 1.27 0.68 1 .50 1.14 1.54 0.87 1.62 1.41 1.88 1.68 1.96 1.85 1.65 2.75 2.42 2.83 2.61 2.71 2.94 0.57 2.59 3.57 3.81 3.60 4.50 5.38 6.02 4.81 6.05 6.62 -5.3 6.18 4.20 4.35 5.26 5.40 5.45 5.59 5.86 6.00 6.02 6.29 6.60 7.29 7.44 10.9 13.6 15.0 15.3 15.6 15.9 16.4 18.0 23.8 29.4 29.8 31.5 48.7 56.2 66.7 68.2 87.7 93.4 114 XV2 XI X3 X5 XlO XI7 36 876 36 600 844 872 52 84 48 40 904 48 916 20 60 924 28 48 24 108 44 976 840 56 996 992 996 28 104 1000 1000 • 60 56 972 36 768 956 956 40 120 44 64 988 52 984 32 52 932 28 88 36 188 44 980 848 60 1000 992 996 44 112 1000 1000 • 104 48 996 60 956 992 1000 48 332 88 44 996 76 1000 48 52 944 48 120 40 376 52 976 844 60 1000 • 992 996 40 1 28 1000 1000 112 48 1000 • 48 996 996 1000 • 64 512 76 36 1000 60 1000 28 40 952 32 136 28 576 44 980 848 76 1000 1000 996 48 172 1000 1000 1 48 52 1000 60 1000 1000 1000 80 700 88 44 1000 64 1000 40 60 964 52 180 28 612 44 984 840 52 1000 996 996 32 192 1000 1000 156 68 1000 64 1000 1000 1000 • 76 728 60 48 996 64 1000 36 28 960 32 172 32 384 60 984 844 72 1000 1000 1000 36 232 1000 1000 184 155 chi-square statistic with a smaller expected cell count is desirable. In this case, the use of more cells will provide a larger number of sizeable differences between the observed cell counts and the expected cell counts for the null hypothesis. However, there is a limit to the extent of the refinement of the partition. Cell counts which are mostly one or zero provide little power for detecting alternatives when the expected cell counts are nearly equal under the null hypothesis. The number of one and zero counts will be similar for many alternative distributions. The XV2 statistic consistently performed poorly relative to the other chi-square statistics. This suggests that the use of the chi-square or likelihood ratio statistic with expected cell count less than one is not desirable. When the Xm statistic is most powerful, the Gm statistic also tends to be the most powerful likelihood ratio statistic. The difference in the power of the Xm and Gm statistics were generally quite small. 3. Comparison of statistics based on the empirical distribution function The numbers in Tables 4.34 - 4.40 indicate the proportions of simulated samples for which the null distribution was rejected. largest number in each line was printed in bold. The Only the results for sample size 20 are included in Tables 4.34 - 4.40. Conclusions drawn from the results for sample sizes 50 and 100 were very similar to those for sample size 20. The Cramer-von Mises type statistics are generally more powerful than the Kolmogorov-Smirnov type statistics. Within the 156 Cramer-von Mises type statistics, the Anderson-Darling statistic is the most powerful for detecting a wide range of alternative distributions. The Anderson-Darling statistic for the exponential case appeared to be the weakest among all the statistics for small sample sizes. The location parameter of the exponential distribution was estimated using the minimum of the observations. values is thus equal to zero. The smallest of the standardized This poses a problem in the computation of the Anderson-Darling statistic because the formula involves log[F((Xi-a)/B)]. To overcome this problem, the value F((xi-a)/8) was assigned the same value as F((x2-a)/3) if F((x2-a)/B) is less than 0.00001, otherwise it is assigned the value 0.00001. The weak performance of the Anderson-Darling statistic for the exponential case is probably due to this modification. For larger sample sizes, this problem is not severe and A^ statistic is a powerful statistic. The Anderson-Darling statistic is usually more powerful than the Cramer-von Mises or the Watson statistic for detecting alternative distributions with long or heavy tails. The Anderson-Darling statistic places more emphasis to the tails of the distribution than the Cramer-von Mises statistic. For symmetrical alternatives to the normal distribution with short tails, the Cramer-von Mises and Watson statistics performed favorably. Careful examination of columns corresponding to the Cramer-von Mises and Watson statistics in Tables 4.34 - 4.40 reveals that the Watson statistic is slightly more powerful in detecting alternative distributions with short tails. Within the class of 157 Table 4.3%. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution (sample size = 20) Statistics A: No. Distribution 1 N(0,l)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,l)+N(4,1) 6 Tukey(1.5) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(^1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 15 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 25 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) U: V D 62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1 .50 1.51 1.63 1 .72 1 .75 1.80 1 .87 1.92 1 .94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21 .5 31.4 36.2 82.1 1000 1000 1000 1000 1000 612 819 504 833 298 477 170 1 46 • 97 106 90 162 50 45 31 44 48 43 49 70 83 89 545 875 340 524 1 92 163 113 125 • 98 194 486 819 286 464 165 151 107 118 107 171 67 51 34 37 53 48 56 62 95 79 102 929 235 276 128 208 633 328 653 177 348 1 07 97 78 89 79 139 45 37 32 42 41 52 55 57 79 82 90 902 201 253 1 21 199 605 785 422 245 592 390 349 429 366 459 203 167 109 125 95 153 50 44 31 40 43 42 44 59 85 101 115 908 248 354 161 274 692 847 491 298 674 440 437 528 107 929 241 316 138 251 675 838 466 278 639 411 423 517 65 51 34 47 52 45 49 63 89 91 104 934 252 308 1 41 233 676 837 454 268 637 409 419 520 . 816 435 261 5l 6 400 377 471 158 Table 4.35. Empirical 5? level power (in % xlO) for tests of departure from the normal distribution (sample size = 20) Statistics A" No. Distribution 36 t(4) 37 t(2) 38 t(1) 39 CauGhy(0,]) 40 SB(0.5333,0.5) 41 TruncN(-2,1) 42 Beta(3,2) 43 Beta(2,l) 44 TruncN(-3,2) 45 Weibull(3.6) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 Weibull(2) 55 Half N(0,1) 56 LoConN(0.1,3) 57 LoConN(0.05,3) 58 GumbeKG.I) 59 LoConN(0.1,5) 60 SU(-1,2) 61 Chi-Square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponentiald ) 65 LoConN(0,05,7) 66 Chi-square(1) 67 Triangle 11(1) 68 Weibull(0,5) 69 SU(1,1) 70 LogN(G,1,0) V D 163 450 860 871 550 76 61 212 39 48 51 53 92 191 74 196 771 970 97 261 168 126 210 584 151 355 824 359 687 596 954 206 998 636 853 161 427 842 843 445 64 46 169 39 44 61 45 103 201 87 214 747 962 107 228 175 128 220 595 1 49 325 815 374 565 579 884 176 982 612 778 32 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.63 0.97 0.80 0.68 1.14 1.54 0.87 1.41 1.96 1.65 2.00 2.42 2.83 0.57 6.62 -5.3 6.18 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5.40 5.45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 87.7 93.4 114 199 503 882 885 660 92 57 277 46 48 41 58 125 270 94 278 890 984 139 360 240 174 281 736 188 442 866 480 765 639 972 256 1000 723 908 187 493 884 885 580 82 57 238 45 48 49 59 113 241 90 271 876 983 131 308 221 1 60 259 679 184 406 854 428 728 617 956 230 998 695 875 179 487 881 889 585 91 61 239 50 55 51 61 112 225 87 251 864 983 123 284 205 146 240 658 170 380 849 406 689 613 949 229 994 679 864 159 Table 4.36. Empirical 5% level power (in ? xlO) for tests of departure from the Gumbel distribution (sample size = 20) A2 Statistics No. Distribution /3 1 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) H SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukeyd.S) 7 Uniform(0,1) 8 38(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) n N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 1 5 Triangle 1(1) 16 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1)+N(1,l) 19 N(0,1) 20 SU((0,3) 21 t(10) 22 Logistic(0,1) 23 SU(0,2) 24 Tukey(IO) 25 Laplace(0,1) 25 ScConN(0.2,3) 27 SoConN(0.05,3) 28 SGConN(0.1,3) 29 ScConN(0.2,5) 30 ScConN(0.2,7) 31 SGConN(0.1,5) 32 ScConN(0.05,5) 33 ScConN(0.1,7) 34 ScConN(0.05,7) 35 SU(0,1) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U2 V D 1000 530 808 406 520 270 266 225 196 203 249 151 149 165 162 164 185 189 212 249 272 272 297 959 461 462 302 352 678 806 526 366 575 438 531 1000 576 798 378 508 251 254 201 175 200 247 1 42 136 151 132 167 170 154 171 217 245 271 282 952 439 439 287 343 664 786 525 345 565 423 500 1000 433 750 283 466 202 212 187 153 174 230 1 24 137 136 1 44 1 41 153 1 44 179 196 213 228 241 932 415 397 258 309 639 773 495 328 558 419 468 #2 1.15 1.50 1 .51 1.63 1.72 1.75 1.80 1.87 1.92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 16.5 20.0 21.5 31 .4 36.2 1000 694 794 450 496 305 287 236 204 213 238 173 153 178 177 175 203 203 219 257 291 299 310 935 467 469 331 363 708 823 551 385 616 479 539 1000 615 793 394 507 277 266 230 198 206 247 161 155 169 169 165 190 194 217 253 273 282 299 954 467 461 306 358 684 808 532 371 584 451 530 160 Table 4.37. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution (sample size = 20) Statistics No. Distribution 35 SU(0,0.9) 37 t(4) 38 t(2) 39 t(1) 40 Cauchy(0,1) 4Î 38(0.5333,0.5) 42 TrunoN(-2,1) 43 Beta(3,2) 44 Beta(2,1) 45 TruncN(-3,2) 46 Weibull(3.6) 47 Weibull(4) 48 SB(1,2) 49 TruncN(-3,1) 50 SB(1,1) 51 Weibull(2.2) 52 LoConN(0.2,3) 53 LoConN(0.2,5) 54 LoConN(0.2,7) 55 Weibull(2) 56 Half N(0,1) 57 LoConN(0.1,3) 58 LoConN(0.05,3) 59 LoConN(0.1,5) 60 SU(-1,2) 61 Chi-square(4) 62 LoConN(0.1,7) 63 LoConN(0.05,5) 64 Exponential(1) 65 LoConN(0.05,7) 66 Chi-square(1) 67 Triangle 11(1) 68 Weibull(0.5) 69 SU(1,1) 70 LogN(0,1,0) /Si Gz 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0:00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.63 0.97 0.80 0.68 1.54 0.87 1.41 1.96 1.65 2.00 2:42 2.83 0.57 6.62 -5.3 6.18 82.1 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.25 3.78 4.02 4.35 5:45 5.59 6.00 6.60 7.44 9.00 10.4 15.0 16.4 . 87.7 93.4 114 A" W" 606 396 519 897 863 488 367 324 627 251 189 236 88 469 81 58 75 388 836 40 105 87 138 214 121 89 566 182 404 336 851 143 984 929 641 601 390 598 888 861 401 336 301 590 239 170 228 88 427 71 63 75 311 726 51 92 88 141 160 120 79 439 150 339 260 772 122 966 929 559 605 383 590 883 853 427 312 286 570 224 167 220 86 41 6 78 66 79 307 683 58 103 98 136 152 119 70 384 150 286 242 724 129 944 920 488 V D 591 366 565 873 851 365 303 265 576 196 1 60 207 78 435 77 65 75 316 683 60 98 96 124 163 105 71 380 147 215 212 689 124 942 910 430 541 316 528 861 834 286 248 251 451 1 92 1 42 184 80 352 62 59 56 245 610 54 80 68 107 1 29 91 72 376 1 37 264 212 561 103 921 861 487 151 Table 4.38. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics V No. Distribution /6i 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukeyd.S) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 15 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(OJ ) 19 SU((0,3) 20 t(10) 21 Logistic(0,1 ) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 25 ScConN(0.05,3) 27 ScConNO.1,3) 28 SoConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 62 1.15 1 .50 1 .51 1 .63 1 .72 1 .75 1 .80 1 .87 1 .92 1 .94 2.04 2.05 2.14 2.35 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 5.00 7.54 7.65 8.33 11.2 12.8 15.5 568 404 254 245 282 273 281 295 343 356 400 421 495 552 573 550 640 670 687 722 728 739 945 793 762 674 687 779 840 720 756 459 451 416 507 489 487 558 575 609 621 679 752 777 715 751 830 835 842 862 869 871 976 897 858 803 828 864 886 823 928 572 494 472 508 494 484 523 554 566 585 559 71 4 740 734 718 802 815 830 849 849 855 984 885 856 813 814 874 897 840 961 589 594 501 551 521 512 547 569 594 598 650 696 725 786 701 772 788 801 824 813 821 978 ' 860 836 854 786 863 903 81 5 838 347 451 337 438 411 391 449 483 511 501 594 527 674 716 672 758 787 788 822 815 822 975 868 842 803 777 844 877 801 162 Table 4.39. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics No. Distribution 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(0,1) 35 SU(0,0.9) 35 t(4) 37 t(2) 38 t(l) 39 Cauchy(0,l) 40 SB(0.5333,0.5) 41 TruncN(-2,l) 42 Beta(3,2) 43 Beta(2,1) 44 TruncN(-3,2) 45 Weibull(3.6) 46 Weibull(4) 47 SB(1,2) 48 TruncN(-3,1) 49 SB(1,1) 50 Weibull(2.2) 51 LoConN(0.2,3) 52 LoConN(0.2,5) 53 LoConN(0.2,7) 54 TruncE(0,3) 55 Weibull(2) 56 LoConE(0.2,7) 57 LoConE(0.2,5) 58 Half N(0,1) 59 LoConN(0.1,3) 60 LoConE(0.2,3) /6i G: 0 0 0 0 0 0 0 0 0 0.65 -.32 0.29 -.57 -.18 0.00 -.09 0.28 -.55 0.73 0.51 0.68 1.07 1.25 0.99 0.63 1.33 1.25 0.97 0.80 1.20 20.0 21.5 31.4 36.2 82.1 2.13 2.27 2.36 2.40 2.65 2.72 2.75 2.77 2.78 2.91 3.04 3.09 3.16 3.20 3.22 3.25 3.27 3.40 3.78 4.02 4.09 675 732 680 807 813 733 805 896 893 121 699 696 833 690 629 648 431 831 52 288 295 111 100 26 225 42 37 41 419 52 815 831 813 899 886 848 894 942 944 97 869 857 934 860 809 830 670 923 160 526 521 256 215 44 468 51 61 124 643 60 U" V D 810 850 826 898 898 840 899 955 950 168 853 842 926 841 793 813 649 918 151 506 504 289 341 60 433 75 77 1 23 623 79 784 834 806 882 883 819 884 943 937 154 859 847 944 815 767 779 607 920 137 465 443 286 373 56 392 • 78 67 112 571 81 774 805 774 889 871 821 878 934 936 78 788 789 862 796 755 775 602 873 1 65 467 503 259 202 49 413 6l 54 1 21 606 62 163 Table 4.40. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution (sample size = 20) Statistics No. Distribution 61 TruncE(0,4) 52 LoConN(0.05,3) 53 TruncE(0,5) 54 Guinbel(0,1) 55 LoConN(0.1,5) 56 SU(-1,2) 57 LoConE(0.1,3) 68 Chi-square(4) 59 LoConE(0.1,5) 70 TruncE(0,5) 71 LoConN(0.1,7) 72 LoConE(0.05,3) 73 LoConN(0.05,5) 74 LoConE(0.05,7) 75 ScConE(0.05,2) 75 Chi-square(l) 77 ScConE(0.1,2) 78 ScConE(0.2,2) 79 LoConE(0.01,7) 80 Triangle 11(1) 81 ScConE(0.01,3) 82 ScConE(0.2,3) 83 ScConE(0.1,3) 84 ScConE(0.05,3) 85 ScConE(0.2,7) 85 ScConE(0.1,5) 87 ScConE(0.1,7) 88 ScConE(0.01,5) 89 ScConE(0.05,5) 90 Weibull(0.5) 91 SU(1,1) 92 LogN(0,1,0) A" /B: 62 1 .27 0.68 1 .50 1.14 1 .54 0.87 1.62 1 .41 1.88 1 .58 1.96 1.85 1.65 2.75 2.42 2.83 2.61 2.71 2.94 0.57 2.59 3.57 3.81 3.60 4.50 5.38 6.02 4.81 6.05 6.52 -5.3 6.18 4.20 4.35 5.25 5.40 5.45 5.59 5.85 5.00 5.02 5.29 6.50 7.29 7.44 10.9 13.5 15.0 15.3 15.6 15.9 16.4 18.0 23.8 29.4 29.8 31.5 48.7 56.2 56.7 68.2 87.7 93.4 114 27 532 33 219 300 477 40 39 32 45 255 43 454 45 56 649 77 101 54 52 58 218 139 • 90 714 297 481 96 188 949 981 151 31 754 41 438 502 670 45 121 49 44 418 40 647 45 62 528 87 93 57 135 63 225 142 100 730 305 499 89 190 914 990 186 40 743 43 399 537 656 60 115 51 45 509 43 657 47 47 418 55 77 57 141 57 119 78 58 563 158 299 60 93 792 991 11 4 V D 57 692 42 361 494 612 60 107 48 50 519 52 634 49 38 413 62 81 58 138 63 135 88 53 551 174 321 63 95 801 990 107 39 719 43 404 454 619 37 117 50 50 386 48 591 54 53 483 86 83 63 138 63 188 127 83 672 267 443 81 174 871 9 171 164 Kolmogorov-Smirnov type statistics, the Kuiper statistic generally performed better than the Kolmogorov-Simirnov statistic. For skewed distributions in the normal and exponential cases, the Kolmogorov-Smirnov statistic performed favorably. For the Gumbel case, the Kuiper statistic is alomst uniformly better than the Kolmogorov statistic. 4. Comparison of statistics based on moments The numbers in Tables 4.41 - 4.47 indicate the proportions of simulated samples for which the null distribution was rejected. The largest number in each line for each sample size, was printed in bold. The skewness and kurtosis tests are directional. Each is designed to detect a particular type of departure from the hypothesized distribution. Any extreme value of kurtosis indicates tails too short or too long compared to that of the hypothesized distribution. Similarly, a large absolute skewness indicates asymmetry and a small absolute skewness indicates near symmetry. Note that the skewness statistic is based on the third sample moment and it can yield a large value when the random sample is from a distribution with long or heavy tails. This is true for both symmetrical and skewed distributions with large kurtosis value. The skewness test is in fact the most powerful test in this class of statistics for detecting skewed distributions with heavy tails for the normal and Gumbel cases. As for symmetrical distributions with heavy tails for the normal case, the skewness test compared favorably with the kurtosis test. The skewness test is the 155 Table 4.41. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution Sample Sizes n = 20 Statistics No. Distribution /g, 1 N(0,1)+N(10,l) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,l) 6 Tukey(1.5) 7 Uniform(0,l) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-l,1 ) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 15 N(0,1)+N(2,l) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0.2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 31 ScConN(0.05,5) 32 ScConN(0.1,7) 33 ScConN(0.05,7) 34 SU(Q,1) 35 SU(0,0.9) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n = 50 R b^ /b^ R 914 650 713 449 402 303 255 203 182 1 42 172 101 59 44 36 46 38 58 77 107 115 133 493 260 393 209 331 692 797 552 347 685 468 439 495 868 709 764 520 473 379 354 272 245 205 231 136 98 57 55 56 35 51 80 95 99 112 490 248 379 197 312 676 777 534 347 678 455 412 495 55 3 18 15 17 9 7 8 9 3 15 3 9 12 10 17 36 56 90 129 130 149 367 259 384 213 336 598 565 514 346 631 455 423 466 1000 •994 992 948 925 864 790 732 508 588 466 340 262 88 70 88 34 56 95 1 65 200 236 718 486 716 444 572 958 990 892 672 946 804 746 848 n = 1 00 bj /bj R b^ /b; 1 0 1000 4 1000 6 1000 6 1000 8 1000 2 1000 2 996 0 992 2 972 2 960 10 792 0 824 2 664 2 168 2 136 16 184 25 20 48 52 106 152 152 21 2 182 350 224 412 320 936 376 728 516 912 390 536 440 828 688 996 728 1000 696 976 586 912 778 996 732 944 558 952 686 984 1000 1000 1000 1000 996 1000 996 992 996 972 864 884 764 268 240 248 32 40 150 224 364 428 955 772 908 640 840 996 1000 980 904 996 948 968 980 12 4 12 0 4 4 0 4 0 0 12 0 0 8 0 16 12 44 100 196 g% 1..15 1..50 1..51 1..53 1..72 1.:75 1.80 1.,87 1. 92 1.94 2.04 2.06 2.14 2.36 2.40 2.50 2.84 2.92 3.53 4.00 4.20 4.51 5. 38 5.00 7.54 7.65 8.33 n .2 12.8 16.5 20.0 21 .5 31 .4 35.2 82.1 1000 998 988 976 948 912 864 846 752 698 560 468 356 150 118 140 30 62 100 172 202 240 780 524 708 434 564 964 994 892 680 946 802 770 860 232 292 224 412 524 504 568 764 708 800 796 784 848 712 764 1 66 Table 4.42. Empirical 5% level power (in % xlO) for tests of departure from the normal distribution n = 20 Sample Sizes Statistics No. Distribution ni m o o bz 231 506 848 846 340 81 74 171 •43 38 48 52 136 194 112 187 553 748 1 43 308 243 211 284 702 210 425 848 554 649 654 872 186 962 690 844 217 488 834 836 247 92 91 139 50 33 42 54 98 95 75 98 234 768 100 174 1 41 160 167 444 160 236 590 450 339 574 541 133 780 541 585 /bi R bz n = 100 /bi R bz /bi 82 0 36 t(4) 0 37 t(2) 0 38 t(1) 0 39 CauchyCO,1) 40 SB(0.5333,0.5) 0.65 2.13 41 TruncN(-2,1) -.32 2.27 42 Beta(3,2) 0.29 2.36 43 Beta(2,1) -.57 2.40 44 TruncN(-3,2) -.18 2.65 0.00 2 . 1 2 45 Weibull(3.6) 46 Weibull(4) -.09 2.75 0.28 2.77 47 SB(1,2) 48 TruncN(-3,1) -.55 2.78 49 SB(1,1) 0.73 2.91 50 Weibull(2.2) 0.51 3.04 51 LoConN(0.2,3) 0.68 3.09 52 LoConN(0.2,5) 1.07 3.16 53 LoConN(0.2,7) 1.25 3.20 54 Weibull(2) 0.63 3.25 55 Half N(0,1) 0.97 3.78 56 LoConN(0.1,3) 0.80 4.02 57 LoConN(0.05,3) 0.68 4.35 58 Gumbel(0,1 ) 1.14 5.40 59 LoConN(0.1 ,5) 1.54 5,45 60 SU(-1,2) 0.87 5.59 61 Chi-Square(4) 1.41 6.00 62 LoConN(0.1,7) 1.96 6.60 63 1.65 7.44 64 Exponentiald ) 2.00 9.00 65 LoConN(0.05,7) 2.42 10.4 66 Chi-square(1) 2.83 15.0 67 Triangle 11(1) 0.57 16.4 68 Weibull(0.5) 6.62 87.7 69 SU(1,1) -5.3 93.4 6.18 114 70 LogN(0,1,0) o o o J R n = 50 244 486 467 858 770 996 775 992 213 746 32 192 28 178 114 412 41 60 36 35 34 38 49 76 125 282 227 478 121 206 231 410 602 960 768 1000 158 334 354 728 298 492 245 410 342 638 767 986 244 448 497 862 863 992 580 890 713 982 648 938 910 1000 135 394 973 1000 730 964 885 996 506 402 724 720 484 868 712 996 992 788 996 908 1000 1000 956 994 908 1000 1000 992 462 468 1000 700 860 240 54 480 520 140 224 36 332 400 136 258 330 820 396 71 2 66 40 84 68 76 48 48 10 60 24 28 40 24 22 32 60 80 204 74 172 136 264 520 100 616 92 548 884 76 924 104 238 436 128 504 78 516 792 52 896 176 946 1000 1 28 1000 232 984 1000 192 1000 148 360 61 6 100 688 214 788 976 248 984 212 574 848 368 880 302 472 688 472 708 370 668 948 564 960 734 988 1000 900 1000 330 486 700 520 708 454 878 996 640 992 884 992 1000 980 1000 872 878 988 988 984 672 992 1000 896 1000 934 940 1000 1000 1000 898 1000 1000 1000 1000 230 322 816 408 652 980 1000 1000 1000 1000 840 964 1000 988 996 906 996 1000 992 1000 157 Table 4.43. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution Sample Sizes n = 20 Statistics No. Distribution /g^ 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5.1) 4 88(0,0.5) 5 N(0,1)+N(4,l) 5 Tukeyd.S) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-l,1) n N(0,1)+N(3,l) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1) 16 N(0,1)+N(2,l) 17 TruncN(-3,3) 18 N(0,1)+N(1,1) 19 N(0,1) 20 SU((0,3) 21 t(10) 22 Logistic(0,1) 23 SU(0,2) 24 Tukey(IO) 25 Laplace(0,1) 26 ScConN(0.2,3) 27 ScConN(0.05,3) 28 ScConN(0.1,3) 29 ScConN(0.2,5) 30 ScConN(0.-2,7) 31 ScConN(0.1,5) 32 ScConN(0.05,5) 33 ScConN(0.1,7) 34 ScConN(0.05,7) 35 SU(0,1) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n = 50 n = 1 00 R b^ /b^ R bg /bj R b^ 939 779 841 647 612 521 449 407 362 349 399 279 246 240 245 265 250 271 272 278 299 300 301 400 355 396 352 373 543 580 514 424 576 517 435 896 716 757 605 544 428 370 322 293 232 276 155 119 55 60 83 37 39 29 22 28 19 29 70 30 71 72 89 218 325 267 204 385 343 138 285 236 308 221 250 256 283 236 240 248 241 264 242 265 269 278 297 306 304 314 337 335 338 411 393 417 382 402 558 584 534 454 583 532 463 1000 998 996 996 988 984 968 950 938 892 876 836 766 688 750 650 584 612 606 598 574 580 556 555 536 542 596 596 668 710 742 690 822 800 640 1000 1000 998 990 976 966 948 918 894 846 726 570 558 262 230 264 65 56 70 24 28 12 24 18 25 75 108 116 342 468 486 394 680 604 242 706 710 728 750 724 746 722 740 770 750 750 772 722 722 748 682 642 672 666 656 598 616 606 594 570 540 634 614 608 622 678 654 726 744 634 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 996 992 976 940 908 944 848 864 772 832 764 748 672 712 708 764 836 852 860 948 920 776 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 992 1000 988 816 776 552 244 236 1 40 72 24 28 40 0 35 92 148 136 344 604 704 552 924 836 372 g^ 1.15 1 .50 1 .51 1.63 1.72 1 .75 1 .80 1 .87 1 .92 1.94 2.04 2.05 2.14 2.35 2,40 2.50 2.84 2.92 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.55 8.33 11.2 12.8 15:5 20.0 21.5 31.4 35.2 968 1000 984 1000 988 1000 1000 992 996 996 996 992 984 992 996 980 948 936 952 872 892 796 860 804 752 668 688 680 672 688 675 768 756 804 708 168 Table 4.44. Empirical 5% level power (in % xlO) for tests of departure from the Gumbel distribution n = 20 Sample Sizes Statistics No. Distribution n = 50 R bz /bi R 463 356 493 762 732 213 527 489 786 330 249 315 136 626 41 69 44 61 131 63 63 66 115 100 100 50 266 180 97 385 235 104 511 928 320 179 64 222 580 553 263 113 96 154 53 41 38 49 72 67 51 39 71 123 51 80 20 14 100 38 61 256 118 104 324 218 141 438 209 297 489 375 517 758 721 8 558 523 780 379 280 355 143 668 8 55 19 15 95 48 14 74 137 122 126 44 330 205 121 442 279 17 598 939 369 634 608 708 926 930 542 978 970 1000 840 664 780 316 994 98 150 46 92 120 102 56 58 128 30 174 60 266 138 146 588 412 278 862 992 556 n = 100 R bz /bi 618 780 624 780 682 824 840 996 854 992 8 852 978 1000 960 1000 1000 1000 862 1000 700 980 796 984 324 688 994 1000 6 252 1 44 344 12 164 0 84 24 104 70 256 8 52 68 60 154 208 44 32 182 228 60 72 364 376 154 92 176 256 656 840 482 680 32 672 904 988 994 1000 630 800 4l 6 128 600 988 960 864 828 852 708 468 344 352 320 384 284 252 144 108 132 248 72 0 8 4 40 52 28 52 164 528 400 748 888 548 652 708 772 736 896 912 36 1000 1000 1000 1000 988 992 720 1000 40 320 80 0 4 172 8 72 252 44 248 84 484 140 288 884 740 164 996 1000 820 /bi 62 0 82.1 36 SU(0,0.9) 0 37 t(4) 0 38 t(2) 0 39 t(1) 40 Cauchy(0,1) 0 41 56(0.5333,0.5) 0.65 2.13 -.32 2.27 42 TruncN(-2,1) 43 Beta(3,2) 0.29 2.36 -.57 2.40 44 Beta(2,1) -.18 2.65 45 TruncN(-3,2) 46 Weibull(3.6) 0.00 2.72 -.09 2.75 47 Weibull(4) 48 86(1,2) 0.28 2.77 -.55 2.78 49 TruncN(-3,1) 0.73 2.91 50 SB(1,1) 0.51 3.04 51 Weibull(2.2) 52 LoConN(0.2,3) 0 .68 3.09 53 LoConN(0.2,5) 1 .07 3.16 54 LoConN(0.2,7) 1 .25 3.20 55 Weibull(2) 0.63 3.25 56 Half N(0,1) 0.97 3.78 57 LoConN(0.1,3) 0:80 4.02 58 LoConN(0.05,3) 0 .68 4.35 59 LoConN(0.1 ,5) 1 .54 5.45 60 SU(-1,2) 0 .87 5.59 61 Chi-square(4) 1 .41 6.00 62 LoConN(0.1 ,7) 1,.96 6.60 63 LoConN(0.05,5) 1,.65 7.44 64 Exponentiald) 2,.00 9.00 65 LoConN(0.05,7) 2,.42 10.4 66 Chi-square(1) 2.83 15.0 67 Triangle 11(1) 0..57 16.4 68 Weibull(0.5) 6..62 87.7 69 SU(1,1) -5.3 93.4 70 LogN(0,1,0) 6.. 1 8 . 114 312 88 432 852 890 584 390 322 338 140 130 112 146 146 140 106 48 112 130 102 66 6 4 24 48 72 182 102 124 468 300 328 714 376 464 169 Table 4.45. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution Sample Sizes n = 20 Statistics R No. Distribution /g^ 1 N(0,1)+N(10,1) 2 Beta(0.5,0.5) 3 N(0,1)+N(5,1) 4 SB(0,0.5) 5 N(0,1)+N(4,1) 6 Tukeyd.S) 7 Uniform(0,1) 8 SB(0,0.707) 9 Tukey(0.7) 10 TruncN(-1,1) 11 N(0,1)+N(3,1) 12 Tukey(3) 13 Beta(2,2) 14 TruncN(-2,2) 15 Triangle 1(1 ) 15 N(0,1)+N(2,1) 17 TruncN(-3,3) 18 N(0,1) 19 SU((0,3) 20 t(10) 21 Logistic(0,1 ) 22 SU(0,2) 23 Tukey(IO) 24 Laplace(0,1) 25 ScConN(0,2,3) 26 ScConN(0.05,3) 27 ScConN(0.1,3) 28 ScConN(0.2,5) 29 ScConN(0.2,7) 30 ScConN(0.1,5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n = 50 bj /bj R n = 100 bg /b^ R b^ /b^ 1000 1000 1000..1000 1000 1000 1000 1000 1000 998 998 998 1000 1000 1000 994 1000 998 1000 992 1000 932 1000 944 1000 926 1000 734 998 686 998 552 990 330 990 304 956 175 944 135 926 82 90 922 0 878 874 12 14 760 836 160 830 70 60 726 702 108 740 154 998 1000 1000 1000 998 998 1000 1000 1000 1000 1000 1000 1000 1000 998 998 990 990 958 952 934 930 884 876 760 836 838 724 684 722 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 996 1000 1000 1000 1000 996 976 972 992 936 892 872 868 784 756 788 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 992 996 952 832 560 454 355 336 0 20 4 236 88 40 108 228 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 996 1000 1000 1000 1000 992 980 972 992 940 900 868 868 788 724 740 6% 1.15 1 .50 1 .51 1.53 1 .72 1 .75 1 .80 1.87 1.92 1.94 2.04 2.05 2.14 2.35 2.40 2.50 2.84 3.00 3.53 4.00 4.20 4.51 5.38 6.00 7.54 7.65 8.33 11.2 12.8 15.5 943 915 882 762 885 791 832 501 830 593 826 478 817 400 807 361 801 317 824 284 800 312 798 207 804 152 765 90 81 762 747 109 40 723 38 717 708 36 22 683 669 19 559 17 502 16 605 10 585 21 46 672 510 32 81 553 596 142 526 120 741 787 794 792 796 820 827 819 832 848 809 850 841 805 806 778 758 751 735 719 697 705 619 643 606 697 645 564 598 653 170 Table 4.46. Empirical 5? level power (in % xlO) for tests of departure from the exponential distribution n = 20 Sample Sizes Statistics No. Distribution R n = 50 bz /b, R 658 11 8 684 234 698 185 618 56 616 103 634 34 600 122 711 390 727 401 277 276 954 138 914 130 990 159 836 57 769 57 758 54 529 68 964 94 181 81 409 60 198 33 78 90 100 125 56 53 62 328 76 63 58 76 81 145 15 253 80 52 681 691 708 637 822 732 826 784 788 840 810 906 894 802 1000 1000 1000 1000 994 1000 952 1000 584 838 628 196 206 220 734 84 84 350 492 94 n = 100 b. /bi R 806 690 782 788 788 842 780 868 864 686 1000 1000 1000 1000 852 836 924 904 876 920 990 984 968 1000 1000 1000 1000 1000 1000 1000 996 1000 91 2 984 968 592 576 620 948 152 116 648 820 156 b. /bi 62 31 ScConN(0.05,5) 0 20.0 0 21.5 32 ScConN(0.1,7) 0 31.4 33 ScConN(0.05,7) 0 36.2 34 SU(0,1) SU(0,0.9) 0 82.1 35 0 36 t(4) 0 37 t (2) 0 38 t(1) 0 39 Cauchy(0,1 ) 40 88(0.5333,0.5) 0.65 2.13 41 TruncN(-2,1) -.32 2.27 42 Beta(3,2) 0.29 2.36 43 Beta(2,1) -.57 2.40 44 TruncN(-3,2) -.18 2.65 0.00 2.72 45 Weibull(3.6) 46 Weibull(4) -.09 2.75 0.28 2.77 47 SB(1,2) 48 TruncN(-3,1 ) -.55 2:78 49 SB(1,1) 0.73 2.91 50 Weibull(2.2) 0.51 3.04 51 LoConN(0.2,3) 0.68 3.09 52 LoConN(0.2,5) 1.07 3.16 53 LoConN(0.2,7) 1.25 3.20 54 TruncE(0,3) 0.99 3.22 0.63 3.25 55 Weibull(2) 56 LoConE(0.2,7) 1.33 3.27 57 LoConE(0.2,5) 1.25 3.40 58 Half N(0,1) 0.97 3.78 59 LoConN(0.1,3) 0.80 4.02 60 LoConE(0.2,3) 1.20 4.09 626 655 611 717 733 192 956 936 992 871 804 794 577 978 194 453 227 33 34 48 369 77 80 158 290 86 258 284 386 68 146 76 244 686 682 816 820 728 704 490 438 378 454 468 386 380 226 206 220 202 320 70 60 214 46 68 994 1000 958 1000 614 850 658 112 36 208 744 82 82 358 518 86 308 476 636 144 240 128 384 932 896 1000 1000 996 996 976 968 960 900 900 832 812 772 624 620 612 712 104 96 520 224 120 816 764 840 880 840 912 848 932 896 988 1000 1000 1000 1000 1000 1000 996 1000 924 988 972 388 116 556 948 156 124 664 828 160 171 Table 4.47. Empirical 5% level power (in % xlO) for tests of departure from the exponential distribution n = 20 Sample Sizes Statistics No. Distribution n. = 50 R bz /bi 34 420 31 200 86 374 47 92 47 38 155 56 299 68 59 89 68 71 54 273 70 103 11 4 106 226 213 313 80 184 281 988 146 31 22 29 32 12 35 54 49 53 35 106 55 31 63 70 105 76 81 63 164 68 120 117 112 232 230 317 86 194 285 138 163 36 464 34 232 98 409 53 117 50 42 165 50 323 68 69 111 75 75 54 289 75 118 123 116 259 248 343 84 196 316 987 171 n = 100 bz /b, R bz /bi 66 64 68 632 44 56 344 102 24 0 634 74 94 76 1 42 78 68 76 36 30 8 6 68 62 120 22 66 56 62 70 110 100 84 80 84 92 66 54 788 652 94 86 164 146 236 21 4 166 1 48 116 94 392 326 508 368 110 106 384 354 486 398 998 186 288 266 66 634 46 354 24 640 96 160 68 34 10 64 120 70 66 120 82 102 66 794 90 1 40 860 72 544 8 776 92 228 76 20 8 64 88 56 108 180 120 148 72 1000 156 252 344 264 440 324 356 220 688 732 148 196 80 240 4 216 72 136 64 16 8 68 4 52 104 128 1 20 11 6 60 1000 156 276 112 244 496 72 92 11 2 868 56 556 4 780 88 236 72 24 0 60 88 52 116 200 132 1 44 72 1000 1 60 44 356 268 36 324 236 228 708 768 1000 432 R 62 61 TruncE(0,4) 1.27 4.20 62 LoConN(0.05,3) 0.68 4.35 63 TruncE(0,5) 1.50 5.26 64 GumbeKO,1 ) 1.14 5.40 65 LoConN(0.1,5) 1.54 5.45 66 SU(-1,2) 0.87 5.59 67 LoConE(0.1,3) 1.62 5.86 68 Chi-square(4) 1.41 6.00 69 LoConE(0.1,5) 1.88 6.02 70 TruncE(0,6) 1.68 6.29 71 LoConN(0.1,7) 1.96 6.60 72 LoConE(0.05,3) 1.85 7.29 73 LoConN(0.05,5) 1.65 7.44 74 LoConE(0.05,7) 2.75 10.9 75 ScConE(0.05,2) 2.42 13.6 76 Chi-square(1 ) 2.83 15.0 77 ScConE(0.1,2) 2.61 15.3 78 ScConE(0.2,2) 2.71 15.6 79 LoConE(0.01,7) 2.94 15.9 80 Triangle 11(1) 0.57 16.4 81 ScConE(0.01,3) 2.59 18.0 82 ScConE(0.2,3) 3.57 23.8 83 ScConE(0.1,3) 3.81 29.4 84 ScConE(0.05,3) 3.60 29.8 85 ScConE(0.2,7) 4.50 31 .5 86 ScConE(0.1,5) 5.38 48.7 87 ScConE(0.1,7) 6.02 56.2 88 ScConE(0.01,5) 4.81 66.7 89 ScConE(0.05,5) 6.05 68.2 90 WeibulKO.S) 6.62 87.7 91 SU(1,1) -5.3 93.4 92 LogN(0,1,0) 6.18 114 172 250 172 40 412 534 114 388 500 998 1000 300 404 612 592 340 368 360 172 weakest for symmetrical distributions with short tails for the normal case. For the Gumbel and exponential cases, where the null distributions are skewed, the skewness test is also the most powerful for detecting symmetrical distributions with heavy tails. The kurtosis test is very powerful in detecting alternative distributions with short tails for all the three cases. It also has good power in detecting alternative distributions with long or heavy tails. As expected, the kurtosis test is weak for detecting distributions with kurtosis measure similar to that of the null distribution. The rectangle test is a combination of the skewness and the kurtosis tests. both kinds of departure. It is sensitive to Generally, it performed well when both the skewness and kurtosis tests did well. Also, it has power close to the better one when either the kurtosis or skewness test performed badly. 5. Comparison of classes of statistics The four classes of statistics used in this power comparison study are compared in this section. The alternative distributions were grouped into various subsets to illustrate how the relative performance of these statistics varies with the nature of the alternative distributions. The numbers in Tables 4.50 - 4.50 indicate the average proportions of simulated samples for which the null hypothesis was rejected. The Pearson chi-square and likelihood ratio statistics are generally not as powerful as the other three classes of statistics. The best Pearson chi-square or likelihood ratio statistic has about 70, 70 173 and 90 percent the power of the best statistics from the other three classes, for normal, Gumbel and exponential cases, respectively. The higher power achieved by the chi-square and likelihood ratio statistics for the exponential case is due to the inclusion of a larger proportion of alternative distributions which are substantially different from the exponential distribution, in the exponential power comparison study. The correlation type statistics generally performed well. The r^ statistic is among the best statistics in detecting alternative distributions with long or heavy tails, especially for the normal case. The relative performance of the r^ statistic degrades as the kurtosis of the null distribution increases. This is due to the smaller proportion of alternative distributions with tails longer or heavier than those of the Gumbel and exponential distributions, used in the power study. The relative performance of the r^ statistic is moderate or weak for alternative distributions with short tails. For testing normality, the Shapiro-Wilk statistic is the best or among the best for the four different sets of alternative distributions. The performance of the Shapiro-Wilk statistic is slightly less powerful for detecting symmetrical distributions with heavy tails when the sample size is large. The k^ statistic performed moderately well for the normal case. As the kurtosis of the null distribution increases from 3 to 9, the relative performance of k^ statistic improves. The k^ statistic is the best statistic for detecting a wide range of distributions when the sample size is small, for the exponential case. For larger sample sizes, the k^ statistic compared favorably with the other good 174 statistics. The statistics based on the empirical distribution function is a class of powerful statistics for all the three null distributions considered. Their relative performance is not much affected by the skewness or kurtosis of the null distribution. This property is desirable if one wishes to use the statistics for any kind of null distribution. The Anderson-Darling, Watson and Cramer Mises statistics usually rank high for the different sets of alternative distributions considered. care. The tests based on moments have good power if used with The rectangle test ranked high for all four sets of alternative distributions, used in the normal power study. The performance of the tests based on moments degrades as the kurtosis and skewness of the null distribution increase. Tables 4.48 and 4.49 contain the critical points of the tests based on moments, used in the power study. The length between the upper and lower critical points increases drastically from the normal to the exponential case, especially for the kurtosis. Table 4.48. Percentiles of the 0.05 level skewness test used in the empirical power comparison (sample size = 50) Lower percentile normal Gumbel exponential -0.74 0.085 0.75 Upper percentile 0.74 2.29 3.23 Difference 1.48 2.204 2.48 175 Table 4.49. Percentiles of the 0.05 level kurtosis test used in the empirical power comparison (sample size = 50) Lower percentile normal Gumbel exponential 2.03 2.17 2.56 Upper percentile 4.97 11.12 17.21 Difference 2.94 8.95 14.65 175 Table 4.50. Statistics ranked by fraction of rejection of alternatives to the normal distribution (19 symmetrical distributions with kurtosis less than 3) Sample sizes Rank 1 2 3 it 5 6 7. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 20 50 .292 b. .574 bz .531 W .521 R .703 bz .674 R .609 Two tailed kurtosis Rectangle .455 .593 Anderson-Darling .435 .414 .558 Watson .550 .394 .390 'V .539 V .513 G3 Cramer-von Mises Kuiper .352 .493 k ' .487 X5 .486 X3 .279 .254 R .245 W .240 U" .235 A" .225 V .223 .207 kz .198 G1 .351 G3 .196 G'/z .340 X3 .183 G3 .315 D .181 D 100 .477 .293 XI3 .477 G5 .452 D .420 G17 .408 G1 .162 G7 .162 X7 .142 XVj .132 X5 .291 XI .286 r^ .283 G5 .405 X17 .400 XI .367 GIO :278 X10 .359 X10 .117 r: .018 /b, .275 X5 .322 G'/z .264 G'/z .301 XVa .229 X'/z .008 /bi .009 .166 G5 .163 XI .162 X3 .314 G13 .313 G1 .300 GIO Statistics A^ (Inflated Type I error) Likelihood ratio(3) Correlation (P-P plot) Chi-square(5) Chi-square(3) Correlation (Q-Q plot) Likelihood ratio(5) Kolmogorov-Smirnov Likelihood ratio(M&W:7,10,17) Likelihood ratio(l) Chi-square(M&W:7,10,17) Chi-square(1) Likelihood ratio(IO) Chi-square(IO) Likelihood ratiofi/g) Chi-square( Vj) Two tailed Skewness (W = Shapiro-Wilk) 177 Table 4.51. Statistics ranked by fraction of rejection of alternatives to the normal distribution (21 symmetrical distributions with kurtosis greater than 3) Sample sizes Rank 20 1 .480 r^ .470 B2 2 3 4 .444 W 5 6 .430 R .425 .422 U2 .438 A: 7 8 .417 9 .403 V 50 100 Statistics .714 rz .820 r ^ Correlation (Q-Q plot) .656 bz .799 bz Two tailed kurtosis .558 R .655 6= .793 R .732 W2 Rectangle .641 .732 .714 V .712 B2 Kuiper A^ (Inflated Type I .703 A: Anderson-Darling .583 D Kolmogorov-Smirnov .662 k^ Correlation (P-P plot) .520 .619 W .618 W: Cramer-von Mises Watson 10 .401 /bi .599 V .569 D 11 .394 k: .384 D .556 k: .646 X17 Chi-square(M&W:7,10,17) 12 .536 G13 .545 G17 Likelihood ratio(M&W:7,10,17) 13 .329 G7 .534 XI3 .602 XI0 Chi-squared0) 14 .329 X7 .528 /b, .598 GIG Likelihood ratio(IO) 15 16 .320 X3 .316 05 .525 XI0 .521 G10 .593 /b. . 5 6 2 X5 Two tailed Skewness Chi-square(5) 17 18 .310 X5 .303 G3 .478 X5 .464 G5 .557 G5 .530 G3 Likelihood ratio(5) Likelihood ratio(3) 19 .244 01 .433 03 .525 X3 Chi-square(3) 20 .226 GV, .431 X3 .461 G1 Likelihood ratio(l) 21 .221 XI .452 XI Chi-square(1) .399 Qi/z Likelihood ratioC/^) 22 23 24 .374 01 .184 XV2 .356 XI .322 O'/z .382 X'/, .291 X'/; Chi-square( V2) (W = Shapiro-Wilk) 178 Table 4.52. Statistics ranked by fraction of rejection of alternatives to the normal distribution (9 skewed distributions with kurtosis less than 3) Sample sizes Rank 20 50 1 .220 .448 W 2 3 4 .202 W .181 A: .391 .359 A: .164 IJ: 5 6 100 .558 B .548 A: Statistics .323 .495 W .486 r (Inflated Type I error) Anderson-Darling Cramer-von Mises Correlation (Q-Q plot) .153 .158 .314 .304 k' .479 R .475 Rectangle Watson 7 .147 V .294 r^ .469 k^ Correlation (P-P plot) 8 .136 G3 .284 V .457 V Kuiper 9 .131 G'/z .273 R .451 G5 Likelihood ratio(5) 10 11 .130 D .128 X3 .259 G3 .258 D .433 D .411 X5 Kolmogor ov-Smirnov 12 .127 R -254 X5 .409 /b, Two tailed Skewness Likelihood ratio(3) Chi-square(3) 13 .125 G1 .250 G5 .400 G3 14 15 .117 r^ .101 ba .230 X3 .215 G1 .367 X3 .344 G10 16 .099 XI .204 /b] .343 X10 Likelihood ratio(IO) Chi-square(IO) Likelihood ratio(l) 17 .095 /b, .202 XI 18 19 .089 XVj .180 GVj .280 XI .068 G5 .174 b^ .255 b^ Two tailed kurtosis 20 .068 X7 .152 XVj .253 G17 Likelihood ratio(M&W:7,10,17) 21 .058 G7 .145 G13 .244 XI7 Chi-square(M&W:7,10,17) 22 .046 X5 .134 GIO .230 G'/g Likelihood ratioCVz) Chi-square( V2) (W = Shapiro-Wilk) 23 24 .293 G1 Chi-square(5) .124 X13 .216 X'/z .116 X10 Chi-square(1) 179 Table 4.53. Statistics ranked by fraction of rejection of alternatives to the normal distribution (21 skewed distributions with kurtosis greater than 3) Sample sizes Rank 20 50 100 Statistics Correlation (Q-Q plot) 1 .579 .789 W 2 3 4 .573 W .544 .538 r" .777 .774 r: .763 A: 5 6 .527 /bi .521 W2 .761 /b. .744 R .870 7 8 .508 U: .732 .708 U" .847 y: k2 Correlation (P-P plot) .827 D .823 V Kolmogorov-Smi rnov Kuiper .735 G5 .726 G10 Likelihood ratio(5) .724 X10 .716 X5 Chi-square(10) Chi-square(5) .709 03 Likelihood ratio(3) . 6 8 8 X3 Chi-square(3) .285 GV, .524 X10 .503 b . .277 XI .267 X5 .493 G13 .225 X ' / z .490 01 .675 X17 .674 017 .618 01 Chi-square(M&W:7,10, Likelihood ratio(M&W .206 G7 .206 X7 .598 b z .538 G ' / z .462 D 12 .417 G3 .601 G5 .408 X3 .599 X5 .571 G3 .546 X3 .525 010 15 16 17 18 19 20 21 22 23 24 .347 b z .322 G1 .285 G5 .481 X I 3 .457 XI .895 8% A^ (Inflated Type I error) A^ OC OC 11 .683 V .676 D Rectangle OO .495 R .480 V 13 14 .691 k== 9 10 Two tailed Skewness .895 R oo 00 .496 k: .907 r: .902 / b , .612 XI .417 G ' / z .525 XV2 .376 X ' / z Anderson-Dar1i ng Cramer-von Mises Watson Likelihood ratio(IO) Likelihood ratio(l) Chi-square(1) Two tailed kurtosis Likelihood ratic^'/g! Chi-square( Vj) (W = Shapiro-Wilk) 180 Table 4.54. Statistics ranked by fraction of rejection of alternatives to the Gumbel distribution (41 alternative distributions with skewness less than 1.14 and kurtosis less than 5.4) Sample sizes Rank 20 50 1 2 .345 R .653 R .839 .305 .621 .799 R 3 4 .290 . 5 8 3 /bi .289 .565 .772 .771 / b i Two tailed Skewness 5 5 .275 V .549 .761 V Kuiper 7 8 .264 /bi .243 D .527 k^ .527 V .471 D .757 .750 k^ .707 D Watson Correlation (P-P plot) . 2 3 5 G3 . 4 2 7 G5 . 6 7 0 r^ Correlation (Q-Q plot) .655 G5 . 6 3 6 X10 Likelihood ratio(5) Chi-square(IO) .635 G10 Likelihood ratio(IO) 9 10 11 12 .182 b a .181 r = .159 XI .157 G5 OC OC 15 16 .416 G3 .217 X3 .415 X5 .191 G1 .186 GV, . 3 9 4 X3 r =• 13 14 .275 100 r^ .376 b z .325 G1 . 3 1 5 X10 Statistics Anderson-Darling Rectangle Cramer-von Mises Kolmogorov-Smirnov .61 2 X5 Chi-square(5) .543 b z .537 X17 . 5 3 3 G17 Two tailed kurtosis Chi-square(M&W:7,10,17) .51 4 G3 Likelihood ratio(M&W:7,10,17) . 4 8 6 X3 Likelihood ratio(3) Chi-square(3) 19 .145 G7 .31 4 G10 .145 X7 .290 XI .138 Xi/z .289 G13 .441 G1 Likelihood ratio(l) 20 .134 X5 .421 XI Chi-square(1) 17 18 .279 X13 21 .274 GV, .362 GV, Likelihood ratioCVj) 22 .217 XVa .344 XVj Chi-square( Vj) I8l Table 4.55. Statistics ranked by fraction of rejection of alternatives to the Gumbel distribution (19 alternative distributions with skewness less than 1.14 and kurtosis greater than 5.4) Sample sizes 3 4 5 6 7 8 11 12 13 14 15 16 17 18 19 20 21 22 100 .934 A^ Statistics Anderson-Darling .530 W" .525 U" .806 .923 Watson .806 0= .921 W: Cramer-von Mises .511 V .789 V .919 V Kuiper .502 k" .494 r^ .487 /b, .756 D .896 D .756 k: .726 X13 .890 k^' .871 G17 Kolmogorov-Smirnov Correlation (P-P plot) .484 D .723 G13 .870 X17 .474 R .418 X 3 .41 5 G5 .720 X10 XI 0 .852 GIO Likelihood ratio(IO) .699 r: .413 G3 .677 X5 .837 X5 .836 r^ Chi-square(5) Correlation (Q-Q plot) .409 X 5 . 4 0 5 X7 .405 G7 . 3 2 9 G1 .673 G5 .663 R .648 X3 . 8 3 0 G5 .793 R .776 G3 Likelihood ratio(5} Rectangle .646 G3 .773 X3 .300 XI .624 /b. .723 G1 .714 XI .700 /b, .716 G10 .295 GV, .568 G1 .239 X'/z .540 XI .217 ba TS ni 00 9 10 . 5 4 2 A" 50 OO 1 2 20 OO Rank Likelihood ratio(M&W:7,10,17) Chi-square(M&:W:7,10,17) Chi-square(10) Likelihood ratio(3) Chi-square(3) Likelihood ratio(l) Chi-square(1) Two tailed Skewness . 5 1 0 G ' / z .676 G ' / z Likelihood ratioCV^) .452 X'/z .657 X'/z .483 bz .361 bg Chi-square( Vj) Two tailed kurtosis 182 Table 4.56. Statistics ranked by fraction of rejection of alternatives to the Gumbel distribution (1 alternative distribution with skewness greater than 1.14 and kurtosis less than 5.4) Sample sizes Rank 1 2 3 4 5 5 7 8 9 10 11 12 20 50 .836 1.000 .726 .701 k^ .683 V .683 .610 D .606 r^ .478 G3 .426 X3 .419 G1 . 9 8 8 r^' .984 .980 .976 V .974 U2 .970 D .940 G3 .918 X3 .906 G5 .389 G ' / z .862 G1 . 8 5 8 XI .319 XI 100 1. 0 0 0 X3 1.000 X5 1.000 X10 1.000 G3 1 .000 G5 1 .000 010 1 .000 D 1.000 1 .000 V 1 .000 1.000 1.000 r^ 15 16 k== .856 X5 1 .000 . 2 6 6 X ' / z .752 X10 . 9 9 6 XI .748 G ' / a .980 G1 .223 X5 .744 G10 .972 G17 .147 G7 17 18 .147 X7 .131 R 19 20 .123 bz .095 /bi 13 14 21 22 .274 G5 .670 XV, . 9 6 8 X17 .632 X13 .932 xv. .6l 6 G13 .91 2 GV, .130 bz .132 bz .120 R .024 /bi .104 R .004 /b^ Statistics Chi-square(3) Chi-square(5) Chi-square(IO) Likelihood ratio(3) Likelihood ratio(5) Likelihood ratio(IO) Kolmogor0v-Smi rnov Cramer-von Mises Kuiper Watson Anderson-Darling Correlation (Q-Q plot) Correlation (P-P plot) Chi-square(1) Likelihood ratio(1) Likelihood ratio(M&W:7,10,17) Chi-square(M&W;7,10,17) Chi-square('/%) Likelihood ratioC/,) Two tailed kurtosis Rectangle Two tailed Skewness 183 Table 4.57. Statistics ranked by fraction of rejection of alternatives to the Gumbel distribution (9 alternatives with skewness greater than 1.14 and kurtosis greater than 5.4) Sample sizes Rank 20 1 .474 A" .453 r^ .41 4 2 3 4 5 6 7 8 9 10 15 15 17 18 19 20 21 22 .377 k^ .352 D .361 V .280 G3 .540 r^' .521 .598 .588 V .578 D .480 G3 .462 X3 .450 G5 .274 X 3 . 2 7 3 G ' / z .445 X5 G1 .270 G1 .279 /bi no 13 14 .382 .727 .549 00 11 12 100 50 Statisti O S .890 A" .801 W' .795 V Anderson-Darling Cramer-von Mises .789 Watson .784 .770 D . 7 6 1 k" .582 G5 Kuiper Correlation (Q-Q plot) Kolmogorov-Smirnov Correlation (P-P plot) Likelihood ratio(5) Chi-square(5) .551 X5 .511 G3 . 6 1 0 010 Likelihood ratio(3) Likelihood ratio(W) .507 X10 Chi-square(IO) .413 XI .503 X3 .410 G ' / z .502 XI . 2 3 4 XI .216 X ' / z . 3 8 6 / b , .587 G1 .213 b z . 3 6 8 G13 . 5 6 2 X 1 7 Chi-square(3) Chi-square(1) .187 G7 Likelihood ratio(M&W:7,10,17) .238 R .187 X7 .141 G5 .115 X5 .367 XI 3 .555 G17 .361 XV, .544 X ' / z .347 X10 .344 G10 .536 G ' / z .498 / b , .340 R .460 R .308 b z .272 bz Likelihood ratio(l) Chi-square(M&W:7,10,17) Chi-squareC/,) Likelihood ratioC/,) Two tailed Skewness Rectangle Two tailed kurtosis 184 Table 4.58. Statistics ranked by fraction of rejection of alternatives to the exponential distribution (51 alternatives with skewness less than 2 and kurtosis less than 9) Sample sizes Rank 20 50 1 2 . 5 9 8 k^ .542 .801 .800 V 3 4 .539 .536 V 5 6 .520 /bi .512 R .498 D 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 .463 X3 .444 G3 .395 .357 r^ .347 X5 .345 G5 .341 G1 .338 G7 . 3 3 3 XI 100 Statisti cs .842 Cramer-von Mises .799 .794 k^ .842 .841 V .837 D Watson Kuiper .789 .781 D .773 X5 .771 G5 .837 A^ . 8 3 6 k^ .820 G5 . 8 1 9 X10 Anderson-Darling Correlation (P-P plot) Likelihood ratio(5) .752 G3 .746 X3 .718 X10 .716 GIG .817 G10 . 8 1 5 X5 .806 X17 .701 0 1 3 . 6 9 4 X13 .680 R . 8 0 2 G17 .676 / b i .650 G1 .309 X7 . 2 8 3 GV, .642 XI .252 XV, .596 r^ .149 K .553 G'/, . 8 0 3 G3 Kolmogorov-Smirnov Chi-square(10) Likelihood ratio(lO) Chi-square(5) Chi-square(M&W:7,10,17) Likelihood ratio(3) Likelihood ratio(M&W:7,10,17) Chi-square(3) Likelihood ratio(l) .799 X3 .770 G1 . 7 5 5 XI Chi-square(1) .754 R .742 / b i .698 GV, .691 r" Rectangle Two tailed Skewness Likelihood ratioCV,) Correlation (Q-Q plot) .505 XV, .675 X ' / z .404 K .587 K Chi-square('/,) Two tailed kurtosis 185 Table 4.59. Statistics ranked by fraction of rejection of alternatives to the exponential distribution (14 alternatives with skewness less than 2 and kurtosis greater than 9) Sample sizes 100 Statistics . 9 6 6 k^ .992 .965 .989 V Cramer-von Mises Kuiper .960 V . 9 8 8 A" Anderson-Dar1i ng .820 V .960 .988 Correlation (P-P plot) .758 X5 .958 D .986 .757 X3 .752 G5 .956 .979 D . 9 7 8 G5 .972 X10 Watson Kolmogorov-Smirnov Rank 20 1 2 .855 k" .834 3 4 .826 5 6 50 .743 .949 X5 . 9 4 6 G5 9 10 .739 D .735 G3 .944 G3 .941 X10 11 12 .715 G7 .701 X7 13 14 .660 G1 .659 XI 15 16 .654 /bi .647 r: 17 18 .642 R 7 8 19 20 Likelihood ratio(5) Chi-square(IO) .970 r" Correlation (Q-Q plot) . 9 3 9 GIG .970 X5 . 9 6 9 G10 Chi-square(5) Likelihood ratio(IO) . 9 3 9 X3 .963 G3 Likelihood ratio(3) .936 G13 .956 .955 .954 .944 Chi-square(M&W:7,10,17) Chi-square(3) .935 X 1 3 .916 G1 . 9 1 6 XI X17 X3 G17 G1 .891 GV, .942 XI .604 G ' / z .887 r^ .936 G ' / z .580 X ' / z .877 X ' / z .934 XV, .811 R .163 K .899 R 21 .795 22 .286 K •856 /bi .419 K Likelihood ratio(M&W:7,10,17) Likelihood ratio(l) Chi-square(1) Likelihood ratioCV,) Chi-square( V,) Rectangle Two tailed Skewness Two tailed kurtosis 186 Table 4.50. Statistics ranked by fraction of rejection of alternatives to the exponential distribution (17 alternatives with skewness greater than 2 and kurtosis greater than 9) Sample sizes ank 20 100 50 .233 D .195 r^ .353 r^ 5 6 .187 V .182 .317 V .310 7 8 .177 G7 . 1 6 8 k: . 2 9 9 k" 9 10 .149 X5 .147 /bi .145 G5 .140 K .283 XI 3 .282 G13 .281 G10 .138 X7 .265 G5 11 12 13 14 15 16 17 18 19 20 . 1 3 8 X3 .136 G3 .132 R X10 .271 X5 .256 X3 .254 G3 .241 XI .127 GV, .237 G1 .125 XI .230 G ' / z .125 G1 .217 X ' / z .110 XV, .205 /bi .476 .476 r^ Cramer-von Mises Correlation (Q-Q plot) .467 D .460 A^ Kolmogorov-Smirnov .451 X17 .450 k^ .449 G17 Chi-square(M&W:7,10,17) Correlation (P-P plot) Likelihood ratio(M&W:7,10,17) OC 3 4 .377 .372 A" .357 D OC JO .257 .254 00 1 2 Statistics XI0 Anderson-D arling Chi-square(IO) .447 V .446 Kuiper Watson .445 G5 .443 X5 Likelihood ratio(5) Chi-square(5) .443 G10 .441 G1 Likelihood ratio(l) .439 G3 Likelihood ratio(3) .438 X3 Chi-square(3) Likelihood ratio(IO) .432 XI Chi-square(1) .431 G ' / g .423 X V j Likelihood ratioC/j Chi-square( V2) .286 R Rectangle 21 .202 R .259 / b , Two tailed Skewness 22 .173 K .226 K Two tailed kurtosis 187 V, PERCENTILES OF THE AND STATISTICS This section describes the generation and smoothing of the Monte Carlo percentiles of the r^ and statistics. Curves were fitted through the percentiles to obtain formulas for the percentiles of these statistics. The percentiles of these statistics were generated for testing the fit of the normal, Gumbel and exponential distributions with unknown location and scale parameters. The percentiles of the r^ and statistics were also simulated for testing the fit of the exponential distribution with only an unknown scale parameter because this is the more frequently used probability model. A description of the random number generators can be found in Section A of Chapter 4 and Appendix B. The uniform random number generator developed by Wichmann and Hill (1982a) was used. For each of the null distributions, the r^ and statistics were simulated at each of the sample sizes, n = 5(1)50(5)100(10)200(100)1000. Table 5.1 shows the number of samples generated for each of the replication employed at each of the sample sizes. The choice of the number of samples was based on the stability of the Monte Carlo percentiles. It was observed that generating a larger number of samples than those listed in Table 5.1 for sample sizes 20 to 1000 affected the simulated percentiles only in the third or fourth decimal place. Larger number of samples ought to be used for sample sizes 5 to 20 to achieve the same stability. Larger number of samples were not used for sample sizes 5 to 20 because of certain memory limitations of the Microsoft FORTRAN compiler. However, the use of more replications for sample 188 sizes 5 to 20 helps to achieve the desired stability of the simulated percentiles. Replications were used for smoothing the percentiles and also to check the accuracy of the simulated percentiles. Table 5.1. Number of samples and replications employed in the simulation of the r^ and statistics Sample sizes Number of samples Number of replications 5(1)10 11(1)15 15000 15000 9 7 16(1)20 21(1)30 15000 15000 5 3 31(1)50 55(5)100 15000 1 5000 2 2 110(10)200 300(100)1000 10000 5000 2 2 The percentiles were first averaged over all replications for each sample size. Figure 5.1 contains a plot of the Monte Carlo percentiles of the statistic against the sample sizes for the normal case. These percentiles exhibit a very smooth pattern, with the percentiles at the 0.001 significant level showing slightly more fluctuation. This plot suggested that the following models may be appropriate for approximating the percentiles, Xp = 1 - a exp(-gn) , (5.1) 189 © O se « Significant levels o = 0.300 A = 0.200 + = 0.150 = 0.100 = 0.075 v = 0.050 B 0.025 X =0.010 • = 0.005 ® = 0.001 a @ ® B* 20 —I— —T" 40 60 80 100 120 140 160 Sample size, n Figure 5.1. Plot of percentiles against n 180 200 1 90 or (5.2) Xp = 1 - 1/(a + 3n) , where X^ is the percentile, n is the sample size and a and 6 are some parameters. A natural logarithmic transformation of model (5.1) yields ln(l - Xp) = ggn + Bi , where g^ and (5.3) are functions of a and g respectively. Nonlinear plots were obtained when log(l - X^) was plotted against n, suggesting that model (5.1) is not appropriate. Model (5.2) can be rewritten as 1/(1 - Xp) = gn + a . The plot of 1/(1 - Xp) against n for the Figures 5.2-5.%. (5.4) percentiles is shown in These plots are quite linear and model (5.4) can thus be used to smooth and fit lines to the Monte Carlo percentiles. An interactive graphical smoothing and curve fitting procedure (IGSCF) was developed using the IBM Personal Computer Plotting System (1984) on the IBM PC AT. The IGSCF procedure enables the points to be smoothed interactively and provides least-squares estimates of a and g for model (5.4). The accuracy of the model can be examined by comparing the smoothed Monte Carlo percentiles with those computed from the estimated model. 13 Transformed percentiles of 300 450 600 750 900 1200 1050 ai î\3 i ? P Ui o* M CD PL "73 O (D 3 a> M P) OTQ p I-" y M o o o en o o M oi o o en O 1350 1500 192 0.300 0.200 0.150 0.100 0.075 0.050 0.025 0.010 0.005 0.001 0 20 40 60 80 100 120 140 160 180 200 Sample size, n Figure 5.3. Plot of transformed percentiles against n 193 ti 0.300 0.200 0.150 0.100 0.075 0.050 0.025 0.010 0.005 0.001 0 5 10 15 20 25 30 35 40 45 50 Sample size, n Figure 5.4. Plot of transformed percentiles against n 194 1. Procedure IGSCF [1] For a particular probability level, enter the sample sizes and Monte Carlo percentiles into the arrays XN and XP respectively. [2] Transform the percentiles using XP = 1/(1 - XP). [3] Plot XP against XN. Enlarge certain portions of the plot of XP against XN if necessary. [4] For any point that appears to deviate to much from the straight line, the point is smoothed by linear interpolation of neighboring points or using the best judgement based on the plot. [5] Go to [6] if all the points are smoothed, otherwise go to [3]. [6] Obtain least-squares estimates of a and 3 and compare the Monte Carlo percentiles and those from the estimated model. [7] Go to [8] if the model provides a satisfactory fit to the percentiles, otherwise go to to [3] or stop and consider a new transformation in [2]. [8] Output smoothed percentiles and least-squares estimates of a and g. Little smoothing of points was performed since there were only very slight fluctuations about the straight line. Only those points that are clearly deviated from the straight lines were adjusted using the IGSCF procedure. Figures 5.5 - 5.7 are plots of the transformed percentiles with certain points smoothed. The IGSCF procedure was then used to compute the least-squares estimates of the parameters a and g of model (5.4). Estimates of a and g for the and r^ statistics are given in Tables 5.2 - 5.8 for the normal, Gumbel and exponential models. 195 o ta 0.300 « 0.200 o 0.150 0.100 0.075 0.050 0) o 0.025 0.010 0.005 cO o C lO 0.001 « Sample size, n Figure 5.5. Plot of transformed percentiles against n 196 0.300 0.200 CN 0.150 ^o 0.100 0.075 0.050 0o 0.025 0.010 0.005 0.001 20 40 60 Sample size, n Figure 5.6. Plot of transformed percentiles against n 197 0.300 0.200 0.150 0.100 0.075 0.050 0.025 0.010 0.005 0.001 20 25 30 35 40 50 Sample size, n Figure 5.7. Plot of transformed percentiles against n 198 Table 5.2. Least-squares estimates of a and B of the model approximating the percentiles of the normal P-P probability plot correlation coefficient Significance levels 0.001 0,005 0.010 0.025 0.050 0.075 0.100 0,150 0.200 0.300 B .36772 .47846 .55045 .65915 .77333 .90071 .93794 1.1121 1.2258 1.4087 a .95357 ,59852 ,18980 .48874 .74190 -1.261 1.5880 -.9422 -.7107 2.6229 Table 5.3. Least-squares estimates of a and g of the model approximating the percentiles of the normal Q-Q probability plot correlation coefficient r^ Significance levels 0.001 0,005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 .16391 .21236 .23785 ,28773 ,34126 .38075 .41445 .47163 .52443 .62205 ai 1.3062 1.6253 1.9329 2.4007 2.9597 3.4357 3.8678 4.6899 5.4256 6.8755 Ba .12804 .20330 .22745 .26595 ,31707 .34682 .38115 .42743 .46008 .55372 ct2 5.6511 2.0049 2.8531 4,5304 5.0003 6.5800 6.6100 8.4408 11,139 11,824 Ba ,16119 ,18933 .21309 .25811 .29582 0.3214 .34132 .37701 ,43760 .50142 a3 ,74124 8.1060 7.7935 6.2141 10.573 13-520 15.708 21.552 15.428 21.885 199 Table 5.4. Least-squares estimates of a and g of the model approximating the percentiles of the Gumbel P-P probability plot correlation coefficient Significance levels 0.001 0.005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 g .38450 .51041 .56241 .65946 .78468 .86620 .93210 1.1136 1.2403 1 . 4 0 8 0 a .39895 -.1716 .63448 1 . 6 3 9 4 1.7074 2.5642 2.9917 .51855 .53716 4 . 0 7 4 5 Table 5.5. Least-squares estimates of a and g of the model approximating the percentiles of the Gumbel Q-Q probability plot correlation coefficient r^ Significance levels 0.001 0.005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 61 .03375 .05749 .07486 .11122 .15404 .19064 .22292 .28105 .33432 .43343 ai 2.4461 2.9063 3.1936 3-7519 4.4428 4.9481 5.4104 6.2055 6.9079 8.3401 6 2 .03650 . 0 6 0 2 8 . 0 7 1 3 2 . 0 9 9 5 8 . 1 1 8 2 4 . 1 4 6 0 7 . 1 6 9 3 4 . 2 1 1 2 3 . 2 4 8 9 8 . 3 2 9 4 2 ca 2.0273 2.3905 3-0889 4.1082 7.1112 8.2502 9-5778 11-703 13-824 16.643 ga .02868 .04265 .05085 .07133 -09463 .11296 .13415 .16359 .19152 .24471 as 3.3611 5.8284 7-7133 9.5867 12.875 15.840 17.490 23.615 27.607 36.089 200 Table 5.6. Least-squares estimates of a and g of the model approximating the percentiles of the exponential P-P probability plot correlation coefficient Significance levels 0,001 0.005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 B .32065 .41448 .47839 .58998 .58370 .75315 .83799 .93214 1.1118 1.2510 a .55736 .64635 .43655 -.0160 .87806 1.6374 ,83343 2.8499 -.9612 3.4117 Table 5.7. Least-squares estimates of a and g of the model approximating the percentiles of the exponential Q-Q probability plot correlation coefficient r^ Significance levels 0.001 0.005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 .02622 .04012 .05136 .07404 .10533 .13352 .15860 .20471 .24945 .33344 ai 2.3535 2.8207 3.1308 3.7591 4.4394 4.9942 5.4824 5.3478 7.1232 8.684] 62 .02752 .03733 .04167 .05559 .07349 .09408 .10549 .14979 .17713 .22381 az 2.2289 3.1243 3-9495 5.4723 7.2355 8.1551 9.9034 10.347 12.697 18.034 Ba .03325 .04135 .04745 .05590 .07100 .08423 .09479 .11979 .13995 .18285 as .30189 1.8551 2.7443 5.6312 8.5513 10.387 12.869 17.029 21.473 27.528 201 Table 5.8. Least-squares estimates of a and g of the model approximating the percentiles of the exponential (unknown scale parameter) P-P probability plot correlation coefficient Significance levels 0.001 0.005 0.010 0.025 0.050 0.075 0.100 0.150 0.200 0.300 6 .33175 . 4 0 7 5 8 . 4 7 8 5 0 .57639 .69736 .75319 .83792 .93223 1.1083 1.2609 a -.1434 1.1053 .48854 .48333 -.0942 1.6737 .82770 2.9445 -.7921 3.3480 By fitting three different straight lines through the transformed percentiles for three separate ranges of the sample sizes, better approximation of the percentiles of the r^ statistic was obtained. pairs of estimates The (02,62) and (03,83) in Tables 5.3, 5.5 and 5.7 are for the following ranges of sample sizes [5,100], [101,200] and [201,1000] respectively. The percentiles of the or r^ statistic can be approximated using the a and g values listed in Tables 5.2 - 5.8 and the formula: Xp = 1 - l/(gn + a) . (5.5) Note that Table 5.7 for the r^ statistic is used for both exponential cases, scale and location parameters unknown, and scale parameter unknown. These models provide very accurate estimates of the percentiles of the k^ and r^ statistics. The smoothed Monte Carlo percentiles and those computed using the model (5.5) are tabulated in Tables 5.9 - 5.15. 202 Table 5.9. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the normal P-P probability plot correlation coefficient Significance levels 0.010 0.001 ^M.C. model n M.C. model 0.050 M . C . model 0.1000 0. 3 0 0 0 M . C . model M.C. model 5 .59 .64 .69 .66 .79 .78 .83 .84 .90 .90 10 .75 .78 .83 .82 .88 .88 .90 .91 .94 .94 20 .87 .88 .91 .91 .94 .94 .95 .95 .97 .97 50 .950 .948 .964 .964 .975 .975 .979 .979 .986 .986 100 .974 .973 .982 .982 .987 .987 .990 .990 .993 .993 150 .982 .982 .988 .988 .994 .994 .993 .993 .995 .995 200 .9868 .9866 .9911 .9909 .9937 . 9 9 3 6 .9947 . 9 9 4 7 .9965 . 9 9 6 5 300 .9915 .9910 .9940 .9940 .9957 .9957 .9965 . 9 9 6 5 .9977 .9976 400 . 9 9 3 4 .9932 .9954 .9955 .9968 .9968 .9974 . 9 9 7 3 .9983 .9982 500 .9946 .9946 .9964 .9964 .9974 .9974 .9979 . 9 9 7 9 .9986 .9986 500 .9956 .9955 . 9 9 6 9 .9970 .9978 .9978 .9983 .9982 .9988 .9988 700 . 9 9 6 2 .9961 .9974 . 9 9 7 4 .9981 .9982 . 9 9 8 5 .9985 . 9 9 9 0 .9990 800 .9966 . 9 9 6 6 .9977 .9977 .9984 .9984 . 9 9 8 7 .9987 .9991 .9991 900 .9970 .9970 .9980 .9980 .9986 .9986 .9988 .9988 .9992 .9992 1000 .9972 .9973 .9982 .9982 .9987 .9987 .9989 .9989 .9993 .9993 a Monte Carlo . 203 Table 5.10. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the normal Q-Q probability plot correlation coefficient r^ Significance levels 0.001 ^.C. model n 0.010 M.C. model 0.050 M.C. model 0. 1000 M.C. model 0.3000 M.C. model 5 .59 .53 .67 .68 .77 .79 .81 10 .56 ,65 .76 .77 .84 .84 .87 20 .77 .78 .85 .85 .90 .90 .92 .83 .88 .92 50 .897 .895 .929 .928 .951 .950 1 00 .943 .943 .961 .961 .973 1 50 .960 .960 .973 .973 .981 .973 .981 .960 .978 .984 .959 .978 .984 200 .9683 .9580 .9790 .9793 300 .9799 .9796 400 .9849 .9847 .9879 .9879 .9861 .9851 .9900 .9899 .9915 .9915 .9892 .9893 .9922 .9922 .9934 .9934 500 .9879 .9877 .9914 .9913 600 .9897 700 .9911 .9937 .9937 .9946 .9946 .9963 .9963 .9897 .9927 .9926 .9947 .9947 .9955 .9955 .9969 .9969 .9912 .9937 .9936 .9954 .9954 .9961 .9961 .9973 .9973 800 .9923 .9923 .9944 .9944 .9960 900 .9932 .9931 .9950 .9950 .9964 1000 .9938 .9938 .9954 .9955 .9967 a Monte Carlo. .9854 .9854 .89 .90 .92 .92 .95 .95 .974 .974 .986 .989 .985 .989 .9918 .9918 .9943 .9942 .9955 .9955 .9960 .9965 .9965 .9976 .9976 .9964 .9969 .9969 .9979 .9979 .9967 .9972 .9972 .9981 .9981 204 Table 5.11. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the Gumbel P-P probability plot correlation coefficient Significance levels 0.001 ^M.C. model n 0.010 M.C. model 0.050 M.C. model 0.1000 0.3000 M.C. model M.C. model 5 .59 .57 .70 .71 .80 .82 .84 .87 .90 .91 10 .76 .76 .84 .84 .89 .90 .91 i92 .94 .94 20 .88 .88 .92 .92 .94 .94 .95 .95 .97 .97 50 .950 .949 .965 .965 .975 .980 .980 100 .975 .974 .982 .983 .982 .988 .988 .992 .990 150 .982 .988 .976 .988 .992 i990 .993 .987 .993 .995 .987 .993 .995 200 .9869 .9871 .9910 .9912 .9938 .9937 .9949 .9947 300 .991 4 .9914 .9942 .9941 .9959 .9965 .9965 400 .9934 .9935 .9956 .9956 .9958 .9969 .9968 .9974 .9973 .9966 .9965 .9977 .9977 .9983 .9982 500 .9947 .9948 .9965 .9965 .9975 .9975 .9979 500 .9957 .9957 .9971 .9970 .9979 .9979 700 .9963 .9963 .9975 .9975 .9982 .9982 .9979 .9982 .9982 .9987 .9987 .9986 .9986 .9988 .9988 .9990 .9990 800 .9968 .9968 .9978 .9978 .9984 .9984 .9987 .9987 .9991 900 .9971 .9971 .9980 .9980 .9986 .9986 .9982 .9987 .9987 .9988 .9988 .9989 .9989 1000 .9974 .9974 a Monte Carlo . .9982 .993 .9991 .9992 .9992 .9993 .9993 205 Table 5.12. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the Gumbel Q-Q probability plot correlation coefficient r^ Significance levels n 0.001 0.010 ^M.C. model M.C. model 0.050 M.C. model 0,1000 0.3000 M.C. model M.C. model 5 .54 .62 .67 .72 .76 .81 .81 .85 .88 .90 10 .64 .64 .73 .75 .82 .83 .86 .91 .92 20 .68 .68 .79 .79 .87 .87 .90 .87 .90 .94 .94 50 .764 .758 .859 .856 .920 .918 .941 .940 100 .824 .947 .950 .962 .964 .868 .904 .928 .906 150 .828 .867 .927 .959 .960 .971 .971 200 .8911 .8928 .9424 .9674 300 .91 66 .9164 .9758 400 .9324 .9803 .9773 .9770 .9878 .9879 .9828 .9827 .9909 .9909 .9860 .9859 .9926 .9925 500 .9424 600 .9565 .9681 .9758 .9326 .9422 .9567 .9645 .9644 .9805 .9435 .9703 .9698 .9835 .9834 .9516 .9514 .9740 .9738 700 .9576 .9573 ^9769 800 .968 .967 .980 .980 .985 .985 .9856 .9856 .9882 .9882 .9898 .9898 .9946 .9769 .9873 .9874 .9909 .9910 .9952 .9952 .9623 .9620 .9793 .9793 .9887 .9887 900 .9656 .9657 .9812 .9813 1000 .9687 .9688 .9829 .9829 .9898 .9898 .9907 .9907 .9920 .9920 .9957 .9957 .9928 .9928 .9961 .9961 .9934 .9934 .9964 .9964 a Monte Carlo. .9937 .9937 .9945 206 Table 5.13. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the exponential P-P probability plot correlation coefficient Significance levels 0.001 ®M.C. model n 0.010 M.C. model 0.050 0.1000 0.3000 M.C. model M.C. model M.C. model 5 .58 .54 .68 .65 .78 .77 .82 10 .72 .73 i8l .81 .87 .87 .89 .80 .89 20 .85 .86 .90 .90 .93 .93 .94 .94 50 .941 .940 .959 .972 .971 .977 100 .970 .969 .979 .986 150 .980 .979 .959 .979 .986 .986 .990 .986 .990 .988 .992 .977 .988 .992 200 .9845 .9845 .9899 .9896 .9928 .9927 .9941 .9941 .9961 300 i9930 .9931 .9951 .9951 .9960 .9960 .9974 .9974 400 .9888 .9897 .9922 .V922 .9948 .9948 .9964 .9964 .9970 .9970 .9981 500 .9938 .9938 .9958 .9958 600 .9949 .9948 700 .9956 .9956 ' .9970 .9970 800 .9961 .9961 900 1000 .89 .93 .96 .97 .985 .985 .90 .94 .992 .992 .995 .995 .9961 .9980 .9984 .9987 .9987 .9989 .9989 .9971 .9971 .9976 .9976 .9979 .9979 .9980 .9980 .9982 .9982 .9983 .9983 .9985 .9974 .9974 .9982 .9982 .9985 .9985 .9990 .9965 .9965 .9977 .9977 .9984 .9984 .9987 .9987 .9991 .9959 .9969 .9979 .9979 .9985 .9985 .9988 .9988 .9992 .9992 a Monte Carlo. .9965 .9965 .9990 .9991 207 Table 5.1Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the exponential Q-Q probability plot correlation coefficient r^ Significance levels 0.010 0.001 ®M.C. model n M.C. model 0.050 M.C. model 0.1000 0.3000 M.C. model M.C. model 5 .49 .60 .65 .70 .76 .80 .81 .84 .88 10 .65 .62 .72 .72 .81 .82 .85 .86 .91 .90 .92 20 .66 .65 .76 .76 .85 .85 .89 .88 .94 .93 50 .730 .728 .828 .825 .899 .897 .927 .925 .961 100 .802 .799 .876 .879 .931 .934 .951 .953 150 .840 .843 .901 .902 .944 .945 .962 .961 .962 .975 .981 200 .8680 .8706 .9187 .9186 .9540 .9544 300 .9048 .9027 .9436 .9411 .9664 400 .9287 .9265 .9541 .9540 .9735 500 .9378 .9409 .9609 .9622 .9776 .9773 600 .9490 .9506 .9676 .9680 .9805 .9805 700 .9566 .9576 800 .9622 900 .9672 1000 .9710 .9628 .9669 .9702 a Monte Carlo. .9722 .9779 .9780 .9803 .9801 .981 .9677 .9839 .9841 .9665 .9758 .9758 .9878 .9844 .9729 .9808 .9803 .9878 .9879 .9722 .9828 i9828 .9754 .9754 .976 .9672 .9835 .9834 .9916 .9916 .9858 .9857 .9928 .9927 .9874 .9874 .9937 .9936 .9886 .9887 .9942 .9862 .9862 .9898 .9898 .9948 .9873 .9874 .9907 .9907 .9952 .9848 .9847 .9943 .9948 .9952 208 Table 5.15. Comparison between the smoothed Monte Carlo percentiles and those computed from the model for the exponential (unknown scale parameter) P-P probability plot correlation coefficient Significance levels n 0.001 0.010 0.050 0.1000 0.3000 ^M.C. model M.C. model M.C. model M.C. model M.C. model 5 .58 .34 .68 .55 .78 .71 .82 .80 .89 .90 10 .72 .68 .81 .81 .87 .85 .89 i89 .93 .94 20 .86 .85 .90 .90 .93 .93 .94 .94 .96 .95 50 .942 .939 .971 .971 .977 .977 100 .970 .970 1 50 .979 .979 .959 .959 .979 .979 .986 .986 .986 .990 .986 .990 .988 .992 .988 .992 .985 .985 .992 .992 .995 .995 200 .9844 .9849 .9900 .9896 .9928 .9928 .9941 .9941 .9961 .9951 300 .9889 .9899 .9930 .9931 .9952 .9952 .9960 .9960 .9974 .9974 400 .9922 .9925 .9948 .9948 .9964 .9970 .9970 .9981 .9980 500 .9940 .9940 .9958 .9976 .9975 500 .9950 .9950 .9965 .9965 700 .9957 .9957 .9970 .9970 .9971 .9971 .9976 :9976 .9979 .9980 .9985 .9984 .9987 .9987 .9989 .9989 800 .9962 .9974 .9974 .9982 .9985 .9985 900 .9967 .9962 .9966 .9977 .9977 .9984 .9970 .9970 .9979 .9979 1000 a Monte Carlo, .9958 .9964 .9982 .9984 .9986 .9986 .9980 .9980 .9983 .9983 .9990 .9990 .9987 .9987 .9991 .9991 .9988 .9988 .9992 .9992 209 VI. SUMMARY AND RECOMMENDATIONS The interesting problem of assessing the fit of probability models to data is investigated in this dissertation. A new statistic which is based on the Pearson correlation coefficient of points on a P-P probability plot is presented. The statistic measures the linearity of a P-P (percent versus percent) probability plot. A small value of the k^ statistic indicates nonlinearity of the P-P probability plot and suggests that hypothesized probability model should be rejected. Two random samples were generated from two different distributions and are listed in Tables 5.1 and 6.2. The identities of these two distributions will be revealed at the end of the discussion. data sets 1 and 2. These two random samples will be referred to as Figures 6.1 and 6.2 are the normal P-P probability plots of these two data sets. A probability plot provides an excellent and informative tool for assessing the goodness of fit of probability models to data set. However, decisions based on the probability plot alone are subjective and can be difficult to make when the sample size is small or the alternative distribution is similar to the hypothesized distribution. The use of the reduces the subjectivity. statistic with the P-P probability plot For the example, the statistic is 0.955 for the probability plot in Figure 5.1 and 0.991 for the probability plot in Figure 5.2. The approximate 0.05 level percentile of the k^ statistic can be computed using formula (5.5) with g = 0.77333, ot = 0.74189 from Table 5.2, with n = 50. percentile is 0.975. The value of the 0.05 level The normal probability model is rejected for data 210 set 1 since 0.955 is less than 0.975. As for data set 2, there Is no evidence that the normal probability model is not appropriate. Table 6.1. Data set 1 1.6953 0.4555 0.5573 1.3159 1.3785 1.7468 0.2049 0.2476 0.4204 2.4226 0.3470 0.2273 0.6521 0.5347 0.8718 0.7238 0.3673 0.0819 1.2704 Table 6.2. 0.7707 -0.0158 -0.3762 -1.1868 0.6096 -0.1295 0.7417 0.6641 1.6197 0.0765 0.9893 1,,0351 2,,2661 0.,9978 0, .8975 0..0143 0,,2356 0.,9973 0.3306 0.2262 0.4336 0.8386 1.4142 1.2112 0.0011 0.4646 0.0578 0.4962 0.0390 0.6355 0.2429 3.6785 2.4692 0.2154 0.3001 1.7663 1.8008 1.0928 0.4558 1.3620 1.2612 Data set 2 0.2647 -0.6092 0.1647 -0.9696 0.0523 -1.1886 -1.5860 0.0865 1.1788 2.0548 -0.4029 -0.2187 1.1428 -1.4062 0.6113 0.5263 1 .1379 2.1265 -0.7022 0.0054 1.4275 0.8059 -2.0366 0.5006 1.4001 -1.4730 1.2081 0.1305 0.5624 -0.5945 0.2526 -0.0430 0.9683 0.0635 -1.2999 -0.1397 1.3533 2.3415 -0.0308 0.5219 Suppose one is interested in fitting a probability model to data set 1. The diagonal line on the P-P probability plot and the statistic enable one to make a decision about whether a normal probability model is appropriate. The statistic provides little information concerning an alternative probability model for the data set when the normal probability model is deemed inappropriate. However, the 21 1 oOo o- rn o - O- OO- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential Laplace Cauchy Gumbel Figure 6.1. Normal P-P probability plot 0.9 1.0 212 O o- cu O- Oo- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Uniform probabilities exponential Laplace Cauchy Gumbel Figure 6.2. Normal P-P probability plot 0.9 1.0 213 shape or curvature of the probability plot provides valuable information concerning an appropriate alternative probability model. Figure 6.1 suggests that the underlying probability model is skewed since the plot does not pass through the point (0.5,0.5). A new qualitative technique based on the distribution curves on a P-P probability plot was developed in Chapter III. Several distribution curves are displayed on the normal p-p probability plot. The diagonal line is the "curve" corresponding to the normal distribution. Based on the distribution curves on the normal P-P plot in Figure 5.1, an exponential probability model seems to provide a good fit to data set 1 since the points fell roughly along the exponential distribution curve. An exponential P-P probability plot can then be constructed for further examination of the data set. The exponential P-P probability plot for data set 1 is displayed as Figure 6.3. The plot is roughly a straight line suggesting that the exponential probability model is appropriate. The statistic for the exponential probability plot is 0.990 and the 0.05 significant level percentile computed using formula (5.5), B = 0.68370 and a = 0.87806 from Table 5.6, is 0.971. Since 0.990 is greater than 0.971, the exponential probability model is not rejected. Data set 1 was in fact generated from the exponential distribution using RANEXP(26719^7) and data set 2 was generated from the normal distribution using RANN0R(9113783) (SAS Inc., 1982). A statistic, r^, based on the Pearson correlation coefficient of points on a Q-Q (quantile versus quantile) probability plot was also developed. The r^ statistic is the Shapiro-Francia statistic using the o OOXi o - 06 .@4 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Uniform probabilities normal Laplace Cauchy Gumbel Figure 6.3. Exponential P—P probability plot 1.0 215 Weibull plotting position i/(n+1). indicated that the the The power study in Chapter IV and r^ statistics have good power for detecting a wide range of alternative distributions. The r^ statistic is very powerful in detecting alternative distributions with long or heavy tails and the statistic is more sensitive to deviations occurring in the central region of the hypothesized distribution. A computer program can be easily written to supply a percentile at a particular significant level given the sample size, using formulas developed in Chapter V. For those significant levels not listed in Table 5.2, linear interpolation can be used to compute the percentile. Also, a computer program can be written to supply an approximate p-value given a k^ or r^ value and the sample size. The intuitively easy concept of the Pearson correlation coefficient for measuring the linearity of a probability plot, good power of the k^ and the r^ statistics and the easy computer implementation of the k^ and r^ statistics for sample sizes 5 to 1000, make the and r^ statistics very valuable tools for assessing goodness of fit. The joint use of the P-P and Q-Q probability plots and the k^ and r^ statistics is a very powerful combination for determining probability models for data. 6.3 contains the 0.05 level percentiles of the normal, Gumbel and exponential distributions. important feature of the Table statistic for the One attractive and statistic is that the percentiles are similar for different hypothesized probability models. The percentiles 216 of the statistic for the normal probability model can thus be used for testing any other hypothesized location-scale probability model. This attractive feature is not possessed by the r^ statistic. Table 6. H contains the 0.05 level percentiles of the r^ statistic for the normal, Gumbel and exponential distributions. Table 6.3. 0.05 level percentiles of the selected sample sizes Sample sizes 0.88 0.975 0.987 0.9957 0.9974 0.9987 10 50 100 300 500 1000 Table 6.4. Gumbel exponential 0.90 0.976 0.988 0.87 0.971 0.9958 0.9951 0.9975 0.9987 0.9985 0.986 0.9971 0.05 level percentiles of the r^ statistic at selected sample sizes Sample sizes 10 50 1 00 300 500 1000 normal statistic at normal 0.84 0.950 0.973 0.9899 0.9937 0.9967 Gumbel 0.83 0.918 0.950 0.9758 0.9834 0.9907 exponential 0.82 0.897 0.934 0.9665 0.9776 0.9874 217 The Pearson chi-square and likelihood ratio statistics can be regarded as the most well known among all the goodness-of-fit statistics, however the extensive power study in Chapter IV indicated that this class of statistics is generally not as powerful as the other statistics studied in this dissertation. Slight modifications of the Pearson chi-square and likelihood ratio statistics like the Rao-Robson statistics (Rao and Robson, 1974) or the power divergence statistics (Cressie and Read, 1984) have power similar to that of the Pearson chi-square or the likelihood ratio statistic. In fact, the Monte Carlo power comparison in Rao and Robson (1974) showed that the improvement of the Rao-Robson statistics over the Pearson chi-square statistics is quite small. In addition to the relatively weak performance of the Pearson chi-square and likelihood ratio statistics, the problem of selecting the best choice of expected cell counts makes the application of the Pearson chi-square and likelihood ratio statistics less attractive. Based on the extensive Monte Carlo power comparison in Chapter IV, the following rule for the best choice of expected cell counts is recommended: expected cell counts of 3, 5 and 8 for sample sizes 20, 50 and 100 respectively. Offsetting the shortcomings of the Pearson chi-square and the likelihood ratio statistics, are certain attractive features. The Monte Carlo percentiles of the Pearson chi-square and likelihood ratio statistics are quite stable across null hypotheses. The Monte Carlo percentiles are presented in Tables 6.5 - 218 Table 6.5. Percentiles of the Pearson chi-square and likelihood ratio statistics used in the empirical power comparison for the testing of departures from the normal distribution Significance levels StatisticsX n a = 0.1 20 50 x'/. 48.0 XI X3 X5 X10 X7/X13/X17 24.0 8.0 3.6 G'/z G1 G3 G5 G10 G7/G13/G17 Table 6.5. 2.5 45.4 27.0 9.47 3.94 2.36 a = 0.05 100 114.0 60.0 21.4 1 2.4 5.4 3.9 108.5 65.5 23.8 13.1 5.61 224.0 114.0 24.8 12.4 6.92 213.2 125.7 44;3 26.5 12.7 3.98 6.98 39.9 20 50 a = 0.01 100 52.0 122.0 232.0 28.0 64.0 120.0 9.4 24.1 43.9 5.2 14.4 27.6 7.0 14.5 3.1 5.2 8.48 46.5 111.4 217.7 29.1 69.0 131 .4 11.1 26.6 48.2 5.87 15.4 29.6 7.12 15.0 3.53 5.31 8.56 20 50 100 50.0 134.0 248.0 34.0 72.0 132.0 13.6 28.9 51.1 7.5 18.8 33.6 9.8 19.0 5.2 7.75 12.1 50.9 117.0 225.3 33.5 74.7 139.7 15.4 31 .8 55.2 7.75 20.5 35.5 10.5 19.5 5.99 7.78 12.1 Percentiles of the chi-square and the likelihood ratio statistics used in the empirical power comparison for the testing of departures from the Gumbel distribution Significance levels StatisticsX n X'/z XI X3 X5 X10 X7/X13/X17 GV, G1 G3 G5 G10 G7/G13/G17 a = 0.1 20 50 a = 0.05 100 48.0 114.0 220,0 24.0 50.0 114.0 8.0 20.7 42.6 4.0 12.4 24.8 5.4 12.4 6.8 2.5 3.92 45.4 108.5 212.8 27.0 65.2 125.7 9.47 23.5 45.8 3.97 13.3 25.5 5.71 12.6 2.35 3.98 5.94 20 50 a = 0.01 100 52.0 122.0 228.0 28.0 64.0 120.0 9.4 23.4 45.5 5.2 14.4 27.6 6.8 14.2 3.1 5.2 8.36 46.5 111.3 216.5 29.1 58.1 131 .0 11.1 25.5 51.4 5.87 15.5 29.5 7.02 14.5 3.53 5.33 8.55 20 50 100 60.0 134.0 244.0 34.0 72.0 132.0 13.6 29.6 55.7 7.6 18.8 33.2 9.6 18.5 5.2 7.92 12.2 50.9 115.8 224.3 33.3' 74.0 139.4 15.2 32.3 60.8 7.95 20.1 35.5 10.3 19.0 6.21 8.21 12.4 219 Table 6.7. Percentiles of the Pearson chi-square and likelihood ratio statistics used in the empirical power comparison for the testing of departures from the exponential distribution a = 0.1 Significance levels Statistics\ n 20 X'/, 48.0 XI X3 X5 X10 X7/X13/X17 G'/, G1 G3 G5 G10 G7/G13/G17 26.0 9.4 4.8 3.1 45.4 28.1 10.6 5.13 3.53 50 114.0 60.0 22.1 13.2 6.4 4.9 109.2 66.2 24.6 14.2 6.51 5.05 a = 0.05 100 224.0 116.0 41 :2 25.6 13.2 8.0 213.9 127.6 45.6 26.5 13.7 8.12 20 50 52.0 122.0 28.0 64.0 10.8 24.8 6.0 15.2 7.6 4.3 6.16 47.1 111.9 29.8 69.7 12.3 27.5 6.49 16.5 8.08 4.05 6.22 a = 0.01 100 232.0 122.0 45.2 28.4 15.2 9.56 217.7 131 .6 49.4 30.0 15.8 9.87 20 50 60.0 1 34.0 34.0 74.0 14.3 30.2 8.4 20.0 10.6 9.2 6.7 50.9 117.9 33.8 75.4 16.1 33.2 10.6 21 .4 11.4 6.88 9.55 100 248.0 134.0 52.5 34.0 19.6 13.2 225.3 140.6 56.4 35.9 20.6 13.7 The closeness of these percentiles from the normal distribution to the exponential distribution suggests that the percentiles of the Pearson chi-square and likelihood ratio statistics are approximately distribution free. Tables 6.8 and 6.9 contain the empirical type I error levels for the Pearson chi-square statistic when the percentiles of the chi-square distribution with degrees of freedom equal to the number of cells less three were used. These empirical type I errors were computed from 5000 Monte Carlo samples, for the testing of departures from normality. 220 Table 5,8. Empirical Type I error of the Pearson chi-square statistics when the 0,01 level percentiles were used (based on 5000 Monte Carlo samples) Expected cell counts Number of cells 1/2 5 10 ,0095 15 1 3 5 .0084 .011 4 .0130 .0085 .0104 .0108 .0074 .0110 .0118 20 .0155 .0115 .0142 .0100 iJO .0155 ,0125 .0108 .0104 50 .0125 ,0124 .0104 .0128 80 .0118 .01 20 .0094 .0098 1 00 .0110 ,0112 .0104 .0084 1 20 .0178 .0145 .0104 .0112 1 40 .0104 ,0095 .0112 .0082 1 50 .0115 ,0112 ,0104 .0104 1 80 .0124 .0138 .01 25 .0084 200 .011 4 .0128 .0098 .0092 300 .0132 ,0118 ,0118 .0095 400 .0118 .0108 .0086 .0100 500 .0130 .0110 .0103 .0075 221 Table 6.9. Empirical Type I error of the Pearson chi-square statistics when the 0.05 level percentiles were used (based on 5000 Monte Carlo samples) Expected cell counts Number of cells 1/2 5 10 .0488 15 1 3 5 .1398 .0746 .0762 .0276 .0444 .0534 .0476 .0586 .0534 20 .0370 .0582 .0518 .0572 40 .0354 .0456 .0492 .0522 60 .0442 .0562 .0476 .0576 80 .0586 .0494 .0460 .0472 100 .0546 .0492 .0550 .0502 1 20 .0624 .0610 .0534 .0440 140 .0564 .0500 .0538 .0430 160 .0552 .0456 .0504 .0446 180 .061 6 .0506 .0484 .0470 200 .0544 .0538 .0488 ,0446 300 .041 4 .0490 .0522 .0457 400 .0510 .0476 .0430 .0450 500 .0570 .0514 .0510 .0535 222 Table 5.10. Empirical Type I error of the likelihood ratio statistics when the 0.05 level x^._g percentiles were used (based on 5000 Monte Carlo samples) Expected cell counts Number of cells 1/2 5 10 .0095 15 1 3 5 .1398 .1185 .0884 .0556 .0864 .0752 .0890 .1042 .0738 20 .0084 .1106 .0962 .0778 40 .0073 .1256 .1186 .081 4 50 .0036 .1488 .1360 .1042 80 .0028 .1824 .1494 .0950 100 .0030 .2376 .1580 .1012 120 .0046 .2650 .1910 .1040 140 .0032 .3030 .2080 .1048 160 .0028 .3344 .2086 .1126 180 .0032 .3640 .221 2 .1132 200 .0024 .4150 .2476 .1094 300 .0040 .5624 .3032 .1398 400 .0022 .5922 .3552 .1580 500 .0026 .7860 .4180 .1890 223 The asymptotic theoretical results provided by Watson (1957, 1958) and Roy (1956) suggest that the asymptotic distribution of the Pearson chi-square statistic for the random cell case, is stochastically larger than the chi-square distribution with k-3 degrees of freedom. However, the empirical Type I error levels achieved are close to the specified Type I error levels. Hence, the Pearson chi-square statistic together with the percentiles from the chi-square distribution can be used for the testing of the fit of general distributions to data. Table 6.10 contains the empirical Type I error levels for the likelihood ratio statistic when the percentiles of the chi-square distribution with degrees of freedom equal to the number of cells less three were used. The percentiles from the chi-square distribution do not provide a good approximation for the likelihood ratio statistic, as was also noted by Koehler and Larntz (1980). Table 6.11. 0.05 level percentiles of the Anderson-Darling statistic Sample sizes normal Gumbel exponential 20 0.822 0.737 1.946 50 0.971 0.740 1.567 100 0.786 0.727 1.468 The extensive power comparison showed that the class of statistics based on the empirical distribution function, especially the 224 Anderson-Darling, Watson and Cramer-von Mises statistics, have very good power in detecting a wide range of alternative distributions. The Kuiper and the Kolraogorov-Smirnov statistics have moderately good power. Also, the relative performance of the statistics based on the empirical distribution function, except for the statistic, is quite consistent from the normal null distribution to the exponential null distribution. Table 5.11 contains the 0.05 level percentiles of the A^ statistic for normal, Gumbel and exponential distributions. The percentiles of the statistic vary from one distribution to the other. This is one drawback of the A^ statistic since new Monte Carlo percentiles must be generated for the testing of different hypothesized distributions. The percentiles of the other statistics based on the empirical distribution function also vary from one distribution to the other. The statistics based on the moments can be very useful if used with care. It is recommended that a histogram be constructed when the skewness, kurtosis or rectangle test is used. This will avoid the problem of accepting the hypothesized probability model for a random sample from a distribution with shape different from that of the hypothesized distribution, but with skewness and kurtosis similar to those of the hypothesized distribution. The skewness test can be very weak for alternative distributions with skewness close to that of the hypothesized distribution. Similarly, the performance of the kurtosis test can be poor for alternative distributions with kurtosis measure close to that of the hypothesized distribution. On the contrary, the rectangle test can detect both kinds of departures from the hypothesized 225 probability model. In addition, the rectangle test usually has power comparable to the power of the better one from the skewness or kurtosis test. The tests based on moments performed better when the skewness and kurtosis of the hypothesized distribution are small. Some ideas for improving the power of the test of fit based on the P-P probability plot were developed during the course of this study. The shapes of the distribution curves on P-P probability plots suggest comparing the fit of a quadratic or cubic polynomial to the fit of a straight line through the points (0,0) and (1,1). This is similar to the suggestion made by LaBrecque (1977) for the normal Q-Q probability plot. However, the Monte Carlo study performed by LaBrecque indicates that the improvement in the power is small. The k^ statistic developed in Chapter II measures how closely the points lie along an unspecified straight line. However, for the P-P probability plot, the line should pass through the points (0,0) and (1,1), so that the following statistic [^(z. - 0.5)(p. - 0.5)]= kg = , (5.1) I(z. - 0.5): I(p. - 0.5): which measures how closely the points lie along a straight line through the points (0,0) and (1,1) may be more powerful than the statistic. The ko statistic is related to the k^ statistic through kg = kV[l + {n(i - 0.5):}/{I(z. - 5):}] . Hence, kg statistic will be close to (6.2) when z is close to 0.5 which occurs for symmetrical alternatives to symmetrical null hypotheses. 226 Consequently, k§ should be more powerful for detecting skewed alternatives to symmetrical null distributions. Since the distribution curves on the Gumbel or exponential P-P probability plots do not pass through (0.5,0.5), the k§ statistic will be more powerful for these alternative distributions. In light of the good performance of the Shapiro-Wilk statistic for the testing of normality, it seems reasonable to consider the corresponding statistic for the P-P probability plot. F([X^-a]/B) and t. = Ci/(n+1) - 0.5]. Let = Assume the case when no parameters are estimated to obtain F(*), then , Z^, ...,Z^ is an ordered random sample from the uniform (0,1) distribution. From David (1970, p. 28), E(Z.) = i/(n+1) , and min(i,j ; (6.3) Cov(Z.,Zj) = n+2 n+1 n+1 n+1 The entire covariance matrix can be written as V n n-1 n-2 = n n+2 n-1 2(n-1) 2(n-2) n+1 Then from Graybill (1969, pp. 181, 182), n-2 2(n-2) 3(n-2) (6.4) 227 -1 2 -1 0 (n+1)(n+2) -1 2 -1 0 -1 2 0 0 -1 0 0 0 . . . . . 0 . 0 . 0 (6.5) • • 2 - 1 0 2 -1 -1 2 The generalized least-squares estimators for the location and scale parameters, a and g are —1 »1 —1 (a, B)' = (T'V^^T) ^ T'V^^Z , where T' = 1 S 1 ^2 1 (6.6) = (1, t)' , or (Z^ + Z g ) / ^ , (6.7) and 6 = (Z^ - Zi)/(Pn - P,) , (6.8) where = i/(n+1 ). g is an estimator of 1 which is twelve times the variance of the U(0,1) random variable, when the null hypothesis is true. Using ^(Z^-Z)^/(n-1) as an estimator of the variance of the U(0,1) random variable, the ratio of these estimators yields the Shapiro-Wilk statistic for the uniform P-P probability plot which is 228 k2 ^ " n(n+1)(Z - Z )2 n ] . 12(n-1) I(Z. - Z): (6.9) 1 This statistic places heavy emphasis on the two extreme points and so can be expected to be weak if the P-P probability plot of the alternative distribution passes through the points (0,0) and (1,1). Since the straight line in the P-P probability plot passes through the points (0,0) and (1,1), the test E(Z^) = i/(n+1) is more appropriate than testing the fit of an arbitrary line. testing A statistic for is Ki - V-' <Z„-P„) (6.10) (n+1)(n+2) I t ^ i ^ l - Z ^ - 1 /(n+ 1 ) ] : where P^ = (1/(n+1),2/(n+1),...,n/(n+1))•, Z^ = (Z^.Z^, and = 1. Z^ = 0 This statistic is based on the spacings of the elements of and is worthy of further consideration. This statistic was previously suggested as a test of fit by Irwin in the discussion of Greenwood (1946) and by Kimball (1947) who computed the moments of this statistic. 229 VII. REFERENCES Ahrens, J. H. and U. Dieter. 1974. Computer methods for sampling from gamma, beta, Poisson, and binomial distributions. Computing 12:223-246. Anderson, T. W. and D. A. Darling. 1952. Asymptotic theory of certain "goodness of fit" criteria based on stochastic process. Annals of Mathematical Statistics 23:193-212. Anderson, T. W. and D. A. Darling. 1954. A test of goodness of fit. Journal of the American Statistical Association 49:755-769. Anscombe, F. J. and William J. Glynn. 1983. Distribution of the kurtosis statistic bg for normal samples. Biometrika 70(1):227-234. Barnett, V. 1975. Probability plotting methods and order statistics. Applied Statistics 24:95-108. Barnett, V. . 1975. Convenient probability plotting positions for the normal distribution. Applied Statistics 25:47-50. Barton, D. E. and C. L. Mallows. 1965. Some aspects of the random sequence. Annals of Mathematical Statistics 36:236-250. Beasley, J. D. and S. G. Springer. 1977. Algorithm AS111, The percentage points of the normal distribution. Applied Statistics 26:118-121 . Benard, A. and E. C. Bos-Levenbach. 1953. The plotting of observations on probability paper (in Dutch). Statistica Neerlandica 7:153-173. Biomedical Computer Programs P-Series. California Press. 1979. Berkeley: University of Birnbaum, Z. W. 1952. Numerical tabulation of the distribution of Kolmogorov's statistic for finite sample size. Journal of the American Statistical Association 47:425. Blom, G. 1958. Statistical estimates and transformed beta variables. New York: John Wiley. Bofinger, Eve. 1973. Goodness of fit test using sample quantiles. Journal of the Royal Statistical Society 835:277-284. Bowker, Albert H. and Gerald J. Lieberman. 1972. Engineering Statistics. Second edition, Englewood Cliffs, N.J.: Prentice-Hall. Bowman, K. 0. 1973. Power of the kurtosis statistic, b^, in tests of 230 departures from normality. Biometrika 60(3):523-528. Bowman, K. 0. and L. R. Shenton. 1973. Notes on the distribution of /b^ in sampling from Pearson distributions. Biometrika 60(1):155-167. Box, G. E. P. and M. E. Muller. 1958. A note on the generation of normal deviates. Annals of Mathematical Statistics 28:610-611. Brent, R. P. 1974. A Gaussian pseudo-random number generator (G5). Communications of the ACM 17:70^-706. California State Department. Works Bull., 5. 1923. Flow in California Streams. Public Chandra, M., N. D. Singpurwalla and M. A. Stephens. 1981. Kolmogorov Statistics for test of fit for the extreme-value and Weibull distributions. Journal of the American Statistical Association 75:729-731. Chase, G. R. 1972. Chi-square test when parameters are estimated independently of the sample. Journal of the American Statistical Association 67:609-611. Chernoff, H. and E. L. Lehmann. 1954. The use of maximum likelihood estimates in the test for goodness of fit. Annals of Mathematical Statistics 25:579-586. Chernoff, H. and G. J. Lieberraan. 1954. Use of normal probability paper. Journal of the American Statistical Association 49:778-785. Chibisov, D. M. 1971. Certain chi-square type tests for continuous distribution. Theory of Probability and Its Applications 16:1-22. Cochran, W. G. 1952. The test of goodness-of-fit. American Statistical Association 47:315-345. Journal of the Cohen, A. and H. B. Sackrowitz. 1975. Unbiasedness of the chi-square, likelihood ratio and other goodness of fit tests for the equal cell case. The Annals of Statistics 3:959-964. Cramer, H. 1928. On the composition of elementary errors. Second paper: Statistical Applications. Skand. Aktuartidskr. 11:141-180. Cramer, H. 1946. Mathematical Methods of Statistics. Princeton University Press. Princeton, N.J.: Cressie, N. and T. R. Read. 1984. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society 46:440-463. Currie, L. D. 1980. The upper tail of the distribution of 231 W-exponential. Scandinavian Journal of Statistics. 7:1^7-1^9. D'Agostino, R. B. 1971. An Omnibus test of normality for moderate and large sample sizes. Biometrika 58:341-348. D'Agostino, Ralph B. 1973. Monte Carlo power comparison of the W and D tests of normality for n=100. Communications in Statistics 1:545-551. D'Agostino, Ralph and E. S. Pearson. 1973. Tests for departure from normality. Empirical results for the distributions of bg and /bj. Biometrika 50(3):6l3-622. D'Agostino, Ralph B. and Gary L. Tietjen. 1973. Approaches to the null distribution of /bj. Biometrika 50(1):159-173• Dahiya, R. C. and J. Gurland. 1972. Pearson chi-square test of fit with random intervals, Biometrika 59:147-153. Dahiya, R. C. and J. Gurland. 1973. How many classes in the Pearson chi-square test? Journal of the American Statistical Association 58:707-712. Daniel, Cuthbert. 1959. Use of half-normal plots in interpreting factorial two-level experiments. Technometrics 1:311-341. Darling, D. A. 1955. The Cramer-Smirnov test in the parametric case. Annals of Mathematical Statistics 25:1-20. Darling, D. A. 1957. The Kolmogorov-Smirnov, Cramer-von Mises tests. Annals of Mathematical Statistics 28:823-838. David, H. A. 1970. Order statistics. New York: John Wiley & Sons, Inc. Davidson, R. R. and W. E. Lever. 1970. The limiting distribution of the likelihood ratio statistic under a class of local alternatives. Sankhya 32:209-224. Doksum, Kjell. 1975. Plotting with confidence: two populations. Biometrika 63:421-434. Graphical comparisons of Donsker, M. D. 1952. Justisfication and extension of Doob's heusristic approach to the Kolmogorov-Smirnov theorems. Annals of Mathematical Statistics 23:277-281. Doob, J. L. 1949. Heuristic approach to the Kolmogorov-Smirnov theorems. Annals of Mathematical Statistics 20:393-403. Durbin, J. 1 9 7 3 a . Weak convergence of the sample distribution function when parameters are estimated. The Annals of Statistics 1 : 2 7 9 - 2 9 0 . 232 Durbin, J. 1973b. Distribution theory of tests based on the sample distribution function. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania. Durbin, J. 1975. Kolmogorov-Smirnov tests when parameters are estimated with applications to test of exponentiality and tests on spacings. Biometrika 62:5-22. Ferrell, E. B. 1958. Probability paper for plotting experimental data. Industrial Quality Control 15:1. Filliben, James J. 1975. The probability plot correlation coefficient test for normality. Technometrics 17(1):111-117. Fisher, R. A. 1924. The conditions under which measures the discrepency between observations and hypothesis. Journal of the Royal Statistical Society 87:442-450. Fishman, G. S. 1976. Sampling from the gamma distribution on a computer. Communications of the ACM 19:407-409. Fligner, M. A. and T. P. Hettmansperger. 1979. On the use of conditional asymptotic normality. Journal of the Royal Statistical Society B41:178-183. Can, F. F. 1985. Raw power comparison results and DSMCG. Unpulished manuscript. Department of Statistics, Iowa State University, Ames, Iowa. Gerson, Marion. 1975. The techniques and uses of probability plotting. The Statistician 24:235-257. Graybill, Franklin A. 1969. Introduction to matrices with applications in statistics. Belmont, California: Wadsworth. Greenwood, M. 1945. The statistical study of infectious diseases. Journal of the Royal Statistical Society A109:85-110. Gringorten, Irving I, 1963. A plotting rule for extreme probability paper. Journal of Geophysical Research 68:813-814. Gumbel, E. J. 1943. On the reliability of the classical chi-square test. Annals of Mathematical Statistics 14:253-263. Gumbel, E. J. 1964. Statistical theory of extreme values. D.C.: National Bureau of Standards. Washington Gurland, J. 1955. Distribution of definite and of indefinite quadratic forms. Annals of Mathematical Statistics 26:122-127. [Correction 33:813]. 233 Gurland, J. 1956. Quadratic forms in normally distributed random variables. Sankhya 17:37-50. Hacking, Ian. 1984. Trial by number. Science 84 5(9):59-70. Hahn, Gerald J. and Samuel S. Shapiro. 1957. engineering. New York: John Wiley. Halmos, P. R. 1950. Measure Theory. Statistical models in New York: Van Nostrand Company. Harter, H. L. 1961. Expected values of normal order statistics. Biometrika 48:151-165. Harter, H. Leon. 1980. Modified asymptotic formulas for critical values of the Kolmogorov test statistic. The American Statistician 34:110-111. Hawkins, D. M. outliers". 1977. Comment on "A new statistic for testing suspected Communication in Statistics 6:435-438. Hazen, A. 1914. Storage to be provided in the impounding reservoirs for municipal water supply. Transactions of the American Society of Civil Engineers 77:1547-1550. Hoaglin David C. and David F. Andrews. 1975. The reporting of computation-based results in statistics. The American Statistician 29(3):122-126. Hoist, L. 1972. Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika 59:137-145. Hoist, L. 1976. On Multinomial Sums. Technical Summary Report NO. 1629. Mathematics Research Center University of Wisconsin-Madison. Hutchinson, T. P. 1979. The validity of the chi-square test when expected frequencies are small: a list of recent research references. Communication in Statistics A8(4):327-335. IBM Personal Computer Plotting System. 1984. Programmer's guide and plot system language bindings. International Business Machines Corporation, Boca Raton, Florida. Ivchenko I. V. and Medvedev. 1978. Separable statistics and hypothesis testing. The case of small samples. Theory of Probability and Its Applications 23:764-775. Johnk, M. D. 1964. Erzeugung von betaverteilter and gamraaverteilter zufallszahlen. Metrika 8:5-15. 234 Johnson, Norman L. 1949. System of frequency curves generated by method of translation. Biometrika 36:149. Johnson R. A. and D. W . Wichern. 1982. Applied multivariate statistical analysis. Englewood Cliffs, New Jersey: Prentice-Hall. Kac, M., Kiefer, J. and Wolfowitz, J. 1955. On tests of normality and other tests of goodness-of-fit based on distance methods. Annals of Mathematical Statistics 26:189-211. Kambhampati, C. 1971. A chi-square statistic for goodness-of-fit tests. Thesis, Cornell University. Kane V. E. 1982. Standard and goodness-of-fit parameter estimation methods for the three-parameter lognormal distribution. Communications in Statistics 11:1935-1957. Kempthorne, 0. 1967. The classical problem of inference-goodness of fit. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1:235-249. Kendall, M. G. and A. Stuart. 2. London: Griffin. 1961. The advanced theory of statistics, Kennedy, William J. Jr. and James E. Gentle. 1980. Computing. New York: Marcel Dekker, Inc. Statistical Kimball, B. F. 1947. Some basic theorems for developing tests of fit for the case of the nonparametric probability distribution function, I. Annals of Mathematical Statistics 18:540-548. Kimball, Bradford F. I960. On the choice of plotting positions on probability paper. Journal of the American Statistical Association 55:546-560. Kinderman, A. J., J. F. Monahan and J. G. Ramage. 1977. Computer methods for sampling from Student's t distribution. Mathematics of Computation 31:1009-1018. King, James R. 1971. Probability charts for decision making. Industrial Press Inc. New York: Koehler Kenneth J. and Kinley Larntz. 1980. An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association 75(370):336-344. Kolmogorov, A. 1933. Sulla determinazione empirica di una legge di distribuzione. Giorn. 1st. Ital. Attuari. 4:83-91. Kotz, S. , N. L. Johnson and D. W. Boyd. 1967. Series representations of 235 distributions of quadratic forms in normal variables. case. Annals of Mathematical Statistics 3 8 : 8 2 3 - 8 3 6 . I. Central Kuiper N. H. 1959. Alternate proof of a theorem of Birnbaum and Pyke. Annals of Mathematical Statistics 30:251-252. LaBrecque, J. 1977. Goodness-of-fit tests based on non-linearity in probability plots. Technometrics 19:293-306. Lancaster, H. 0. 1969. The chi-squared distribution. Wiley and Sons, Inc. New York: John Larntz, K. 1978. Small-sample comparisons of exact levels for chi-square goodness-of-fit statistics. Journal of the American Statistical Association 73:253-263. Larsen, Ralph I., Thomas C. Curran and William F. Hunt, Jr. 1980. An air quality data analysis system for interrelating effects, standards, and needed source reductions: Part 6. Calculating concentration reductions needed to achieve the new national ozone standard. Journal of the Air Pollution Control Association 30:662-669. Lawless, J. F. 1982. Statistical models and methods for lifetime data. New York: John Wiley and Sons. Lilliefors, H. W. 1967. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association 62:399-404. Lilliefors, H. W. 1969. On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown. Journal of the American Statistical Association 64:387-389. Littell, Ramon C., James T. McClave and Walter W. Offen. 1979. Goodness-of-fit tests for the two parameter weibull distribution. Communications in Statistics B8(3):257-269. Looney, Stephens W. and Thomas R. Gulledge, Jr. 1985. Use of the correlation coeffiecient with normal probability plots. The American Statistician 39:75-79. Mage, David T. 1980. An empirical model for the Kolmogorov-Smirnov statistic. Journal of Environmental Science and Health, Part A 15:139-147. Mage, David T. 1982. An objective graphical method for testing normal distributional assumptions using probability plots. The American Statistician 36(2):116-120. Mann, H. B. and A. Wald. 1942. On stochastic limit and order 236 relationships. Annals of Mathematical Statistics 14:217-226. Mann, N. R., E. M. Scherer and K. W. Fertig. 1973. A new goodness of fit tests for the Weibull distribution or extreme value distribution with unknown parameters. Communications in Statistics 2:283-400. Medvedev, Yu. I. 1977a. Separable statistics in a polynomial scheme, I. Theory of Probability and Its Applications 22(1):1-15. Medvedev, Yu. I. 1977b. Separable statistics in a polynomial scheme, II. Theory of Probability and Its Applications 22(3)=607-614. Michael, John R. 70:11-17. 1983. The stabilized probability plot. Microsoft FORTRAN Reference Manual. Corporation. 1983. Bellevue, WA: Biometrika Microsoft Mood, M. Alexander, Franklin A. Graybill and Duane C. Boas. 1974. Introduction to the theory of statistics. Third edition. New York: McGraw-Hill, Inc. Moore, D. S. 1970. On the multivariate chi-square statistics with random cell boundaries. Purdue Statistics Department Mimeo Series No. 246. Moore, D. S. 1971. A chi-square statistic with random cell boundaries. Annals of Mathematical Statistics 42:147-155. Moore, D. S. and M. C. Spruill. 1975. Unified large sample theory of general chi-squared statistics for tests of fit. Annals of Statistics 3(3):599-6l6. Moore, David S. 1977. Generalized Inverses, Wald's method, and the construction of chi-squared test of fit. Journal of the American Statistical Association 72(357):131-137. Morris, C. 1975. Central limit theorems for multinomial sums. Statistics 3:165-188. Annals of Murthy, V. K. and A. V. Gafarian. 1970. Limiting distributions of some variants of the chi-square statistics. Annals of Mathematical Statistics 41:188-194. Pearson, E. S., R. E. D'Agostino and K. 0. Bowman. 1977. Test of departure from normality; Comparison of powers. Biometrika 64(2).-231-246. Pearson, K. 1900. On the criterion that a given system of deviations from the probable case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random 237 sampling. Philosophical Magazine 50:157-175. Quesenberry C. P. and Craige Hales. 1980. Concentration Bands for Uniform plots. Journal of Statistical Computation and Simulation 11:41-53. Rand Corporation. 1955. A million random digits with 100000 normal deviates. Rand Corporation, Santa Monica, California. Rao, K. C. and D. S. Robson. 1974. A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics 3(1 2):1 1 39-1 153. Rayner, J. C. and D. J. Best. 1982. The choice of class probabilities and number of classes for the simple goodness-of-fit test. Sankhya 844:28-38. Read, Timothy B.C. 1984. Small-sample comparisons for the power divergence goodness-of-fit statistics. Journal of the American Statistical Association 79(388):929-935. Roscoe, J. T. and J. A. Byars. 1971. An investigation of the restraints with respect to the sample size commonly imposed on the use of the chi-square statistics. Journal of the American Statistical Association 66:755-759. Roy, A. R. 1956. On statistics with variable intervals. Technical Report No. 1. Department of Statistics, Stanford University, Stanford, California. Royston, J. P. 1982a. An extension of Shapiro and Wilk's test for normality to large samples. Applied Statistics 31:115-124. Royston, J. P. 1982b. Expected normal order statistics (exact and approximate). Algorithm AS177. Applied Statistics. 31(2):161-165. Royston, J. P. 1982c. 31 :176-180. The W test for normality. Applied Statistics Royston, J. P. 1983. Some techniques for assessing multivariate normality based on the Shapiro-Wilk W. Applied Statistics 2:121-133. Ryan, Thomas A., Jr. and Brain L. Joiner. 1974. Normal probability plots and tests for normality. Technical Report. Statistics Department, Pennsylvania State University, University Park, Pennsylvania. Sahler, W. 1968. A survey on distribution-free statistics based on distances between distribution functions, Metrika 13:149-169. Sarkadi, K. 1975. The consistency of the Shapiro-Franoia test. 238 Biometrika 62(2):445. SAS Institute Inc. 1982. SAS User's guide: Basics. Gary, NC: SAS Institute Inc. 1982 Edition. Scheffé, H. 1947. distributions. A useful convergence theorem for probabilityAnnals of Mathematical Statistics 18:434-438. SCIENCE 84. American Association for the Advancement of Science. 1984. Sethuraman^ J. 1961. Some limit theorems for joint distributions. Sankhya 23(A):379-386. Shapiro, S. S. 1964. An analysis of variance test for normality (complete samples). Unpublished Ph.D. thesis, Rutgers - The State University. Shapiro, S. S. and M. B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52:591-611. Shapiro, S. S., M. B. Wilk and N. J. Chen. 1968. A comprehensive study of various test for normality. Journal of the American Statistical Association 53:1342-1372. Shapiro, S. S. and R. S. Francia. 1972. An approximate analysis of variance test for normality. Journal of the American Statistical Association 67(337)=215-216. Shapiro, S. S. and M. B. Wilk. 1972. An analysis of variance test for the exponential distribution (complete sample). Technometrics 14:355-370. Shapiro, S. S. 1980. How to test normality and other distributional assumptions. ASQC Basic References in Quality Control: Statistical Techniques, Volume 3- American Society for Quality Control, Milwaukee, Wisconsin. Shapiro, S. S. and C. W. Brain. 1982. Recommended distributional testing procedures. American Journal of Mathematical and Management Science 2:175:221. Sinha, B. K. 1976. On unbiased of Mann-Wald-Gumbel x^'test. Sankhya A38:124-130. Slakter, M. J. 1956. Comparative validity of the chi-square and two modified chi-square goodness-of-fit tests for small but equal expected frequencies. Biometrika 53:619-622. 1 Smirnov, N. V. 1936. Sur la distribution de w^ (Critérium de M. R. v. Mises). C. R. Acad. Soi. Paris 202:449-452. 239 Smirnov, N. V. 1937. Sur la distribution de (Critérium de M. R. v. Mises). (Russian/French summary). Mat. Sbornik (N. S.). 2(44):973-993. r Smirnov, N. V. 1939. Sur les écarts de la courbe de distribution empirique (Russian/French summary). Mat. Sbornik (N.S.) 6(18):3-26. Smirnov, N. V. 1941. Approximate laws of distribution of random variables from empirical data. Uspekhi Mat. Nauk 10:179-206 (Russian). Smith, Paul J., Donald S. Rae, Ronald W. Manderscheid and Sam Silbergeld. 1979. Exact and approximate distributions of the chi-square statistic for equiprobability. Communications in Statistics B8(2):131-1^9. Snedecor, George W. and William G. Cochran. 1980. Statistical Methods. Seventh edition. Ames, Iowa: Iowa State University Press.' Steck, G. P. 1957. Limit theorems for conditional distributions. University of California Publications in Statistics 2(12):237-284. Stephens, M. A. 1970. Use of Kolmogorov-Smirnov, Cramer-Von Mises and related statistics without using extensive tables. Journal of the Royal Statistical Society B32(1):115-122. Stephens, M. A. 1971. Asymptotic results for goodness-of-fit statistics when parameters must be estimated. Stanford Research Report No. 180, Department of Statistics, Stanford University. Stanford, California. Stephens, M. A. 1974. EDF statistics for goodness-of-fit and some comparisons. Journal of the American Statistical Association 69:730-737. Stephens, M. A. 1976. Asymptotic results for goodness-of-fit statistics with unknown parameters. The Annals of Statistics 4(2);357-369. Stephens, M. A. 1977. Goodness of fit for the extreme value distribution. Biometrika 64(3);583-588. Stirling, Douglas. 1982. Enhancements to aid interpretation of probability plots. Statistician 31:211-220. Sukhatme, Shashikala. 1972. Fredholm determinant of a positive definite kernal of a special type and its applications. Annals of Mathematical Statistics 43:1914-1926. Tate, M. W. and L. A. Hyer. 1973. Inaccuracy of the test of goodness of fit when expected frequencies are small. Journal of the American Statistical Association 68:836-841. 240 Tukey, J. W. 1962. The future of data analysis. Statistics 33:1-67. Von Mises, R. 1931. Wahrscheinlichkeitsrechnung. Annals of Mathematical Wein, Leipzig. Wald, Abraham. 1943. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society 54:426-482. Watson, G. S. 1957. The goodness-of-fit test for normal distributions. Biometrika 44:336-348. Watson, G. S. 1958. On chi-square goodness-of-fit tests for continuous distributions. Journal of the Royal Statistical Society, Series B 20:44-72. Watson, G. S. 1959. Some recent results in Biometrics 15:440-458. goodness-of-fit tests. Watson, G. S. 1951. 48:109-114. Goodness-of-fit test on a circle. Watson, G. S. 49:57-63. Goodness-of-fit test on a circle II. 1962. Weibull, W. 1939. The phenomenon of rupture in solids. Vetenskaps Akademien Handlingar 153:17. Biometrika Biometrika Ingénions Weisberg, S. 1974. An empirical comparison of percentage points of W and W'. Biometrika 61:644-646. Weisberg, S. and C. Bingham. 1975. An approximate analysis test for non-normality suitable for machine calculation. Technometrics 17:133-134. Wichmann, B. A. and I. D. Hill. 1982a. An efficient and portable pseudo-random number generator. Applied Statistics 31:188-190 [Correction 33:1233. Wichmann, B. A. and I. D. Hill. 1982b. A pseudo-random number generator, National Physical Laboratory Report DITC 6/82. National Physical Laboratory, Teddington, Middx, UK. Wilk, M. B. and R. Gnanadesikan. 1968. Probability plotting methods for the analysis of data. Biometrika 55:1-17. Witting, H. 1959. Uber einen x^-test, dessen klassen durch geordnete Stichproben funktionen festgelegt werden. Ark. Mat. 10:468-479. 241 VIII. ACKNOWLEDGEMENTS I would like to express my deepest appreciation to Dr. Koehler for his guidance, assistance, and patience. The financial support given to me by the Department of Statistics and the Statistical Laboratory for my graduate program at the Iowa State University will be remembered. I appreciate the computing facility provided by Dr. Kennedy and thank Bud for his thoughtfulness. Friends at Iowa State and the church at Lincoln Swing have made my stay here a very pleasant one. family can not be described. The sacrifice of my 242 IX. APPENDIX A. Normal distribution 1 N(y,o^) " f(x) = PARAMETRIC FAMILIES OF DISTRIBUTIONS ^20*^ -co e < , p < /(2ira^) , 1— f(x) = (1-p) e . + x^ 1 e ScConN(p,a) x^ " ~2^ e 1 + p /(2n) * , „ 0 > 0 . -« < x < " . /(2ira^) , Truncated normal distribution TruncN(a,b) • /(2iTa^) , F(b) - F(a) Exponential distribution X - a ^ -® < X < " , -= < a < b < " . Exponential(a,B) . , 6 . -" < a < " , B > 0 , X > a . Exponential distribution with location 0 f(x) = — e 6 -" < n < " , -" < X < " . /(2it) Normal distribution with scale contaminated f(x) = (1-p) LoConN(p,u) - ^*2*^ e , p /(2w) f(x) = — e , -" < X < " . Normal distribution with location contaminated f(x) = œ 0 > 0 , ^ , 6 > 0 , x > 0. Exponential(g) 243 Exponential dlstributi on with 1ocation contaminated f(x) = (1-p) e- X + p e- (X - =) X > a . Exponential distribution with scale contaminated ScConE(p,B) -co < a < oo J - X f(x) = (1-p) e + 6 > 0 , X > 0 . Truncated exponential distribution TruncE(0,b) - X f(x) = e b > 0 , 0 < X < b . F(b) Logistic distribution f(x) = LogisticÇa,B) < a < " , exp[-(x - a)/B] g > 0 , -« < X < = . 6 {1 + exp[-(x - a)/6]}• Laplace distribution Laplace(a,B) -0° < a < m , f(x) = exp[-|x - a|/B]/(2B) , B > 0 , -to < X < <° . Asymmetric triangle distribution Triangle 11(c) f(x) = 2/c - 2x/c* , c > 0 , 0 < X < c . Symmetric triangle distribution 1/c - x/c^ f(x) = 1/c + x/c^ LoConE(p,a) Triangle 1(c) c > 0 , -c < X < c . 2# Beta distribution f<x). Beta(a,B) x""' (i-x)»"' , 6 > 0: 0 < X < 1 . r(ct)r(B) Cauchy distribution Cauchy(a,B) 1 f(x) = -œ < a < " , g > 0 , -" < x < " . , m g [1 + {(x-a)/g}^ Gamma distribution 1 f(x) = Gamma(\,a,B) a-1 (x - X) exp[-(x - A)/g], * > 0' g > 0 , x > X . r(a)6° Chi-square distribution f(x) = Chi-square(k) k/2 - 1 vlT' ^ ' sxp (-x/2), r(k/2)2*'^ Weibull distribution k > 0 , X > 0 . Weibull(c,a,g) , f(x) = c/g*C(x - a)/B] 0 >0 • ex p [ - ( x - a ) / g ] , , -»<%<=, -co < X < " , g > 0 . Standard Weibull distribution f(x) = G x° ^'exp(-x°) , Weibull(c) c > 0, Gumbel or extreme value distribution f(x) . . expC-e-^:" " -o> < X < " . Gumbel(a,g) ^ -00 < x < " , g > 0 . 245 Uniform distribution Uniform(a,B) f ( x ) = 1 / ( 3 - a ) , -= < a < g < » , a < X < g . Johnson bounded distribution SB(a,g) The Johnson bounded random variable Y is related to the standard normal random variable X by the equation: X = a + glog[Y/(1 - Y)] , 0 < Y < 1 . Johnson unbounded distribution SU(a,g) The Johnson unbounded random variable Y is related to the standard normal random variable X by the equation: X = a+ Bsinh^Y, Lognormal distribution - = < Y < =» . Lognormal(A,a,B) The lognormal random variable Y is related to the standard normal random variable X by the equation: X = a + Blog(Y - X) , X < Y < » . Symmetric Tukey distribution Tukey(A) The symmetric Tukey random variable Y is related to the standard uniform random variable U (on [0,1]) by the equation: Y = - (1 - U)^ . t distribution t(v) f^(x) = [1 + xz/v]"(^ ^ 246 IX. APPENDIX B. RANDOM VARIATES GENERATORS The methods of generating random numbers from distributions different from the uniform distribution are described in this Appendix. A uniform random (0,1) variate is denoted by U. These methods may not be the most efficient methods for generating random numbers. For efficient generators, an excellent description can be found in Kennedy and Gentle (1980). Normal distribution N(y,a^) (Box-Muller transformation, 1958) A pair of independent normal variates (Xi.Xg) is obtained by the transformation: Xi = oos(2nUz)./[-2 In(Ui)] X; = sin(2%U2)./[-2 ln(Uj] Normal distribution with location contaminated 1. Generate a normal random variate, X. 2. Generate a uniform random variate, U. 3. If U < p then deliver X + u, else deliver X. Normal distribution with scale contaminated LoConN(p,u) ScConN(p,a) 1. Generate a normal random variate, X. 2. Generate a uniform random variate, U. 3. If U < p then deliver aX, else deliver X. 247 Truncated normal distribution TruncN(a,b) 1. Generate a normal random variate, X. 2. Generate a uniform random variate, U. 3. If a < U < b then deliver X, else go to 1. Exponential distribution ExponentiaKa,g) X = -glog(U) + a Exponential distribution with location contaminated 1. Generate an Exponential(0,1) random variate, X. 2. Generate a uniform random variate, U. 3. If U < p then deliver X + a, else deliver X. Exponential distribution with scale contaminated ScConE(p,6) 1. Generate an ExponentiaKO,1 ) random variate, X. 2. Generate a uniform random variate, U. 3. If U < p then deliver gX, else deliver X. Truncated exponential distribution TruncE(0,b) 1. Generate an ExponentiaKO,1 ) random variate, X. 2. Generate a uniform random variate, U. 3- If X < b then deliver X, else go to 1 • Logistic distribution Logistic(a,B) X = a - 61n(1/U - 1) LoConE(p,ct) 248 Laplace distribution Laplace(a,B) 1. Generate two uniform random variates, Uj and Uj. 2. If Uj < 0.5 then deliver a + 61n(2U2), else deliver a - 31n(2[1-U2]). Asymmetric triangle distribution Triangle 11(c) X = 0 - c/U. Symmetric triangle distribution Triangle 1(c) 1. Generate a Triangle 11(c) random variate, Y. 2. Generate a uniform random variate, U. 3. If U < 0.5 then deliver X = - Y, else deliver X = Y. Beta distribution Beta(a,B) (Algorithm Jojnk, 1964) 1. Generate and 2. Set Yi = 3. If W g 1 then deliver X = Yj/W, else go to 1. Cauchy ditribution . Yz = ^^d W = Yj + Y^. Cauchy(a,g) X = a + B tan[n(U - 0.5)] Gamma distribution Gamma(X,a,B) The generation of Gamma random variate was based on two methods, depending on whether a is less than or greater than 1. For a less than 1, the method by Ahrens in Ahrens and Dieter (1974) was used. The method by Fishman (1976) was employed when a is greater or equal to 1. Descriptions of these two generators can be found in Kennedy and Gentle ( 1 9 8 0 , pp. 2 1 3 , 2 1 4 ) . 249 Welbull distribution Weibull(c,a,B) X = a + B(-lnU)T/° Gumbel or extreme value distribution Gumbel(a,B) X = a - Bln(-lnU) Johnson bounded distribution SB(a,B) 1. Generate aN(0,1) random variate, Y. 2. Deliver X = a + Blog[Y/(1 - Y)]. Johnson unbounded distribution SU(a,B) 1. Generate a N(0,1) random variate, Y. 2. Deliver X = a + B slnh Lognormal distribution —1 Y, Lognormal(X,a,B) 1. Generate aN(0,1) random variate, Y. 2. Deliver X = a + Blog(Y - X). Symmetric Tukey distribution Y = Tukey(A) - (1 - U)^ . t distribution t(v) (Kinderman, Monahan and Ramage, 1977) Algorithm TAR: 1. Generate U;. Generate Uj. If Uj < 0.5, go to 2. Go to 3. 2. Set X = 0.25/(Ui - 0.25). 3. If U2 < 1 - |x|/2, If Uj < (1 + X^/v) Set X = Generate U3 and set - 3. = Uj/X^. deliver X. ^ T)/'2 deliver X, else go to 1. 250 IX. APPENDIX C. COMPUTER PROGRAMS SAS program for the generation of normal P-P and Q-Q probability plots //CAN JOB 13542,SASPPQQ //SI EXEC SAS //SYSIN DD * * I * * INPUT THE OBSERVATIONS X1,X2,...,XN INTO DATAX | * ; DATA DATAX; INPUT XI; CARDS; 129 104 124 146 83 134 161 123 107 119 113 97 * SORT THE OBSERVATIONS X1,X2,...,XN PROC SORT; BY XI; * COMPUTE THE MEAN AND STANDARD DEVIATION OF X I , X 2 STORE MEAN AND STANDARD DEVIATION IN DATAMLE PROC MEANS; VAR XI; OUTPUT OUT=DATAMLE MEAN=XMEAN STD=XSTD N=NUM; * * MERGE THE 2 DATA SETS DATA DATACOMB; MERGE DATAX DATAMLE; * COMPUTE: XTRANS = STANDARDIZED XI PI = I/(N+1) I.E. THE PLOTTING POSITIONS NORPERC = INVERSE NORMAL CDF OF PI NORPROB = NORMAL PROBABILITY OF XTRANS DATA DATAPLOT; SET DATACOMB; XN 251 IF _N_=1 THEN ALPHA=XMEAN; IF _N_=1 THEN BETA=XSTD; IF _N_=1 THEN N=NUM; XTRANS=(XI-ALPHA)/BETA; PI=_N_/(N+1); NORPERC=PROBIT(PI); NORPROB=PROBNORM(XTRANS); DROP XMEAN XSTD NUM ALPHA BETA N; RETAIN ALPHA BETA N; * I * * PRINT THE OBSERVATIONS AND PLOTTING POSITIONS | * ; PROC PRINT; * I * * CONSTRUCT P-P PLOT | * ; PROC PLOT; PLOT NORPROB*PI='*' / VAXIS = 0 .1 .2 .3 .4 .5 .5 .7 .8 .9 1 HAXIS = 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 VSPACE = 4 HSPACE = 5 VPOS = 44 HPOS =55 HZERO VZERO; TITLE NORMAL P-P PROBABILITY PLOT; * CONSTRUCT Q-Q PLOT NOTE: MODIFY VAXIS AND HPOS VALUES FOR OTHER Q-Q PLOTS PROC PLOT; PLOT XI*N0RPERC='*'/VAXIS= r .5 1 . 6 1.7 1 . 8 1.9 2 . 0 2. 1 2.2 2.3 2.4 2.5 HAXIS = -3 -2.4 -1.8 -1.2 -.6 0 .5 1.2 1.8 2.4 3 VSPACE = 4 HSPACE = 5 VPOS = 44 HPOS = 55; TITLE NORMAL Q-Q PROBABILITY PLOT; // 252 DISSPLA program for the generation of normal P-P and Q-Q probability plots //CAN JOB 13542,DISPPQQ NORMAL P-P PROBABILITY PLOT /*OUTPUT P001 FORMS=3001,C0PIES=1 //SI EXEC FORTVD,REGION.G0=512K,TIME=1 //FORT.SYSIN DD * C C C C C C C XI = OBSERVATIONS PI = PLOTTING POSITIONS XTRANS = STANDARDIZED OBSERVATIONS NORPER = NORMAL CDF OF PLOTTING POSITIONS NORPRO = NORMAL PROBABILITY OF STANDARDIZED OBSERVATIONS NUMOBS = N, NUMBER OF OBSERVATIONS REAL XI,PI,XTRANS,NORPER,NORPRO DIMENSION XI(IOOO),PI(1000),XTRANS(1000),NORPER(1000) DIMENSION NORPRO(WOO) CC INPUT THE OBSERVATIONS XI, X2 XN INTO XI() cDO 100 1=1,1000 READ (5,*,END=200) XI(I) 100 CONTINUE 200 NUM0BS=I-1 c- C SORT THE OBSERVATIONS XI, X2, . . ., XN CCALL SORT(NUMOBS,XI) CC COMPUTE MEAN AND STANDARD DEVIATION OF XI, X2, . . ., XN CCALL MLE(NUMOBS,XI,XMEAN,XSTD) c- COMPUTE PLOTTING POSITIONS [PI=I/(NUM0BS+1 )] C c- NOBSPl=NUM0BS+1 DO 300 1=1,NUMOBS PI(I)=I/REAL(N0BSP1) 300 CONTINUE c- COMPUTE STANDARDIZED OBSERVATIONS AND PLOTTING POSITIONS C c- DO 400 1=1,NUMOBS XTRANS(I)=(XI(I)-XMEAN)/XSTD CALL NPROB(NORPRO(I),XTRANS(I)) CALL NINV(PI(I),NORPER(I)) 400 CONTINUE 253 C C C C PREPARE LABELS FOR AXES OF Q-Q PLOT. X AXIS LABELS WILL DISPLAY 2 DECIMAL PLACES. XTOP=NORPER(NUMOBS)+0.1 XTOP=REAL(NINT(XTOP*10.0))/10.0 XB0T=N0RPER(1)-0.1 XBOT=REAL(NINT(XBOT*1 G.0))/1 0.0 XINCR=(XTOP-XBOT)/10.0 YTOP=XI(NUMOBS) YB0T=XI(1) YINCR=(YTOP-YBOT)/10.0 C C C C C C COSTRUCT NORMAL P-P PLOT * | * CALL ZETA(53,11,15) CALL PHYSOR(2.0,4.25) CALL HWROT('MOVIE') CALL AREA2D(5.0,5.0) CALL COMPLX CALL SETCLR('BLACK') CALL XMME('Uniform probabilities!',100) CALL YNAME('Normal probabilities$',100) CALL GRAF(0.0,0.1,1.0,0.0,0.1,1.0) CALL RLVEC(0.0,0.0,1.0,1.0,0) CALL THKFRM(.01) CALL FRAME CALL MARKER(15) CALL CURVE(PI,N0RPR0,NUM0BS,-1) CALL ENDGR(O) CALL PHYSOR(2.0,1.25) CALL AREA2D(5.0,8.0) CALL MESSAGC'Figure 3.1. Normal P-P probability plot$',100, &0.0,1.84375) CALL ENDPL(I) * CONSTRUCT NORMAL Q-Q PLOT | * CALL PHYSOR(2.0,4.25) CALL AREA2D(5.0,5.0) CALL XNAME('Normal percentiles!', 100) CALL YNAME('Observed percentiles!',100) CALL GRAF(XBOT,XINCR,XTOP,YBOT,YINCR,YTOP) CALL THKFRM(.OI) CALL FRAME CALL SETCLR('BLACK') CALL MARKER(15) CALL CURVE(NORPER,XI,NUMOBS,-1) CALL ENDGR(O) CALL PHYSOR(2.0,1.25) 254 CALL AREA2D(5.0,8.0) CALL MESSAG('Figure 3.2. &0.0,1.84375) CALL ENDPL(2) CALL DONEPL STOP END C C C C C Normal Q-Q probability plot$',100, * SORT OF OBSERVATIONS IN NON-DESCENDING ORDER SHELL SORT ALGORITHM USED SOURCE: R. LOESER, COMMUNICATIONS OF THE ACM, VOL 17, NO 3, P. 143 SUBROUTINE SORT(N,SOBS) IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*4 (I-N) IMPLICIT REAL*4 (0-W) DIMENSION SOBS(N) 1=1 101 102 IF(I-N) 102,102,103 1=1+1 GOTO 101 103 M=I-1 104 M=M/2 IF(M) 110,110,105 K=N-M DO 109 J=1 ,K I=J+M I=I-M IF(I) 109,109,107 L=I+M IF(SOBS(L)-SOBS(I)) 108,108,109 S=SOBS(I) SOBS(I)=SOBS(L) SOBS(L)=S GOTO 106 CONTINUE GOTO 104 RETURN END 105 106 107 108 109 110 C C C C * MAXIMUM LIKELIHOOD ESTIMATION OF MEAN AND STANDARD DEVIATION FOR THE OBSERVATIONS * SUBROUTINE MLE(NUMOBS,SOBS,SMEANX,STDX) IMPLICIT DOUBLE PRECISION (A-H) 255 200 C C C C IMPLICIT INTEGER*^ (I-N) IMPLICIT REAL*4 (0-W) DIMENSION SOBS(NUMOBS) DXBAR=(DBLE(SOBS(1))+DBLE(SOBS(2)) ) / 2 . OD+0 DT1=DBLE(S0BS(1))-DXBAR DT2=DBLE(SOBS(2))-DXBAR DVAR=DT1*DT1+DT2*DT2 DNPAR=DFLOAT(NUMPAR) N0BSM2=NUM0BS-2 NPARM1=NUMPAR-1 DO 200 1=1,N0BSM2 J=I+1 K=I+2 DI=DFLOAT(I) DJ=DFLOAT(J) DK=DFLOAT(K) DXMXB=DBLE(SOBS(K))-DXBAR DVAR=(DI*DVAR+DXMXB*DXMXB*DJ/DK)/DJ DXBAR=(DJ*DXBAR+ DBLE(SOBS(K)))/DK CONTINUE SMEANX=SNGL(DXBAR) STDX=SNGL(DSQRT(DVAR)) RETURN END * COMPUTE NORMAL PROBABILITIES CODY ALGORITHM USED SOURCE: KENNEDY & GENTLE, "STATISTICAL COMPUTING", 1980 SUBROUTINE NPROB(SP,SXP) IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*^) (I-N) IMPLICIT REAL*4 (0-W) DXP=DBLE(SXP) IC0R=0 IF(DXP.GE.O) GOTO 200 IC0R=1 DXP=-DXP 200 CONTINUE ARG=DXP/DSQRT(0.2D+1) IF(DXP.GE.0.45875) GOTO 300 DP=(1.0D+0+ARG*D1(ARG))/2.0D+0 GOTO 900 300 IF(DXP.GE.4.0) GOTO 400 DP=(2.0D+0-DEXP(-ARG*ARG)*D2(ARG))/2.0D+0 GOTO 900 400 DP=(0.2D+1-(DEXP(-ARG*ARG)/ARG)*(1.0D+0/DSQRT(3.141 592653589798 &D+Û)+D3(1.OD+0/(ARG*ARG))/CARG*ARG)))/2.OD+0 256 900 950 CONTINUE IFdCOR.NE.I ) GOTO DP=1 .QD+G-DP DXP=-DXP CONTINUE SP=SNGL(DP) RETURN END 950 C DOUBLE PRECISION FUNCTION D1(F) C IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*') (I-N) IMPLICIT REAL*4 (O-W) DP0=2.4266795523053175D+2 DPI =2.1 9792616182941520+1 DP2=6.99638348861913550+0 DP3=-3.5609843701 Si 53850-2 DQ0=2.1505887586986120D+2 DQ1=9.1 16490540451 4901 D+1 DQ2=1.50827976304077870+1 DQ3=1.OD+0 ANUM=((DP3*F*F+DP2)*F*F+DP1)*F*F+DP0 DEN= ((DQ3*F*F+DQ2)*F*F+DQ1)*F*F+DQO D1=ANUM/DEN RETURN END C DOUBLE PRECISION FUNCTION D2(F) C IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*4 (I-N) IMPLICIT REAL*4 (O-W) DP0=3.004592610201616005D+2 DPI=4.5191895371187294220+2 DP2=3.3932081 673-434368700+2 DP3=1.5298928504694040390+2 B4=4.3162227222056735300+1 55=7.2117582508830936590+0 B6=5.6419551 747897397110-1 B7=-1.3686485738271670670-7 DQ0=3.0045926095698329330+2 DQ1=7.909509253278980272D+2 DQ2=9.3135409485060962110+2 DQ3=6.3898026446563116650+2 G4=2.7758544474398764340+2 05=7.7000152935229472950+1 G6=l.2782727319629423510+1 G7=1.0D+0 ANUM=((((((B7*F+B6)*F+B5)*F+B4)*F+DP3)*F+DP2)*F+0P1)*F+DPO 257 DEN =((((((G7*F+G6)*F+G5)*F+G4)*F+DQ3)*F+DQ2)*F+DQ1)*F+DQO D2=ANUM/DEN RETURN END C DOUBLE PRECISION FUNCTION D3(F) C IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*4 (I-N) IMPLICIT REAL*4 (0-W) DP0=-2.996107077035421 74D-3 DPI=-4.9473091 0623250734D-2 DP2=-2.26956593539686930D-1 DP3=-2.78661 3086096477880-1 B4=-2.231 924597341 846860-2 DQ0=1.062092305284679180-2 0Q1=1.913089261078298410-1 0Q2=1.0516751 07067932070+0 0Q3=1.987332018l 71 352560+0 04=1.00+0 ANUM=DP0+DP1 /(F*F)+0P2/(F*F*F*F)+0P3/(F*F*F*F*F*F)+B4/(F*F*F*F* C F*F*F*F) DEN =DQ0+DQ1/(F*F)+0Q2/(F*F*F*F)+0Q3/(F*F*F*F*F*F)+G4/(F*F*F*F* C F*F*F*F) 03=ANUM/0EN RETURN END C C C COMPUTE NORMAL PERCENTILES ODEH & EVANS ALGORITHM USED SOURCE: KENNEDY & GENTLE, "STATISTICAL COMPUTING", 1980 SUBROUTINE NINV(S,SP) IMPLICIT DOUBLE PRECISION (A-H) IMPLICIT INTEGER*4 (I-N) IMPLICIT REAL*4 (0-W) D=DBLE(S) DLIM=0.10-18 D0=-0.322232431088 D1=-1.0 D2=-0.342242088547 D3=-0.0204231210245 D4=-0.4536422101 480-4 C0=0.0993484626060 C1=0.588581570495 C2=0.531103462366 03=0.103537752850 C4=0.385607006340-2 258 DP=0.0D+0 IF(D.GT.0.5) THEN D=1-D IC0R=1 ELSE ICOR=0 ENDIF IF(D.LT.DLIM) GOTO 200 IF(D.EQ.0.5) GOTO 200 B=DSQRT(DL0G(1.0/(D*D))) DP=B+((((B*Dil + D3)*B+D2)*B+D1 )*B+DO)/(( ( (B*C4+C3)*B+C2) *B+C1 )*B + CO) IFdCOR.EQ.I ) THEN D=l-D ELSE DP=-DP CONTINUE ENDIF 200 CONTINUE SP=SNGL(DP) RETURN END //G0.FT15F001 DD SYSOUT=(P,,P001) //GO.SYSIN DD * 129 104 124 145 83 134 151 123 107 119 113 97 /*