Download Goodness-of-fit statistics for location-scale distributions

Document related concepts
no text concepts found
Transcript
Retrospective Theses and Dissertations
1985
Goodness-of-fit statistics for location-scale
distributions
Fah Fatt Gan
Iowa State University
Follow this and additional works at: http://lib.dr.iastate.edu/rtd
Part of the Statistics and Probability Commons
Recommended Citation
Gan, Fah Fatt, "Goodness-of-fit statistics for location-scale distributions " (1985). Retrospective Theses and Dissertations. Paper 12065.
This Dissertation is brought to you for free and open access by Digital Repository @ Iowa State University. It has been accepted for inclusion in
Retrospective Theses and Dissertations by an authorized administrator of Digital Repository @ Iowa State University. For more information, please
contact digirep@iastate.edu.
INFORMATION TO USERS
This reproduction was made from a copy of a document sent to us for microfilming.
While the most advanced technology has been used to photograph and reproduce
this document, the quality of the reproduction is heavily dependent upon the
quality of the material submitted.
The following explanation of techniques is provided to help clarify markings or
notations which may appear on this reproduction.
1.The sign or "target" for pages apparently lacking from the document
photographed is "Missing Page(s)". If it was possible to obtain the missing
page(s) or section, they are spliced into the film along with adjacent pages. This
may have necessitated cutting througli an image and duplicating adjacent pages
to assure complete continuity.
2. When an image on the film is obliterated with a round black mark, it is an
indication of either blurred copy because of movement during exposure,
duphcate copy, or copyrighted materials that should not have been filmed. For
blurred pages, a good image of the page can be found in the adjacent frame. If
copyrighted materials were deleted, a target note will appear listing the pages in
the adjacent frame.
3. When a map, drawing or chart, etc., is part of the material being photographed,
a definite method of "sectioning" the material has been followed. It is
customary to begin filming at the upper left hand comer of a large sheet and to
continue from left to right in equal sections with small overlaps. If necessary,
sectioning is continued again—beginning below the first row and continuing on
until complete.
4. For illustrations that cannot be satisfactorily reproduced by xerographic
means, photographic prints can be purchased at additional cost and inserted
into your xerographic copy. These prints are available upon request from the
Dissertations Customer Services Department.
5. Some pages in any document may have indistinct print. In all cases the best
available copy has been filmed.
Universl^
MicrxSilms
International
300 N. Zeeb Road
Ann Arbor, Ml 48106
8524655
Gan, Fah Fatt
GOODNESS-OF-FIT STATISTICS FOR LOCATION-SCALE DISTRIBUTIONS
Iowa State University
University
Microfilms
Int©rnSti0n& 1
PH.D. 1985
300 N. Zeeb Road, Ann Arbor, Ml 48106
Copyright 1985
by
Gan, Fah Fatt
All Rights Reserved
PLEASE NOTE:
In all cases this material has been filmed in the best possible way from the available copy.
Problems encountered with this document have been identified here with a check mark -/ .
1.
Glossy photographs or pages
2.
Colored illustrations, paper or print
3.
Photographs with dark background
4.
Illustrations are poor copy
5.
Pages with black marks, not original copy
6.
Print shows through as there is text on both sides of page
7.
Indistinct, broken or small print on several pages
8.
Print exceeds margin requirements
9.
Tightly bound copy with print lost in spine
10.
Computer printout pages with indistinct print
11.
Page(s)
author.
lacking when material received, and not available from school or
12.
Page(s)
seem to be missing in numbering only as text follows.
13.
Two pages numbered
14.
Curling and wrinkled pages
15.
Dissertation contains pages with print at a slant, filmed as received
16.
Other
. Text follows.
University
Microfilms
International
Goodness-of-fit statistics
for location-scale distributions
by
Fah Fatt Gan
A Dissertation Submitted to the
Graduate Faculty in Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Major:
Statistics
Approved:
Signature was redacted for privacy.
In Charge of Major Work
Signature was redacted for privacy.
For the Major Department
Signature was redacted for privacy.
For the Graduate College
Iowa State University
Ames, Iowa
1985
Copyright (C) Fah Fatt Gan, 1985.
All rights reserved.
ii
TABLE OF CONTENTS
Page
I.
II.
III.
IV.
V.
VI,
INTRODUCTION
1
GOODNESS-OF-FIT STATISTICS
6
A.
Correlation Statistics
6
B.
Chi-square and Likelihood Ratio Statistics
18
C.
Statistics Based on the Empirical Distribution
Function
48
D.
Statistics Based on Moments
57
PROBABILITY PLOTS AND DISTRIBUTION CURVES
55
EMPIRICAL POWER COMPARISON
110
A.
Methods of Computation
110
B.
Results of the Power Comparison
121
PERCENTILES OF THE r^ AND
STATISTICS
187
SUMMARY AND RECOMMENDATIONS
209
REFERENCES
229
ACKNOWLEDGEMENTS
241
IX,
APPENDIX A.
PARAMETRIC FAMILIES OF DISTRIBUTIONS
242
X.
APPENDIX B.
RANDOM VARIATES GENERATORS
246
XI.
APPENDIX C.
COMPUTER PROGRAMS
250
VII.
VIII.
1
I. INTRODUCTION
The function of statistics is to extract and explicate the
informational content of a set of data.
Fitting a probability model to
a data set is often a useful step in this endeavor.
For example, in
survival time analysis, a probability model enables one to make
statements about the probabilities that individuals will survive
specified time intervals.
Many statistical procedures are based on
certain probability model assumptions.
Assessing the fit of a proposed
model to a data set is a necessary preliminary measure, leading perhaps
to a transformation of the data or to an alternate statistical
procedure.
The importance of assessing the fit of a probability model was
highlighted by the inclusion of the Karl Pearson's development of the
chi-square test in the list of the twenty most significant discoveries
of the current century presented by SCIENCE 84 (1984).
The chi-square
test was quoted in SCIENCE 84 as "a tiny event by itself but it was the
signal for a sweeping transformation in the ways we interpret our
numerical world" also "Karl Pearson's chi-square test measured the fit
between theory and reality, ushering in a new sort of decision making".
The traditional limiting chi-square theory for the distribution of
the Pearson chi-square statistic keeps the number of cells fixed as the
sample size is increased.
A non-traditional large sample theory for the
chi-square statistic is examined in this dissertation, where the number
of cells is allowed to increase at a certain rate as the sample size
increases.
In the later case, the asymptotic distribution of the
2
chi-square statistic may not be a chi-square distribution.
In fact, for
the case of testing simple null hypotheses, Hoist (1972) and Morris
(1975) showed that under certain regularity conditions the
goodness-of-fit statistic has a large sample normal distribution.
The
accuracy of the large sample normal and chi-square approximations for
the chi-square and likelihood ratio statistics was investigated by
Koehler and Larntz (1980).
In this dissertation, attention is focused
on tests of composite null hypotheses.
For example, one might be
interested in determining if the observed data were sampled from a
normal distribution.
An assessment could be made by partitioning the
real line into a certain number of cells and comparing observed counts
in the cells to estimates of expected counts from the hypothesized
probability model.
Attention is restricted to the case where the cells
are of equal probability since Hoist (1972) showed that this
partitioning has a certain optimum power property.
Also, Gumbel (1943)
pointed out that different conclusions can be reached by using cells
with unequal probabilities.
In testing composite null hypotheses, the
unknown parameters must be estimated and then used to approximate a
partitioning with equal probabilities.
The asymptotic theory of the
chi-square or the likelihood ratio statistic under this non-traditional
setup for testing composite null hypotheses has not been established in
the literature, but some results are given in the next chapter.
The
question of how finely the interval should be partitioned for various
sample sizes will also be investigated.
There are many other methods available for assessing the goodness
3
of fit of probability models to a data set.
One interesting test
statistic is the Pearson correlation coefficient of points on a normal
Q-Q (quantile versus quantile) probability plot.
This statistic
provides a measure of the linearity of a normal Q-Q probability plot.
If the normal probability model provides a good fit to the data set, an
approximate straight line is obtained and the correlation coefficient
will be close to one, the normal probability model will not be rejected.
On the other hand, one will reject the null hypotheses of normality for
a small value of the correlation coefficient because this indicates the
non-linearity of the probability plot.
The Q-Q probability plot is very
popular among statisticians and engineers.
tools in statistical quality control.
It is one of the important
The popularity of the Q-Q
probability plot can be largely attributed to the linear invariance
property it possesses.
In general, the P-P (percent versus percent)
probability plot is not linear invariant.
However, if the observations
are standardized, the P-P probability plot can be shown to possess the
linear invariance property.
A new statistic based on the Pearson
correlation coefficient of points on a P-P probability plot is proposed
for assessing the goodness of fit of probability models to a data set.
The logic behind this statistic is similar to that of the Shapiro-Wilk
statistic.
It is a measure of linearity of the probability plot which
provides a measure of the goodness of fit of the normal probability
model to the data set.
Since the Q-Q probability plot places more
emphasis on the tails of the distribution than the P-P probability plot,
one would expect that a correlation coefficient based on a Q-Q
4
probability plot would be more likely to detect long or heavy tailed
departures from the hypothesized distribution.
The Pearson correlation
coefficient based on a P-P probability plot may be more sensitive to
discrepancies near the center of the hypothesized distribution.
A new qualitative method based on distribution curves on a P-P
probability plot is developed for assessing the alternatives to the
hypothesized probability model for a data set.
One advantage of this
technique is that it is not limited to location-scale distributions.
The relative power of these goodness of fit statistics to detect
various alternative distributions is of major interest.
An extensive
Monte Carlo power comparison was performed to assess the power of the
chi-square and likelihood ratio statistics, correlation coefficient
statistics, statistics based on the empirical distribution function, and
statistics based on sample moments.
The power comparison also provides
some information about how finely to partition the support of the
hypothesized distribution for the chi-square and likelihood ratio
statistics so as to achieve nearly optimum power.
The extensive power comparison is performed for the normal, Gumbel
and exponential distributions.
The exponential and Weibull
distributions are used frequently in modeling the survival time or
reliability of certain individuals or components.
An interesting
relationship between the Gumbel and the Weibull distributions is that a
logarithmic transformation of the Weibull random variable produces a
Gumbel random variable.
The distributions of the statistics based on the Pearson
5
correlation coefficients are mathematically difficult to tract for
finite samples.
Consequently, the empirical percentiles of these
statistics are simulated and smoothed.
Curves are fitted through
smoothed Monte Carlo percentiles to obtain formulas for the true
percentiles as functions of the sample size.
Based on the results of these extensive power comparisons, some
recommendations will be made concerning the use of these statistics for
assessing the fit of probability models to a data set.
6
II.
GOODNESS-OF-FIT STATISTICS
A.
Correlation Statistics
Tests of fit based on correlation coefficients are reviewed in this
section.
Particular attention has been given to the empirical methods
of generating the percentiles of the statistics.
This provides
information for deciding how to generate the empirical percentiles of
the statistics based on the Pearson correlation coefficient for points
on a P-P or a Q-Q probability plot.
Shapiro and Wilk (1965) devised a statistic to test the linearity
of the probability plot of the ordered observations against the expected
values of the order statistics from a standardized version of the
hypothesized distribution.
This statistic compares two estimates of the
variance of a normal distribution.
One is the square of the generalized
least-squares estimate of the slope and the other is based on the second
sample moment about the sample mean.
Let X = (X^...,X^) be an ordered vector of n observations.
a =
.,ct^) and
Let
be the mean vector and the covariance matrix,
respectively, of the order statistics of a random sample of n
observations from a standard normal distribution.
The Shapiro-Wilk
statistic can be written as
W =
(a'fi ^X)2/(a'0
%
ï(Xj - X)2
^a)
,
(2.1)
where X = (}X^)/n and note that the numerator, (a'S2~^x)2/(a'n~^n~''^) is
the best linear unbiased estimator of the scale parameter S of the
7
normal distribution with density function
f(x) =
(x-g)^
2g:
e
< a < "
g > 0
/2it3^
-co
<
X
<
®
(2.2)
.
The Shapiro-Wilk statistic can also be written as
(I OiXi):
W =
(2.3)
%o;.I(Xi - X):
where (c^.c^,...,g^)' = a'n
-1
Shapiro and Wilk (1965) also presented this statistic in the form
h
1=1
noting that c^ =
n
in
n-i+1
(2.4)
1=1
and h denotes n/2 or (n-1)/2 according to
whether n is even or odd.
The Shapiro-Wilk statistic is location-scale invariant and
statistically independent of X and S, the maximum likelihood estimates
of the mean and the standard deviation of a normal distribution,
respectively.
The distribution of the Shapiro-Wilk statistic depends on
the sample size and the hypothesized distribution only.
The exact
distribution of the Shapiro-Wilk statistic has not been derived for
finite sample sizes except for sample sizes of 3 or 4 for which explicit
results have been derived by Shapiro and Wilk (1965) and Shapiro (1954).
The percentiles of the Shapiro-Wilk statistic for sample sizes
8
n=3(1)50 [that is 3 to 50 with increment 1] were obtained by Shapiro and
Wilk (1965) using a Monte Carlo method. Five thousand statistics were
computed for n=3(1)20 and 100000/n statistics were computed for
n=21(1)50.
The justification of the choice of the number of statistics
in the Monte Carlo study was provided by comparing the theoretical
one-half moment (E /W) and the first moment of the sample with the
corresponding empirical moments of W for n=3(l)20.
The Johnson bounded
system of curves (Johnson, 19^9) was used for smoothing the empirical
distribution of W.
Normal random variates were obtained from the Rand
Tables (Rand Corporation, 1955).
Shapiro and Wilk (1965) provided the necessary constants
and
a table of lower and upper 0.01, 0.02, 0.05 and 0.10 percentiles and the
median of W for n=3(1)50 under the null hypothesis of normality.
These
constants and the table of lower percentiles can also be found in
Shapiro and Brain (1982).
Small values of W indicate significant
departure from normality.
Extensive Monte Carlo experiments performed by Shapiro, Wilk and
Chen (1968), Pearson, D'Agostino and Bowman (1977) showed that the
Shapiro-Wilk test of normality has good power against a wide range of
alternative distributions.
Since the development of the Shapiro-Wilk
statistic, almost every new goodness-of-fit statistic proposed in the
literature is compared to the Shapiro-Wilk statistic in empirical power
studies.
The Shapiro-Wilk statistic has become a standard statistic for
the testing of normality.
Any statistic that is almost as powerful as
the Shapiro-Wilk statistic for a wide range of alternative distributions
is considered to be an excellent statistic.
In an attempt to summarize the appearance of nonlinearity in normal
probability plots, LaBrecque (1977) developed three modifications of the
Shapiro-Wilk statistic which assess the amount of certain types of
curvature in the normal Q-Q probability plot.
The Shapiro-Wilk statistic for the exponential distribution was
presented by Shapiro and Wilk (1972) as
W =
Cl'fi~^(1 a' - a 1')nX]: / [I'n'^l a'Q~^a - (l'Q~^a)^]
.
I(Xi - X)z
(2.5)
The above formula can be written neatly as
W =
n(X - X ) : / ( n - 1 )
:
KX^ - X):
,
(2.6)
and note that the numerator, n(X - X^)^/(n-1) is the best linear
unbiased estimator of the scale parameter g of the exponential
distribution with density function
X - a
f(x) = -^ e
^
-to <^ a < 00
6 > 0
X > a .
(2.7)
The empirical null distribution of the Shapiro-Wilk statistic for
the exponential distribution was obtained by Monte Carlo sampling.
thousand samples of W were generated for sample sizes n = 3(1)50 and
[250000/n] samples were used for n = 51(1)100.
The empirical
Five
10
percentiles were plotted against the sample size n and smoothed by hand
to obtain a table of approximate percentiles for the Shapiro-Wilk
statistic.
A table of smoothed percentiles for significance levels
0.005, 0.01, 0.025, 0.10, 0.5, 0.9, 0.95, 0.975, 0.99 and 0.995 and
sample sizes n = 3(1)100 can be found in Shapiro and Wilk (1972) or
Shapiro and Brain (1982).
Since the Shapiro-Wilk statistic for the
exponential distribution responds to the nonexponentiality by shifting
either to smaller or larger values,
Shapiro and Wilk suggested that
this test statistic must be two-tailed.
Shapiro and Francia (1972) modified W so that it can be used for
large samples where the covariance matrix 0 is unknown.
for large samples, the
They argue that
values may be treated as if they are
independent and hence the variance-covariance matrix
can be replaced
by the identity matrix I. The Shapiro-Francia statistic is given by
W' =
(a'X)(a'a)
=
,
(2.8)
I(Xi - X):
[% (a. - a)(X - X)]2
^
=
I(a^ - a)^'I(X^ - X)2
or
W' =
,
since
= 0 for the normal distribution.
(2.9)
Noting that
, the
Shapiro-Francia statistic can also be written as
h
n
°ln(Xn-i+1 - Xi]=/I(Xi - X): '
i=1 in n J. I
1
1
(2-10)
where h denotes n/2 or (n-1)/2 according to whether n is odd or even.
11
Percentiles of the Shaplro-Francia statistic for sample sizes n =
50(1)99 can be found in Shapiro (1980) or Shapiro and Brain (1982).
The
percentiles of the distributions of W and W were found to be very
similar by Weisberg (1974).
Weisberg pointed out that the use of the
tabulated percentiles of W for the distribution of W will result in
only a small loss of accuracy, often giving a slightly conservative
test.
The Shapiro-Wilk and the Shapiro-Francia statistics were shown by
Sarkadi (1975) to have the same asymptotic distribution under the null
hypothesis of normality.
Sarkadi also showed that both the Shapiro-Wilk
and Shapiro-Francia statistics provide consistent tests of fit.
In
fact, the consistency of the test statistics holds if any distribution
with a finite variance replaces the normal distribution as the null
hypothesis.
Weisberg and Bingham (1975) modified the Shapiro-Francia statistic
slightly by replacing the expected values of the order statistic by a
simple approximation m = (m^.,m^). This modified statistic is
given by
^
(m'X)^/(m'm)
W =
:
ICX; - X):
or
*
W =
,
(2.11)
[I(m. - m)(X. - X)]^
^
— .
I(nL - m):.%(X^ - X):
(2.12)
where m^ = $ ^[(i-0.375)/(n+0.25)] and $ ^(.) is the inverse standard
normal distribution function.
Note that (i-0.375)/(n+0.25) is the
12
plotting position suggested by Blom (1958).
They showed empirically
*
that the distribution function of W and W
are essentially identical.
*
The advantage of the statistic W
over W and W is that no storage of
constants is required if a routine for the inverse of the standard
normal distribution function is available, as is common on most computer
systems.
An algorithm for computing the inverse standard normal
distribution function $ («) is Algorithm AS111 developed by Beasley and
Springer (1977).
Royston (1982a) obtained an approximate normalizing transformation
for the Shapiro-Wilk statistic using extensive Monte Carlo simulations
for sample sizes n = 7 to 2000.
The exact covariance matrix was not
used in computing the Shapiro-Wilk statistic.
due to Shapiro and Wilk (1965) was used.
Instead, an approximation
Six thousand Shapiro-Wilk
statistics for each sample size n = 7(1)30(5)100, 125, 150, 150,
200(100)600, 750, 1000, 1250 and 2000 were simulated.
Royston (1982b)
wrote two FORTRAN programs that compute the expected values of the
normal order statistics in exact or approximate form.
The expected
values of the normal order statistics are based on a formula suggested
by Blom (1958, pp. 69-71).
An approximation for the coefficients used
in the Shapiro-Wilk statistic which does require
be computed.
to be known can thus
The practical significance of these two papers is that the
W test of normality can now be programmed on a computer for sample sizes
up to 2000 without storing tables of percentiles and coefficients.
The probability plot correlation coefficient test r was presented
by Filliben (1975).
It is essentially the Pearson correlation
13
coefficient between the ordered observations and their medians.
The
Filliben statistic measures the linearity of a normal Q-Q probability
plot.
It rejects the null hypothesis of normality for small values of r
since small values indicate non-linearity of the normal Q-Q probability
plot.
The formula for r is
r =
where
I(X. - X)(M. - M)
^
— ,
/I(X. - M):
(2.13)
is the median of the i^^ order statistic from the standard
normal distribution.
The median of the i^^ order statistic from a
standard normal distribution is exactly related to the median of the i^^
order statistic
—1
from a uniform distribution on [0,1] by
-]
where $ («) is the inverse normal distribution function.
= $ (pu)
The
approximate median of the iorder statistic from the uniform [0,1]
distribution is given by
p. =
1 - Pn
, i = 1
(i - 0.3175)/(n + 0.365)
, i = 2,3
, i = n .
n -1
(2.14)
The table of smoothed Monte Carlo percentiles for 0.005, 0.01,
0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99 and 0.995
levels can be found in Filliben (1975).
This table was based on
extensive Monte Carlo simulations for sample size n=3(1)50(5)100.
100000 samples were generated for n g 10 and 100000/n samples were
generated for n > 10.
Normal random variates from the Rand Tables (Rand
14
Corporation, 1955) were used.
Checks on the accuracy of the empirical
percentiles were provided by comparing the empirical mean and standard
deviation with the theoretical mean and standard deviation of r for each
sample size, and by comparing empirical and theoretical percentiles of r
for n = 3.
The Monte Carlo power comparison presented by Filliben
(1975) indicated that the Filliben statistic and the Shapiro-Francia
statistic have similar power properties.
Looney and Gulledge (1985) investigated the power of various
versions of the Shapiro-Francia statistic corresponding to different
approximations of the mean vector a of the ordered statistics from the
normal distribution.
They concluded that the Francia-Shapiro statistic
using Blom's formula as suggested by Weisberg and Bingham (1975), has
slightly better power over the other versions of the Shapiro-Francia
statistic.
Filliben called r the probability plot correlation
coefficient but the plotting positions suggested by Filliben are seldom
used by practitioners.
Motivated by the better power of the
Weisberg-Bingham statistic (Shapiro-Francia statistic using Blom's
formula) and the increasing acceptance of the Blom's plotting positions,
Looney and Gulledge (1985) generated a table of percentiles for
n=3(1)50(5)100 for the statistic at the 0.005, 0.01, 0.025, 0.5. 0.1,
0.25, 0.5 0.75, 0.9, 0.95, 0.975 ,0.99 and 0.995 levels.
Twelve sets of
10000 samples were generated for each of the sample sizes
n=3(1)50(5)100.
The percentiles were smoothed by taking the average
over all 12 samples for a particular sample size.
The uniform random
numbers were generated by the algorithm developed by Wichmann and Hill
15
(1982a) and the normal random numbers were generated by the GRAND
generator (Brent, 1974).
Looney and Gulledge duplicated some of the
work by Weisberg and Bingham (1975) since Weisberg and Bingham showed
*
the distributions of W
and the Shapiro-Francia statistics are nearly
the same and the table of percentiles for the Shapiro-Francia statistic
has already been tabulated.
In much the same spirit as the Shapiro-Wilk and the Shapiro-Francia
statistics, which attempt to measure the linearity of a normal Q-Q
probability plot, a statistic based on the Pearson correlation
coefficient of points on a P-P probability plot is proposed.
This
statistic is given as
k^ =
where
[I(z. - z)(p - p)]=
—:
:—
I(z. - z)='I(p. - p)=
,
(2.15)
= F[(x^^j - a)/B ], the p^'s are plotting positions, and a and
g are the location and scale parameters of the distribution function
F(').
The estimators a and g are taken to be the maximum likelihood
estimators unless otherwise specified.
The maximum likelihood estimators for the normal distribution are
given by
n
ot = ilx. )/n ,
i=r
(2.16)
n
B = [%(x
i=1
(2.17)
and
- a):]/n .
16
The unbiased estimator for g, ng/(n-1) is used instead of g, and shall
be denoted by g from here on.
The maximum likelihood estimators for the exponential distribution
are given by
a = min x^,
(2,18)
and
n
g = (lx.)/n - min x.
i=l
.
(2.19)
The distribution function of the Gumbel distribution is given as
F(x) = exp[-exp[(x-a)/g] ,
(2.20)
and the maximum likelihood estimators a and g can be obtained by solving
the maximum likelihood equations:
n
I exp[-(x. - a)/g] = n ,
i=1
(2.21)
n
I (x. - a){1 - exp[-(x
i=1
(2.22)
- a)/g]} = ng .
Combining the two likelihood equations, an equation involving only g can
be written as
g = (Ix^)/n - [)]x^exp(-x^/g)]Clexp(-x^/g)] ^ .
(2.23)
The maximum likelihood estimator g can thus be computed easily using the
bisection method or Newton's method.
17
A question then exists as to which plotting position to use.
The
plotting position p^ = i/(n+1) is chosen because of the theoretical
property:
E{F[(X^i) - ct)/6]} = i/(n+1) .
(2.24)
Obviously, one can perform a Monte Carlo power study to investigate
if there is any difference between various plotting positions.
The
results of Looney and Gulledge (1985) indicated that differences are
small and appear only for small sample sizes.
Barnett (1975) showed
that the choice of plotting positions can make a difference when the
object is precise estimation of a and g.
For most practical purposes,
it does not matter which plotting position to use.
It is of interest to see how this new statistic performs relative
to its counterpart, r^ based on the Q-Q probability plot.
Thus, the r^
statistic, using the plotting position p^ = i/(n+1) is also studied.
The formula for the r^ statistic is
LliX. - X)(M
- M)]:
r== =
,
(2.25)
I(X. - X):.I(M. - M)2
where
= F[i/(n+1)] and F is either the distribution function of the
standard normal, Gumbel or exponential distribution in this study.
A
similar test for normality presented in Johnson and Wichern (1982, pp.
155,156) is a Shapiro-Francia test based on the plotting position p^ =
(i - 0.5)/n.
18
B.
Chi-square and Likelihood Ratio Statistics
The chi-square statistic was first introduced by Karl Pearson
(1900).
The simplicity of the chi-square statistic and the intuitively
sound logic behind it have made the chi-square statistic one of the most
widely used tools in statistics.
Since its introduction in 1900, the
chi-square statistic has generated a tremendous amount of interest in
the problem of assessing the fit of probability models to sets of
observed data.
The fascinating idea of measuring the "goodness of fit"
of a distribution to a data set using the squares of the differences
between observed and expected counts was a great catalyst to the
development of many statistical concepts including tests of hypotheses.
The Pearson chi-square statistic can be written as
k
.% (0
i= 1
- E )VE ,
(2.25)
and the likelihood ratio statistic is given by
k
0= = 2 I 0 log(0./E.),
i=1
where 0^ and
(2.27)
are the observed and expected cell counts, respectively,
and k is the number of cells.
A great deal of research had been done on the chi-square statistic.
Extensive lists of references can be found in Cochran (1952), Lancaster
(1969) and Hutchinson (1979).
This can be atttributed to the
flexibility in the use of the chi-square statistic.
There are many
19
issues involved in use of the Pearson chi-square statistic to test the
goodness of fit of a distribution to a data set.
Some of these issues
are listed below.
[1]
How many cells or intervals should be formed?
[2]
How should the cells be formed?
[3]
Should the cells be random or predetermined?
[4] What are the consequences of using different methods to
estimate the expected cell frequencies?
[5]
What are the consequences of using different methods of
estimation for unknown parameters?
[5] Is the chi-square test unbiased?
[7]
How is power affected by small cell frequencies?
[8]
How does the discreteness of the chi-square statistic
affect the chi-square approximation?
The Pearson chi-square statistic was first proposed for testing of
goodness of fit of a known distribution with a set of predetermined or
fixed cells.
This was later extended to the more practical case of the
composite null hypotheses for which the data were sampled from some
member of a parametric family F(x;9) of distributions.
The fundamental
theorem of the Pearson chi-square testing procedure, that the Pearson
chi-square statistic is asymptotically distributed as a chi-square
random variable with degrees of freedom equal to the number of cells
less one less the number of parameters estimated, was established by
Fisher (1924).
A more rigorous proof with a set of regularity
conditions was given by Cramer (1946, pp. 477-479).
20
It was observed by Fisher (1924) that if the estimators for the
unknown parameters did not have the same efficiencies as the maximum
likelihood estimators based on the observed cell counts, then the
chi-square statistic would not have a limiting chi-square distribution.
Chernoff and Lehmann (1954) gave a precise solution to this problem.
They considered the case where the cells were predetermined and maximum
likelihood estimators based on the original (or ungrouped) data were
used, and they showed that the Pearson chi-square statistic is
asymptotically distributed as a linear combination of chi-square random
variables:
*^-<1-1 * 1,z; *
" ••• *
(2.28)
where the Z^'s are independent normal random variables, the coefficients
are contrained by 0 ^ A S 1 and may depend on the q unknown parameters,
and k is the number of cells.
This shows explicitly that the chi-square
statistic is stochastically larger than a chi-square random variable
with k-q-1 degrees of freedom.
The practical significance of Chernoff
and Lehmann's result is limited since the asymptotic distribution of the
Pearson chi-square statistic depends on the unknown parameters.
In an attempt to more closely model the procedure followed by
researchers, Roy (1956) and Watson (1957, 1958 and 1959) investigated
the case where the cell boundaries are determined from the maximum
likelihood estimators of the unknown parameters, based on the original
data.
The number of classes and the desired cell probabilities are
predetermined.
The cell boundaries vary with the composition of the
21
sample, and the cells are commonly referred to as random cells.
Roy and
Watson showed that the asymptotic distribution of the chi-square
statistic is a linear combination of independent chi-square random
variables of the form given by (2.28).
They also showed that if the
cells are chosen in a proper manner and if F(x;8) is a location-scale
parametric family of distributions, then the asymptotic distribution of
the chi-square statistic does not depend on the unknown parameters.
Some notations will be introduced to facilitate discussion.
This
notation will be applied throughout the whole dissertation unless
otherwise noted.
Let X be a continuous univariate random variable with a
distribution function F(x;0) where 9 is a column vector with q
components, that is
e'=(e^ ,02,...,0q).
Let X^,X2,
(2.29)
,X^ be a random sample of size n from this distribution.
The Pearson chi-square statistic can be written as
k (n. - np ):
X: = I
— ,
i=1
np.
(2.30)
where there are k cells, p^ is the true probability that the random
variable X will fall in the i^^ cell, n^ is the observed frequency or
count in the i
th
cell and p^ denotes an estimator of the true
probability p^.
The following theorem due to Roy (1956) will be stated without
22
proof.
This theorem is of important practical significance for applying
the chi-square test of goodness of fit when the cell boundaries are
constructed using the maximum likelihood estimator of the true
parameters based on the original or ungrouped data.
Theorem 2.1 (Roy, 1956)
(i) Let f(x;8) be the density function of X , where the parameter 9
is a column vector with q components, that is
8 ' = ( 8 , ,82,...,8 ).
(2.31)
and assume that f(x;0) is continuous in x and differentiable in 6.
(ii) Let the qxl vector 0 be an estimator of 9 based on the
original data with the property that there exist functions g^(x)
(i=1,2,...,q), which may depend on 6 such that
1
0 -0 =
n
I g(X.) + e,
(2.32)
1= 1
where g(«) and e are qxl vectors, E{g(X^)} = 0, var{g(X^)} is finite and
/ne^
> 0 in probability.
(ill) Let the range of X, namely (-",=») be partitioned into k, k>q,
mutually exclusively cells C^(0) (i=1,2,...,k) depending on 9 such that
C^(6) is the half open interval
w._^(0) < X < w.(0),
(2.33)
where w^ is a function of 0 with continuous partial derivatives.
23
(iv)
Let
= number of X^'s falling in C^(0),
p^(e)
= F(w^(e);e)
- F( w^_^(0);9),
(2.34)
where F(*;0) is the cumulative distribution function of the random
variable X.
Under the above regularity conditions, the asymptotic distribution
of the Pearson chi-square statistic is that of
(2.35)
where
,Z^,...,Z^ are mutually independent standard normal random
variates.
The coefficients
the matrix
are the characteristic roots of
^ described as follows:
(2.36)
J = Dp - pp' - UW - WU' + UGU',
where
p- 0
D
0
=
... 0
Pg . . . 0
(2.37)
P
0
0
... p.
p' = (p^,p2,...,p^), U' = (U^,U2,...,U^), W = (W^.W^,...,#^),
w (0)
U. = /
w. .(6)
1-1
ôf(x;0)
60
w (0)
dx, W. = /
w, . ( 0 )
g(x)f(x;0) dx,
1-1
and G = E(g(X)g(X)'} is the covariance matrix of g(X).
(2.38)
Here U^ and
24
are qxl vectors.
Watson (1957) derived similar results and showed that if e is the
maximum likelihood estimator of 0, then the asymptotic distribution of
the Pearson chi-square statistic is that of;
Xk-q-1 + A,:' + AgZ: + . - + Aqz; ,
where
(2.39)
are in the interval (0,1) and may depend on the true
parameter 0.
For the case where the distribution function is a member
of a parametric family of location-scale distributions, Roy (1956)
showed that the asymptotic distribution of the Pearson chi-square
statistic does not depend on the location parameter a and the scale
parameter g provided that the estimators a and B used in the chi-square
statistic are the maximum likelihood estimators and the cell boundaries
are of the form a + c\g, where the c^'s are some specified constants.
Dahiya and Gurland (1972) proved a similar theorem where the
chi-square statistic follows the asymptotic distribution of (2.39) when
the location and scale parameters are estimated by the sample mean and
standard deviation respectively.
Of course, these two results coincide
for the normal distribution.
Following the test procedure suggested by Watson and Roy, the
asymptotic distribution of the chi-square statistic is of the form:
X%_1 + AiZ; +
when the normal distribution is tested.
(2.40)
Explicit expressions for the
A's were derived by Watson (1957, 1958) for various distributions.
Specifically, for the normal distribution.
25
k
k
A . - 1 - k I ? : ( ! ) , A _ = 1 - - T T I Y ? (i),
'
i=1
(2.41 )
i=1
where
4(x) = 7:;
and that w^'s are the cell boundaries for the standard normal
ditribution.
Watson (1957) computed
and
for the number of cells k
from 2 to 10 and noted that each \ value decreases to 0 as k increases.
He suggested using at least 10 cells so that the contribution from the
terms
and
is negligible.
Watson also required that none of
the cells have small expected cell frequencies in order to avoid
deficiencies in the asymptotic theory due to small sample effects.
Dahiya and Gurland (1972) provided a straight forward solution to
the problem of contribution due to the A's.
the distribution of (2.40) by the
Instead of approximating
distribution with k-3 degrees of
freedom, Dahiya and Gurland computed a table of percentiles for the
distribution of (2.40) using Laguerrian expansions (Gurland, 1955 and
1955, and Kotz, Johnson and Boyd, 1967) for a weighted sum of
independent
random variables.
This table of percentiles is presented
as Table 2.1.
In this table, d,
and d, _
are defined as
k,a
k-3,a
*
+ AiZ! + AgZl ^
and
= Ct,
(2.42)
26
'V3,«' • «•
*
Table 2.1.
Critical points d, , d, _
and the corresponding values
k,a
k-S.ot
of
and
a = 0.10
for normal null distributions
a = 0.05
*
*
d,
k,a
k
3
4
5
6
7
8
9
10
11
12
13
14
15
d,
k-3,a
3.928
5.442
6.905
8.322
9.703
11.055
12.384
13.694
14.988
16.267
17.535
18.792
d,
k,a
d, ^
k-3,a
3.248
2.371
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.352
14.684
15.987
17.275
18.549
a = 0.01
5.107
6.844
8.479
10.038
11.543
13.007
14.438
15.843
17.226
18.589
19.937
21.270
3.841
5.991
7.815
9.488
11.070
12.592
14.067
15.507
15.917
18.307
19.675
21.026
*
d,
k,a
5.418
7.917
10.075
12.021
13.837
15.567
d, ^
k-3,a
6.635
9.210
11.341
13.277
15.086
15.812
18.475
19.090
21.666
23.209
17.234
18.852
20.431
21.977
23.495
24.990 2 4 . 7 2 5
25.464 25.217
X,
1
2
0.207
0.139
0. 7 7 9
0. 6 3 3
0 . 1 0 3 0.532
0.081 0.459
0.066 0.404
0.055 0.361
0.047 0. 3 2 6
0.041 0.298
0.036 0. 2 7 4
0.032 0. 2 5 4
0.029 0.2 3 6
0.026 0.221
0.024 0.208
The case in which the cell boundaries1 are simple quantiles was
studied by Witting (1959) and was investigated further by Bofinger
(1973).
A basic technique in deriving the asymptotic distribution of
the chi-square statistic when the cells are random, is to show that the
difference between the fixed cell and the random cell chi-square
statistics converges to zero in distribution.
This technique was first
employed by Roy (1956) and later used by Moore (1971) and Chibisov
(1971).
The asymptotic distribution of the random cell version of the
Pearson chi-square statistic, was obtained by Chibisov, under the null
hypothesis and also under sequences of Pitman alternatives.
The
multivariate version of the random cell chi-square statistic was studied
27
by Moore (1970, 1971).
Moore and Spruill (1975) presented a unified
large-sample theory of general chi-square tests of fit under composite
hypotheses and Pitman alternatives.
Wald's method (1943) of
constructing test statistics having chi-square limiting distributions
from estimators having nonsingular multivariate normal limiting
distributions, was generalized by Moore (1977) to the case where the
estimators have singular multivariate normal limiting distributions.
This generalized Wald's method was then use to construct chi-square type
statistics having a chi-square limiting null distribution, for the case
when unknown parameters have to be estimated.
The methods of proof
discussed by the above authors fail if the number of cells increases
with the number of observations at a rate faster than 0(/n). Thus, the
case where the number of cells and the number of observations increase
at the same rate, is beyond the framework of their proof.
Several authors have proposed modified or nonstandard chi-square
statistics.
Kambhampati (1971) proposed a quadratic form of the
observed minus the expected cell frequencies.
The asymptotic
distribution of his statistic is chi-square when the maximum likelihood
estimator based on the ungrouped data is used.
Rao and Robson (1974)
constructed a modified chi-square statistic based on the quadratic form
of the asymptotic multinormal conditional distribution of the cell
frequencies given the parameter estimates.
It is simply the chi-square
statistic with an-extra term added on to it.
They showed by simulation
that the distribution of their statistic agrees with the chi-square
distribution with degrees of freedom one less than the number of cells
28
after grouping, regardless of the number of parameters estimated.
They
also provided results from a small Monte Carlo power comparison of their
statistic with both the fixed cell and random cell chi-square
statistics.
The Rao-Robson statistic showed a slight improvement over
the two other chi-square statistics.
Cressie and Read (1984) investigated the family
\ e R} of
"power divergence" statistics
2nl^ =
A (A + 1)
I 0.{(0./E.)^ - 1}, A e R,
^
^ ^
(2.44)
Pearson's chi-square (A = 1) and likelihood ratio (A = 0) statistics are
special cases of the power divergence statistics.
Based on power
consideration of the power divergence statistics, they recommended 2nl^
for A E [0,3/2] to be used when no knowledge of alternative
distributions is known.
Note that the Pearson chi-square and likelihood
ratio statistics are among the set of statistics recommended.
The
small-sample properties of these statistics was investigated by Read
(1984).
He also gave similar recommendations about the choice of A for
several classes of alternatives.
Traditional discussions of the limiting distribution of the Pearson
chi-square and the likelihood ratio statistics for the goodness of fit
problem are based on the assumption that all the expected cell
frequencies become large as the sample size is increased.
Slakter
(1966), Roscoe and Byars (1971), and Tate and Hyer (1973) studied the
inaccuracy of the Pearson goodness of fit test when expected cell
frequencies are small.
In order not to violate the assumptions of the
traditional chi-square test of goodnees of fit, cells are often
collapsed to avoid small expected cell frequencies.
Otherwise, one
might feel rather uncomfortable with the large sample chi-square
approximation for the null distribution of the test statistic.
However,
information may be lost when cells are collapsed, and the choice of
cells to be collapsed introduces a certain degree of subjectivity into
the test.
Hoist (1972) expressed the view that it is rather unnatural
to keep the number of cells fixed when the sample size Increases, for
the classical goodness of fit problem.
The asymptotic distribution of the Pearson chi-square and
likelihood ratio statistics for the goodness of fit problem when the
number of cells increases as the sample size increases, was studied by
Steck (1957), Hoist (1972, 1976), Morris (1975) and Medvedev (1977a and
1977b).
They considered the case of testing a simple hypothesis and
each gave similar, but not identical sets of conditions for the
asymptotic normality of the goodness of fit statistics.
Hoist and
Medvedev used complex analysis to derive the asymptotic normal theory of
the goodness of fit statistics based on the convergence of a sequence of
characteristic functions.
Morris extended a conditioning argument of
Steck (1957) to obtain a central limit theorem for sums of functions of
multinomial counts, and used the result to obtain the limiting
distribution of the Pearson and likelihood ratio test statistics for
sparse data sets.
Certain results from Hoist (1972) and Morris (1975)
will be given here without proof.
Further notation will be introduced
30
as needed.
Theorem 2.2 (Hoist, 1972)
Let
= (N^^,
- Multinomial(p^,k,n^) i.e.,
T
"ik"'""kk'
""''kk'''
where
k = number of cells,
n^^ = sample size or the number of observations,
" (Pik' P2k'* * •' Pkk^'"
Sometimes the subscript for n^^ is suppressed to facilitate the
presentation of formulas.
Let the real measurable function f|^(v,x) be
defined for v = 0,1,2,...
and 0 ^ x S 1.
Let
1= I
Let
« Poisson(np^j^) for i=1,2,...k.
P(X,,. =
ik
X,, ) =
ik
(np^^)!
i.e.,
31
Set
k
u = I E{f,(X ,i/k)},
" i=1
(2.45)
k
[ I cov{X. ,f|^(X. ,i/k)},
°n
var{fk(X.,i/k)}
^
1=1
If
n and k
kp^^ â
>
so that n/k
> a (0 < a < »);
C < ®, for some real number C and all k and i;
jf|^(v,x)j è a*exp(bv) for some real numbers a and b; and
0 < lim inf a^/n â Ixm sup a^/n < » ,
n —>=
n —>=
then
is asymptotically N(y^,ff^) when n
(2.46)
> ®.
Theorem 2.3 (Asymptotic normality of Pearson's chi-square statistic,
Morris, 1975)
Let
Nj^ =
_ Multinomial(p^,k,n^),
where
k = number of cells,
nj^ = sample size or the number of observations,
^ k " ^^Ik' ^ 2 k ' ' P k k ^ '
32
k
> 0 , 1 p?j^
Let {p°j^ : l^iSk} be given with
max p.. =
1<i<k
0 (1)
as k
>
and that there exists e > 0 such that
n^p^^ i e for all i, k.
Denote
? "ik
,
u,, = L ZÔ— + n.
i=lPik
|! <Pik - Pik''
2
—
'^i=1
Pik
Ik
= 2
''ik
Pik
Pik
"k^njp?
k^ik
+ 2
Pik
Pik
- Tv)=P
ik
and
= I Oiki=1
Suppose the condition
max 2
iSi^k^ik
= o( 1 )
as k — >
holds.
Then,
1
k (Nik - Vik)'
5
y.}
1=1
"kPik
— { I
s
L
>
N(0,1).
as k
>
33
Define
and
»lk • ""p?/"' - °K-
(2-53)
Then a . ^ is asymptotically of the exact order of
k + ko: + "k I G^kPik'
(2-54)
1=1
and condition (2.50) is equivalent to the condition that
max ne:.Pi
^
k
33
)CO.
(2.55:
" * ""k" * "k.^lk^iK
1=1
When the "null hypothesis p^^^ = p?^" is true for every i, condition
(2.55) is trivially met and so (2.51) holds provided only that (2.47)
and (2.49) are valid.
The Morris conditions bound all expected cell frequencies away from
zero and do not allow any cell probability to remain bounded away from
zero as the sample size and the number of cells increase.
In contrast,
Hoist's conditions do not require all expected cell counts to be bounded
away from zero but requires the cell probabilities to be less than c/k
for all cells and some c.
The conditions of these theorems dictate
certain ways of refining the partitions as the sample size increases, to
ensure convergence in distribution to a normal distribution.
The
accuracy of these normal approximations for the null distribution of the
Pearson chi-square and likelihood ratio statistics was investigated by
Koehler and Larntz (1980).
One controversial issue concerning the use of the chi-square
statistic is the choice of the cell probabilities.
In regard to this
issue, Mann and Wald (1942) showed that the equiprobable chi-square test
is locally unbiased.
The equiprobable chi-square test was later shown
to be strictly unbiased by Cohen and Sackrowitz (1975) and Sinha (1976).
However, Rayner and Best (1982) demonstrated the existence of unbiased
chi-square tests with unequal cell probabilities.
The other rationale
behind using the equiprobable chi-square statistic is the fact that
strikingly different outcomes could be reached by using different
configurations of intervals with unequal probabilities, as pointed out
by Gumbel (1943).
A further attractive feature of the equiprobable
chi-square test is that Roscoe and Byars (1971), Smith et al. (1979) and
others have shown that the chi-square approximation to the null
distribution of
is more accurate than for cases with unequal cell
probabilities.
The special case of Morris's theorem with equal cell probabilities
will be stated:
Theorem 2.4 (Asymptotic normality of Pearson's chi-square statistic
for the null hypothesis of equal probabilities)
Let
^k =
Ngk»..., N^^) _ Multinomial(p^,k,n^),
35
where
k = number of cells,
= sample size or the number of observations,
Pk " (Pik' ^2k'"" Pkk^'
Let (p?^ : lâi^k} be given with
max
p.,
=0(1)
as k
= 1/k.
Suppose
(2.56)
> ®
ISlSk
and that there exists e > 0 such that
n^p^^ k t for all i, k.
(2.57)
Denote
= k + n^k % (p.^ - 1/k): ^
(2.58)
i=l
°?k -
"îk'^Plk'
1= 1
and
^k' * I ''ikSuppose the condition.
max 2
iSiSk^ik
= 0(1 )
as k
> "
(2.59)
holds.
Then,
j_i ;
1-1
" V"'. I
VK
"
M(o,,) ,
as k
>
(2.60)
Suppose the null hypothesis "Pj^|^ = P?|^ = 1 /k" for every i, then
= k,
= 2 and
= 2k.
The condition (2.59) is thus satisfied.
Morris's theorem of the asymptotic normality of the chi-square statistic
was proved for the case of a simple null hypothesis and also certain
classes of alternatives satisfying the conditions stated in the theorem.
This theorem will be extended to the case of a composite null hypotheses
for which the hypothesized distribution is a member of a parametric
family of distributions F(*;a), where a denotes the location parameter.
A conditional approach developed by Fligner and Hettmansperger
(1979) will be used.
This method of proof is based on some theorems on
the convergence of a sequence of joint distributions due to Sethuraman
(1961).
A special case of some very general theorems contained in
Sethuramen can be found in Fligner and Hettmansperger (1979) and will be
stated here.
Some definitions and theorems concerning strong and weak
convergence of probability measures will be introduced here.
Definition 2.1 (Strong convergence of probability measures)
Let p^, Pg,.... be a sequence of probability measures defined on a
measurable space (n,F).
converges strongly to p if p^(A) converges
to p(A) for each AeF.
Theorem 2.5 (Strong convergence of probability measures (Halmos, 1950))
Let p^, Pg,
be a sequence of probability measures defined on a
measurable space (0,F).
p^ converges strongly to p if and only if Tgdp^
converges to /gdp for all bounded measurable functions g on fi.
37
Theorem 2.6 (Strong convergence of probability measures (Scheffê, 1947))
Let p^, Pg,.... be a sequence of probability measures defined on a
measurable space (0,F).
If the density f^(') of p^ with respect to some
finite measure p^, converges in measure [p^] to a density f(») then
there is a measure p such that p^ converges strongly to p.
Definition 2.2 (Weak convergence of probability measures)
Let p^, Pg,.... be a sequence of probability measures defined on a
measurable space (Q,F).
p^ converges weakly to p if and only if /gdp^
converges to /gdp for all bounded continuous functions g on Q.
Theorem 2.7 (Sethuraman, 1951)
Suppose (X|^, Y|^) is a sequence of random vectors such that the
conditional distribution of
given
= c converges weakly to a normal
distribution for which the limiting conditional mean is a linear
function of c and the limiting conditional variance does not depend on
c.
If the marginal distribution of
converges strongly to a normal
distribution, then the joint distribution of (X^, Yj^) converges weakly
to a bivariate normal distribution.
A theorem concerning the asymptotic normal theory of the Pearson
chi-square statistic, where the number of cells is allowed to increase
as the sample size increases and an unknown location parameter has to be
estimated, will now be proved.
38
Theorem 2.8 (Asymptotic distribution of the Pearson chi-square
statistic when the location parameter is estimated from
the data via the sample median)
Let
k = number of cells,
= sample size or the number of observations,
Let X ,X„,...,X
be a random sample from a continuous distribution
"k
with distribution function F(x:a) and density function f(x;a), where a
is the location parameter.
Let the sample median 0 be the estimate of the population median 6.
Note that 0 is based on the ungrouped data.
Let the location parameter
a be estimated via the sample median by solving the equation:
F(0;a) = 1/2.
Let the cells be constructed as follows
F(Wik;a) - F(w\_^
= 1/k
for i=1,2,...,k,
(2.61)
where w. , , and w. , are the cell boundaries of the i^^ cell.
1-1,k
i,k
Let
be the resulting multinomial:
"^k " ^^1k' Ngk'"'"' ^kk^ ~ Multinomial(p^,k,n^)
Pk = (p^^, Pg^,'"',
is the vector of true random cell
probabilities.
Assume n^/k converges to a constant X > 0.
Denote
k
= k + n^k I (p.j^ - 1/k):
i=1
,
(2.62)
39
°ik " ZkZpïk +
-.1 PÏki'Pik ,
1=1
and
=5 - j / . ;
Then,
k (N.
- n./k)=
L
-V
1=1
->
«"•" •
k
as k
> "
Also,
1
k (N
- U,
Sk
1=1
- n /k):
TTk
L
::k)
—>
'
k
as k
> =» .
Proof
Without loss of generality, assume n^^ is an even integer, k is an
even integer, and let m = n^/2.
Let 0 be the sample median and 0 be the population median.
Let U^,U2,...,U^ be those observations less than 0 and
be those observations greater than 0.
Given 0=0+ c//n^, and noting that the sample median is
asymptotically distributed as a normal random variable with mean equal
to the population median and variance equal to 1/{4n[f(0)]^}, the
following facts concerning the conditional distributions of the U's and
V's follow and are stated without proof:
40
(1)
.,U^ are independently and identically distributed random
variables with distribution function:
f
F(t;a)
, t <
F(e;a)
(2.63)
Fy(t)
, t à e
(2)
independently and identically distributed random
variables with distribution function:
^
F(t;ot) -
F (0 ;a)
, t >
1 - F(e ;a )
(2.64)
, t <
(3)
U's and V's are mutually stochastically independent.
Given 6=9+ c//n^, where c is some constant, let the estimator a
of a be obtained by solving the equation:
(2.65)
F(6,a) = 1/2.
Let the cell boundaries
F(Wik,a) = i/k
i=0,1,...,k be constructed as follows;
, i=0,1,...,k.
(2.66)
Note that 9 = w.
k/2,k*
Consider any cell ("i-i,such that w.^^ g q and let pj^^ be the
true probability of U's falling into the cell ("i-ipk'^i.k^'
41
(2.67)
F(e ;a)
Similarly, consider any cell (w._, . ,w. ) such that 9 Û w
x^ljK IjK
lljK
let
be the true probability
of the V's falling into the cell
i
V
^
F(w.^^;a) - F(w._^^^;a)
(2.68)
1 - F(9;a)
Condition on 0 = 6 + c//n^, and let
be the true conditional
cell probability associated with the cell (w\_^ k'^i k^'
0-5 Pik
' "ik 3
(2.69)
Pik "
0-5 Pik
' ® ° "i-l,k
Let
k
^k =
/2k
"ikCp.,) -
I
i=1
(2.70)
Hk/k
where N., ,
> denotes the cell count of i^^ cell and p., is the
ik(p.j^)
Ik
associated true cell probability,
is attached to
so that the
notation will be more precise when the conditional version of
given later.
Condition on 0 = 6 + c//n^, where c is some constant, then
is
42
n^/k]:
k
I •
i=1
1
9 +c//n^ =
/2k
n^/k
(2.71)
where
= k + "kk.l^fplk -
'
1=1
and N. , + . denotes the cell count of the
cell and p'î is the
ikip.k;
.
+
associated cell probability, given that 8 = 8 + c//n^. Let
denotes
the conditional distribution of
given 8=0 +c//n^, for some constant
c.
Note that (N. , + .,..,N . + .) is not a multinomial random vector
^Pik
^ik
since the sum of the probabilities of the first half or the second half
of the cells is equal to one half.
However, using (1), (2) and (3), y|^
can be written as the sum of two independent random variables.
,v+
/2F
/2k
Y. =
s" .
*
(2.72)
+
k(Pik)
where
,u+
+ G//n^,
1
k/2
'"T'T''
I
(2.73)
1=1
2
2
k
k(Pik)
43
n
k/Z
(2.74)
k/2
1
2P,.
ik
+ 2
^«P.j = Î '
ik
i=l
rij^
2
(2.75)
' 2Plk '
2
2
k
k
2p
'^ -F - ^k)'- ^Pik '
ik(Pik)
(
k
(2.76)
and
k/2
o"
2 _ y u
k(Pik) "i:l
2
(2.77)
ik'
Similarly, define
k
•«"ik'
i=k/2+1
kL
n
2
y
^^Pik^ i=k/2+1
/
,
2
k
(2.78)
k(Pik)
( 2 p , k - 2 / k ) : ,
k
(2.79)
i=k/2+1
2Pik
— + " —T-)"Pik '
"k
2
2
I
2
1
^
\
k
—
yV
2
(2.80)
44
2P,.
+ 2
ik (Pik)
2
2
n,
2
2
Ik
(
k
k
(2.81)
and
.V
2 =
y
k(Pik)
i=k/2+1
V
2
(2.82)
Note that
(2.83)
and
Let n" and
k
greater than
e
k
be vectors of cell counts for observations less and
+
respectively.
Also, note that p
j^), given 0 = 6 + c//n^.
probability of the cell
O'SPÏk
O'SPlk
is the true cell
'"ikS
(2.84)
s w.
i-1,k
Thus,
nJ^ ~ MultinomiaKpj^, k/2, n ^ / 2 ) ,
and
~ Multinomial(p^, k/2, n ^/2).
where
'
,u
k " ^^^1k'-'"^Pk/2,k^ '
45
K-
"k.ktZp;
•
Pr " (^^k/2+1,k'*••'^^k.k^ •
Let 6 e (0,1/2),
^6k,k
^
O'SPgk.k
1/k
(2.85)
1/k
[F("6k,k:*) - F("6k-l,k:=)
2F(0;a)
F("6k,k:*) - F("5k-i,k:*)
\
f("6k,k:G)["5k.k - "gk-l.k^
2F(e,a)
f(Wgk,k:G)["ak,k ~ "6k-1,k^
-> 1 as k
> m.
Similarly, let 6 e (1/2,1), then
(2.86)
1/k
1/k
[f("6k,k:*) - F("6k-1,k:*)
2(1 - F(6;a))
f("6k,k:*) - f("6k-i,k:*)
1
"6k-1 ,k^
2F(e,a)
-> 1 as k
«"sk.k'-'tV.k - "Sk-I.k^
>
46
Using Theorem 2.3,
yJJ""
(2.87)
> N(0,1) ,
and
> N(0,1) .
Note that
k(Pik)
2ir
> 1 //2 ,
(2.88)
-> 1 //2
"TIF
Consequently,
/2k
,v+
'k(Plk)
-> N(0,1 ) as k
Let
>
(2.89)
= /n^(8 - 0) and note that
%k = %k I Xk = °
(2.90)
Applying the central limit theorem to the estimator of the median
(Mood, Graybill and Boas, 1974),
47
strongly
->
N(0,
-)
(2.91)
.
itf(0;a)'
Hence, with Theorem 2.7,
X, \
weakly
(2.92)
-> N
Since,
(2.93)
Var(Y) = E(Var(Y|X)) + Var(E(Y|X)) ,
and
al = ^ ,
then
weakly
-> N(0,1)
Note that
and
may be replaced with any asymptotically
*
equivalent formulas, say
*
*
and Sj^ such that s^/s^ converges to one and
*
- p^)/s^ converges to zero as k tends to infinity.
This is not
*
important for the asymptotic result, but the choice of s^ and
*
may
greatly influence the accuracy of the limiting normal distribution for
small samples.
its
C.
Statistics Based on the Empirical Distribution Function
This section reviews various statistics based on the empirical
distribution function.
Let
be an ordered random sample
from a distribution with distribution function F^(x:0) where 8 is a
vector of unknown parameters.
The empirical distribution function at x,
F^(x), is defined as the proportion of the x^ values less than or equal
to
X.
More explicitly, F^^x) is defined as
F^(x) =
0
, X < X.
i/n
, x^ 3 X <
1
, X a x^ .
(2.94)
The statistics based on the empirical distribution function can be
roughly divided into two broad classes of statistics typified by the
well-known Kolmogorov-Smirnov statistic,
D
=
sup
|F (x) - F (x;0)|,
(2.95)
-= 3 X 3 "
and the Cramer-von Mises statistic.
= n / [F^(x) - F^(x;e)]^ dF^(x;e) .
(2.96)
The Kolmogorov statistic was developed by Kolmogorov (1933).
Two
one-sided statistics very similar to the Kolmogorov statistic which were
proposed by Smirnov (1939, 1941) are
=
sup
-co
<
X â
[F^^x) - F^(x;e)] ,
(2.97)
49
and
D
sup
n
(2.98)
[Fg(x;8) - F^(x)] .
is commonly known as the Kolmogorov-Smirnov statistic.
The
Kolmogorov-Smirnov statistic measures the maximum discrepancy between
the empirical and the hypothesized cumulative distribution functions.
In an attempt to make full use of the discrepancy between the empirical
and hypothesized cumulative distribution function, Cramer (1928, p. 145)
developed the statistic
=
(2.99)
/ [F^(x) - F^(x;6)]^ dF^(x;6) ,
which averages the square of the difference between the empirical and
the hypothesized distribution function across all values of x.
The
spirit behind the Cramer statistic is similar to that of the chi-square
statistic which measures the square of the differences between the
expected and observed cell counts.
The Cramer statistic was later
generalized by von-Mises (1931) by introducing a weight function g(x) to
obtain
=
/ g(x) [F^(x) - F^(x:e)]^ dFg(x;8) .
(2.100)
When the weight function is identically one, this reduces to the
original Cramer statistic.
This statistic was further modified by
Smirnov (1936, 1937), who obtained
50
= n ; Y(F (x;8)) [F^(x) - Fg(x;8)]:
where
dF ^(x;e),
(2.101)
is some function.
Anderson and Darling (1952) studied a special case of Cramer-von
Mises-Smirnov statistic where the weight function is
¥(F^(x;0)) = [F^(x;0)(1 - F^(x;0))]"\
(2.102)
and the resulting statistic,
"
= n /
(F (x) - F (x;8))2
dF (x;9) .
(2.103)
F (x;8)(1 - F^(x;e))
is commonly called the Anderson-Darling statistic.
Note that the denominator approaches 0 when x approaches the
extreme ends of the distribution and it achieves the maximum value of
0.25 when x is the median of the distribution.
The Anderson-Darling
statistic gives greater weight to the tails of the distribution and can
be expected to be more powerful in detecting distributions with heavier
or longer tails than those of the null distributions.
Sometimes, the data are in the form of directions and one wishes to
test the hypothesis that the orientation of the directions is random.
Data of this type can also be represented as a set on points on the
circumference of a circle.
The testing of the hypothesis that the n
points are distributed at random on the circumference of the circle is
exactly the same as testing the hypothesis of randomness of directions.
51
Statistics developed for this kind of situation are commonly known as
tests on the circle.
One essential property of statistics of this kind
is the invariance property of the choice of reference point on the
circumference of the circle.
To be more precise, let R be any arbitrary
point on the circumference of the unit circle.
Let d^,d2,...,d^ be the
distances from the reference point R to the n sample points in a
particular direction.
The sample d^.d^j-.-.d^ completely determines the
sample for a fixed R.
Thus, it is important that the statistics
developed for this kind of situation remains unchanged with any other
choice of reference point on the circle.
A Kolmogorov-Smirnov test on the circle of the form
V
n
= D* + D~
n
n
(2.10'-1)
was proposed by Kuiper (1959).
A more powerful statistic for testing
points on the circle,
U' = n/[F^(x)-F^(x;0) - /{F^(t)-F^(t;0)ldF^(t,9)]^dF^(x;0) ,
(2.105)
was developed by Watson (1961).
The Watson statistic attempts to
measure the variance of the differences between the empirical and
hypothesized distribution functions.
These statistics for test of
points on a circle can also be used for testing of points on a line.
The Kuiper statistic is more powerful in detecting a change in scale
rather than a change in location, when compared to the
Kolmogorov-Smirnov statistic.
Similarly, the Watson statistic can be
52
expected to be more powerful in detecting shifts in the variance of a
symmetric distribution.
The EDF statistics are summarized as follows:
Kolmogorov-Smirnov statistics:
D
=
sup
|F (x) - F (x;e)| ,
-= S X a "
(2.106)
CF^(x) - F^(x;e)] ,
(2.107)
[Fg(x;8) - F^(x)] .
(2.108)
=
sup
-00 < X ^ "
D
=
sup
— CO
^
X
=
00
Kuiper statistic:
V
n
= D"*"
n
D~ .
n
(2.109)
Cramer von-Mises statistic:
.n / [F^(x) - F^(x;8)]^ dF^(x;0),
(2.110)
Anderson-Darling statistic:
»
A: = n /
"
(F„(x) - F^(x;0))2
dF (x;e)
F^(x;0)(1 - Fg(x;8))
(2.111)
53
Watson statistic:
00
00
y: = n/[F (x)-F (x;0) - /{F (t)-F^(t;0)}dF^(t,0)]^dF^(x;8)
n
no
n
0
o
o
(2.112)
These previous formulas are not necessarily the most convenient
formulas for practical computations.
formulas are useful.
Note that
The following computational
= F^(x^;e) or F^(x^;0) depending on
whether 0 is known or not under the null hypothesis, respectively.
Unless otherwise indicated, 0 is assumed to be the maximum likelihood
estimator of 0.
Only those statistics used in the subsequent power
comparison are listed below.
Kolmogorov-Smirnov statistics:
= max [ i/n - z.] ,
1gi<n
^
(2.113)
D = max [ z. - i/n] ,
" ISiSn
^
(2.114)
"
D
= max I i/n - z.| = max ( D^, D ) .
"
IZiSn
'
"
(2.115)
"
Kuiper statistic:
V
n
= D
+
n
+ D .
n
(2.116)
Cramer-von Mises statistic:
n
= % [z. - ( 2 i - 1 ) / ( 2 n ) ] : + 1/(12n) .
" i=1 ^
(2.117)
54
Anderson-Darling Statistic:
n
= - (%(2i-1)[ln
i=l
+ In (1 - z^+^_^)]}/n - n .
(2.118)
Watson Statistic:
- n(z - 1/2)2 ^
where
z = (^z^)/n.
(2.119)
Extensive references for tests based on the empirical distribution
function can be found in Darling (1957), Barton and Mallows (1965),
Sahler (1968) and Durbin (1973b).
Darling (1955) considered testing a
composite null hypothesis where one parameter has to be estimated, and
this was extended to the multiparameter case by Sukhatme (1972).
Durbin
(1973b) presented a comprehensive treatment of the theory for the
derivation of the sampling distribution of a wide range of statistics
based on the empirical distribution function.
The statistics considered
by Durbin include the Kolmogorov-Smirnov, Kuiper, Cramer-von Mises,
Anderson-Darling and Watson statistics. Treatment of the asymptotic
theory of statistics based on the empirical distribution functions can
be found in Anderson and Darling (1952, 1954), Darling (1955, 1957),
Durbin (1973a, 1973b, 1975), Kac et al. (1955), Stephens (1976, 1977)
and Watson (1961, 1952).
Asymptotic percentiles of the Anderson-Darling statistic were
tabulated by Anderson and Darling (1954) for testing simple null
hypothesis.
Stephens (1974, 1976) obtained the asymptotic percentiles
for the Cramer-von Mises, Watson and Anderson-Darling statistics, when
the distribution tested is normal with mean or variance or both unknown.
55
These percentiles were obtained by Stephens by fitting Pearson curves to
the distributions using the first four cumulants.
The asymptotic
percentiles for D and V were obtained by Stephens (1974) using
extrapolation of Monte Carlo percentiles of finite samples.
Stephens
(1974, 1976) also provided Monte Carlo percentiles for the statistics
A^,
, V and D corresponding to finite samples for the normal case,
where parameters have to be estimated.
The Weibull probability model is used widely in modelling
reliability or lifetime data because of its wide range of density
curves.
The distribution function of a Weibull random variable X is
F(x) = 1 - exp[-(x/e)^] .
(2.120)
The Weibull distribution can be transformed into a Gumbel distribution
using the simple transformation Y = -log X.
The distribution function
of the Gumbel random variable is given by
F(y) = exp[-exp{-(y - a)/g}],
(2.121)
a = - log 8 ,
(2.122)
where
and
B = 1/Y .
Thus, if one is interested in fitting a Weibull probability model to a
set of data, one can first transform the data using the minus .
logarithmic transformation and then fit a Gumbel probability model to
the transformed data.
A good fit of the Gumbel probability model to the
56
transformed data set would imply a good fit of the Weibull probability
model to the original data set. The Monte Carlo percentiles for finite
sample D^, D
and V statistics for the testing of goodness of fit of the
Gumbel probability model with unknown parameters, were provided by
Chandra et al. (1981).
Stephens (1977) provided the Monte Carlo
percentiles for the
and
statistics for the Gumbel case.
The exponential probability model has also been widely used as a
model in lifetime study.
It has a constant hazard function and this can
be useful in certain situations.
"lifetime" of glass bottles.
will not deteriorate.
One interesting example is the
Unlike many other things, a glass bottle
The simplicity of the exponential density
function often leads to many elegant derivations of properties.
Monte
Carlo percentiles for D were provided by Lilliefors (1967» 1969).
Stephens (1974, 1975) provided Monte Carlo percentiles of A^, W^, U^, V
and D for the exponential case with unknown scale parameter.
+
method for the exact distributions of D , D
sizes was developed by Durbin (1975).
—
An elegant
and D for finite sample
Tables of percentiles for D^, D
and D for a wide range of sample sizes, were also provided by Durbin.
57
D.
Statistics Based on Moments
The mean and the variance of a distribution are among the most
basic statistical concepts.
They measure the location and the spread of
a distribution, respectively.
Two less well-known measures are the
skewness and kurtosis statistics.
The skewness and kurtosis are two
measures of the shape of a distribution.
The skewness is a measure of
asymmetry and the kurtosis is a measure of the heaviness of the tails of
The skewness /Bi and the kurtosis 62 of a distribution
a distribution.
are defined as
(2.123)
and
62 =
,
where yj, U3, and
(2.124)
are the central moments defined as
U2 = E(X - y)^ ,
(2.125)
U3 = E(X - u)^ ,
(2.126)
^4 = E(X - u)"* .
(2.127)
and
An asymptotically unbiased estimate of /gj is given by
n
/bi =
n
I (X
(n-1)(n-2) i=1
_
- X)^ /
,
(2.128)
58
where
1
n
(n-1)
i=1
_
- X): ,
I (X
(2.129)
and
n
X = (I X.)/n .
i=1
(2.130)
The bias of this sample estimate is of the order of 1/n.
An
asymptotically unbiased estimate of 6% is given by
bz =
n(n+1)
:
(n-1)(n-2)(n-3)
n
il (X
i=1
3(n-l)(n-l)
- X)") / s"
+3.
(n-2)(n-3)
(2.131)
The bias of this estimate is also known to be of the order of 1/n.
Table 2.2 contains the skewness and kur-tosis of several different
distributions.
Table 2.2.
Skewness and kurtosiss of certain distributions
Distribution
Skewness
Kurtosis
uniform
0
1.8
normal
0
3
Gumbel
1.14
5.4
exponential
2
9
59
The uniform and normal distributions are symmetrical distributions
whereas the Gumbel and exponential distributions are skewed
distributions.
Flat distributions with short tails like the uniform
distribution have small kurtosis.
The exponential distribution has a
long tail and a large kurtosis value.
Other sample estimates of the skewness and kurtosis are possible.
Common ones are
*
/b, =
1
n
I (X
n i=1
- X)= / s' ,
(2.132)
and
*
ba =
1
n
I (X
n i=1
- X)- / s" .
/
(2.133)
/
*
*
The reasons for using /b^ and b^ instead of /bi and bg are the
asymptotic unbiased property of /bj and bg and the fact that these two
sample estimates can be computed easily using the procedures PROC MEANS
or PROC UNIVARIATE of SAS (SAS Inc., 1982, pp. 497,498).
kurtosis computed in SAS differs from bj by 3.
Note that the
In other words, add 3 to
the kurtosis computed by SAS to get b^.
The skewness and kurtosis statistics can be used as goodness of fit
statistics.
For testing normality, the normal probability model will be
rejected for large absolute values of skewness since this is an
indication of asymmetry.
A kurtosis value too far from 3 will either
indicate a distribution with tails shorter or longer than that of the
normal distribution, and hence the null hypothesis of normal probability
50
model will be rejected.
The skewness and kurtosis tests can also be
performed for other null probability models, bearing in mind that the
critical values for the rejection of the null hypothesis will differ
among distributions.
D'Agostino and Pearson (1973) provided charts of curves through
smooth Monte Carlo percentiles of bj, from a normal distribution.
An
approximation to the distribution of bg was obtained by Anscombe and
Glynn (1983).
Work on approximating the distribution of /b^ can be
found in Bowman and Shenton (1973) and D'Agostino and Tietjen (1973).
The skewness and kurtosis statistics are tailored for different
classes of alternative distributions.
The kurtosis test is generally
more powerful than the skewness test for detecting symmetrical
distributions with longer or heavier tails than the normal probability
model.
On the other hand, the skewness test will perform better for
skewed distributions with kurtosis near that of the null probability
model.
Pearson et al. (1977) introduced a test based on the joint use
of the skewness and kurtosis statistics.
for the data to fit in.
This test specifies a frame
An extreme deviation of the skewness or
kurtosis values from those for the null probability model will lead to
the rejection of the null hypothesis.
This test was referred to as the
rectangle test by Pearson et al. (1977).
Consider the case of testing a null hypothesis F^, let /bi(L) and
/bi(U) be the lower and upper 1006% points of /bj and let b2(L) and
b2(U) be the lower and upper 1006% points of bg.
The four points
(/b,(L),b2(L)), (/bi(L),b2(U)), (/bj(U),b2(L)) and (/b^(U).b^(U))
61
define a rectangle as shown in Figure 2.1
(/bi(L),b2(U))
(/bi(U),b2(U))
(/bi(L).bzCL))
(/bi(U),b2(L))
b,(U)
b,(L)
/bi(L)
Figure 2.1.
/bi(U)
/b.
Rectangle defined by critical values of a rectangle test
If /bj and bg are independent, then the probability of a point
falling outside this rectangle will be a.
The a and g values are
related by
a = 4 (6 - g^) ,
(2.134)
or
=[1 -/(I - a)]/2 .
Table 2.3
shows the different g values needed to achieve various a
levels for the rectangle test.
62
Table 2.3.
Relationship between a and g
a
g
0.100
0.075
0.050
0.025
0.010
0.005
0.001
0.025658
0.019115
0.012660
0.006290
0.002506
0.001256
0.000250
Note that /bi and bg are usually not independent, and a percentage
of points smaller than a will fall outside the rectangle, yielding a
conservative test.
To obtain the Monte Carlo percentiles corresponding
to the specified a level, the following algorithm implementing a
bisection method was designed.
1.
[1]
Algorithm to obtain Monte Carlo percentiles of the rectangle test
corresponding to the specified g level
Generate N sets of random samples from the null distribution
compute the skewness and kurtosis values, i.e., (/bi^.bj^),
(
/
b
•
[2] Create a sorted array of skewness values /bi^^^, /bi^^)»**»»
and a sorted array of kurtosis values bz^^^, bz^g^,
...,
C3]
Construct the first rectangle test:
(a) Obtain the lower and upper percentiles
and
/bjj-n_ng-j for the /bj component, where g corresponds to the
and
63
specified a level as shown in Table 2.
(b) Obtain the lower and upper percentiles
and
for the bj component.
(c) The first rectangle is defined by
^^^[OUTupper]' ^^[OUTlower]
[ouTlower]'
^^[OUTupper]
OUTlower = [ng] and OUTupper = [n-ng].
Note: The fraction of (/bi^.bz^) points falling outside this
rectangle will be less than a.
*
[4]
Compute
for the rectangle defined by »^t)iouTlower'
'^'^lOUTupper' ^'OUTlower
^'OUTupper'
the
fraction of (/bi^,b2^) points falling outside the rectangle.
*
[5] Select a suitable positive integer j (such that the a
achieved
by the new rectangle will be greater than a) to contruct a new
rectangle defined by /b.^Niower' '''='I»upper'''=imo»er
"'raupper "here
INlower = ([ng]+j),
and
INupper = ([n-ng]-j).
This rectangle is smaller than the previous one.
*
[6]
Compute
for the rectangle defined by /biiNlower'
/b'lNupper' ^ziNlower
bziNupper' b? computing the fraction
of (/bi^.bz^) points falling outside the rectangle.
[7]
Compute Cj/2].
If [j/2] = 0, stop
If [j/2] > 0, go to [8].
64
[8] Let
MIDlower = (OUTlower + [j/2]) ,
and
MIDupper = (OUTupper - [j/2]).
*
[9J
Compute
Type I error achieved by the rectangle defined
/biMiDlower' "^^^MIDupper' ^^MIDlower
[10] If
(a -
X
(a
-
< 0, then
OUTlower = MIDlower,
OUTupper = MIDupper,
*
*
"out " "mid
and
j = INlower - MIDlower. Go to [?].
else,
INlower = MIDlower,
INupper = MIDupper,
*
*
"in " "mid
and
j = MIDlower - OUTlower.
Go to [7].
^^MIDupper'
III.
PROBABILITY PLOTS AND DISTRIBUTION CURVES
The probability plot is a common qualitative tool used widely by
statisticians and engineers.
The scatter plot, which includes the
probability plot, is considered to be one of the "magnificent seven" in
statistical quality control.
"Probabably the single most powerful tool
with which the results of an experiment can be studied is a collection
of plots of raw and transformed data" (Gerson, 1975).
Probability plots
provide a qualitative estimate of the goodness of fit of a probability
model to a data set.
One important application is assessing the
goodness of fit of a normal probability model to the residuals from a
fitted model of some experimental data.
There are two main types of
probability plots, namely the P-P (percent versus percent) and the Q-Q
(quantile versus quantile) probability plots.
Wilk and Gnanadesikan
(1968) and Gerson (1975) have comprehensive reviews of P-P and Q-Q
probability plots and some variants.
The Q-Q probability plot seems to
enjoy a greater popularity than the P-P probability plot.
This can be
largely attributed to the linear invariance property possessed by the
Q-Q probability plot.
The linear invariance property guarantees that,
if a linear transformation is performed on the observations, the
resulting Q-Q probability plot would still be linear but with a change
in slope and intercept.
This chapter reviews the construction of P-P
and Q-Q probability plots, the choice of plotting positions and a
comparison between the Q-Q and P-P probability plots.
A new technique
based on P-P probability plots for assessing the goodness of fit of
nonhypothesized probability models to a data set is developed.
This
66
technique is not limited to location-scale distributions.
Finally, a
computer implementation of this technique is proposed.
Let
be an ordered random sample of size n from a
location-scale distribution with distribution function F^C(x-a)/B],
where a and g are the location and scale parameters, respectively
1.
Construction of an "F" Q-Q probability plot
—» 1
Plot
against a^ where a_ = F
(p^) and p^ is the plotting
position.
2.
Construction of an "F" P-P probability plot
Plot
against p^ where
= F[(x^ - a)/B].
If a and g are unknown, they are replaced by the corresponding
maximum likelihood estimators a and g.
If F is the normal distribution
function, then the resulting probability plot is known as a normal
probability plot.
Similarly, if F is the exponential distribution
function, the resulting probability plot is called an exponential
probability plot.
Different choices of plotting positions are available.
Table 3.1
contains a list of different plotting positions for the P-P probability
plots.
Plotting positions for the Q-Q probability plots are obtained by
evaluating the inverse distribution function at these plotting
positions, that is, F (p^).
57
The plotting positions are similar for large sample sizes, but
there are differences among these sets of plotting positions, especially
at the extremes when the sample size is small.
The plotting position
i/n is known to hydrologists as the California Method (California
Department, 1923).
This has generally been discarded because it was not
possible to plot the largest or the smallest observation on the Q-Q
probability plot, but this problem does not occur for the P-P
probability plots.
The plotting positions (i-0.5)/n and i/(n+l) are the
most often sited in the journals, with the former being more popular.
However, there has been an increasing acceptance of Blom's plotting
position (i-0.375)/(n+0.25) in recent years.
Blom (1958) proposed the
— "j
formula $ [(i-c)/(n-2c+1 )] as an approximation of the expectation of
the normal order statistics and recommended the compromise value c =
0.375.
Harter (1961) provided a formula for c as a function of i and n,
improving the overall accuracy of the approximation to about 0.002 for n
S 400.
The crude normal Q-Q probability plot produced by the SAS
UNIVARIATE procedure is based on this plotting position.
The plotting
•position i/(n+1) has the feature that E{F[(X^- a)/g]} = i/(n+l) since
FC(X^-a)/B] has a beta(i, n-i+1) distribution.
Kimball (I960) has a
detailed discussion on the choice of plotting positions.
Looney and
Gulledge (1985) investigated empirically the power of the
Shapiro-Francia statistic using different plotting positions.
The power
of the Shapiro-Francia statistic remains approximately the same for
different plotting positions.
68
Table 3.1.
Plotting positions for the P-P probability plots
Plotting position, p^
(i-0.5)/n
i/n
i/(n+1)
(i-0.3)/(n-0.4)
(i-0.375)/(n+0.25)
(3i-1)/(3n+1)
(i-0.44)/(n+0.12)
(i-0.3175)/(n+0.365)
(i-0.33)/(n+0.33)
(i-0.4)/(n+0.2)
(i-0.567)/(n-0.134)
References
Hazen (1914)
California State Department (1923)
Weibull (1939)
Benard and Bos-Lavenbach (1953)
Blom (1958)
Tukey (1962)
Gringorten (1963)
Filliben (1975)
Biomedical (1979)
Larsen, Curran and Hunt (1980)
Larsen, Curran and Hunt (1980)
The construction of normal P-P and Q-Q probability plots is
illustrated with a data set taken from Snedecor and Cochran (1980, p.
94).
The data set consists of gains in weight of female rats under a
high protein diet.
The location and scale parameters are estimated by
the sample mean and standard deviation a = 120 and g = 21.39.
The ordered observations are listed in Table 3.2, along with the
plotting positions {i/(n+1)}.
In order to construct a normal Q-Q
—1
probability plot, the inverse normal distribution function, $ (•) must
be evaluated at the plotting positions.
To facilitate the construction
of probability plots, special probability graph papers are available.
-1
These graph papers have a scale based on the values of F
(i/(n+1)) but
-1
labelled with an i/(n+1) scale, so the point (x\, F
Ci/(n+1)]), can be
plotted by knowing the value of the point (x%, i/(n+1)).
A plot of the
ordered observations against the inverse normal distribution function of
the plotting positions would yield a normal Q-Q probability plot, as
69
shown in Figure 3.2.
To contruct a normal probability P-P plot, the
/V
^
_1
observations are standardized using a and g, and $ («) is evaluated for
the standardized observations.
columns of Table 3.2.
These values are listed in the last two
A plot of $C(X^-ot)/g] against i/(n+1) yields the
normal P-P probability plot as shown in Figure 3.1.
Table 3.2.
i
1
2
3
4
5
6
7
8
9
10
n
12
Gains in weight of female rats under a high protein diet and
plotting positions for the normal P-P and Q-Q plots
i/(n+l)
83
97
104
107
113
119
123
124
129
134
146
161
0.077
0.154
0.231
0.308
0.385
0:462
0.538
0.61 5
0.692
0.769
0.846
0.923
$ ^[i/(n+1)] (x^-a)/3
-1.426
-1 .020
-0.736
-0.502
-0.293
-0.097
0.097
0.293
0.502
0.736
1 .020
1 .426
-5.993
-3.725
-2.591
-2.106
-1.134
-0.162
0.486
0.648
1.458
2.268
4.211
6.641
$[(x^-a)/g]
0.0000
0.0001
0.0048
0.0176
0.1285
0.4357
0.5869
0.7415
0.9275
0.9883
0.9999
1.0000
F[(X. - a)/B] = i/(n+1),
(3.1)
X. = g F"T[i/(n+1)] + a.
(3.2)
and
These approximations provide a heuristic explanation for the
linearity of the normal P-P and Q-Q probability plots respectively.
The
slope and the intercept on the vertical axis of a Q-Q probability plot
70
O
O00
6-
0 o-
s S-
M
6-
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
Figure 3.1. Normal P—P probability plot
0.9
1.0
Observed percentiles
83.0 90.8 98.6 106.4 114.2 122.0 129.8 137.6 145.4 153.2 161.0
J
1
I
I
I
L
OQ
c
m
CO
k)
?
B
JD
I
3
W
S-.
I
"O
I—•
o
P CO
72
provide graphical estimates of the scale and location parameters, as is
obvious from (3.2).
Chernoff and Lieberman (1954), and Barnett (1975,
1976) discussed the problem of obtaining efficient and unbiased
estimators of the location and scale parameters using probability
plotting methods.
Figures 3.1 and 3.2 show that the P-P and Q-Q probability plots are
similar.
A common feature of the normal Q-Q probability plot is that
points near the middle of the plot usually have the smallest variance.
The opposite is true for P-P probability plots when
the form of F^.
= F regardless of
Michael (1983) considered the use of certain
transformations involving the arsin function, on the plotting positions
and {F[(X^ - a)/g]} to achieve uniform variance of the points on a
probability plot from one end to the other.
An appealing property of the Q-Q probability plot is that if y^ is
a linear transformation of x^, then the resulting Q-Q probability plot
will remain linear but with possibly changed slope and intercept.
This
linear invariance property has made Q-Q probability plots valuable and
very popular.
One geometric configuration that humans can perceive most
easily is linearity.
The general P-P probability plots discussed in Wilk and
Gnanadesikan (1968) do not necessarily possess the linear invariance
property.
However, as long as the observations are properly
standardized, the P-P probability plot can be shown to be linear
invariant.
If fact, a P-P probability plot of the original observations
and a P-P probability plot of a linear transformation of the original
73
observations are identicaly the same.
This can be proved using the
linear invariance property of maximum likelihood estimation.
A theorem
stating the property of the maximum likelihood estimators is presented
without proof.
The proof of this theorem can be found in Mood, Graybill
and Boes (1974).
Theorem 3.1 (Mood, Graybill and Boes, 1974, Ch. VII, p. 285)
Let 0 = (8^,...,8^), where 8^
likelihood estimator of
6 .
J
(X^,...,X^) is a maximum
in the density f(•;e
'
8 .
K
).
If t(0) =
((0),...,T^(0)) for 1 g r g k is a transformation of the parameter
s p a c e 0, then a maximum likelihood estimator of t(8) = (i^(0),...,t^(e))
is t(8) =
(0),...,t^(0)).
Theorem 3.2 (Invariance property of the P-P plot or the
statistic)
The P-P plot or the k^ statistic is linear invariant if the
location and scale parameters are estimated using maximum likelihood
estimators.
Pooof
Let F[(X-a)/g] be the distribution function of a location-scale
distribution with location parameter a and scale parameter g.
Let X = (X^.X^,...,X^) be an ordered random sample from the
standard distribution with location parameter 0 and scale parameter 1.
Let
= b X^ + a be any linear transformation of the X^.
The
distribution function for the transformed random variable is FC(X-a)/b].
74
It is sufficient to show that
X. - a
since the
Y. ~ a
. y
,
(3.3)
statistic or the points on a-P-P probability plot depend on
X through the transformed observations only.
Note that
and
are
the maximum likelihood estimators of the location parameter 0 and scale
parameter 1 and
and
are the maximum likelihood estimators of the
location parameter a and the scale parameter b, respectively.
By Theorem 3.1, the maximum likelihood estimators of a and b are
a = b a + a ,
y
X
(3.4)
and
By - b 'x Hence,
^i
"y
b X_ + a
- ( b
+ a)
^x
*1 - "x
3.
Problems associated with Q-Q probability plots
Mage (1982) in his paper entitled "An Objective Graphical Method
for Testing Normal Distributional Assumptions Using Probability Plots"
75
provided a good review on the problem of drawing "the" best straight
line on a Q-Q probability plot.
If one resorts to the use of a machine,
then a straight line can be drawn on the graph objectively using the
methods of least-squares, weighted least-squares, moments or maximum
likelihood.
If one uses the hand to draw a straight line on the Q-Q
probability plot, a straight line is drawn subjectively.
Some of the methods suggested in the literature for drawing a
straight line on a Q-Q probability plot are as follows:
[1]
Gumbel (1964): "After the observations have been plotted, the
straight line may be drawn by a ruler, provided that the scatter of
the observations is sufficiently small.
The question of acceptance
or rejection of the probability function may be settled by mere
inspection."
[2] Hahn and Shapiro (1967): "If a straight line appears to fit the
data, draw such a line on the graph 'by eye'."
[3] Ferrell (1958), described by King (1971): "First, make a good
'eye-ball fit', using the straightedge.
near the smallest plotted point.
Then place a pencil point
Pivot the straightedge around the
pencil point until the points in the upper half (P>0.5) of the plot
are divided into two equal parts.
(Equal numbers of points above
and below the upper half of the line.) This is readily done by
counting.
Next, shift the pencil point up near the largest plotted
point on the new trial line and divide the points in the lower half
(P<0.5) of the plot into two equal parts.
Two or three such
76
points into an upper half and a lower half with respect to the
straightedge."
Motivated by the need of a method of drawing an objective straight
line on a Q-Q probability plot, Mage (1982) suggested a set of 10 rules
for drawing a such a line.
The idea behind the 10 rules is to draw a
straight line to minimize the Kolmogorov-Smirnov statistic.
The
uncertainty and subjectivity of the drawing of a straight line is one of
the drawbacks of Q-Q probability plots.
For the P-P probability plot, there is no confusion at all.
The
unique best-fit straight line is the diagonal line joining the points
(0,0) and (1,1).
Another advantageous feature of the P-P probability
plot is that the x-coordinate values depend only on the sample size and
not upon the hypothesized distribution.
In addition, the points always
fall within the unit square and are not bunched as closely together in
various regions as with Q-Q probability plots for certain distribution.
An example is the exponential Q-Q probability plot for which the points
are usually bunched together at the left end of the exponential Q-Q
probability plot.
Furthermore, the variation of the points about the
line is relatively small for the left end of the plot since the variance
of the i^^ ordered exponential random variable is given by
i
Var(X.) = I [1/(n-k+1):],
k=1
(3.5)
and the variance of the Y-coordinate values on an exponential Q-Q
probability plot increases steadily from the lower end to the upper end.
77
On the contrary, the variance of the i^^ uniform ordered random variable
is
i(n-i+1)
Var(X ) =
,
(n+1):(n+2)
and the quantiles are evenly spaced.
(3.5)
So, the points will spread out
within an oval shape band enclosing the diagonal.
4.
Distribution curves on a P-P probability plot
The random variable F[(X^ - a)/g] is the i^^ ordered uniform random
variable and hence E{F[(X^ - a)/g]} = i/(n+1).
should be close to i/(n+1).
Thus, F[(x^ - a)/6]
If X is a random sample from a distribution
with distribution function F('), then the points {(F[(X^ - a)/6],
i/(n+1))} of the resulting "F" P-P probability plot will fall roughly
along the diagonal joining the points (0,0) and (1,1).
However, if the
sample is from a non-"F" distribution, then the points of the "F" P-P
probability plot may fall along some curve on the "F" P-P probability
plot.
Figure 3.3 shows a normal P-P probability plot of a random sample
of size twenty from the uniform (0,1) distribution generated using the
generator RANUNI(9882017) (SAS Inc., 1982, p. 195).
Note that the
points fall along a curve in Figure 3.3.
There are specific curves corresponding to various non-"F"
distributions for a particular "F" P-P probability plot.
will be called "distribution curves".
These curves
These curves can be used just
like the diagonal line as a measure of fit of a probability model to a
78
o
O00
d-
O"
o-
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
Figure 3.3. Normal P—P probability plot
0.9
1.0
79
set of data that is plotted on a P-P probability plot.
The obvious
problem is to produce these distribution curves for a particular "F" P-P
probability plot.
The following method is presented for obtaining the
distribution curve for a random variable with distribution function F^ ^
F.
5.
Constructing distribution curves for "F" P-P probability plot
-1
[1] For each p^ = i/(n+1) compute F^ (pr), for i=1,2,,..,n.
[2] Compute the maximum likelihood estimates of a and g based on the
—1
F^ (p^) values.
[3] Standardized the observations by computing
- a
[4] Compute the "F" probability
[5]
Plot y^ against p^.
[5] Join the points to get a smooth curve (using a good graphics
package).
For uniform P-P probability plot, a = min x^ and 6 = max x^ - min
x..
The smooth curve obtained is called the "F^" distribution curve on
an "F" P-P probability plot.
Note that steps 2 through 5 are exactly
what is required for constructing an "F" P-P plot.
Distribution curves
80
for the normal, exponential, Gumbel and uniform P-P plots are displayed
in Figures 3.4 - 3.27.
Figures 3.28, 3.29 and 3-30 are normal P-P
probability plots of random samples of size 100 generated using
RANCAU(9874127), RANEXP(2572191) AND RANNOR(7250493) (SAS Inc., 1982, p.
195), respectively.
The same random sample from the normal distribution
is displayed on an exponential P-P probability plot in Figure 3.31.
Program codes for constructing normal P-P probability plot, in SAS and
DISSPLA languages can be found in Appendix C.
A computer implementation of the technique of using distribution
curves is now presented.
The main idea is to find the best match
between the plotted points and an "F" distribution curve.
The "F" P-P
plot is then constructed for additional support (using the diagonal
line) of the chosen probability model.
The matching procedure can be
automated by matching curves and plotted points using certain criteria
like least squares.
6.
Computer implementation
[1] Input the observations
[2] Select plotting position otherwise use the default.
[3] Select the type of P-P probability plot wanted: normal,
exponential, uniform etc.
[4] Plot the points on the screen.
[5]
Good match with the diagonal?
Yes, stop.
[6] Select an alternative distribution.
Any alternative distribution left?
[7]
Good match?
No, stop.
Yes, go to [31 or stop.
81
The graphical and the quantitative methods ought to complement each
other.
A probability plot often imparts a greater impression of the
nature of the data than a number. Shapiro and Wilk (1965) stated that
"The formal use of the (one-dimensional) test statistic as a
methodological tool in evaluating the normality of a sample is
visualized by the authors as a supplement to the normal probability plot
and not as a substitute for it."
One solution is to incorporate a test
statistic into a P-P or Q-Q probability plot using a simultaneous
confidence band.
Quesenberry and Hales, (1980) suggested using the fact
that F[(X^-a)/B] is a beta(i,n-i+1) random variable to construct (1-Y)
confidence intervals (L.
, U.
1 , T
1>
T
) for the Y-coordinate F[(X. - a)/6].
1
The end points of these confidence intervals are joined together to form
a "concentration band".
The main disadvantage of a concentration band
is that the (1-Y) concentration band does not give a (1-T) simultaneous
confidence set for the entire probability plot.
The probability that
all the points of a sample will fall inside the concentration band will
be less than (1-Y).
Stirling (1982) showed how to construct a
simultaneous confidence band on the P-P or Q-Q probability plots,
corresponding to the Kolmogorov-Smirnov test statistic but this type of
band appears to be rather conservative.
Probability plots provide a
qualitative method of assessing the goodness of fit of a probability
model to a data set, and insights into any apparent lack of fit of a
proposed probability model.
The use of probability plots, accompanied
by some quantitative tests like the r^ and
statistics often provide
an excellent tool for assessing the fit of a probability model.
82
o
O-
O-
1—4 CO
Z o-
O-
O-
O-
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
Laplace
Cauchy
Gumbel
Figure 3.4. Normal P—P probability plot
0.9
1.0
83
o
o-
O"
0 O-
ja O//
O-
oi
doo
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.6
Uniform probabilities
gamma(0,l,2)
logistic
gamma(0,l,5)
gamma(0,l,10)
Figure 3.5. Normal P—P probability plot
0.9
1.0
Normal probabilities
0.3
p
5
b
C!
M
o-
oq
cd
0Û
o.
to
P
?
CÛ
g
0
D)
I
o.
cr
(D
(+
p
cr
(T>
r*-
D)
'CO
M
6
&
o
cr
%
cr
P
'to'
B
*T3
1
o.
bi
O
%
:
( - ' •
fD O.
M ^
4"
U
o
r*-
o.
bo
o.
CO
-J
0.4
0.5
0.6
0.7
0.8
0.9
1
1
1
1
I
I
1.0
Normal probabilities
0.3
P
5
b
OQ
0
M
CD
Cû
o_
o.
to
!z:
0
g s -
B
0)
o*
•-1 o.
I—»
TJ
1
•n
M
O
cr
o"
M
U
o
a* cr
(D
ÎB- r*p
P
3 'o
ai bi
CO
cr
a>
r+
P
cr
CD
r-iP
M o_
O cji
cr
Is
(D o .
Ul kl
o.
CD
o.
Ô3
-J
0.4
1
0.5
1
0.6
L_
0.7
I
0.8
I
0.9
I
1.0
Normal probabilities
0.0
0.1
J—
0.2
1
0.3
1
0.4
I
0.5
I
0.6
L
0.7
I
0.8
I
0.9
I
orq
pi
3
W
00
?
s
1
u
1
S'
4"
U
o'
Î5.
P)
o
en
m
r*OJ
o"
bi
cr
D)
1-»
o
M
0
y
I—»•
o*
•-J
cr
m
c*D)
3
bi
o
.en.
o
oo
Normal probabilities
0.2
O
3
-I
0.3
1
0.4
1
0.5
1
0.6
1
0.7
I
0.8
I
0.9
1.0
I
b
ffQ
cM
o_
P
o.
k)
(D
CO
g
?
g
O
M o.
P
I—'
Ti
I
U
o
D'
2-
^.
cr cr cr cr
(D
rf
P
'Cû
.W
(D
D)
P
13 "co
CO
CO
(b
P
M o.
bi
&
W.
Is
O"
%
Ti
t—'
O
r*-
m o.
M kl
o.
QD
O.
CO
00
-s
88
o
O"
O-
oo
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
normal
Laplace
logistic
Figure 3.10. Gumbel P-P probability plot
0.9
1.0
Gumbel probabilities
0.
o
5
b
orq
pi
M
O-
m
Cû
o.
M
g
£.
I
T)
M
0
cr
1
|:
cr cr
(D
p
p
(D
0)
'ro'
o
u'
fu
cr
fD
r+P
M OO O»
cr
lw»§
h-»»
CD P-
W -^î
4"
O.
I—'
o
CD
O.
CÛ
O
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Gumbel probabilities
0.
O"
b
QTQ
C
y
CD
Cû
o-
î\3
o.
M
g
g
I
O*
M o.
g.
•n
I
Tl
M
O
cr"
&
>->•
I—'
hd
I—'
O
2-
cr cr cr cr
CD
CD
CD
CD
(+
f+
P) r*P)
P3
'o 'o' 'h^
CJI CJI 3
CA
M
9
M o_
O oi
cr
& p
M: O)
CD o.
w kl
o.
ôo
o.
ô)
b
0.3
- 1
0.4
1
0.5
0.6
0.7
0.8
0.9
1.0
1
1
I
'
l
l
91
o
O)
d"
00
d-
O O-
C\î
oo
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
beta(0.5,0.5)
beta(l,l) or uniform
beta(0.5,l)
beta(l,0.5)
Figure 3.13. Gumbel P—P probability plot
0.9
1.0
92
o
CO
o o-
^ o-
O-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
beta(2,2)
beta(3,3)
beta(2,3)
beta(3,2)
Figure 3.14. Gumbel P-P probability plot
0.9
1.0
93
o
o-
O"
5 2•—H
_ O-
go
CV2
d
o
d
0.0
0.1
0.2
0,3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
Laplace
logistic
normal
Gumbel
Figure 3.15. Exponential P-P probability plot
1.0
94
o
O"
co
docO ®
mO o-
O-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
gamma(0,l,10)
gamma(0,l,4)
gamma(0,l,3)
gamma(0,l,2)
Figure 3.16. Exponential P-P probability plot
1.0
Exponential probabilities
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
O
5
b
?
o-J
3
CO
o.
K)
g 20
p
O
CD
B
a
I—•
cr
(T>
f+
1
•n
A
CO
o
M
0
1
I
u
o
cr
m
r+p
cr
m
f+
P
'co
of
o
cr
p.
Ip.
o.
bi
o
I
'
(D pM ->I
O.
CD
O.
b
b
VO
Exponential probabilities
0.0
5
0.1
0.2
J
0.3
1
0.4
1
0.5
1
0.6
I
0.7
I
0.8
I
0.9
I
acj
0
m
CO
00
M
X
X)
0
P
(D
5;
P)
1 •
•n
I
M
o
cr
ë-
I—•
o
cr
cr cr cr
n>
CD (D
(D
r+
r*(+
(-+-
P
33
cn bi
3
P
t—^
CÛ
ro
O 01
kO
Exponential probabilities
0.0
O
5
b
OQ
c
M
CD
o-
W
Η>•
CO
o.
to
M
%
Xi
O
m
(D
gM"
P
I—'
TJ
I
T)
M
O
O"
P
cr
g SO
M o.
cr
cr cr
a>
(D
CD
(-4r*P P p
'P^ 3
o oi
3 1-^ o
0
n
o*
cr
(D
(-4P
3
CJl
o
3
3
M o.
en
g-
Is
m o.
M kl
o.
CD
%
•m
o
c-t-
o_
CO
b
0.2
J
0.3
0.4
0.5
0.6
0.7
0.8
1
1
1
I
I
I
0.9
1.
Exponential probabilities
0.
O"
3?
0.1
0.2
L
0.3
1
0.4
1
0.5
1
0.6
L_
0.7
0.8
L
b
OQ
d
o-
%
CO
M
o.
to
M
g S-
O
0
O*
CD
0
r*M»
p)
h—»
1
%
3
cr
&
cr cr cr cr
CD
CD
CD
ÇD
(+
(-H
p p fu
P
'co
"co
3^ 3 3
B S"
p-
o OI
cr"
& P-
11: oi
CD' P-
M -O
O.
•D
I—'
o
O.
b
Co
99
o
O"
oo
dOT ^
0 o~
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
Cauchy
Laplace
logistic
normal
Figure 3.21. Uniform P—P probability plot
1.0
100
/
//
/
/
/
/
/1
.•
/
/
/
/
Uniform probabilities
symmetric triangle
Gumbel
exponential
uniform
Figure 3.22. Uniform P—P probability plot
I
101
o
O-
OOT P0) 6-
O-
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
gamma(0,1.10)
gamma(0,l,4)
gamma(0,l,3)
gamma(0,l,2)
Figure 3.23. Uniform P—P probability plot
1.0
Uniform probabilities
0.0
p
b
5
(m
C
M
(û
Où
O-
ro
o.
M
g
gs
O
B
nj
I
O
cr
g-
VJ
O
r*"
cr CJ* cr
(D
fD
fD
r*- r+- f+
P)
P)
'Cû 'Fo' CO
O O
3 ê
CM
(D
r+P)
J—^
B
o.
4^
M o.
O bi
cr
Is
(D O .
M
O.
ba
O.
io
o
0.1
0
0.3
—I
0.4
0.5
1
1
0.6
1
0.7
0.8
1
I
0.9
I
1.0
Uniform probabilities
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
5
(TQ
c
•I
m
Oû
bo
ai
g
o*
M
B
•n
I
TJ
•n
M
o
cr
D)
cr
M*'
H—»
o
cr
(D
r+
P
3
ai
.CJ,
or
fb
p)
'o
ai
cr cr
(D
m (+
D) p
'P'
&
o
OJ
Uniform probabilities
0.3
p
1
0.4
1
0.5
1
0.6
1
0.7
1
0.8
I
0.9
I
b
'S
o-
3
CO
ho
o.
to
pi
g
gs
o*
o
M o.
"1
cr
I
3O"
(D
f-c
p
o
.bi.
cr cr cr
(D
(D
c-i-
P
fD
p
Ol
Ol
o
cr
X
o'
M
o
P
'o'
'3
M
0
y
t-"'
P)
r+-
O
Ol.
M o.
O cn
cr
Is
CD O.
M
o.
00
o.
CO
\\ \
1.0
105
o
m
6
w
V o
IS
O W
b 6-
«
d
CM
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
beta(2,2)
beta(3,3)
beta(2,3)
beta(3,2)
Figure 3.27. Uniform P—P probability plot
1.0
106
O"
00
O"
O"
Oo
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
Laplace
Cauchy
Gumbel
Figure 3.28. Normal P-P probability plot
0.9
1.0
107
o
o00
d"
0 O-
O-
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
—
exponential
Laplace
Cauchy
Gumbel
Figure 3.29. Normal P-P probability plot
0.9
1.0
108
o
O-
O"
0) O "
mO O"
O"
O"
o-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
Laplace
Cauchy
Gumbel
Figure 3.30. Normal P-P probability plot
0.9
1.0
109
o
O-
O-
O-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
normal
Laplace
Cauchy
Gumbel
Figure 3.31. Exponential P—P probability plot
1.0
110
IV.
EMPIRICAL POWER COMPARISON
A.
Methods of Computation
This section describes concisely the computing methods behind the
Monte Carlo power comparison study.
The power study actually consists
of three separate power studies: normal, Gumbel and exponential power
comparisons.
The null composite hypothesis is
F = Fg[(X -
where
a)/g]
,
(4.1)
is either the normal, Gumbel or the exponential cumulative
distribution function.
The unknown parameters, a and g, are the
location and scale parameters, respectively.
1.
Statistics used in the power comparison study
[1] Correlation type statistics: k^, r^ and W.
[2] Pearson chi-square and likelihood ratio statistics: XVj, XI,
X3, X5, X7, XI0, XI3, X17, GVj, G1, G3, G5, G7, GIG, G13 and
017.
[3] Statistics based on the empirical distribution function:
A\ WS U\ V and D.
[4] Statistics based on moments: /bj, bg and R.
Note that Xm or Gm refers to a chi-square or likelihood ratio
statistic with expected cell count equals to m.
X7 and G7 for sample
size 20, XI0 and GIG for sample size 50 and X17 and G17 for sample size
100 are based on a recommendation by Mann and Wald (1942).
The Mann and
Wald formulation was based on the equiprobability case for the simple
null hypothesis.
The W test was used for the normal power study only.
Ill
The neccessary coefficients for the W test were tabulated for sample
sizes up to 50 only, so it is not used in the normal power study for
sample size 100.
2.
Percentiles used in the power comparison study
Except for the Anderson-Darling statistic, the percentiles for the
statistics based on the empirical distribution function for the normal
power study were obtained from Stephens (197^, 1976).
Percentiles for
all the other statistics were generated using 15000 Monte Carlo samples.
As a check of the accuracy of the percentiles used in the power study,
two sets of 5000 random samples of were generated for each statistic and
the empirical Type I errors were computed.
Table 4.1 contains the
empirical Type I errors of the statistics for the test of normality.
The empirical Type I error levels were reasonably close to the the
specified Type I error levels for all the statistics but the empirical
Type I error levels of the chi-square and likelihood statistics showed
slightly more fluctuation.
This is due to the discreteness of the
chi-square or likelihood ratio statistics.
Since the percentiles for
the Anderson-Darling statistic provided by Stephens (1975) for the test
of normality consistently showed inflated Type I error levels, new Monte
Carlo percentiles for the Anderson-Darling statistic were generated
using 15000 samples.
The Anderson-Darling statistic using these
percentiles is denoted by A^ and the one using the percentiles presented
by Stephens (1974, 1976) is denoted by B^.
112
Table 4.1.
Empirical Type I error levels of the statistics based on
two sets of 5000 random samples for the testing of departure
from the normal distribution (sample size = 20)
Set 1
Set 2
Level of significance
Level of significance
Statistics
0.1
0.05
0.01
0. 1
0.05
0.01
XV,
XI
X3
X5
X7
.077
.094
.093
.100
.078
.039
.037
.047
.032
.031
.009
.009
.007
.007
.005
.070
.090
.093
.100
.086
.032
.031
.052
.030
.036
.008
.005
.008
.006
.007
GV,
G1
G3
G5
G7
.107
.104
.098
.108
.078
.066
.050
.059
.044
.031
.008
.010
.009
.010
.008
.111
.094
.104
.113
.086
.061
.047
.058
.043
.036
.006
.008
.009
.008
.008
W
.096
.092
.101
.048
.046
.054
.009
.012
.011
.103
.092
.096
.052
.044
.050
.012
.008
.011
/bi
bz
R
.100
.105
.102
.054
.047
.051
.009
.010
.010
.103
.105
.101
.051
.046
.048
.009
.008
.010
D
V
.097
.098
.095
.098
.130
.048
.049
.048
.050
.070
.009
.008
.010
.010
.017
.099
.095
.097
.101
.131
.050
.048
.050
.050
.069
.010
.008
.009
.010
.017
Uz
8=
113
3.
Sample sizes and significant levels used in the power comparison
study
Three significant levels 0.1, 0.05 and 0.01 and three sample sizes
20, 50 and 100 were considered in the power study.
1000, 500 and 250
statistics were generated for each of the alternative distributions, for
sample sizes 20, 50 and 100, respectively.
Obviously, the estimated •
power levels have larger variances for small samples like 250 and 500.
However, with such a wide range of alternative distributions, one can
obtain good estimates of the powers of these statistics by examining
average results for various subsets of the alternatives.
4.
Alternative distributions used in the power comparison study
The alternative distributions used in the power comparison study
consist of a wide range of distributions.
These alternative
distributions include symmetrical distributions like the Laplace and
logistic distributions, skewed distributions like the chi-square and
beta distributions, short tailed distributions like the uniform
distribution, and heavy tailed distributions like the Cauchy
distribution.
Bimodal distributions, location or scale contaminated
normal distributions and location or scale contaminated exponential
distributions were also included.
The formulas for the alternative
distributions used in the study are given in Appendix A.
The skewness
and kurtosis values for the sets of alternative distributions used in
the normal, Gumbel and exponential power comparison study are given in
Tables 4.2 - 4.4.
114
Table 4.2.
Alternative distributions used in the normal power
comparison study
No. Distribution
Bz
No. Distribution
62
N(0,1)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1 )
0
0
0
0
0
1.15
1 .50
1 .51
1 .63
1 .72
36
37
38
39
40
t(4)
t(2)
t(1)
Cauchy(0,1)
38(0.5333,0.5)
6
7
8
9
10
Tukeyd .5)
Uniform(0,1)
38(0,0.707)
Tukey(0.7)
TruncN(-1 ,1)
0
0
0
0
0
1.75
1 .80
1.87
1.92
1 .94
41
42
43
44
45
TruncN(-2,1 )
Beta(3,2)
Beta(2,1)
TruncN(-3,2)
Weibull(3.6)
11
12
13
14
15
N(0,1)+N(3,1 )
TukeyO)
Beta(2,2)
TruncN(-2,2)
Triangle 1(1 )
0
0
0
0
0
2.04
2.06
2.14
2.36
2.40
46
47
48
49
50
Weibull(4)
SB(1,2)
TruncN(-3,1)
SB(1,1)
Weibull(2.2)
-.09
0.28
-.55
0.73
N(0,1)+N(2,1)
2.50
2.84
2.92
3.53
4.00
51
52
53
54
55
LoConN(0.2,3)
LoConN(0.2,5)
LoConN(0.2,7)
Weibull(2)
Half N(0,1)
0.68
3.09
1.07
1.25
0.63
3.16
0.97
3.20
3.25
3.78
4.20
4.51
56
57
58
59
60
LoConNO.1,3)
LoConN(0.05,3)
Gumbel(0,1)
LoConN(0.1,5)
3U(-1,2)
0.80
0.68
1.14
1 .54
0.87
4.02
4.35
5.40
5.45
5.59
1
2
3
4
5
0
0
0
0
0.65
2.13
-.32
2.27
0.29
2.36
-.57
2.40
2.65
2.72
-.18
0.00
0.51
2.75
2.77
2.78
2.91
3.04
16
17
18
19
20
N(0,1)+N(1,1 )
SU((0,3)
t(10)
0
0
0
0
0
21
22
23
24
25
Logistic(0,1)
SU(0,2)
Tukey(IO)
Laplace(0,1)
ScConN(0.2,3)
0
0
0
0
0
6.00
7.54
25
27
29
30
SGConN(0.05,3)
ScConN(G.1,3)
ScConN(0.2,5)
ScConN(0.2,7)
ScConN(0.1,5)
0
0
0
0
0
8.33
11.2
12.8
16.5
61
62
63
64
65
Chi-Square(4)
LoConN(0.1,7)
LoConN(0.05,5)
Exponent!aid )
LoConN(0.05,7)
1.41
1.96
1.65
2.00
2.42
6.00
6.60
7.44
9.00
10.4
31
32
33
34
35
ScConN(0.05,5)
SoConN(0.1,7)
ScConN(G.05,7)
SU(0,1)
SU(0,0.9)
0
0
0
0
0
20.0
21 .5
31 .4
36.2
82.1
66
67
68
69
70
Chi-square(1)
Triangle 11(1)
Weibull(0.5)
SU(1,1)
LogN(0,1,0)
2.83
0.57
6.62
-5.3
6.18
15.0
16.4
87.7
93.4
114
28
TruncN(-3,3)
5.38
7.65
115
Table 4.3.
Alternative distributions used In the Gumbel power
comparison study
No. Distribution
1 N(0,1)+N(10,1 )
2 Beta(0.5,0.5)
3 N(0,l)+N(5,l)
4 SB(0,0.5)
5 N(0,l)+N(4,1)
No. Distribution
32
0
0
0
0
0
1.15
1 .50
1.51
1 .53
1.72
36
37
38
39
40
SU(0,0.9)
t(4)
t(2)
t(1)
Cauchy(0,1)
Bz
0
0
0
0
0
82.1
5
7
8
9
10
Tukeyd .5)
Uniform(0,1)
SB(0,0.707)
Tukey(0.7)
TruncN(-1,1)
0
0
0
0
0
1.75
1 .80
1.87
1.92
1.94
41
42
43
44
45
58(0.5333,0.5)
TruncN(-2,1)
Beta(3,2)
Beta(2,1)
TruncN(-3,2)
0.65
-.32
0.29
-.57
-.18
2.40
2.65
11
12
13
14
15
N(0,1)+N(3,1)
Tukey(3)
Beta(2,2)
TruncN(-2,2)
Triangle 1(1)
0
0
0
0
0
2.04
2.06
2.14
2.36
2.40
46
47
48
49
50
Weibull(3.6)
Weibull(4)
SB(1,2)
TruncN(-3,1)
SB(1,1)
0.00
-.09
0.28
-.55
0.73
2.75
2.77
2.78
2.91
15
17
18
19
20
N(0,l)+N(2,1)
TruncN(-3,3)
N(0,1)
SU((0,3)
0
0
0
0
0
2.50
2.84
2.92
3.00
3.53
51
52
53
54
55
Weibull(2.2)
LoConN(0.2,3)
LoConN(0.2,5)
LoConN(0.2,7)
Weibull(2)
0.51
0.68
1.07
1.25
0.63
3.16
3.20
3.25
21
22
23
24
25
t(10)
Logistic(0,1)
SU(0,2)
Tukeyd 0)
Laplace(0,1 )
0
0
0
0
0
4.00
4.20
4.51
5.38
6.00
56
57
58
59
60
Half N(0,1)
LoConN(0.1 ,3)
LoConN(0.05,3)
LoConN(0.1,5)
SU(-1,2)
0.97
0.80
0.68
1.54
0.87
3.78
4.02
4.35
5.45
5.59
25
27
30
ScConN(0.2,3)
ScConN(0.05,3)
ScConN(0.1,3)
ScConN(0.2,5)
ScConN(0.2,7)
0
0
0
0
0
7.54
7.65
8.33
11.2
12.8
61
62
63
64
65
Chi-square(4)
LoConN(0.1,7)
LoConN(0.05,5)
Exponential(1)
LoConN(0.05,7)
1,41
1.96
1.65
2.00
2.42
6.60
7.44
9.00
10.4
31
32
33
34
35
ScConNCO.1,5)
ScConN(0.05,5)
ScConN(0.1,7)
ScConN(0.G5,7)
SU(0,1)
0
0
0
0
0
16.5
20.0
21.5
31.4
36.2
66
67
Chi-square(1)
Triangle 11(1)
Weibull(0.5)
SU(1,1)
LogN(0,1,0)
2.83
0.57
6.62
-5.3
28
29
N(0,1)+N(1,1)
.
68
69
70
6.18
2,13
2.27
2.36
2.72
3.04
3:09
6.00
15.0
16.4
87.7
93.4
114
115
Table 4.4.
Alternative distributions used in the exponential power
comparison study
No. Distribution
No. Distribution
/6i
62
N(0,1)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1)
0
0
0
0
0
1.15
1 .50
1 .51
1.53
1 .72
31
32
33
34
35
ScConN(0.05,5)
ScConN(0.1,7)
ScConN(0.05,7)
SU(0,1)
SU(0,0.9)
Tukeyd .5)
Uniform(0,1)
SB(0,0.707)
Tukey(0.7)
TrunoN(-1,1)
0
0
0
0
0
1 .75
1 .80
1 .87
1 .92
1 .94
36
37
38
39
40
t(4)
t (2)
t(1)
Cauchy(0,1)
SB(0.5333,0.5)
0
0
0
0
0.65
2.13
11 N(0,1)+N(3,1)
1 2 Tukey(3)
13 Beta(2,2)
1 4 TruncN(-2,2)
1 5 Triangle 1(1)
0
0
0
0
0
2.04
2.06
2.14
41
42
43
44
45
TruncN(-2,1 )
Beta(3,2)
Beta(2,1)
TruncN(-3,2)
Weibull(3.6)
-.32
0.29
-.57
-.18
0.00
2.27
2.36
2.40
2.65
2.72
16
17
18
19
20
N(0,1)+N(2,1)
TruncN(-3,3)
N(0,1)
SU((0,3)
t(10)
0
0
0
0
0
.46
47
48
49
50
Weibull(4)
SB(1,2)
TruncN(-3,1)
SB(1,1)
Weibull(2.2)
-.09
0.28
-.55
0.73
0.51
2.75
2.77
2.78
2.91
3.04
21
22
23
24
25
Logistic(0,1 )
SU(0,2)
Tukey(IO)
Laplace(0,1)
ScConN(0.2,3)
0
0
0
0
0
51
52
53
54
55
LoConN(0.2,3)
LoConN(0.2,5)
LoConN(G.2,7)
TruncE(0,3)
Weibull(2)
0.68
1 .07
1.25
0.99
0.63
3.16
3.20
3.22
3.25
26
27
28
29
30
ScConN(0.05,3)
ScConN(0.1 ,3)
ScConN(0.2,5)
ScConN(0.2,7)
ScConN(0.1 ,5)
0
0
0
0
0
56
57
58
59
60
LoConE(0,2,7)
LoConE(0.2,5)
Half N(0,1)
LoConN(0.1,3)
LoConE(0.2,3)
1.33
1 .25
0.97
0.80
1 .20
3.40
3.78
4.02
4.09
1
2
3
4
5
5
7
8
9
10
2.36
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
1 2.8
16.5
62
0
0
0
0
0
20.0
21 .5
31.4
36.2
82.1
3.09
3.27
117
Table 4.4 (continued)
No. Distribution
/6.
6:
6?
52
63
64
65
TruncE(0,4)
LoConN(0.05,3)
TrunGE(0,5)
GumbeKO,1 )
LoConN(0.1,5)
1 .27
0.68
1 .50
1 .14
1 .54
4.20
4.35
5.26
5.40
5.45
66
67
69
70
SU(-1,2)
LoConE(0.1,3)
Chi-square(4)
LoConE(0.1,5)
TruncE(0,6)
0.87
1 .62
1 .41
1 .88
1.68
5.59
5.86
6.00
5.02
6.29
71
72
73
74
75
LoConN(0.1,7)
LoConE(0.05,3)
LoConN(0.05,5)
LoConE(0.05,7)
ScConE(0.05,2)
1 .96
1,.85
1 .65
2,.75
2,.42
6.60
7.29
7.44
10.9
13.6
76
77
Chi-square( 1 )
ScConECO.I,2)
ScConE(0.2,2)
LoConE(0.01 ,7)
Triangle 11(1)
2,.83
2,.61
2..71
2.,94
0..57
15.0
15.3
15.6
15.9
1 6.4
ScConE(0.01 ,3)
ScConE(0.2,3)
ScConE(0.1,3)
ScConE(0.05,3)
ScConE(0.2,7)
2..59
3..57
3.,81
3.60
4.50
18.0
ScConE(0.1,5)
ScConE(0.1,7)
ScConE(0.01 ,5)
ScConE(0.05,5)
Weibull(0.5)
5.38
6.02
4.81
6.05
6.62
87.7
SU(1,1)
LogN(0,1,0)
-5;.3
6.18
93.4
11 4
68
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
23.8
29.4
29.8
31.5
48.7
56.2
55.7
68.2
118
5.
Random variâtes generators
Hoaglin and Andrews (1975) emphasized the importance of
comprehensive and concise reporting of computing methods.
The two
uniform pseudo random number generators, RANDOM (Wichmann and Hill,
1982a) and DSMCG, used in the power comparison study, will be described
in detail.
To obtain a uniform (0,1) random number using RANDOM, three
integers IX, lY and IZ are generated using three different
multiplicative congruential generators:
IX = MOD (171*IX, 30269) ,
lY = MOD (172*IY, 30307) and
(4.2)
IZ = MOD (170*IZ, 30323).
The uniform (0,1) random number, U is then given by the fractional part
of
U = IX/30269 + IY/30307 + IZ/30323.
(4.3)
The results of tests of uniformity and randomness of the generator
RANDOM can be found in Wichmann and Hill (1982b).
DSMCG (double shuffled multiplicative congruential generator)
employs six multiplicative congruential generators: PICKG, GENT1, GENT2,
GENT3, GENT4 AND PICKR.
DSMCG.
Figure 4.1 is a flowchart of the generator
119
PICKG
IGS7575
7576<IGg15134
15134<IGâ22701
GENT1
GENT2
GENTS
IRS7576
7576<IR<1513^
1513^<IR^22701
ST0RE1
ST0RE2
ST0RE3
I
I
I
IX
IX
IX
Figure 4.1.
Flowchart of DSMCG generator
IG>22701
GENT4
IR>22701
STORED
1
IX
1 20
The multiplicative congruential generators used by DSMCG are
IG = MOD (171*IG, 30269) ,
(4.4)
IR = MOD (172*IR, 30307) and
(4.5)
IX = MOD (170*IX, 30326) .
(4.6)
PICKG uses (4.4), PICKR uses (4.5) and GENT1, GENT2, GENT3 and GENT4 use
(4.5) to generate a uniform integer.
The steps in generating a uniform
(0,1) random number using DSMCG are as follows:
6.
Algorithm for DSMCG
[1] Supply 5 seeds in the range [10000,30000].
[2]
An initialization routine is run so that GENTi (i=1,2,3,4) will
each generate a random number and store it in STOREi.
[3] Generate IG using PICKG.
[4] Select one of the GENTi based on IG (see Figure 4.1) and generate
IX,
[5] Generate IR using PICKR.
[6] Select one of the STOREi based on IR (see Figure 4.1) and deliver
IX from STOREi as IX/30323.
[7] Put IX generated in [4] in STOREi.
[6] Go to [3] for the next IX.
Some tests of randomness on DSMCG indicated that DSMCG has very
good randomness properties.
properties.
It also has moderately good uniform
The results of these tests can be found in Gan (1985).
The
power comparison results obtained from these two generators are very
similar.
Appendix B contains a complete description of the generation
121
of random numbers from the various distributions used in the power
comparison study.
7.
Machines used in the power comparison study
The IBM personal computer (IBM PC) with an 8087 coprocessor was
used for the entire power comparison study.
All the computer programs
needed were developed from scratch since reliable subroutines like those
provided by IMSL were not available on the IBM PC.
All programs were
written in FORTRAN 77 and a description of FORTRAN 77 can be found in
Microsoft FORTRAN Reference Manual (1983).
Each subroutine developed
was thoroughly tested, using at least two different methods.
B.
Results of the Power Comparison
This section summarizes the results from the Monte Carlo power
comparison.
Some general and specific observations concerned with the
performance of these statistics will be made.
Each class of statistics
will be studied separately and then an overall comparison will be made.
1.
Comparisons among statistics of the correlation coefficient type
The numbers in Tables 4.5 - 4.12 indicate the proportions of
simulated samples for which the null distribution was rejected.
A
number in the column for W is printed in bold if this number is greater
than or equal to the corresponding number in the r^ column.
The
and
the r^ statistics were contrasted by printing in bold the larger of the
122
two numbers for each alternative distribution.
In the event of a draw,
both the numbers were printed in bold.
The Shapiro-Wilk statistic is the most powerful in detecting
alternative distributions with kurtosis less than 3-
However, the r^
statistic is more powerful than the Shapiro-Wilk statistic in detecting
symmetrical alternative distibutions with kurtosis greater than 3,
In
order to understand the difference between the Shapiro-Wilk and the r^
statistics, the coefficients used in computing the Shapiro-Wilk and the
r^ statistics ought to be examined.
The Shapiro-Wilk and the r^
statistics can be expressed as
(I "iX ):
W =
,
(it.7)
(I r X ):
^
.
- X):
(4.8)
I(X. - X):
and
r^ =
The coefficients w^ and r^ for certain selected sample sizes are listed
in Table 4.5.
The r^ statistic puts more weight in the tails than at
the center of the null distribution in the sense that ^rvEX. > ^w^EX^
although w^ i r^ and w^ g r^.
123
Table 4.5.
Comparison between the coefficients used in computing the
Shapiro-Wilk and r^ statistics
Coefficients (i=[n/2]+1,...,n)
5
10
.2413
EÏj .4950
.6460
.6646
1.1630
r. .0458
.0399
.1227
.1399
.1224
.3758
^ .2876
4
20
.2425
.2141
.6561
.3644
.3291
1.0014
.5355
.5739
1.5388
OC
r=.O
or
.0154 .0463 .0780 .1108 .1457 .1834 .2255 .2748 .3370 .4295
.0140 .0422 .0711 .1013 .1334 .1686 .2085 .2565 .3211 .4734
"i
EX^ .0620 .1870 .3149 .4483 .5903 .7454 .9210
1.867
The Shapiro-Wilk statistic performed better than the r^ statistic
in detecting skewed distributions except the location contaminated
normal distributions.
The Shapiro-Wilk statistic generally performed
better than the k^ statistic.
The
statistic generally performed better than the k^ statistic
in detecting alternative distributions with tails heavier or longer than
that of the null distribution.
As the kurtosis of the null distribution
increases from 3 to 9, the relative performance of the
improved.
statistic
The r^ statistic is more powerful in detecting alternative
distributions like the location or scale contaminated normal
distributions for the normal power comparison, and location or scale
contaminated exponential distributions for the exponential power
comparison.
1 24
1.0
0
Figure 4.2.
X
Sketch of the cumulative distribution function of
the standard normal random variable
Figure 4.2 is a sketch of the cumulative distribution function of
the standard normal random variable.
A small change in x near the
location of the normal distribution causes a larger change in the cdf
F(x) than an equivalent change in x further out in the tail.
This is
clear from the diagram since the slope of F(') is the largest at the
location.
The k^ statistic which is based on the distribution function,
is thus sensitive to deviations occurring near the location of the null
distribution.
To understand the r^ statistic, a sketch of the density
function of the standard normal random variable is helpful.
Figure 4.3
is a sketch of the density function of the standard normal random
variable.
The same change in the probability p causes a greater change
—1
of the percentile F (p) at the tails than at the location.
Consequently, the r^ statistic which is based on percentiles is more
sensitive to deviations from the tails of the hypothesized distribution.
125
0
Figure 4.3.
X
Sketch of the density function of the standard
normal random variable
For the normal case, the
statistic is slightly more powerful
than the r^ statistic in detecting very close alternative distribution
like the Weibull(4) distribution.
Unlike the normal case where the
kurtosis provided a shape division for the performance of the k^ and r^
statistics, it is less obvious for the Gumbel case.
For the Gumbel
case, the k^ statistic is usually more powerful than the r^ statistic
except for some alternative distributions with large kurtotis values.
The k^ statistic is much more powerful than the r^ statistic for most
alternative distributions in the exponential case.
The r^ statistic
performed better than the k^ statistic for some alternative
distributions with very small or large kurtosis values.
For certain
alternative distributions, the k^ statistic is more powerful than the r^
statistic when the sample size is small but the trend reverses when the
sample size is large.
126
Table 4.6.
Empirical 5? level power (in % xlO) for tests of departure
from the normal distribution
n = 20
Sample sizes
Statistics
W
No. Distribution
1
2
3
U
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
N(0,1)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1)
Tukeyd .5)
Uniforra(0,1 )
SB(0,0.707)
Tukey(0.7)
TruncN(-l,1)
N(0,1)+N(3,1)
Tukey(3)
Beta(2,2)
TruncN(-2,2)
Triangle 1(1)
N(0,1)+N(2,1)
TruncN(-3,3)
N(0,1)+N(1,1)
SU((0,3)
t(10)
Logistic(0,1)
SU(0,2)
Tukey(IO)
Laplace(0,1)
ScConN(0.2,3)
ScConN(0.05,3)
ScConN(0.1,3)
ScConN(0.2,5)
ScConN(0.2,7)
ScConN(O.T,5)
ScConN(0.05,5)
ScConN(0.1,7)
ScConN(0.05,7)
SU(0,1)
SU(0,0.9)
r:
n = 100
n = 50
k:
W
r:
k:
r:
k:
Bz
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1 .50
1 .51
1.63
1 .72
1.75
1 .80
1 .87
1 .92
1 .94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
1 2.8
16.5
20.0
21.5
31 .4
36.2
82.1
1000
729
759
443
403
267
189
136
135
in
135
77
44
32
38
43
43
50
66
96
108
128
816
244
381
196
318
704
845
525
340
689
457
429
508
1000 1000
284 438
356 787
109 247
117 438
46 141
33 118
81
21
29
93
14
79
31 156
20
50
11
39
10
33
41
11
47
23
50
36
45
51
86
61
119 81
126 84
143 97
863 938
316 226
452 255
211 120
346 192
766 627
874 803
562 406
347 234
704 587
475 382
504 380
575 475
1000
1000
1000
992
942
920
876
746
6l 6
624
402
390
224
84
82
68
34
44
64
118
118
158
996
422
604
316
446
948
988
822
608
930
770
690
818
1000 1000
954 908
974 1000
656 662
638 930
320 388
284 372
142 266
104 206
78 152
112 364
70
30
12
68
4
46
14
48
12
50
30
54
64
54
138 66
228 102
250 82
290 11 4
1000 1000
664 488
784 458
468 162
610 250
976 890
996 982
908 654
694 402
952 850
808 596
832 666
922 814
1000
1000
1000
1000
1000
1000
1000
1000
984 1000
912 788
832 700
644 6l 2
496 444
480 376
360 764
176 1 44
88 176
8
60
8
40
15 120
4
32
56
.36
176
68
252 64
460 136
472 158
1000 1000
848 776
932 696
660 188
844 392
996 988
1000 1000
980 856
924 564
1000 980
948 780
964 892
996 964
127
Table 4.7.
Empirical 5? level power (in % xlO) for tests of departure
from the normal distribution
n = 20
Sample sizes
Statistics
No. Distribution
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
t(4)
t(2)
t(1)
Cauchy(0,1 )
38(0.5333,0.5)
TruncN(-2,1 )
Beta(3,2)
Beta(2,1)
TruncN(-3,2)
Weibull(3.6)
Weibull(4)
SB(1,2)
TruncN(-3,1)
SB(1,1)
Weibull(2.2)
LoConN(0.2,3)
LoConN(0.2,5)
LoConN(0.2,7)
Weibull(2)
Half N(0,1)
LoConN(0.1,3)
LoConN(0.05,3)
GumbeKO,1 )
LoConN(0.1,5)
SU(-1,2)
Chi-Square(4)
LoConN(0.1,7)
LoConN(0.05,5)
Exponential(1)
LoConN(0.05,7)
Chi-square(1)
Triangle 11(1)
Weibull(0,5)
SU(1,1)
LogN(0,1,0)
W
r^
n = 50
1<==
n =100
r^
r:
560 296
906 804
996 998 996
994 998 998
1000 972 964
416 86 1 66
244 56 108
886 576 584
48
66 26
14
50
50
40
18
38
64
90 42
486 270 210
806 602 552
260 192 152
602 516 594
1000 1000 1000
1000 1000 1000
400 310 204
944 850 722
470 574 370
310 444 190
618 606 470
978 988 900
390 492 296
958 890 804
992 992 974
854 898 536
1000 998 986
936 940 790
1000 1000 1000
884 590 580
1000 1000 1000
972 980 950
1000 1000 998
768
1000
1000
1000
1000
376
W
k"
Ba
0
218
0
507
0
873
0
868
0.65 2.13
725
96
-;32 2.27
0.29 2.36
63
316
-.57 2.40
-;18 2.65
46
0.00 2,72
44
-.09 2.75
40
0.28 2.77
56
164
-.55 2.78
0.73 2.91
312
0.51 3.04
111
0.68 3.09
273
1.07 3.16
887
1.25 3.20
985
0.63 3.25
173
0.97 3.78
445
0.80 4.02
258
0.68 4.35
210
1.14 5.40
314
1.54 5.45
775
0.87 5.59
216
1.41 6.00
519
1.96 6.60
878
1.65 7.44
539
2.00 9.00
832
2.42 10.4
649
2.83 15.0
989
0.57 16.4
301
6.62 87.7 1000
745
-5.3 93.4
6.18 114
934
255 158
563 442
898 865
898 869
421 555
44
78
26
54
158 225
25
45
26
52
61
32
62
37
105 106
205 232
88
89
222 263
785 848
978 966
133 127
329 289
270 208
225 133
282 240
772 611
220 162
437 384
875 803
559 336
738 703
652 551
959 952
151 221
997 996
727 676
893 860
372
812
448
948
1000
1000
1000
388
196 244
988 920
28
60
24
56
20
56
136 164
672 488
956 900
388 284
884 908
1000 1000
1000 1000
592 444
996 940
864 664
712 328
936 784
1000 992
712 452
996 988
1000 1000
988 832
1000 1000
1000 932
1000 1000
980 912
1000 1000
1000 1000
1000 1000
128
Table 4.8.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution
n = 20
Sample sizes
Statistics
No. Distribution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
N(0,1)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1)
Tukey(1.5)
Uniform(0,l)
SB(0,0.707)
Tukey(0.7)
TruncN(-1,1)
N(0,1)+N(3,1)
Tukey(3)
Beta(2,2)
TruncN(-2,2)
Triangle 1(1)
N(0,1)+N(2,1)
TruncN(-3,3)
N(0,1)+N(1,1)
N(0,1)
SU((0,3)
t(10)
Logistic(0,1)
SU(0,2)
TukeydO)
Laplace(0,1)
ScConN(0.2,3)
ScConN(0.05,3)
ScConNCO.1,3)
29 ScConNCO.2,5)
30
31
32
33
34
ScConN(0.2,7)
ScConN(0.1,5)
ScConNCO.05,5)
ScConN(0.1,7)
ScConN(0.05,7)
35 SU(0,1)
n = 50
n = 100
r"
k2
r:
kz
r2
kz
991
400
464
226
243
1 64
1 42
116
92
104
1 24
1000
1000
543
934
1000
944
996
814
898
674
604
548
540
494
558
462
396
378
406
352
382
384
422
466
500
538
504
1000
760
792
532
626
936
982
1000
1000
1000
1000
1000
996
984
960
972
940
904
908
804
1000
1000
1000
1000
1000
62
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1 .50
1 .51
1 .63
1 .72
1 .75
1 .80
1 .87
1.92
1 .94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
1 2.8
16.5
20.0
21.5
31.4
36.2
83
70
86
76
87
106
109
114
138
174
192
190
766
310
386
249
318
765
337
462
263
240
215
200
206
236
167
160
176
174
156
173
170
196
223
253
253
279
963
441
422
270
334
936
81 2
676
598
544
436
426
388
384
298
244
184
204
192
176
222
200
264
298
318
340
972
500
680
432
580
930
640
656
644
468
564
560
568
668
620
636
1000
808
880
696
824
336
656
646
505
560
421
874
734
457
503
932
782
724
1000
1000
960
924
996
952
852
940
700
816
565
398
656
652
789
515
990
858
814
952
960
900
880
852
916
816
756
716
700
708
664
704
728
764
832
824
844
1000
968
952
764
872
1000
1000
960
904
984
936
988
129
Table 4.9.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution
n = 20
Sample sizes
Statistics
No. Distribution
35 SU(0,0.9)
37 t(4)
38 t(2)
39 t(1)
40 CauGhy(0,l )
41 88(0.5333,0.5)
42 TruncN(-2,l)
43 Beta(3,2)
44 Beta(2,l)
45 TruncN(-3,2)
45 Weibull(3.6)
47 Weibull(4)
48 SB(1,2)
49 TruncN(-3,l)
50 SB(1,1)
51 Weibull(2.2)
52 LoConN(0.2,3)
53 LoConN(0.2,5)
54 LoConN(0.2,7)
55 Weibull(2).
56 Half N(0,1)
57 LoConN(0.1,3)
58 LoConN(0.05,3)
59 LoConN(0.1,5)
60 SU(-1,2)
51 Chi-square(4)
62 LoConN(0.1,7)
53 LoConN(0.05,5)
54 ExponentiaK1)
55 LoConN(0.05,7)
55 Chi-square(l )
57 Triangle 11(1)
68 Weibull(0.5)
59 SU(1,1)
70 LogN(0,l,0)
r^
/6i
82
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.58
1.07
1.25
0.63
0.97
0.80
0.58
1.54
0.87
1.41
1.96
1.55
2.00
2.42
2.83
0.57
5.62
-5.3
6.18
82.1
523
291
525
862
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
844
33
251
239
590
117
70
132
35
395
5
14
13
147
506
15
13
41
53
279
79
68
731
261
225
594
533
10
874
897
512
n = 50
r^
569
358
550
872
846
291
349
299
591
210
177
218
92
41 4
56
69
59
306
701
47
64
86
133
1 54
104
50
404
157
247
234
704
79
940
912
484
795
504
844
998
994
136
762
656
972
352
188
282
46
906
2
8
4
266
988
4
12
18
64
382
890
660
892
994
998
800
746
686
940
500
440
508
168
852
96
120
84
794
980
72
142
164
222
396
80
158
85
136
988
708
454
302
582
312
914
384
860
998
12
218
1000 1000
998 1000
768
874
n = 100
r=^
k:
960
760
972
1000
1000
728
1000
996
1000
888
558
692
136
1000
8
28
0
836
1000
8
4
15
76
736
1 40
116
1000
704
560
996
996
64
1000
1000
952
1000
904
976
1000
1000
996
980
964
1000
800
728
808
328
988
176
188
156
996
1000
120
316
252
420
684
260
188
940
472
920
644
1000
436
1000
1000
1000
130
Table 4.10.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution
n = 20
Sample sizes
Statisti OS
No. Distribution
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,l)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,l)
6 TukeyCl.S)
7 Uniforra(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,1)+N(3,l)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
1 5 Triangle 1(1)
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 SoConN(0.2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN{0,1,5)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
n = 100
k^
rz
k:
r^
k2
955
973
1000
692
728
600
600
641
580
996
990
1000
970
611
555
529
487
505
485
511
451
477
419
417
438
439
450
441
442
453
472
782
524
532
472
508
685
793
640
625
634
625
650
682
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
632
1000
824
1000
1000
r^
/Si
n = 50
62
1.15
1 .50
1.51
1.63
1.72
1 .75
1 .80
1 .87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
982
964
982
974
968
966
701
964
703
747
800
818
817
783
858
864
866
874
881
880
993
903
884
874
851
888
914
862
938
972
948
924
906
892
878
850
824
864
822
830
982
856
834
810
344
944
970
944
982
942
976
962
966
974
968
986
988
996
994
998
1000
994
1000
1000
998
1000
1000"
1000
1000
998
1000
994
998
1000
998
998
996
1000 1000
996 1000
980 1000
980 1000
988 1000
1000 1000
980 1000
964 1000
980
976
996
1000
996
1000
1000
1000
1000
1000
131
Table 4.11.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution
n = 20
Sample sizes
Statistics
No. Distribution
0
31 ScConN(0.05,5)
0
32 ScConN(0.1,7)
0
33 ScConN(0.05,7)
3M SU(0,1)
0
0
35 SU(0,0.9)
36 t(it)
0
0
37 t(2)
38 t(1)
0
0
39 Cauchy(0,1)
40 38(0.5333,0.5) 0.,65
41 TruncN(-2,1)
.32
42 Beta(3,2)
0.29
43 Beta(2,1)
.57
44 TruncN(-3,2)
18
45 Weibull(3.5)
0.GO
46 Weibull(4)
09
47 SB(1,2)
0.28
48 TruncN(-3,1)
55
49 SB(1,1)
0.73
50 Weibull(2.2)
0.51
51 LoConN(0.2,3) 0.68
52 LoConN(0.2,5) 1.07
53 LoConN(0.2,7) 1. 25
54 TruncE(0,3)
0.99
55 Weibull(2)
0.63
56 LoConE(0.2,7) 1.33
57 LoConE(0.2,5) 1.25
58 Half N(0,1)
0.97
59 LoConN(0.1,3) 0.80
60 LoConE(0.2,3) 1.20
n = 50
n = 100
r^'
k"
r"
kz
r==
kz
541
723
627
566
848
873
858
912
910
874
910
957
960
107
908
892
956
885
851
861
709
951
225
589
551
359
475
71
524
82
95
187
680
108
904
962
940
998
1000
998
998
1000
998
998
1000
1000
288
1000
1000
1000
998
1000
1000
990
1000
498
944
912
866
984
86
900
156
144
360
968
154
992
996
996
984
1000
1000
1000
1000
1000
1000
1000
1000
1000
632
1000
1000
1000
1000
1000
1000
1000
1000
824
1000
996
996
1000
156
1000
260
308
652
1000
256
Bz
20.0
21 .5
31.4
36.2
82.1
632
498
623
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.22
3.25
3.27
3.40
3.78
4.02
4.09
854
857
69
751
711
920
550
466
477
246
855
38
152
69
40
161
13
107
36
32
48
133
46
874
908
846
932
994
990
286
1000
1000
1000
964
854
942
650
998
88
322
150
96
554
6
260
38
38
50
194
40
992
984
1000
1000
1000
796
1000
1000
1000
1000
1000
1000
936
1000
264
704
396
428
976
32
512
44
48
48
348
44
132
Table 4.12.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution
n = 20
Sample sizes
Statistics
No. Distribution
r==
r"
/6i
61 TruncE(0,4)
1.27
62 LoConN(0.05,3) 0.68
1.50
63 TruncE(0,5)
64 Gumbel(0,1)
1.14
65 LoConN(0.1,5) 1.54
66 SU(-1,2)
0.87
67 LoConE(0.1,3) 1.62
68 Chi-Square(4) 1.41
69 LoConE(0.1,5) 1.88
1 .68
70 TruncE(0,6)
71 LoConN(0.1,7) 1.96
72 LoConE(0.05,3) 1.85
73 LoConN(0.05,5) 1.65
74 LoConE(0.05,7) 2.75
75 ScConE(0.05,2) 2.42
76 Chi-square(1) 2.83
77 ScConE(0.1,2) 2.61
78 ScConE(0.2,2) 2.71
79 LoConE(0.01 ,7) 2.94
80 Triangle 11(1) 0.57
81 ScConE(0.01,3) 2.59
82 ScConE(0.2,3) 3.57
83 ScConE(0.1,3) 3.81
84 SoConE(0.05,3) 3.60
85 ScConE(0.2,7) 4,50
86 ScConE(0.1,5) 5.38
87 ScConE(0.1,7) 6.02
88 ScConE(0.01,5) 4.81
89 ScConE(0.05,5) 6.05
90 Weibull(0.5)
6.62
91 SU(1,1)
-5.3
6.18
92 LogN(0,1,0)
n = 50
n = 100
r^
k2
0
76
1000
68
1000
1000
1000
84
660
112
36
1000
72
1000
32
4.20
4.35
5.26
5.40
5.45
5.59
5.86
6.00
6.02
6.29
6.60
7.29
7.44
10.9
13.6
15.0
15.3
15.6
15.9
16.4
18.0
23.8
29.4
29.8
31.5
48.7
56.2
66.7
68.2
87.7
93.4
114
131
209
38
40
42
22
349
38
269
46
785
39
471
585
708
52
177
65
48
590
50
701
59
76
51
48
374
60
90
162
223
210
598
316
89
78
54
58
64
40
280
51
209
54
58
42
156
48
57
116
128
56
48
20
212
532
644
832
70
180
152
133
62
99
77
36
218
138
84
439
978
1000
359
509
102
251
179
333
51
99
704
894
128
490
556
160
548
920
316
1000
1000
1000
256
740
48
544
59
535
98
320
336
222
493
964
122
649
994
227
1000
402
12
236
27
58
4
408
4
1 04
102
330
56
18
48
10
582
42
776
50
998
46
872
960
978
76
376
62
40
976
60
990
36
508
54
58
44
108
788
46
172
974
1000
206
628
0
144
1 40
580
60
20
24
0
968
908
1000
524
40
52
864
860
88
1000
1000
1000
44
324
1000
1000
• 424
133
2.
Comparison between various versions of Pearson chi-square
statisti c
The numbers in Tables 4.13 ~ 4.33 indicate the proportion of
simulated samples for which the null distribution was rejected.
largest number in each line was printed in bold.
The
Only the power results
for the Pearson chi-square statistic are listed in Tables 4.13 - 4.33.
The X3 statistic is generally the most powerful when the sample size is
20.
When the sample size increases to 50, the X5 statistic becomes the
dominant statistic.
When the sample size is 100, it is harder to pin
point the best chi-square statistic, however, any chi-square statistic
with expected cell count around 8 is probably optimum.
for all the three null hypotheses investigated.
This trend holds
The power study
suggests that the number of cells ought to increase with the sample size
to achieve optimum power.
However, the choice of the number of cells
which provides optimum power depends somewhat on the alternative
distributions.
The chi-square statistics with large expected cell
counts are the most powerful in detecting scale contaminated normal
distributions for the normal power study.
The scale contaminated normal
distributions are very similar in shape to the normal distribution.
In
order to distinguish between the normal distribution and a close
alternative distribution, the observed cell counts must be sufficiently
large to provide deviation from the expected cell counts for the null
distribution.
Thus, the recommendations of Mann and Wald (1942), the
X7, XI3 and the XI7 statistics performed very well here.
For
alternative distributions that differ in shape from the normal
distribution like beta(0.5,0.5), uniform or exponential distribution, a
134
Table 4.13.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution (sample size = 20)
Statist!OS
No. Distribution
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukey(1.5)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
15 N(0,1)+N(2,1)
17 TrunoN(-3,3)
18 N(0,1)+N(1,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
XVa
XI
X3
884
988
511
375
531
X5
X7
999
997
957
264
209
529
02
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1 .50
1.51
1.53
1.72
1 .75
1 .80
1 .87
1 .92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5.38
5.00
7.54
7.65
8.33
11.2
12.8
15:5
20.0
21.5
31 .4
35.2
82.1
496
247
175
131
111
91
61
65
70
51
53
40
42
45
34
29
32
33
46
28
33
809
56
62
47
49
175
312
126
87
244
155
99
152
76
48
41
37
41
26
29
40
31
39
33
31
856
55
73
59
57
255
449
172
101
337
89
185
481
99
195
74
55
42
45
45
75
34
25
18
22
38
25
30
37
44
47
50
770
135
157
74
127
459
689
125
469
682
315
190
493
318
213
341
140
187
255
329
219
158
175
258
118
113
69
91
83
59
82
88
77
58
111
53
45
32
43
52
43
51
48
68
63
72
884
136
163
196
512
322
268
333
158
315
93
84
56
79
55
138
40
34
29
35
44
32
32
50
65
48
71
777
201
181
93
132
472
678
326
198
503
340
290
392
135
Table 4.14.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution (sample size = 20)
Statistics
No. Distribution
36 t(4)
37 k(2)
38 t(l)
39 Cauchy(0,1)
40 86(0.5333,0.5)
41 TruncN(-2,l)
42 Beta(3,2)
43 Beta(2,1)
44 TruncN(-3,2)
45 Weibull(3.6)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 Weibull(2)
55 Half N(0,1)
56 LoConN(0.1,3)
57 LoConN(0.05,3)
58 GumbeKO,!)
59 LoConN(0.1,5)
60 SUM,2)
5l Chi-Square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponentiald )
65 LoConN(0.05,7)
66 Chl-square(l)
67 Triangle 11(1)
68 Weibull(0.5)
69 SU(1,1)
70 LogN(0,1,0)
/B.
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.63
0.97
0.80
0.68
1.14
1.54
0.87
1.41
1.96
1.65
2.00
2.42
2.83
0.57
6.62
-5.3
6.18
X'/a
XI
X3
X5
X7
44
160
569
59
199
103
94
333
325
774
774
773
773
1 45
122
370
794
800
652
647
394
55
48
127
37
34
42
31
64
94
52
65
Ba
572
2.13
2.27
2.36
2.40
2.65
2.72
2,75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.40
5.45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
324
• 54
63
113
45
31
33
45
52
76
37
68
219
478
50
93
54
30
62
1 66
47
95
319
92
258
177
766
89
953
226
455
298
679
51
115
63
45
81
215
62
138
415
109
364
245
831
116
964
328
586
483
71
59
176
43
55
57
50
78
138
68
143
578
898
81
239
98
77
147
385
104
305
703
211
655
456
947
148
999
517
810
29
24
54
29
21
26
28
29
53
32
71
372
765
41
51
59
61
76
286
81
101
628
187
1 82
442
436
59
802
420
457
206
44
35
69
42
40
42
35
46
90
42
57
233
414
43
81
81
70
58
266
82
82
490
180
148
403
335
81
623
262
288
136
Table 4.15.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution (sample size = 50)
Statistics
No. Distribution
/6:
1
2
3
4
5
6
7
8
9
10
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
N(0,1)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1)
Tukeyd.S)
Uniform(0,1)
SB(0,0.707)
Tukey(0.7)
TruncN(-1,1)
N(0,1)+N(3,1)
12 Tukey(3)
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Beta(2,2)
TruncN(-2,2)
Triangle 1(1)
N(0,1)+N(2,1)
TruncN(-3,3)
N(0,1)+N(1,1)
SU((0,3)
t(10)
Logistic(0,1)
SU(0,2)
Tukey(IO)
Laplace(0,1 )
ScConN(0.2,3)
ScConN(0.05,3)
ScConN(0.1,3)
ScConN(0.2,5)
ScConN(0.2,7)
ScConN(0.1,5)
ScConN(0.05,5)
ScConN(0.1,7)
ScConN(0.05,7)
SU(Q,1)
SU(0,0.9)
X'/z
XI
X3
X5
XlO
X13
1000
1000
1000
738
978
1000
• 948
1000
956
1000
922
448
Bz
1.15
1 .50
1 .51
1 .63
1.72
1 .75
1.80
1 .87
1 .92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.531.4
36.2
82.1
472
234
242
238
116
108
98
92
104
42
48
26
54
32
36
22
34
28
66
986
70
72
44
74
346
634
222
152
458
286
176
260
720
636
948
724
712
976
412
436
464
402
662
708
692
368
462
410
246
202
202
218
196
168
110
182
108
98
42
50
58
32
46
36
40
44
50
200
56
50
34
44
234
198
152
120
134
206
76
56
802
276
242
212
162
98
238
1000
WOO
166
236
264
134
322
184
152
156
136
158
86
68
50
48
36
46
38
64
40
64
994
116
136
70
80
502
836
328
212
620
390
218
358
216
88
106
52
72
20
46
40
50
44
60
48
80
74
62
80
90
34
130
168
730
946
486
304
770
498
352
520
812
968
576
346
810
566
470
6l 6
54
1000
350
360
174
234
876
970
662
422
840
608
564
718
690
982
68
62
46
54
46
46
42
44
80
70
88
980
332
408
184
238
888
972
670
424
858
614
602
754
137
Table 4.16.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution (sample size = 50)
Statistics
No. Distribution
36 t(4)
37 t(2)
38 t(1)
39 Cauchy(0,1)
40 38(0.5333,0.5)
41 TruncN(-2,1)
42 Beta(3,2)
43 Beta(2,1)
44 TrunoN(-3,2)
45 Weibull(3.5)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1 )
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 Weibull(2)
55 Half N(0,1)
56 LoConN(0.1,3)
57 LoConN(0.05,3)
58 Gumb6l(0,1)
59 LoConN(0.1,5)
60 SU(-1,2)
61 Chi-Square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponential(1 )
65 LoConN(0.05,7)
66 Chi-square(1 )
67 Triangle 11(1)
68 Weibull(0.5)
69 SU(1,1)
70 LogN(0,1,0)
X'/z
XI
X3
X5
XlO
XI3
82
322
90
450
940
940
880
104
84
304
50
22
46
136
180
592
978
978
880
128
86
398
54
678
986
990
946
254
744
272
762
990
994
532
62
0
0
0
0
0.65
2.13
2.27
0.29 2.36
-.57 2.40
-.18 2.65
0.00 2.72
-.09 2.75
0.28 2.77
-.55 2.78
0.73 2.91
0.51 3.04
0.68 3.09
1.07 3.16
1.25 3.20
0.63 3.25
0.97 3.78
0.80 4.02
0.68 4.35
1.14 5.40
1.54 5:45
0.87 5.59
1.41 6.00
1.96 6.60
1 .65 7.44
2.00 9.00
2.42 10.4
2.83 15.0
0.57 16.4
6.62 87.7
-5.3 93.4
6.18
114
-.32
896
874
710
58
50
210
44
40
32
44
104
114
44
90
446
906
52
220
94
38
78
388
80
1 42
740
188
602
408
128
162
54
116
726
986
72
346
112
72
92
526
82
248
840
262
788
494
998
998
58
40
58
36
132
294
76
222
940
1000
88
464
130
86
174
666
124
480
914
344
104
88
472
44
46
62
44
140
384
94
288
966
1000
• 112
638
776
698
1 22
212
948
400
964
966
492
666
478
468
804
952
11 6
830
820
1000
484
1000
858
990
390
402
794
560
920
772
984
90
56
196
866
998
142
998
1000
1000
88
32
72
148
668
986
300
64
254
942
60
176
40
46
38
186
78
260
786
1000
1 26
72
746
124
938
184
466
58
48
162
50
52
44
34
60
66
130
1 46
96
154
670
1000
992
996
178
158
98
326
1000
868
934
176
986
790
844
138
Table 4.17.
Empirical 5^ level power (in % xlO) for tests of departure
from the normal distribution (sample size = 100)
Statistics
No. Distribution
1 N(0,1)+N(10,l)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
5 Tukey(1.5)
7 Uniform(0,1)
8 SB(0,0,707)
9 Tukey(0.7)
1 0 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
1 5 Triangle 1(1)
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
X'/z
XI
X3
X5
X10
XI7
1000
1000
1000
1000
1000
944
920
1000
1000
1000
996
984
868
1000
716
696
340
456
356
200
180
208
144
132
56
36
40
60
32
24
16
28
36
48
1000
1000
1000
972
388
972
760
404
372
284
220
40
48
44
84
96
340
256
156
116
500
76
92
28
56
60
44
56
40
36
52
112
1000
268
1000
1000
320
308
104
196
368
124
260
964
492
452
152
320
976
1000
1000
780
476
952
760
672
796
• 804
568
980
6a
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1.50
1.51
1.63
1.72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.5
31.4
36.2
82,1
628
656
520
344
308
364
252
200
116
52
52
64
56
48
36
44
64
48
1000
1000
• 124
172
60
132
612
920
184
208
56
136
780
368
188
724
428
264
504
268
872
564
372
368
580
988
972
908
828
788
572
584
452
460
376
316
168
72
424
412
456
224
1 48
64
64
68
88
60
100
44
52
40
24
68
56
928
996
656
416
948
700
608
760
936
984
1000
1000
876
704
836
760
852
588
508
68
132
48
32
56
36
80
64
76
100
156
1000
576
608
220
432
980
1000
868
624
992
832
824
912
139
Table 4.18.
Empirical 5% level power (in ? xlO) for tests of departure
from the normal distribution (sample size = 100)
Statistics
No. Distribution
36 t(4)
37 t(2)
38 t(1)
39 Cauchy(0,1)
40 38(0.5333,0.5)
41 TruncN(-2,l)
42 Beta(3,2)
43 Beta(2,l)
44 TruncN(-3,2)
45 Weibull(3.6)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SBd.l)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 Wei bull(2)
55 Half N(0,1)
56 LoConN(0.1,3)
57 LoConN(0.05,3)
58 Gumbel(0,1)
59 LoConN(0.1,5)
50 SU(-1,2)
51 Chi-Square(4)
52 LoConN(0.1,7)
53 LoConN(0.05,5)
54 ExponentiaK1)
55 LoConN(0.05,7)
56 Chi-square(1)
57 Triangle II( 1 )
58 Weibull(0.5)
59 sue 1,1)
70 LogN(0,1,0)
XI/2
XI
X3
X5
XlO
88
160
624
180
776
240
320
376
1000
996
992
1000
1000
1000
900
1000
1000
996
920
980
932
144
100
188
124
332
488
276
156
780
44
44
28
40
1 28
200
52
124
720
64
820
1000
1000
1000
376
196
840
84
68
948
988
1000
84
432
1 56
72
144
708
72
324
1 28
620
224
92
240
840
132
200
816
244
148
348
944
548
984
512
988
812
X17
62
0
0
0
0
0.55
-.32
480
980
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
2.13
2.27
2.35
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
0.68
3.09
0:29
1.07 3.16
1.25 3.20
0.63 3.25
0.97 3.78
0.80 4.02
0.58 4.35
1.14 5.40
1.54 5.45
0.87 5:59
1.41 6.00
1.96 5.60
1.55 7.44
2.00 9.00
2.42 10.4
2.83 15.0
0.57 16.4
5.52 87.7
-5.3 93.4
114
6.18
976
440
888
76
64
60
75
75
28
52
212
304
376
572
100
152
432
232
1000
1000
168
1000
876
1000
760
1000
968
1000
760
972
• 575
172
532
1000
1000
224
620
1000
1000
1000
1000
208
132
848
856
436
316
304
1 40
304
448
952
444
968
272
1 44
440
936
204
912
1000
512
796
52
44
44
64
1 44
1000
912
1000
836
1000
1000
356
76
256
668
575
744
160
100
412
688
1000
1000
372
1000
1000
886
1000
32
72
152
112
708
72
48
48
1000
1000
900
980
1000
256
916
1000
736
1000
940
1000
672
1000
988
1000
184
336
1 40
540
180
684
1000
764
984
956
1000
376
1000
1000
1000
140
Table 4.19.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size = 20)
Statistics
X'/g
No. Distribution
/6i
1 N(0,1)+N(10,l)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukey(1.5)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,l)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 N(0,1)
20 SU((0,3)
21 t(10)
22 Logistic(0,1)
23 SU(0,2)
24 Tukey(IO)
25 Laplace(0,1)
26 ScConN(0.2,3)
27 ScConN(0.05,3)
28 ScConN(0.1,3)
29 ScConN(0.2,5)
30 ScConN(0.2,7)
31 ScConN(0.1,5)
32 ScConN(0.05,5)
33 ScConN(0.1,7)
34 ScConN(0.05,7)
35 SU(0,1)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
XI
X3
X5
972
196
456
119
207
71
74
72
55
71
X7
62
1.15
1.50
1.51
1.63
1.72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.5
31 .4
36.2
916
530
295
205
181
141
1 42
92
80
90
79
69
61
48
54
49
45
59
58
67
71
82
71
849
102
95
70
91
287
445
244
150
304
236
167
983
986
548
464
355
261
244
242
171
161
117
83
101
1 00
73
83
56
57
58
48
71
54
83
76
94
374
197
190
176
86
879
162
1 48
1 08
141
393
541
314
194
385
275
213
616
145
150
1 20
128
152
170
817
147
154
845
286
328
315
185
225
167
231
578
687
576
736
418
253
504
347
402
83
110
112
49
45
51
61
60
78
75
70
116
132
127
134
134
144
182
187
189
913
299
324
226
253
565
695
430
330
101
92
91
82
81
138
46
49
38
61
56
72
60
97
107
193
125
930
243
577
152
414
273
269
492
353
384
495
351
382
126
293
141
Table 4.20.
Empirical 5? level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size = 20)
Xi/z
Statistics
No. Distribution
36 31/(0,0.9)
37 t(4)
38 t(2)
39 t(1)
40 Cauchy(0,1)
41 58(0.5333,0.5)
42 TruncN(-2,1)
43 Beta(3,2)
44 Beta(2,1)
45 TrunoN(-3,2)
46 Weibull(3.5)
47 Weibull(4)
48 SB(1,2)
49 TrunoN(-3,1)
50 SB(1,1)
51 Weibull(2.2)
52 LoConN(0.2,3)
53 LoConN(0.2,5)
54 LoConN(0.2,7)
55 Weibull(2)
56 Half N(0,1)
57 LoConN(0.1,3)
58 LoConN(0.05,3)
59 LoConN(0.1,5)
60 SU(-1,2)
61 Chi-square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponential0)
65 LoConN(0.05,7)
66 Chi-square(1)
67 Triangle 11(1)
68 WeibulKO.S)
69 SUd.l)
70 LogN(0,1,0)
XI
X3
295
438
268
436
792
X5
X7
442
467
254
485
82
0
0
0
0
0
0.65
-.32
2.13
2.27
0.29
2.36
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.63
0.97
0.80
0.68
1 .54
0.87
1.41
1.96
1.65
2.00
2.42
2.83
0.57
6.62
-5.3
6.18
2.40
2.65
2.72
2.75
82.1
217
1 06
238
588
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
575
226
91
84
207
68
43
59
38
113
34
41
38
116
266
37
54
42
51
63
50
42
127
48
99
62
509
69
853
500
1 45
120
321
672
657
241
108
94
302
69
57
68
51
147
33
39
40
117
319
29
62
51
47
69
51
44
149
763
151
267
238
544
162
106
140
74
370
51
61
56
178
426
49
51
69
92
103
76
58
195
58
103
113
52
544
120
127
561
73
57
873
644
202
907
817
296
235
449
786
757
144
70
78
149
81
67
87
38
135
34
29
47
91
223
30
36
47
71
73
65
35
107
84
63
77
155
54
338
783
102
802
784
184
79
89
103
83
54
70
39
93
49
34
47
98
1 47
36
66
71
78
82
75
41
148
96
120
109
314
73
601
520
173
1 42
Table 4.21.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size = 50)
Statistics
No. Distribution
/6.
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukeyd.S)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
15 N(0,1 )+N(2,1)
17 TruncN(-3,3)
18 N(0,1 )+N(1,1)
19 N(0,1)
20 SU((0,3)
21 t(10)
22 Logistic(0,1)
23 SU(0,2)
24 Tukey(IO)
25 Laplace(0,1 )
26 ScConN(0.2,3)
27 ScConN(0.05,3)
28 ScConN(0.1,3)
29 SoConN(0.2,5)
30 ScConN(0.2,7)
31 ScConNCO.I,5)
32 SoConN(0.05,5)
33 ScConN(0.1,7)
34 SoCohN(0.05,7)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
35 SU(0,1)
X'/g
XI
X3
X5
XlO
X13
1000
1000
964
1000
1000
1000
1000
934
800
702
512
422
396
224
230
980
744
802
580
534
822
974
718
782
546
716
942
430
594
262
202
202
708
948
472
630
290
214
178
136
1 24
186
80
96
86
118
Bz
1.15
1.50
1.51
1.63
1 .72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.5
31.4
36.2
906
540
546
284
310
260
160
144
146
124
126
92
88
82
72
80
80
76
80
98
110
120
998
208
310
138
188
668
826
496
312
664
480
350
264
416
372
344
374
278
510
438
444
408
464
336
254
224
230
246
224
218
236
194
184
162
116
98
108
128
114
100
120
130
162
1 66
226
174
164
174
184
242
292
314
282
282
334
362
344
1000
1000
1000
350
404
194
524
622
384
458
856
954
702
492
826
596
326
762
908
614
408
740
556
482
252
214
638
668
156
150
246
118
1 24
150
178
142
210
216
234
364
394
404
418
998
88
182
188
210
300
350
398
404
672
674
740
998
716
762
426
522
468
592
434
576
888
916
956
756
552
830
650
716
972
798
578
928
972
802
588
874
862
702
712
796
794
143
Table 4.22.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size =50)
Statistics
No. Distribution
35 SU(0,0.9)
37 t(4)
38 t(2)
39 t(1)
40 Cauchy(0,1)
41 88(0.5333,0.5)
42 TruncN(-2,1)
43 Beta(3,2)
44 Beta(2,1)
45 TruncN(-3,2)
46 Weibull(3.6)
47 Weibull(4)
48 SB(1,2)
49 TruncN(-3,1 )
50 SB(1,1)
51 Weibull(2.2)
52 LoConN(0.2,3)
53 LoConN(0.2,5)
54 LoConN(0.2,7)
55 Weibull(2)
56 Half N(0,1)
57 LoConN(0.1,3)
58 LoConN(0.05,3)
59 LoConN(0.1,5)
60 SU(-1,2)
61 Chi-square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponential(l)
65 LoConN(0.05,7)
66 Chi-square(1)
67 Triangle 11(1)
68 Weibull(0.5)
69 SU(1,1)
70 LogN(0,1,0)
X'/z
XI
X3
X5
XlO
X13
450
174
482
890
926
528
186
136
370
104
48
86
50
206
62
28
60
256
670
42
88
56
58
140
54
46
388
82
192
156
922
1 20
996
852
326
586
246
620
938
958
698
304
220
656
134
102
126
52
376
98
56
62
370
858
62
138
82
74
190
68
80
488
108
292
184
962
164
1000
934
414
738
426
756
978
988
832
552
414
864
234
204
236
84
640
86
66
80
568
918
68
178
76
90
242
82
98
454
132
480
146
974
234
1000
986
634
802
502
794
980
992
566
676
578
916
320
246
278
102
764
74
60
70
594
856
72
86
96
126
188
112
88
378
152
376
160
990
1 22
1000
994
674
862
606
854
988
996
560
276
292
552
278
210
268
80
482
74
66
88
396
752
44
90
138
162
176
104
90
370
218
190
242
574
156
908
1000
354
874
582
874
990
998
594
170
1 66
292
200
180
174
80
314
94
52
70
252
632
50
106
1 26
1 44
212
1 44
62
382
222
226
276
662
196
946
986
312
2:
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.63
0.97
0.80
0.68
1.54
0.87
1.41
1.96
1.65
2.00
2.42
2.83
0:57
6.62
-5.3
6.18
82.1
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
144
Table 4.23.
Empirical 5? level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size = 100)
Statistics
No. Distribution
/8i
1
1
2
3
4
5
6
7
8
9
10
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
N(0,1)+N(10,1)
N(0,1)+N(10,l)
Beta(0.5,0.5)
N(0,l)+N(5,l)
SB(0,0.5)
N(0,1)+N(4,l)
Tukeyd.S)
Uniform(0,1)
SB(0,0.707)
Tukey(0.7)
TruncN(-l,1)
N(0,l)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
16 N(0,1)+N(2,l)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 N(0,1)
20 SU((0,3)
21 t(10)
22 Logistic(0,1)
23 SU(0,2)
24 Tukey(IO)
25 Laplace(0,1 )
25 ScConN(0.2,3)
27 ScConN(0.05,3)
28 ScConN(0.1,3)
29 SGConN(0.2,5)
30 ScConN(0.2,7)
31 ScConN(0.1 ,5)
32 ScConN(0.05,5)
33 SGConN(0.1,7)
34 ScConN(0.05,7)
35 SU(0,1)
Xi/z
XI
X3
X5
XlO
XI7
1000
1000
1000
• 884
824
536
640
476
348
292
312
272
308
152
132
176
136
128
1 60
1 40
168
240
192
216
1000
472
564
288
424
920
976
764
1000
1000
1000
• 980
928
748
796
672
484
376
432
352
372
212
160
208
152
148
220
228
208
292
1000
1000
1000
1000
• 984
948
892
836
61 2
604
552
524
408
272
212
264
196
208
192
216
268
360
392
424
1000
732
812
460
648
984
996
852
720
944
784
900
1000
1000
1000
1000
975
988
904
872
836
764
740
800
6l 5
544
424
444
380
336
404
440
444
580
608
596
1000
876
892
576
768
992
1000
908
828
964
855
956
1000
1000
984
1000
952
988
776
756
760
680
664
864
596
564
412
476
564
436
532
588
5l 5
732
736
732
1000
932
932
596
856
996
1000
940
848
976
896
976
1000
1000
972
1000
832
952
544
440
400
368
356
504
340
304
292
320
364
432
508
552
640
748
736
776
1000
924
944
712
884
1000
1000
948
868
984
896
988
Ba
1.15
1.15
1.50
1 .51
1.63
1.72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.5
31.4
36.2
588
888
724
644
288
332
1000
624
688
384
484
952
1000
808
652
908
760
760
145
Table 4.24.
Empirical 5? level power (in ? xlO) for tests of departure
from the Gumbel distribution (sample size = 100)
Statistics
No. Distribution
36 SU(0,0.9)
37 t(4)
38 t(2)
39 t(l)
40 Cauchy(0,1)
41 38(0.5333,0.5)
42 TruncN(-2,1)
43 Beta(3,2)
44 Beta(2,1)
45 TruncN(-3,2)
46 Weibull(3.6)
47 Weibull(4)
48 SB(1,2)
49 TruncN(-3,1)
50 SB(1,1)
51 Weibull(2.2)
52 LoConN(0.2,3)
53 LoConN(0.2,5)
54 LoConN(0.2,7)
55 Weibull(2}
56 Half N(0,1)
57 LoConN(0.1,3)
58 LoConN(Q.05,3)
59 LoConN(0.1,5)
50 SU(-1,2)
61 Chi-square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponentiaid )
65 LoConN(0.05,7)
66 Chi-square(1)
67 Triangle 11(1)
58 Weibull(0.5)
69 SU(1,1)
70 LogN(0,1,0)
X'/z
XI
X3
X5
XlO
XI7
732
420
844
496
856
1000
1000
936
544
404
904
256
158
184
136
920
61 6
916
1000
1000
992
736
628
984
352
184
268
92
844
100
40
64
820
1000
40
240
96
64
400
80
92
788
204
748
308
1000
316
1 000
1000
968
750
952
1000
1000
996
920
832
1000
556
392
504
156
976
128
76
148
948
1000
64
400
136
160
492
1 40
136
768
272
900
360
1000
460
1000
1000
928
972
992
872
972
1000
1000
928
Bz
0
0
0
0
0
0.55
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1 .25
0.63
0.97
0.80
0.68
1 .54
0.87
1 .41
1 .95
1.65
2.00
2.42
2.83
0.57
6.62
-5.3
6.18
82.1
CO
CO
00
CO
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.45
5.59
6.00
6.50
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
788
996
992
824
344
324
756
176
108
1 40
92
408
124
80
104
500
932
80
172
72
84
288
80
54
836
172
416
512
7 000
248
1000
980
604
628
168
116
92
648
996
60
228
116
80
340
76
80
936
204
636
492
1000
276
1000
996
728
888
816
976
1000
1000
864
976
904
1000
644
524
616
184
988
95
100
132
952
1000
84
180
180
264
372
164
80
584
292
724
404
1000
248
1000
1000
908
656
612
944
484
452
544
1 68
864
104
116
116
772
968
76
196
192
324
312
224
112
664
404
388
508
948
324
1000
1000
•720
146
Table 4.25.
Empirical 5? level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
No. Distribution
/6i
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukey(1.5)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(^1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1 )
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 SoConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X'/a
XI
X3
X5
X7
908
543
445
311
331
253
226
197
211
218
264
208
211
217
253
245
260
284
335
358
353
347
963
490
493
349
425
660
750
538
967
575
613
397
468
382
304
290
302
308
342
300
331
321
350
361
407
424
467
481
507
493
973
635
6l 6
457
526
726
806
624
858
527
693
511
61 6
515
492
506
523
532
601
531
584
580
625
593
646
648
666
696
694
706
976
795
729
662
687
812
871
733
890
148
221
105
1 44
129
133
147
197
190
230
254
319
408
461
416
536
580
601
656
646
680
964
803
758
642
681
837
865
757
708
210
308
154
271
137
130
151
161
182
258
195
248
336
330
349
432
479
502
549
553
571
914
679
668
531
583
787
848
688
Bz
1.15
1 .50
1 .51
1 .63
1 .72
1 .75
1 .80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
147
Table 4.26.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
No. Distribution
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
36 t(4)
37 t(2)
38 t(1)
39 Cauchy(0,1)
40 58(0.5333,0.5)
41 TruncN(-2,1)
42 Beta(3,2)
43 Beta(2,1)
44 TruncN(-3,2)
45 Weibull(3.5)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
5'2 LoConN(0.2,5)
53 LoConN(0.2,7)
54 TruncE(0,3)
55 Weibull(2)
56 LoConE(0.2,7)
57 LoConE(0.2,5)
58 Half N(0,1)
59 LoConN(0.1,3)
50 LoConE(0.2,3)
/6i
Bz
0
0
0
0
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.99
0.63
1.33
1.25
0.97
0.80
1.20
20.0
21.5
31.4
36.2
82.1
CO '
00
2.13
2.27
2.35
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.22
3.25
3.27
3.40
3.78
4.02
4.09
X'/z
XI
X3
X5
X7
397
590
455
536
568
442
582
815
808
108
374
363
571
304
282
267
169
478
81
122
159
146
267
41
104
• 42
37
56
191
55
504
656
550
659
686
549
682
869
861
112
498
517
732
454
409
409
266
662
97
162
200
220
344
55
170
60
36
71
268
61
584
752
702
799
809
717
810
917
906
107
799
764
938
587
551
644
451
903
101
308
293
265
356
50
262
53
51
89
432
79
659
758
597
819
821
717
834
927
912
73
464
499
512
554
499
508
364
633
80
255
334
236
288
47
217
59
62
71
474
63
562
724
626
707
747
613
762
903
911
75
393
422
491
449
409
443
287
551
70
199
344
312
305
55
174
50
50
55
383
46
148
Table 4.27.
Empirical 5? level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
No. Distribution
61 TruncE(0,4)
52 LoConN(0.05,3)
63 TruncE(0,5)
64 Gumbel(0,1)
65 LoConN(0.1,5)
66 SU(-1,2)
67 LoConE(0.1,3)
68 Chi-square(4)
69 LoConE(0.1,5)
70 TruncE(0,6)
71 LoConN(0.1,7)
72 LoConE(0.05,3)
73 LoConN(0.05,5)
74 LoConE(0.05,7)
75 ScConE(0.05,2)
76 Chi-square(l)
77 ScConECO.1,2)
78 ScConE(0.2,2)
79 LoConE(0.01,7)
80 Triangle 11(1)
81 ScConE(0.01,3)
1.27
0.68
1.50
1.14
1.54
0.87
1.62
1.41
1.88
1.68
1.96
1.85
1.65
2.75
2.42
2.83
2.61
2.71
2.94
0.57
2.59
82 ScConECO.2,3)
3.57
83 ScConECO.1,3)
3.81
84 ScConECO.05,3) 3.60
85 ScConECO.2,7)
4.50
86 ScConECO.1,5)
5.38
87 ScConECO.1,7) 6.02
88 ScConECO.01,5) 4.81
89 ScConECO.05,5) 6.05
90 Weibull(0.5)
6.62
91 SU(1,1)
-5.3
92 LogN(0,1,0)
6.18
X'/z
XI
X3
X5
X7
54
231
36
95
187
187
46
57
45
46
214
37
217
49
42
263
29
48
55
69
62
61
57
54
199
66
123
40
60
625
893
44
42
319
36
124
280
282
45
66
32
50
279
43
317
51
49
296
35
58
42
79
53
76
61
52
294
73
1 43
35
70
690
948
54
41
530
31
213
377
464
48
90
44
48
378
42
481
44
33
303
46
63
42
102
• 42
88
53
52
388
93
215
41
66
708
981
63
37
553
38
237
431
476
33
79
41
41
429
43
505
38
35
284
50
57
48
57
52
95
77
59
439
139
270
48
85
684
938
70
34
409
37
230
425
432
45
82
43
41
435
39
424
43
37
226
54
42
57
45
47
103
64
49
409
1 34
271
52
89
591
892
79
32
4.20
4.35
5.26
5.40
5.45
5.59
5.86
6.00
6.02
6.29
6.60
7.29
7.44
10.9
13.6
15.0
15.3
15.6
15.9
16.4
18.0
23.8
29.4
29.8
31:5
48.7
56.2
66.7
68.2
87.7
93.4
114
149
Table 4.28.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution {sample size = 50)
Statistics
No. Distribution
/B
Bz
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
5 Tukeyd.S)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
1 0 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 TukeyO)
13 Beta(2,2)
14 TrunoN(-2,2)
15 Triangle 1(1)
15 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 SeConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1.50
1 .51
1.63
1.72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
XV2
XI
X3
X5
XlO
XI3
1000
924
804
592
666
540
488
440
422
436
590
454
494
512
590
600
682
696
736
784
780
816
1000
• 924
930
762
882
972
976
922
1000
964
944
822
878
750
708
684
656
694
782
696
718
778
822
784
866
860
896
914
940
930
1000
976
982
920
930
988
986
952
1000
960
992
888
970
922
892
934
914
918
952
932
964
972
972
952
972
972
982
992
992
992
1000
996
1000
972
982
996
994
992
1000
978
996
962
994
978
974
980
976
982
988
986
996
992
996
988
988
988
996
996
1000
996
1000
998
1000
994
990
1000
996
994
1000
522
662
386
732
506
588
722
772
776
870
874
930
972
986
958
986
990
988
994
1000
996
1000
998
1000
994
992
1000
996
1000
988
516
786
318
666
414
464
560
554
654
788
802
852
952
968
946
976
982
978
994
1000
996
1000
998
1000
980
992
1000
998
1000
150
Table 4.29.
Empirical 5? level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 50)
Statistics
No. Distribution
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
36 t(4)
37 t(2)
38 t(1)
39 Cauchy(0,1)
40 SB(0.5333,0.5)
41 TruncN(-2,1)
42 Beta(3,2)
43 Beta(2,1)
44 TruncN(-3,2)
45 Weibull(3.6)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SBd.l)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 TruncE(0,3)
55 Weibull(2)
56 LoConE(0.2,7)
57 LoConE(0.2,5)
58 Half N(0,1)
59 LoConN(0.1,3)
60 LoConE(0.2,3)
X'/:
XI
X3
X5
XlO
X13
840
938
848
942
956
876
952
996
992
174
808
736
962
724
654
712
458
894
94
254
344
460
626
64
232
42
54
72
482
52
928
972
940
984
996
944
990
1000
•996
266
958
906
992
924
856
892
708
982
176
482
516
664
856
76
364
74
68
118
696
74
978
994
984
998
1000
988
994
1000
1000
380
998
998
1000
992
970
988
920
1000
266
760
780
852
920
60
652
82
88
150
894
96
994
998
996
998
1000
1000
1000
1000
1000
292
1000
1000
1000
998
992
996
964
1000
366
862
868
848
968
50
786
96
102
172
940
94
994
1000
996
998
1000
1000
1000
1000
1000
• 222
994
988
994
992
986
996
962
996
342
848
884
862
924
66
832
100
122
166
964
106
994
998
990
998
1000
998
1000
1000
1000
220
972
966
986
986
974
994
946
992
308
838
882
782
830
48
768
72
70
136
952
82
Bz
0
0
0
0
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.99
0.63
1.33
1.25
0.97
0.80
1.20
20.0
21 .5
31.4
36.2
82.1
CO
CO
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.22
3.25
3.27
3.40
3.78
4.02
4.09
151
Table 4.30.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 50)
Statistics
No. Distribution
61 TruncE(0,if)
62 LoConN(G.05,3)
63 TruncE(0,5)
64 Gumbel(0,1)
65 LoConN(0.1,5)
66 SU(-1,2)
67 LoConE(0.1,3)
68 Chi-square(4)
69 LoConE(0.1,5)
70 TruncE(0,6)
71 LoConN(0.1,7)
72 LoConE(0.05,3)
73 LoConN(0.05,5)
74 LoConE(0.05,7)
75 ScConE(0.05,2)
76 Chi-square(l)
77 ScConE(0.1,2)
78 ScConE(0.2,2)
79 LoConE(0.01 ,7)
80 Triangle 11(1)
81 ScConE(0.01,3)
82 ScConE(0.2,3)
83 ScConE(0.1,3)
84 ScConE(0.05,3)
85 ScConE(0.2,7)
86 ScConE(0.1,5)
87 ScConE(0.1,7)
88 ScConE(0.01,5)
89 ScConE(0.05,5)
90 Weibull(0.5)
91 SUd.l)
92 LogN(0,1,0)
X'/z
XI
X3
X5
XlO
XI3
30
598
32
284
492
548
44
90
34
30
562
46
602
42
50
530
42
46
42
76
26
80
48
28
984
188
464
64
66
940
998
54
50
770
52
390
704
758
66
128
30
58
774
56
764
56
80
638
34
54
36
150
32
106
40
54
998
246
538
50
930
42
658
878
904
48
186
48
74
924
38
912
42
62
728
36
68
44
230
52
142
64
44
996
300
568
64
78
982
1000
86
48
970
38
760
932
956
62
252
58
36
942
46
954
60
64
742
44
58
42
304
50
166
92
70
998
346
626
72
68
990
1000
112
54
990
44
832
944
960
60
314
44
48
926
58
980
54
58
710
60
62
48
188
50
204
102
64
998
432
726
64
140
982
1000
140
46
982
48
846
962
966
60
322
42
52
954
52
976
46
52
662
62
62
34
120
54
178
84
68
1000
464
740
64
130
982
1000
136
Gz
1.27
0.68
1.50
1.14
1.54
0.87
1.62
1.41
1.88
1.68
1.96
1.85
1.65
2.75
2.42
2.83
2.61
2.71
2.94
0.57
2.59
3.57
3.81
3.60
4.50
5.38
6.02
4.81
6.05
6.62
-5.3
6.18
4.20
4.35
5.26
5.40
5.45
5.59
5.86
6.00
6.02
6.29
6.60
7.29
7.44
10.9
13.6
15.0
15.3
15.6
15.9
16.4
18.0
23.8
29.4
29.8
31.5
48.7
56.2
66.7
68.2
87.7
93.4
114
72
68
974
1000
• 76
152
Table ^,31-
Empirical 5% level power (in % xlO) for tes ts of departure
from the exponential distribution (sampl e size = 100)
Statistics
No. Distribution
1
2
3
4
5
6
7
8
9
10
11
N(0,l)+N(10,1)
Beta(0.5,0.5)
N(0,1)+N(5,1)
SB(0,0.5)
N(0,1)+N(4,1)
Tukey(1.5)
Uniform(0,1)
SB(0,0.707)
Tukey(0.7)
TruncN(-1,1)
N(0,l)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1 )
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1 )
22 SU(0,2)
23 Tukey(lO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
/6
02
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1 .50
1 .51
1.63
1 .72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
3.00
3.53
4,00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
XV,
XI
X3
X5
XlO
X17
1000
992
992
836
916
796
760
732
688
708
880
756
832
868
152
892
936
936
972
968
968
984
1000
996
1000
976
992
1000
1000
992
1000
1000
996
968
988
936
944
928
920
920
980
952
960
976
208
988
1000
1000
1000
1000
988
1000
1000
1000
1000
996
996
1000
1000
1000
1000
1000
1000
1000
1000
992
1000
1000
1000
992
996
1000
1000
1000
408
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
996
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
596
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
620
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
868
1000
844
1000
924
948
992
992
984
1000
1000
1000
996
384
1000
1000
1000
1000
1000
1 000
1000
1000
1000
1000
1000
1000
1000
1000
1000
153
Table 4.32.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 100)
Statistics
No. Distribution
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
36 t(4)
37 t(2)
38 t(1 )
39 Cauchy(0,1)
40 SB(0.5333,0.5)
41 TruncN(-2,1)
42 Beta(3,2)
43 Beta(2,1)
44 TruncN(-3,2)
45 Weibull(3.5)
45 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 TruncE(0,3)
55 Wei bull(2)
56 LoConE(0.2,7)
57 LoConE(0.2,5)
58 Half N(0,1)
59 LoConN(0.1,3)
60 LoConE(0.2,3)
XV,
XI
X3
X5
XlO
X17
976
1000
1000
1000
996
1000
1000
1000
1000
312
988
988
1000
960
932
932
780
992
184
564
720
772
940
56
44
48
52
112
864
52
996
1000
1000
1000
1000
1000
1000
1000
1000
• 488
1000
1000
1000
1000
996
1000
956
1000
308
792
896
920
984
88
672
68
56
116
968
84
1000
1000
1000
1000
1000
1000
1000
1000
1000
624
1000
1000
1000
1000
1000
1000
1000
1000
• 540
992
976
996
1000
88
984
104
60
236
1000
96
1000
1000
1000
1000
1000
1000
1000
1000
1000
708
1000
1000
1000
1000
1000
1000
1000
1000
• 728
992
996
996
1000
136
984
144
1 44
296
1000
132
1000
1000
1000
1000
1000
1000
1000
1000
1000
• 444
1000
1000
1000
1000
1000
1000
1000
1000
840
1000
996
1000
1000
76
1000
192
148
392
1000
148
1000
1000
1000
1000
1000
1000
1000
1000
1000
• 404
1000
1000
1000
1000
1000
1000
1000
1000
• 804
1000
996
1000
1000
100
992
160
204
340
1000
148
62
0
0
0
0
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0:73
0.51
0.68
1.07
1.25
0.99
0.63
1.33
1.25
0.97
0.80
1.20
20.0
21 .5
31 .4
35.2
82.1
2.13
2.27
2.36
2.40
2.65
2:72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.22
3.25
3.27
3.40
3.78
4.02
4.09
154
Table 4.33.
Empirical 5? level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 100)
Statistics
No. Distribution
61 TruncE(0,4)
62 LoConN(0.05,3)
63 TruncE(0,5)
64 Gumbel(0,1)
65 LoConN(0.1,5)
66 SU(-1,2)
67 LoConE(0.1,3)
68 Chi-square(4)
69 LoConE(0.1,5)
70 TruncE(0j6)
71 LoConN(0.1,7)
72 LoConE(0.05,3)
73 LoConN(0.05,5)
74 LoConE(0.05,7)
75 ScConE(0.05,2)
76 Chi-square(l)
77 ScConE(0.1,2)
78 ScConE(0.2,2)
79 LoConE(0.01,7)
80 Triangle 11(1)
81 ScConE(0.01 ,3)
82 ScConE(0.2,3)
83 ScConE(0.1,3)
84 ScConE(0.05,3)
85 ScConE(0.2,7)
86 ScConE(0.1,5)
87 ScConE(0.1,7)
88 ScConE(0.Q1 ,5)
89 ScConE(0.05,5)
90 Weibull(0.5)
91 SU(1,1)
92 LogN(0,1,0)
/6i
Ba
1.27
0.68
1 .50
1.14
1.54
0.87
1.62
1.41
1.88
1.68
1.96
1.85
1.65
2.75
2.42
2.83
2.61
2.71
2.94
0.57
2.59
3.57
3.81
3.60
4.50
5.38
6.02
4.81
6.05
6.62
-5.3
6.18
4.20
4.35
5.26
5.40
5.45
5.59
5.86
6.00
6.02
6.29
6.60
7.29
7.44
10.9
13.6
15.0
15.3
15.6
15.9
16.4
18.0
23.8
29.4
29.8
31.5
48.7
56.2
66.7
68.2
87.7
93.4
114
XV2
XI
X3
X5
XlO
XI7
36
876
36
600
844
872
52
84
48
40
904
48
916
20
60
924
28
48
24
108
44
976
840
56
996
992
996
28
104
1000
1000
• 60
56
972
36
768
956
956
40
120
44
64
988
52
984
32
52
932
28
88
36
188
44
980
848
60
1000
992
996
44
112
1000
1000
• 104
48
996
60
956
992
1000
48
332
88
44
996
76
1000
48
52
944
48
120
40
376
52
976
844
60
1000
• 992
996
40
1 28
1000
1000
112
48
1000
• 48
996
996
1000
• 64
512
76
36
1000
60
1000
28
40
952
32
136
28
576
44
980
848
76
1000
1000
996
48
172
1000
1000
1 48
52
1000
60
1000
1000
1000
80
700
88
44
1000
64
1000
40
60
964
52
180
28
612
44
984
840
52
1000
996
996
32
192
1000
1000
156
68
1000
64
1000
1000
1000
• 76
728
60
48
996
64
1000
36
28
960
32
172
32
384
60
984
844
72
1000
1000
1000
36
232
1000
1000
184
155
chi-square statistic with a smaller expected cell count is desirable.
In this case, the use of more cells will provide a larger number of
sizeable differences between the observed cell counts and the expected
cell counts for the null hypothesis.
However, there is a limit to the
extent of the refinement of the partition.
Cell counts which are mostly
one or zero provide little power for detecting alternatives when the
expected cell counts are nearly equal under the null hypothesis.
The
number of one and zero counts will be similar for many alternative
distributions.
The XV2 statistic consistently performed poorly
relative to the other chi-square statistics.
This suggests that the use
of the chi-square or likelihood ratio statistic with expected cell count
less than one is not desirable.
When the Xm statistic is most powerful, the Gm statistic also tends
to be the most powerful likelihood ratio statistic.
The difference in
the power of the Xm and Gm statistics were generally quite small.
3.
Comparison of statistics based on the empirical distribution
function
The numbers in Tables 4.34 - 4.40 indicate the proportions of
simulated samples for which the null distribution was rejected.
largest number in each line was printed in bold.
The
Only the results for
sample size 20 are included in Tables 4.34 - 4.40.
Conclusions drawn
from the results for sample sizes 50 and 100 were very similar to those
for sample size 20.
The Cramer-von Mises type statistics are generally
more powerful than the Kolmogorov-Smirnov type statistics.
Within the
156
Cramer-von Mises type statistics, the Anderson-Darling statistic is the
most powerful for detecting a wide range of alternative distributions.
The Anderson-Darling statistic for the exponential case appeared to be
the weakest among all the statistics for small sample sizes.
The
location parameter of the exponential distribution was estimated using
the minimum of the observations.
values is thus equal to zero.
The smallest of the standardized
This poses a problem in the computation
of the Anderson-Darling statistic because the formula involves
log[F((Xi-a)/B)].
To overcome this problem, the value F((xi-a)/8) was
assigned the same value as F((x2-a)/3) if F((x2-a)/B) is less than
0.00001, otherwise it is assigned the value 0.00001.
The weak
performance of the Anderson-Darling statistic for the exponential case
is probably due to this modification.
For larger sample sizes, this
problem is not severe and A^ statistic is a powerful statistic.
The
Anderson-Darling statistic is usually more powerful than the Cramer-von
Mises or the Watson statistic for detecting alternative distributions
with long or heavy tails.
The Anderson-Darling statistic places more
emphasis to the tails of the distribution than the Cramer-von Mises
statistic.
For symmetrical alternatives to the normal distribution with
short tails, the Cramer-von Mises and Watson statistics performed
favorably.
Careful examination of columns corresponding to the
Cramer-von Mises and Watson statistics in Tables 4.34 - 4.40 reveals
that the Watson statistic is slightly more powerful in detecting
alternative distributions with short tails.
Within the class of
157
Table 4.3%.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution (sample size = 20)
Statistics
A:
No. Distribution
1 N(0,l)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,l)+N(4,1)
6 Tukey(1.5)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(^1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
15 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
25 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
U:
V
D
62
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.15
1 .50
1.51
1.63
1 .72
1 .75
1.80
1 .87
1.92
1 .94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21 .5
31.4
36.2
82.1
1000
1000
1000
1000
1000
612
819
504
833
298
477
170
1 46
• 97
106
90
162
50
45
31
44
48
43
49
70
83
89
545
875
340
524
1 92
163
113
125
• 98
194
486
819
286
464
165
151
107
118
107
171
67
51
34
37
53
48
56
62
95
79
102
929
235
276
128
208
633
328
653
177
348
1 07
97
78
89
79
139
45
37
32
42
41
52
55
57
79
82
90
902
201
253
1 21
199
605
785
422
245
592
390
349
429
366
459
203
167
109
125
95
153
50
44
31
40
43
42
44
59
85
101
115
908
248
354
161
274
692
847
491
298
674
440
437
528
107
929
241
316
138
251
675
838
466
278
639
411
423
517
65
51
34
47
52
45
49
63
89
91
104
934
252
308
1 41
233
676
837
454
268
637
409
419
520
.
816
435
261
5l 6
400
377
471
158
Table 4.35.
Empirical 5? level power (in % xlO) for tests of departure
from the normal distribution (sample size = 20)
Statistics
A"
No. Distribution
36 t(4)
37 t(2)
38 t(1)
39 CauGhy(0,])
40 SB(0.5333,0.5)
41 TruncN(-2,1)
42 Beta(3,2)
43 Beta(2,l)
44 TruncN(-3,2)
45 Weibull(3.6)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 Weibull(2)
55 Half N(0,1)
56 LoConN(0.1,3)
57 LoConN(0.05,3)
58 GumbeKG.I)
59 LoConN(0.1,5)
60 SU(-1,2)
61 Chi-Square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponentiald )
65 LoConN(0,05,7)
66 Chi-square(1)
67 Triangle 11(1)
68 Weibull(0,5)
69 SU(1,1)
70 LogN(G,1,0)
V
D
163
450
860
871
550
76
61
212
39
48
51
53
92
191
74
196
771
970
97
261
168
126
210
584
151
355
824
359
687
596
954
206
998
636
853
161
427
842
843
445
64
46
169
39
44
61
45
103
201
87
214
747
962
107
228
175
128
220
595
1 49
325
815
374
565
579
884
176
982
612
778
32
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.63
0.97
0.80
0.68
1.14
1.54
0.87
1.41
1.96
1.65
2.00
2.42
2.83
0.57
6.62
-5.3
6.18
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5.40
5.45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4
87.7
93.4
114
199
503
882
885
660
92
57
277
46
48
41
58
125
270
94
278
890
984
139
360
240
174
281
736
188
442
866
480
765
639
972
256
1000
723
908
187
493
884
885
580
82
57
238
45
48
49
59
113
241
90
271
876
983
131
308
221
1 60
259
679
184
406
854
428
728
617
956
230
998
695
875
179
487
881
889
585
91
61
239
50
55
51
61
112
225
87
251
864
983
123
284
205
146
240
658
170
380
849
406
689
613
949
229
994
679
864
159
Table 4.36.
Empirical 5% level power (in ? xlO) for tests of departure
from the Gumbel distribution (sample size = 20)
A2
Statistics
No. Distribution
/3 1
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
H SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukeyd.S)
7 Uniform(0,1)
8 38(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
n N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
1 5 Triangle 1(1)
16 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)+N(1,l)
19 N(0,1)
20 SU((0,3)
21 t(10)
22 Logistic(0,1)
23 SU(0,2)
24 Tukey(IO)
25 Laplace(0,1)
25 ScConN(0.2,3)
27 SoConN(0.05,3)
28 SGConN(0.1,3)
29 ScConN(0.2,5)
30 ScConN(0.2,7)
31 SGConN(0.1,5)
32 ScConN(0.05,5)
33 ScConN(0.1,7)
34 ScConN(0.05,7)
35 SU(0,1)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
U2
V
D
1000
530
808
406
520
270
266
225
196
203
249
151
149
165
162
164
185
189
212
249
272
272
297
959
461
462
302
352
678
806
526
366
575
438
531
1000
576
798
378
508
251
254
201
175
200
247
1 42
136
151
132
167
170
154
171
217
245
271
282
952
439
439
287
343
664
786
525
345
565
423
500
1000
433
750
283
466
202
212
187
153
174
230
1 24
137
136
1 44
1 41
153
1 44
179
196
213
228
241
932
415
397
258
309
639
773
495
328
558
419
468
#2
1.15
1.50
1 .51
1.63
1.72
1.75
1.80
1.87
1.92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
16.5
20.0
21.5
31 .4
36.2
1000
694
794
450
496
305
287
236
204
213
238
173
153
178
177
175
203
203
219
257
291
299
310
935
467
469
331
363
708
823
551
385
616
479
539
1000
615
793
394
507
277
266
230
198
206
247
161
155
169
169
165
190
194
217
253
273
282
299
954
467
461
306
358
684
808
532
371
584
451
530
160
Table 4.37.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution (sample size = 20)
Statistics
No. Distribution
35 SU(0,0.9)
37 t(4)
38 t(2)
39 t(1)
40 Cauchy(0,1)
4Î 38(0.5333,0.5)
42 TrunoN(-2,1)
43 Beta(3,2)
44 Beta(2,1)
45 TruncN(-3,2)
46 Weibull(3.6)
47 Weibull(4)
48 SB(1,2)
49 TruncN(-3,1)
50 SB(1,1)
51 Weibull(2.2)
52 LoConN(0.2,3)
53 LoConN(0.2,5)
54 LoConN(0.2,7)
55 Weibull(2)
56 Half N(0,1)
57 LoConN(0.1,3)
58 LoConN(0.05,3)
59 LoConN(0.1,5)
60 SU(-1,2)
61 Chi-square(4)
62 LoConN(0.1,7)
63 LoConN(0.05,5)
64 Exponential(1)
65 LoConN(0.05,7)
66 Chi-square(1)
67 Triangle 11(1)
68 Weibull(0.5)
69 SU(1,1)
70 LogN(0,1,0)
/Si
Gz
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0:00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.63
0.97
0.80
0.68
1.54
0.87
1.41
1.96
1.65
2.00
2:42
2.83
0.57
6.62
-5.3
6.18
82.1
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.25
3.78
4.02
4.35
5:45
5.59
6.00
6.60
7.44
9.00
10.4
15.0
16.4 .
87.7
93.4
114
A"
W"
606
396
519
897
863
488
367
324
627
251
189
236
88
469
81
58
75
388
836
40
105
87
138
214
121
89
566
182
404
336
851
143
984
929
641
601
390
598
888
861
401
336
301
590
239
170
228
88
427
71
63
75
311
726
51
92
88
141
160
120
79
439
150
339
260
772
122
966
929
559
605
383
590
883
853
427
312
286
570
224
167
220
86
41 6
78
66
79
307
683
58
103
98
136
152
119
70
384
150
286
242
724
129
944
920
488
V
D
591
366
565
873
851
365
303
265
576
196
1 60
207
78
435
77
65
75
316
683
60
98
96
124
163
105
71
380
147
215
212
689
124
942
910
430
541
316
528
861
834
286
248
251
451
1 92
1 42
184
80
352
62
59
56
245
610
54
80
68
107
1 29
91
72
376
1 37
264
212
561
103
921
861
487
151
Table 4.38.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
V
No. Distribution
/6i
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukeyd.S)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
15 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(OJ )
19 SU((0,3)
20 t(10)
21 Logistic(0,1 )
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
25 ScConN(0.05,3)
27 ScConNO.1,3)
28 SoConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
D
62
1.15
1 .50
1 .51
1 .63
1 .72
1 .75
1 .80
1 .87
1 .92
1 .94
2.04
2.05
2.14
2.35
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
5.00
7.54
7.65
8.33
11.2
12.8
15.5
568
404
254
245
282
273
281
295
343
356
400
421
495
552
573
550
640
670
687
722
728
739
945
793
762
674
687
779
840
720
756
459
451
416
507
489
487
558
575
609
621
679
752
777
715
751
830
835
842
862
869
871
976
897
858
803
828
864
886
823
928
572
494
472
508
494
484
523
554
566
585
559
71 4
740
734
718
802
815
830
849
849
855
984
885
856
813
814
874
897
840
961
589
594
501
551
521
512
547
569
594
598
650
696
725
786
701
772
788
801
824
813
821
978
' 860
836
854
786
863
903
81 5
838
347
451
337
438
411
391
449
483
511
501
594
527
674
716
672
758
787
788
822
815
822
975
868
842
803
777
844
877
801
162
Table 4.39.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
No. Distribution
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(0,1)
35 SU(0,0.9)
35 t(4)
37 t(2)
38 t(l)
39 Cauchy(0,l)
40 SB(0.5333,0.5)
41 TruncN(-2,l)
42 Beta(3,2)
43 Beta(2,1)
44 TruncN(-3,2)
45 Weibull(3.6)
46 Weibull(4)
47 SB(1,2)
48 TruncN(-3,1)
49 SB(1,1)
50 Weibull(2.2)
51 LoConN(0.2,3)
52 LoConN(0.2,5)
53 LoConN(0.2,7)
54 TruncE(0,3)
55 Weibull(2)
56 LoConE(0.2,7)
57 LoConE(0.2,5)
58 Half N(0,1)
59 LoConN(0.1,3)
60 LoConE(0.2,3)
/6i
G:
0
0
0
0
0
0
0
0
0
0.65
-.32
0.29
-.57
-.18
0.00
-.09
0.28
-.55
0.73
0.51
0.68
1.07
1.25
0.99
0.63
1.33
1.25
0.97
0.80
1.20
20.0
21.5
31.4
36.2
82.1
2.13
2.27
2.36
2.40
2.65
2.72
2.75
2.77
2.78
2.91
3.04
3.09
3.16
3.20
3.22
3.25
3.27
3.40
3.78
4.02
4.09
675
732
680
807
813
733
805
896
893
121
699
696
833
690
629
648
431
831
52
288
295
111
100
26
225
42
37
41
419
52
815
831
813
899
886
848
894
942
944
97
869
857
934
860
809
830
670
923
160
526
521
256
215
44
468
51
61
124
643
60
U"
V
D
810
850
826
898
898
840
899
955
950
168
853
842
926
841
793
813
649
918
151
506
504
289
341
60
433
75
77
1 23
623
79
784
834
806
882
883
819
884
943
937
154
859
847
944
815
767
779
607
920
137
465
443
286
373
56
392
• 78
67
112
571
81
774
805
774
889
871
821
878
934
936
78
788
789
862
796
755
775
602
873
1 65
467
503
259
202
49
413
6l
54
1 21
606
62
163
Table 4.40.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution (sample size = 20)
Statistics
No. Distribution
61 TruncE(0,4)
52 LoConN(0.05,3)
53 TruncE(0,5)
54 Guinbel(0,1)
55 LoConN(0.1,5)
56 SU(-1,2)
57 LoConE(0.1,3)
68 Chi-square(4)
59 LoConE(0.1,5)
70 TruncE(0,5)
71 LoConN(0.1,7)
72 LoConE(0.05,3)
73 LoConN(0.05,5)
74 LoConE(0.05,7)
75 ScConE(0.05,2)
75 Chi-square(l)
77 ScConE(0.1,2)
78 ScConE(0.2,2)
79 LoConE(0.01,7)
80 Triangle 11(1)
81 ScConE(0.01,3)
82 ScConE(0.2,3)
83 ScConE(0.1,3)
84 ScConE(0.05,3)
85 ScConE(0.2,7)
85 ScConE(0.1,5)
87 ScConE(0.1,7)
88 ScConE(0.01,5)
89 ScConE(0.05,5)
90 Weibull(0.5)
91 SU(1,1)
92 LogN(0,1,0)
A"
/B:
62
1 .27
0.68
1 .50
1.14
1 .54
0.87
1.62
1 .41
1.88
1 .58
1.96
1.85
1.65
2.75
2.42
2.83
2.61
2.71
2.94
0.57
2.59
3.57
3.81
3.60
4.50
5.38
6.02
4.81
6.05
6.52
-5.3
6.18
4.20
4.35
5.25
5.40
5.45
5.59
5.85
5.00
5.02
5.29
6.50
7.29
7.44
10.9
13.5
15.0
15.3
15.6
15.9
16.4
18.0
23.8
29.4
29.8
31.5
48.7
56.2
56.7
68.2
87.7
93.4
114
27
532
33
219
300
477
40
39
32
45
255
43
454
45
56
649
77
101
54
52
58
218
139
• 90
714
297
481
96
188
949
981
151
31
754
41
438
502
670
45
121
49
44
418
40
647
45
62
528
87
93
57
135
63
225
142
100
730
305
499
89
190
914
990
186
40
743
43
399
537
656
60
115
51
45
509
43
657
47
47
418
55
77
57
141
57
119
78
58
563
158
299
60
93
792
991
11 4
V
D
57
692
42
361
494
612
60
107
48
50
519
52
634
49
38
413
62
81
58
138
63
135
88
53
551
174
321
63
95
801
990
107
39
719
43
404
454
619
37
117
50
50
386
48
591
54
53
483
86
83
63
138
63
188
127
83
672
267
443
81
174
871
9
171
164
Kolmogorov-Smirnov type statistics, the Kuiper statistic generally
performed better than the Kolmogorov-Simirnov statistic.
For skewed
distributions in the normal and exponential cases, the
Kolmogorov-Smirnov statistic performed favorably.
For the Gumbel case,
the Kuiper statistic is alomst uniformly better than the Kolmogorov
statistic.
4.
Comparison of statistics based on moments
The numbers in Tables 4.41 - 4.47 indicate the proportions of
simulated samples for which the null distribution was rejected.
The
largest number in each line for each sample size, was printed in bold.
The skewness and kurtosis tests are directional.
Each is designed to
detect a particular type of departure from the hypothesized
distribution.
Any extreme value of kurtosis indicates tails too short
or too long compared to that of the hypothesized distribution.
Similarly, a large absolute skewness indicates asymmetry and a small
absolute skewness indicates near symmetry.
Note that the skewness
statistic is based on the third sample moment and it can yield a large
value when the random sample is from a distribution with long or heavy
tails.
This is true for both symmetrical and skewed distributions with
large kurtosis value.
The skewness test is in fact the most powerful
test in this class of statistics for detecting skewed distributions with
heavy tails for the normal and Gumbel cases.
As for symmetrical
distributions with heavy tails for the normal case, the skewness test
compared favorably with the kurtosis test.
The skewness test is the
155
Table 4.41.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution
Sample Sizes
n = 20
Statistics
No. Distribution
/g,
1 N(0,1)+N(10,l)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,l)
6 Tukey(1.5)
7 Uniform(0,l)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-l,1 )
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
15 N(0,1)+N(2,l)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1)
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0.2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
31 ScConN(0.05,5)
32 ScConN(0.1,7)
33 ScConN(0.05,7)
34 SU(Q,1)
35 SU(0,0.9)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
n = 50
R
b^
/b^
R
914
650
713
449
402
303
255
203
182
1 42
172
101
59
44
36
46
38
58
77
107
115
133
493
260
393
209
331
692
797
552
347
685
468
439
495
868
709
764
520
473
379
354
272
245
205
231
136
98
57
55
56
35
51
80
95
99
112
490
248
379
197
312
676
777
534
347
678
455
412
495
55
3
18
15
17
9
7
8
9
3
15
3
9
12
10
17
36
56
90
129
130
149
367
259
384
213
336
598
565
514
346
631
455
423
466
1000
•994
992
948
925
864
790
732
508
588
466
340
262
88
70
88
34
56
95
1 65
200
236
718
486
716
444
572
958
990
892
672
946
804
746
848
n = 1 00
bj /bj
R
b^
/b;
1 0 1000
4 1000
6 1000
6 1000
8 1000
2 1000
2 996
0 992
2 972
2 960
10 792
0 824
2 664
2 168
2 136
16 184
25
20
48
52
106 152
152 21 2
182 350
224 412
320 936
376 728
516 912
390 536
440 828
688 996
728 1000
696 976
586 912
778 996
732 944
558 952
686 984
1000
1000
1000
1000
996
1000
996
992
996
972
864
884
764
268
240
248
32
40
150
224
364
428
955
772
908
640
840
996
1000
980
904
996
948
968
980
12
4
12
0
4
4
0
4
0
0
12
0
0
8
0
16
12
44
100
196
g%
1..15
1..50
1..51
1..53
1..72
1.:75
1.80
1.,87
1. 92
1.94
2.04
2.06
2.14
2.36
2.40
2.50
2.84
2.92
3.53
4.00
4.20
4.51
5. 38
5.00
7.54
7.65
8.33
n .2
12.8
16.5
20.0
21 .5
31 .4
35.2
82.1
1000
998
988
976
948
912
864
846
752
698
560
468
356
150
118
140
30
62
100
172
202
240
780
524
708
434
564
964
994
892
680
946
802
770
860
232
292
224
412
524
504
568
764
708
800
796
784
848
712
764
1 66
Table 4.42.
Empirical 5% level power (in % xlO) for tests of departure
from the normal distribution
n = 20
Sample Sizes
Statistics
No. Distribution
ni
m
o
o
bz
231
506
848
846
340
81
74
171
•43
38
48
52
136
194
112
187
553
748
1 43
308
243
211
284
702
210
425
848
554
649
654
872
186
962
690
844
217
488
834
836
247
92
91
139
50
33
42
54
98
95
75
98
234
768
100
174
1 41
160
167
444
160
236
590
450
339
574
541
133
780
541
585
/bi
R
bz
n = 100
/bi
R
bz
/bi
82
0
36 t(4)
0
37 t(2)
0
38 t(1)
0
39 CauchyCO,1)
40 SB(0.5333,0.5) 0.65 2.13
41 TruncN(-2,1)
-.32 2.27
42 Beta(3,2)
0.29 2.36
43 Beta(2,1)
-.57 2.40
44 TruncN(-3,2)
-.18 2.65
0.00 2 . 1 2
45 Weibull(3.6)
46 Weibull(4)
-.09 2.75
0.28 2.77
47 SB(1,2)
48 TruncN(-3,1)
-.55 2.78
49 SB(1,1)
0.73 2.91
50 Weibull(2.2)
0.51 3.04
51 LoConN(0.2,3) 0.68 3.09
52 LoConN(0.2,5) 1.07 3.16
53 LoConN(0.2,7) 1.25 3.20
54 Weibull(2)
0.63 3.25
55 Half N(0,1)
0.97 3.78
56 LoConN(0.1,3) 0.80 4.02
57 LoConN(0.05,3) 0.68 4.35
58 Gumbel(0,1 )
1.14 5.40
59 LoConN(0.1 ,5) 1.54 5,45
60 SU(-1,2)
0.87 5.59
61 Chi-Square(4) 1.41 6.00
62 LoConN(0.1,7) 1.96 6.60
63
1.65 7.44
64 Exponentiald ) 2.00 9.00
65 LoConN(0.05,7) 2.42 10.4
66 Chi-square(1) 2.83 15.0
67 Triangle 11(1) 0.57 16.4
68 Weibull(0.5)
6.62 87.7
69 SU(1,1)
-5.3 93.4
6.18 114
70 LogN(0,1,0)
o
o
o
J
R
n = 50
244 486
467 858
770 996
775 992
213 746
32 192
28 178
114 412
41
60
36
35
34
38
49
76
125 282
227 478
121 206
231 410
602 960
768 1000
158 334
354 728
298 492
245 410
342 638
767 986
244 448
497 862
863 992
580 890
713 982
648 938
910 1000
135 394
973 1000
730 964
885 996
506 402 724 720 484
868 712 996 992 788
996 908 1000 1000 956
994 908 1000 1000 992
462 468 1000 700 860
240
54 480 520 140
224
36 332 400 136
258 330 820 396 71 2
66
40
84
68
76
48
48
10
60
24
28
40
24
22
32
60
80 204
74 172
136 264 520 100 616
92 548 884
76 924
104 238 436 128 504
78 516 792
52 896
176 946 1000 1 28 1000
232 984 1000 192 1000
148 360 61 6 100 688
214 788 976 248 984
212 574 848 368 880
302 472 688 472 708
370 668 948 564 960
734 988 1000 900 1000
330 486 700 520 708
454 878 996 640 992
884 992 1000 980 1000
872 878 988 988 984
672 992 1000 896 1000
934 940 1000 1000 1000
898 1000 1000 1000 1000
230 322 816 408 652
980 1000 1000 1000 1000
840 964 1000 988 996
906 996 1000 992 1000
157
Table 4.43.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution
Sample Sizes
n = 20
Statistics
No. Distribution
/g^
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5.1)
4 88(0,0.5)
5 N(0,1)+N(4,l)
5 Tukeyd.S)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-l,1)
n N(0,1)+N(3,l)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1)
16 N(0,1)+N(2,l)
17 TruncN(-3,3)
18 N(0,1)+N(1,1)
19 N(0,1)
20 SU((0,3)
21 t(10)
22 Logistic(0,1)
23 SU(0,2)
24 Tukey(IO)
25 Laplace(0,1)
26 ScConN(0.2,3)
27 ScConN(0.05,3)
28 ScConN(0.1,3)
29 ScConN(0.2,5)
30 ScConN(0.-2,7)
31 ScConN(0.1,5)
32 ScConN(0.05,5)
33 ScConN(0.1,7)
34 ScConN(0.05,7)
35 SU(0,1)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
n = 50
n = 1 00
R
b^
/b^
R
bg
/bj
R
b^
939
779
841
647
612
521
449
407
362
349
399
279
246
240
245
265
250
271
272
278
299
300
301
400
355
396
352
373
543
580
514
424
576
517
435
896
716
757
605
544
428
370
322
293
232
276
155
119
55
60
83
37
39
29
22
28
19
29
70
30
71
72
89
218
325
267
204
385
343
138
285
236
308
221
250
256
283
236
240
248
241
264
242
265
269
278
297
306
304
314
337
335
338
411
393
417
382
402
558
584
534
454
583
532
463
1000
998
996
996
988
984
968
950
938
892
876
836
766
688
750
650
584
612
606
598
574
580
556
555
536
542
596
596
668
710
742
690
822
800
640
1000
1000
998
990
976
966
948
918
894
846
726
570
558
262
230
264
65
56
70
24
28
12
24
18
25
75
108
116
342
468
486
394
680
604
242
706
710
728
750
724
746
722
740
770
750
750
772
722
722
748
682
642
672
666
656
598
616
606
594
570
540
634
614
608
622
678
654
726
744
634
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
996
992
976
940
908
944
848
864
772
832
764
748
672
712
708
764
836
852
860
948
920
776
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
992
1000
988
816
776
552
244
236
1 40
72
24
28
40
0
35
92
148
136
344
604
704
552
924
836
372
g^
1.15
1 .50
1 .51
1.63
1.72
1 .75
1 .80
1 .87
1 .92
1.94
2.04
2.05
2.14
2.35
2,40
2.50
2.84
2.92
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.55
8.33
11.2
12.8
15:5
20.0
21.5
31.4
35.2
968
1000
984
1000
988
1000
1000
992
996
996
996
992
984
992
996
980
948
936
952
872
892
796
860
804
752
668
688
680
672
688
675
768
756
804
708
168
Table 4.44.
Empirical 5% level power (in % xlO) for tests of departure
from the Gumbel distribution
n = 20
Sample Sizes
Statistics
No. Distribution
n = 50
R
bz
/bi
R
463
356
493
762
732
213
527
489
786
330
249
315
136
626
41
69
44
61
131
63
63
66
115
100
100
50
266
180
97
385
235
104
511
928
320
179
64
222
580
553
263
113
96
154
53
41
38
49
72
67
51
39
71
123
51
80
20
14
100
38
61
256
118
104
324
218
141
438
209
297
489
375
517
758
721
8
558
523
780
379
280
355
143
668
8
55
19
15
95
48
14
74
137
122
126
44
330
205
121
442
279
17
598
939
369
634
608
708
926
930
542
978
970
1000
840
664
780
316
994
98
150
46
92
120
102
56
58
128
30
174
60
266
138
146
588
412
278
862
992
556
n = 100
R
bz
/bi
618 780
624 780
682 824
840 996
854 992
8 852
978 1000
960 1000
1000 1000
862 1000
700 980
796 984
324 688
994 1000
6 252
1 44 344
12 164
0
84
24 104
70 256
8
52
68
60
154 208
44
32
182 228
60
72
364 376
154
92
176 256
656 840
482 680
32 672
904 988
994 1000
630 800
4l 6
128
600
988
960
864
828
852
708
468
344
352
320
384
284
252
144
108
132
248
72
0
8
4
40
52
28
52
164
528
400
748
888
548
652
708
772
736
896
912
36
1000
1000
1000
1000
988
992
720
1000
40
320
80
0
4
172
8
72
252
44
248
84
484
140
288
884
740
164
996
1000
820
/bi
62
0 82.1
36 SU(0,0.9)
0
37 t(4)
0
38 t(2)
0
39 t(1)
40 Cauchy(0,1)
0
41 56(0.5333,0.5) 0.65 2.13
-.32 2.27
42 TruncN(-2,1)
43 Beta(3,2)
0.29 2.36
-.57 2.40
44 Beta(2,1)
-.18 2.65
45 TruncN(-3,2)
46 Weibull(3.6)
0.00 2.72
-.09 2.75
47 Weibull(4)
48 86(1,2)
0.28 2.77
-.55 2.78
49 TruncN(-3,1)
0.73 2.91
50 SB(1,1)
0.51 3.04
51 Weibull(2.2)
52 LoConN(0.2,3) 0 .68 3.09
53 LoConN(0.2,5) 1 .07 3.16
54 LoConN(0.2,7) 1 .25 3.20
55 Weibull(2)
0.63 3.25
56 Half N(0,1)
0.97 3.78
57 LoConN(0.1,3) 0:80 4.02
58 LoConN(0.05,3) 0 .68 4.35
59 LoConN(0.1 ,5) 1 .54 5.45
60 SU(-1,2)
0 .87 5.59
61 Chi-square(4) 1 .41 6.00
62 LoConN(0.1 ,7) 1,.96 6.60
63 LoConN(0.05,5) 1,.65 7.44
64 Exponentiald) 2,.00 9.00
65 LoConN(0.05,7) 2,.42 10.4
66 Chi-square(1) 2.83 15.0
67 Triangle 11(1) 0..57 16.4
68 Weibull(0.5)
6..62 87.7
69 SU(1,1)
-5.3 93.4
70 LogN(0,1,0)
6.. 1 8 . 114
312
88
432
852
890
584
390
322
338
140
130
112
146
146
140
106
48
112
130
102
66
6
4
24
48
72
182
102
124
468
300
328
714
376
464
169
Table 4.45.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution
Sample Sizes
n = 20
Statistics
R
No. Distribution
/g^
1 N(0,1)+N(10,1)
2 Beta(0.5,0.5)
3 N(0,1)+N(5,1)
4 SB(0,0.5)
5 N(0,1)+N(4,1)
6 Tukeyd.S)
7 Uniform(0,1)
8 SB(0,0.707)
9 Tukey(0.7)
10 TruncN(-1,1)
11 N(0,1)+N(3,1)
12 Tukey(3)
13 Beta(2,2)
14 TruncN(-2,2)
15 Triangle 1(1 )
15 N(0,1)+N(2,1)
17 TruncN(-3,3)
18 N(0,1)
19 SU((0,3)
20 t(10)
21 Logistic(0,1 )
22 SU(0,2)
23 Tukey(IO)
24 Laplace(0,1)
25 ScConN(0,2,3)
26 ScConN(0.05,3)
27 ScConN(0.1,3)
28 ScConN(0.2,5)
29 ScConN(0.2,7)
30 ScConN(0.1,5)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
n = 50
bj /bj
R
n = 100
bg
/b^
R
b^
/b^
1000 1000
1000..1000
1000 1000
1000 1000
1000 998
998 998
1000 1000
1000 994
1000 998
1000 992
1000 932
1000 944
1000 926
1000 734
998 686
998 552
990 330
990 304
956 175
944 135
926
82
90
922
0
878
874
12
14
760
836 160
830
70
60
726
702 108
740 154
998
1000
1000
1000
998
998
1000
1000
1000
1000
1000
1000
1000
1000
998
998
990
990
958
952
934
930
884
876
760
836
838
724
684
722
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
996
1000
1000
1000
1000
996
976
972
992
936
892
872
868
784
756
788
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
992
996
952
832
560
454
355
336
0
20
4
236
88
40
108
228
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
996
1000
1000
1000
1000
992
980
972
992
940
900
868
868
788
724
740
6%
1.15
1 .50
1 .51
1.53
1 .72
1 .75
1 .80
1.87
1.92
1.94
2.04
2.05
2.14
2.35
2.40
2.50
2.84
3.00
3.53
4.00
4.20
4.51
5.38
6.00
7.54
7.65
8.33
11.2
12.8
15.5
943 915
882 762
885 791
832 501
830 593
826 478
817 400
807 361
801 317
824 284
800 312
798 207
804 152
765
90
81
762
747 109
40
723
38
717
708
36
22
683
669
19
559
17
502
16
605
10
585
21
46
672
510
32
81
553
596 142
526 120
741
787
794
792
796
820
827
819
832
848
809
850
841
805
806
778
758
751
735
719
697
705
619
643
606
697
645
564
598
653
170
Table 4.46.
Empirical 5? level power (in % xlO) for tests of departure
from the exponential distribution
n = 20
Sample Sizes
Statistics
No. Distribution
R
n = 50
bz
/b,
R
658 11 8
684 234
698 185
618
56
616 103
634
34
600 122
711 390
727 401
277 276
954 138
914 130
990 159
836
57
769
57
758
54
529
68
964
94
181
81
409
60
198
33
78
90
100 125
56
53
62
328
76
63
58
76
81
145
15
253
80
52
681
691
708
637
822
732
826
784
788
840
810
906
894
802
1000
1000
1000
1000
994
1000
952
1000
584
838
628
196
206
220
734
84
84
350
492
94
n = 100
b. /bi
R
806
690
782
788
788
842
780
868
864
686
1000
1000
1000
1000
852
836
924
904
876
920
990
984
968
1000
1000
1000
1000
1000
1000
1000
996
1000
91 2
984
968
592
576
620
948
152
116
648
820
156
b. /bi
62
31 ScConN(0.05,5) 0 20.0
0 21.5
32 ScConN(0.1,7)
0 31.4
33 ScConN(0.05,7)
0 36.2
34 SU(0,1)
SU(0,0.9)
0
82.1
35
0
36 t(4)
0
37 t (2)
0
38 t(1)
0
39 Cauchy(0,1 )
40 88(0.5333,0.5) 0.65 2.13
41 TruncN(-2,1)
-.32 2.27
42 Beta(3,2)
0.29 2.36
43 Beta(2,1)
-.57 2.40
44 TruncN(-3,2)
-.18 2.65
0.00 2.72
45 Weibull(3.6)
46 Weibull(4)
-.09 2.75
0.28 2.77
47 SB(1,2)
48 TruncN(-3,1 )
-.55 2:78
49 SB(1,1)
0.73 2.91
50 Weibull(2.2)
0.51 3.04
51 LoConN(0.2,3) 0.68 3.09
52 LoConN(0.2,5) 1.07 3.16
53 LoConN(0.2,7) 1.25 3.20
54 TruncE(0,3)
0.99 3.22
0.63 3.25
55 Weibull(2)
56 LoConE(0.2,7) 1.33 3.27
57 LoConE(0.2,5) 1.25 3.40
58 Half N(0,1)
0.97 3.78
59 LoConN(0.1,3) 0.80 4.02
60 LoConE(0.2,3) 1.20 4.09
626
655
611
717
733
192
956
936
992
871
804
794
577
978
194
453
227
33
34
48
369
77
80
158
290
86
258
284
386
68
146
76
244
686
682
816
820
728
704
490
438
378
454
468
386
380
226
206
220
202
320
70
60
214
46
68
994
1000
958
1000
614
850
658
112
36
208
744
82
82
358
518
86
308
476
636
144
240
128
384
932
896
1000
1000
996
996
976
968
960
900
900
832
812
772
624
620
612
712
104
96
520
224
120
816
764
840
880
840
912
848
932
896
988
1000
1000
1000
1000
1000
1000
996
1000
924
988
972
388
116
556
948
156
124
664
828
160
171
Table 4.47.
Empirical 5% level power (in % xlO) for tests of departure
from the exponential distribution
n = 20
Sample Sizes
Statistics
No. Distribution
n. = 50
R
bz
/bi
34
420
31
200
86
374
47
92
47
38
155
56
299
68
59
89
68
71
54
273
70
103
11 4
106
226
213
313
80
184
281
988
146
31
22
29
32
12
35
54
49
53
35
106
55
31
63
70
105
76
81
63
164
68
120
117
112
232
230
317
86
194
285
138
163
36
464
34
232
98
409
53
117
50
42
165
50
323
68
69
111
75
75
54
289
75
118
123
116
259
248
343
84
196
316
987
171
n = 100
bz
/b,
R
bz
/bi
66
64
68
632
44
56
344 102
24
0
634
74
94
76
1 42
78
68
76
36
30
8
6
68
62
120
22
66
56
62
70
110 100
84
80
84
92
66
54
788 652
94
86
164 146
236 21 4
166 1 48
116
94
392 326
508 368
110 106
384 354
486 398
998 186
288 266
66
634
46
354
24
640
96
160
68
34
10
64
120
70
66
120
82
102
66
794
90
1 40
860
72
544
8
776
92
228
76
20
8
64
88
56
108
180
120
148
72
1000
156
252
344
264
440
324
356
220
688
732
148
196
80
240
4
216
72
136
64
16
8
68
4
52
104
128
1 20
11 6
60
1000
156
276
112
244
496
72
92
11 2
868
56
556
4
780
88
236
72
24
0
60
88
52
116
200
132
1 44
72
1000
1 60
44
356
268
36
324
236
228
708
768
1000
432
R
62
61 TruncE(0,4)
1.27 4.20
62 LoConN(0.05,3) 0.68 4.35
63 TruncE(0,5)
1.50 5.26
64 GumbeKO,1 )
1.14 5.40
65 LoConN(0.1,5) 1.54 5.45
66 SU(-1,2)
0.87 5.59
67 LoConE(0.1,3) 1.62 5.86
68 Chi-square(4) 1.41 6.00
69 LoConE(0.1,5) 1.88 6.02
70 TruncE(0,6)
1.68 6.29
71 LoConN(0.1,7) 1.96 6.60
72 LoConE(0.05,3) 1.85 7.29
73 LoConN(0.05,5) 1.65 7.44
74 LoConE(0.05,7) 2.75 10.9
75 ScConE(0.05,2) 2.42 13.6
76 Chi-square(1 ) 2.83 15.0
77 ScConE(0.1,2) 2.61 15.3
78 ScConE(0.2,2) 2.71 15.6
79 LoConE(0.01,7) 2.94 15.9
80 Triangle 11(1) 0.57 16.4
81 ScConE(0.01,3) 2.59 18.0
82 ScConE(0.2,3) 3.57 23.8
83 ScConE(0.1,3) 3.81 29.4
84 ScConE(0.05,3) 3.60 29.8
85 ScConE(0.2,7) 4.50 31 .5
86 ScConE(0.1,5) 5.38 48.7
87 ScConE(0.1,7) 6.02 56.2
88 ScConE(0.01,5) 4.81 66.7
89 ScConE(0.05,5) 6.05 68.2
90 WeibulKO.S)
6.62 87.7
91 SU(1,1)
-5.3 93.4
92 LogN(0,1,0)
6.18 114
172
250
172
40
412
534
114
388
500
998 1000
300 404
612
592
340
368
360
172
weakest for symmetrical distributions with short tails for the normal
case.
For the Gumbel and exponential cases, where the null
distributions are skewed, the skewness test is also the most powerful
for detecting symmetrical distributions with heavy tails.
The kurtosis
test is very powerful in detecting alternative distributions with short
tails for all the three cases.
It also has good power in detecting
alternative distributions with long or heavy tails.
As expected, the
kurtosis test is weak for detecting distributions with kurtosis measure
similar to that of the null distribution.
The rectangle test is a
combination of the skewness and the kurtosis tests.
both kinds of departure.
It is sensitive to
Generally, it performed well when both the
skewness and kurtosis tests did well.
Also, it has power close to the
better one when either the kurtosis or skewness test performed badly.
5.
Comparison of classes of statistics
The four classes of statistics used in this power comparison study
are compared in this section.
The alternative distributions were
grouped into various subsets to illustrate how the relative performance
of these statistics varies with the nature of the alternative
distributions.
The numbers in Tables 4.50 - 4.50 indicate the average
proportions of simulated samples for which the null hypothesis was
rejected.
The Pearson chi-square and likelihood ratio statistics are
generally not as powerful as the other three classes of statistics.
The
best Pearson chi-square or likelihood ratio statistic has about 70, 70
173
and 90 percent the power of the best statistics from the other three
classes, for normal, Gumbel and exponential cases, respectively.
The
higher power achieved by the chi-square and likelihood ratio statistics
for the exponential case is due to the inclusion of a larger proportion
of alternative distributions which are substantially different from the
exponential distribution, in the exponential power comparison study.
The correlation type statistics generally performed well.
The r^
statistic is among the best statistics in detecting alternative
distributions with long or heavy tails, especially for the normal case.
The relative performance of the r^ statistic degrades as the kurtosis of
the null distribution increases.
This is due to the smaller proportion
of alternative distributions with tails longer or heavier than those of
the Gumbel and exponential distributions, used in the power study.
The
relative performance of the r^ statistic is moderate or weak for
alternative distributions with short tails.
For testing normality, the
Shapiro-Wilk statistic is the best or among the best for the four
different sets of alternative distributions.
The performance of the
Shapiro-Wilk statistic is slightly less powerful for detecting
symmetrical distributions with heavy tails when the sample size is
large.
The k^ statistic performed moderately well for the normal case.
As the kurtosis of the null distribution increases from 3 to 9, the
relative performance of k^ statistic improves.
The k^ statistic is the
best statistic for detecting a wide range of distributions when the
sample size is small, for the exponential case.
For larger sample
sizes, the k^ statistic compared favorably with the other good
174
statistics.
The statistics based on the empirical distribution function
is a class of powerful statistics for all the three null distributions
considered.
Their relative performance is not much affected by the
skewness or kurtosis of the null distribution.
This property is
desirable if one wishes to use the statistics for any kind of null
distribution.
The Anderson-Darling, Watson and Cramer Mises statistics
usually rank high for the different sets of alternative distributions
considered.
care.
The tests based on moments have good power if used with
The rectangle test ranked high for all four sets of alternative
distributions, used in the normal power study.
The performance of the
tests based on moments degrades as the kurtosis and skewness of the null
distribution increase.
Tables 4.48 and 4.49 contain the critical points
of the tests based on moments, used in the power study.
The length
between the upper and lower critical points increases drastically from
the normal to the exponential case, especially for the kurtosis.
Table 4.48.
Percentiles of the 0.05 level skewness test used in the
empirical power comparison (sample size = 50)
Lower percentile
normal
Gumbel
exponential
-0.74
0.085
0.75
Upper percentile
0.74
2.29
3.23
Difference
1.48
2.204
2.48
175
Table 4.49.
Percentiles of the 0.05 level kurtosis test used in the
empirical power comparison (sample size = 50)
Lower percentile
normal
Gumbel
exponential
2.03
2.17
2.56
Upper percentile
4.97
11.12
17.21
Difference
2.94
8.95
14.65
175
Table 4.50.
Statistics ranked by fraction of rejection of alternatives
to the normal distribution (19 symmetrical distributions
with kurtosis less than 3)
Sample sizes
Rank
1
2
3
it
5
6
7.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
20
50
.292 b.
.574 bz
.531 W
.521 R
.703 bz
.674 R
.609
Two tailed kurtosis
Rectangle
.455
.593
Anderson-Darling
.435
.414
.558
Watson
.550
.394
.390 'V
.539 V
.513 G3
Cramer-von Mises
Kuiper
.352
.493 k '
.487 X5
.486 X3
.279
.254 R
.245 W
.240 U"
.235 A"
.225 V
.223
.207 kz
.198 G1
.351 G3
.196 G'/z .340 X3
.183 G3
.315 D
.181 D
100
.477
.293 XI3
.477 G5
.452 D
.420 G17
.408 G1
.162 G7
.162 X7
.142 XVj
.132 X5
.291 XI
.286 r^
.283 G5
.405 X17
.400 XI
.367 GIO
:278 X10
.359 X10
.117 r:
.018 /b,
.275 X5
.322 G'/z
.264 G'/z .301 XVa
.229 X'/z .008 /bi
.009
.166 G5
.163 XI
.162 X3
.314 G13
.313 G1
.300 GIO
Statistics
A^ (Inflated Type I error)
Likelihood ratio(3)
Correlation (P-P plot)
Chi-square(5)
Chi-square(3)
Correlation (Q-Q plot)
Likelihood ratio(5)
Kolmogorov-Smirnov
Likelihood ratio(M&W:7,10,17)
Likelihood ratio(l)
Chi-square(M&W:7,10,17)
Chi-square(1)
Likelihood ratio(IO)
Chi-square(IO)
Likelihood ratiofi/g)
Chi-square( Vj)
Two tailed Skewness
(W = Shapiro-Wilk)
177
Table 4.51.
Statistics ranked by fraction of rejection of alternatives
to the normal distribution (21 symmetrical distributions
with kurtosis greater than 3)
Sample sizes
Rank
20
1
.480 r^
.470 B2
2
3
4
.444 W
5
6
.430 R
.425
.422 U2
.438 A:
7
8
.417
9
.403 V
50
100
Statistics
.714 rz
.820 r ^
Correlation (Q-Q plot)
.656 bz
.799 bz
Two tailed kurtosis
.558 R
.655 6=
.793 R
.732 W2
Rectangle
.641
.732
.714 V
.712 B2
Kuiper
A^ (Inflated Type I
.703 A:
Anderson-Darling
.583 D
Kolmogorov-Smirnov
.662 k^
Correlation (P-P plot)
.520
.619 W
.618 W:
Cramer-von Mises
Watson
10
.401 /bi
.599 V
.569 D
11
.394 k:
.384 D
.556 k:
.646 X17
Chi-square(M&W:7,10,17)
12
.536 G13
.545 G17
Likelihood ratio(M&W:7,10,17)
13
.329 G7
.534 XI3
.602 XI0
Chi-squared0)
14
.329 X7
.528 /b, .598 GIG
Likelihood ratio(IO)
15
16
.320 X3
.316 05
.525 XI0
.521 G10
.593 /b.
. 5 6 2 X5
Two tailed Skewness
Chi-square(5)
17
18
.310 X5
.303 G3
.478 X5
.464 G5
.557 G5
.530 G3
Likelihood ratio(5)
Likelihood ratio(3)
19
.244 01
.433 03
.525 X3
Chi-square(3)
20
.226 GV, .431 X3
.461 G1
Likelihood ratio(l)
21
.221 XI
.452 XI
Chi-square(1)
.399 Qi/z
Likelihood ratioC/^)
22
23
24
.374 01
.184 XV2 .356 XI
.322 O'/z .382 X'/,
.291 X'/;
Chi-square( V2)
(W = Shapiro-Wilk)
178
Table 4.52.
Statistics ranked by fraction of rejection of alternatives
to the normal distribution (9 skewed distributions with
kurtosis less than 3)
Sample sizes
Rank
20
50
1
.220
.448 W
2
3
4
.202 W
.181 A:
.391
.359 A:
.164 IJ:
5
6
100
.558 B
.548 A:
Statistics
.323
.495 W
.486 r
(Inflated Type I error)
Anderson-Darling
Cramer-von Mises
Correlation (Q-Q plot)
.153
.158
.314
.304 k'
.479 R
.475
Rectangle
Watson
7
.147 V
.294 r^
.469 k^
Correlation (P-P plot)
8
.136 G3
.284 V
.457 V
Kuiper
9
.131 G'/z .273 R
.451 G5
Likelihood ratio(5)
10
11
.130 D
.128 X3
.259 G3
.258 D
.433 D
.411 X5
Kolmogor ov-Smirnov
12
.127 R
-254 X5
.409 /b,
Two tailed Skewness
Likelihood ratio(3)
Chi-square(3)
13
.125 G1
.250 G5
.400 G3
14
15
.117 r^
.101 ba
.230 X3
.215 G1
.367 X3
.344 G10
16
.099 XI
.204 /b] .343 X10
Likelihood ratio(IO)
Chi-square(IO)
Likelihood ratio(l)
17
.095 /b, .202 XI
18
19
.089 XVj .180 GVj .280 XI
.068 G5
.174 b^
.255 b^
Two tailed kurtosis
20
.068 X7
.152 XVj .253 G17
Likelihood ratio(M&W:7,10,17)
21
.058 G7
.145 G13
.244 XI7
Chi-square(M&W:7,10,17)
22
.046 X5
.134 GIO
.230 G'/g
Likelihood ratioCVz)
Chi-square( V2)
(W = Shapiro-Wilk)
23
24
.293 G1
Chi-square(5)
.124 X13 .216 X'/z
.116 X10
Chi-square(1)
179
Table 4.53.
Statistics ranked by fraction of rejection of alternatives
to the normal distribution (21 skewed distributions with
kurtosis greater than 3)
Sample sizes
Rank
20
50
100
Statistics
Correlation (Q-Q plot)
1
.579
.789 W
2
3
4
.573 W
.544
.538 r"
.777
.774 r:
.763 A:
5
6
.527 /bi
.521 W2
.761 /b.
.744 R
.870
7
8
.508 U:
.732
.708 U"
.847 y:
k2
Correlation (P-P plot)
.827 D
.823 V
Kolmogorov-Smi rnov
Kuiper
.735 G5
.726 G10
Likelihood ratio(5)
.724 X10
.716 X5
Chi-square(10)
Chi-square(5)
.709 03
Likelihood ratio(3)
. 6 8 8 X3
Chi-square(3)
.285 GV, .524 X10
.503 b .
.277 XI
.267 X5
.493 G13
.225 X ' / z .490 01
.675 X17
.674 017
.618 01
Chi-square(M&W:7,10,
Likelihood ratio(M&W
.206 G7
.206 X7
.598 b z
.538 G ' / z
.462 D
12
.417 G3
.601 G5
.408 X3
.599 X5
.571 G3
.546 X3
.525 010
15
16
17
18
19
20
21
22
23
24
.347 b z
.322 G1
.285 G5
.481 X I 3
.457 XI
.895 8%
A^ (Inflated Type I error)
A^
OC
OC
11
.683 V
.676 D
Rectangle
OO
.495 R
.480 V
13
14
.691
k==
9
10
Two tailed Skewness
.895 R
oo
00
.496 k:
.907 r:
.902 / b ,
.612 XI
.417 G ' / z .525 XV2
.376 X ' / z
Anderson-Dar1i ng
Cramer-von Mises
Watson
Likelihood ratio(IO)
Likelihood ratio(l)
Chi-square(1)
Two tailed kurtosis
Likelihood ratic^'/g!
Chi-square( Vj)
(W = Shapiro-Wilk)
180
Table 4.54.
Statistics ranked by fraction of rejection of alternatives
to the Gumbel distribution (41 alternative distributions
with skewness less than 1.14 and kurtosis less than 5.4)
Sample sizes
Rank
20
50
1
2
.345 R
.653 R
.839
.305
.621
.799 R
3
4
.290
. 5 8 3 /bi
.289
.565
.772
.771 / b i
Two tailed Skewness
5
5
.275 V
.549
.761 V
Kuiper
7
8
.264 /bi
.243 D
.527 k^
.527 V
.471 D
.757
.750 k^
.707 D
Watson
Correlation (P-P plot)
. 2 3 5 G3
. 4 2 7 G5
. 6 7 0 r^
Correlation (Q-Q plot)
.655 G5
. 6 3 6 X10
Likelihood ratio(5)
Chi-square(IO)
.635 G10
Likelihood ratio(IO)
9
10
11
12
.182 b a
.181 r =
.159 XI
.157 G5
OC
OC
15
16
.416 G3
.217 X3
.415 X5
.191 G1
.186 GV, . 3 9 4 X3
r =•
13
14
.275
100
r^
.376 b z
.325 G1
. 3 1 5 X10
Statistics
Anderson-Darling
Rectangle
Cramer-von Mises
Kolmogorov-Smirnov
.61 2 X5
Chi-square(5)
.543 b z
.537 X17
. 5 3 3 G17
Two tailed kurtosis
Chi-square(M&W:7,10,17)
.51 4 G3
Likelihood ratio(M&W:7,10,17)
. 4 8 6 X3
Likelihood ratio(3)
Chi-square(3)
19
.145 G7
.31 4 G10
.145 X7
.290 XI
.138 Xi/z .289 G13
.441 G1
Likelihood ratio(l)
20
.134 X5
.421 XI
Chi-square(1)
17
18
.279 X13
21
.274 GV, .362 GV,
Likelihood ratioCVj)
22
.217 XVa .344 XVj
Chi-square( Vj)
I8l
Table 4.55.
Statistics ranked by fraction of rejection of alternatives
to the Gumbel distribution (19 alternative distributions
with skewness less than 1.14 and kurtosis greater than 5.4)
Sample sizes
3
4
5
6
7
8
11
12
13
14
15
16
17
18
19
20
21
22
100
.934 A^
Statistics
Anderson-Darling
.530 W"
.525 U"
.806
.923
Watson
.806 0=
.921 W:
Cramer-von Mises
.511 V
.789 V
.919 V
Kuiper
.502 k"
.494 r^
.487 /b,
.756 D
.896 D
.756 k:
.726 X13
.890 k^'
.871 G17
Kolmogorov-Smirnov
Correlation (P-P plot)
.484 D
.723 G13
.870 X17
.474 R
.418 X 3
.41 5 G5
.720 X10
XI 0
.852 GIO
Likelihood ratio(IO)
.699 r:
.413 G3
.677 X5
.837 X5
.836 r^
Chi-square(5)
Correlation (Q-Q plot)
.409 X 5
. 4 0 5 X7
.405 G7
. 3 2 9 G1
.673 G5
.663 R
.648 X3
. 8 3 0 G5
.793 R
.776 G3
Likelihood ratio(5}
Rectangle
.646 G3
.773 X3
.300 XI
.624 /b.
.723 G1
.714 XI
.700 /b,
.716 G10
.295 GV, .568 G1
.239 X'/z .540 XI
.217 ba
TS
ni
00
9
10
. 5 4 2 A"
50
OO
1
2
20
OO
Rank
Likelihood ratio(M&W:7,10,17)
Chi-square(M&:W:7,10,17)
Chi-square(10)
Likelihood ratio(3)
Chi-square(3)
Likelihood ratio(l)
Chi-square(1)
Two tailed Skewness
. 5 1 0 G ' / z .676 G ' / z
Likelihood ratioCV^)
.452 X'/z .657 X'/z
.483 bz
.361 bg
Chi-square( Vj)
Two tailed kurtosis
182
Table 4.56.
Statistics ranked by fraction of rejection of alternatives
to the Gumbel distribution (1 alternative distribution with
skewness greater than 1.14 and kurtosis less than 5.4)
Sample sizes
Rank
1
2
3
4
5
5
7
8
9
10
11
12
20
50
.836
1.000
.726
.701 k^
.683 V
.683
.610 D
.606 r^
.478 G3
.426 X3
.419 G1
. 9 8 8 r^'
.984
.980
.976 V
.974 U2
.970 D
.940 G3
.918 X3
.906 G5
.389 G ' / z .862 G1
. 8 5 8 XI
.319 XI
100
1. 0 0 0 X3
1.000 X5
1.000 X10
1.000 G3
1 .000 G5
1 .000 010
1 .000 D
1.000
1 .000 V
1 .000
1.000
1.000 r^
15
16
k==
.856 X5 1 .000
. 2 6 6 X ' / z .752 X10 . 9 9 6 XI
.748 G ' / a .980 G1
.223 X5
.744 G10 .972 G17
.147 G7
17
18
.147 X7
.131 R
19
20
.123 bz
.095 /bi
13
14
21
22
.274 G5
.670 XV, . 9 6 8 X17
.632 X13 .932 xv.
.6l 6 G13 .91 2 GV,
.130 bz
.132 bz
.120 R
.024 /bi
.104 R
.004 /b^
Statistics
Chi-square(3)
Chi-square(5)
Chi-square(IO)
Likelihood ratio(3)
Likelihood ratio(5)
Likelihood ratio(IO)
Kolmogor0v-Smi rnov
Cramer-von Mises
Kuiper
Watson
Anderson-Darling
Correlation (Q-Q plot)
Correlation (P-P plot)
Chi-square(1)
Likelihood ratio(1)
Likelihood ratio(M&W:7,10,17)
Chi-square(M&W;7,10,17)
Chi-square('/%)
Likelihood ratioC/,)
Two tailed kurtosis
Rectangle
Two tailed Skewness
183
Table 4.57.
Statistics ranked by fraction of rejection of alternatives
to the Gumbel distribution (9 alternatives with skewness
greater than 1.14 and kurtosis greater than 5.4)
Sample sizes
Rank
20
1
.474 A"
.453 r^
.41 4
2
3
4
5
6
7
8
9
10
15
15
17
18
19
20
21
22
.377 k^
.352 D
.361 V
.280 G3
.540 r^'
.521
.598
.588 V
.578 D
.480 G3
.462 X3
.450 G5
.274 X 3
. 2 7 3 G ' / z .445 X5
G1
.270 G1
.279
/bi
no
13
14
.382
.727
.549
00
11
12
100
50
Statisti O S
.890 A"
.801 W'
.795 V
Anderson-Darling
Cramer-von Mises
.789
Watson
.784
.770 D
. 7 6 1 k"
.582 G5
Kuiper
Correlation (Q-Q plot)
Kolmogorov-Smirnov
Correlation (P-P plot)
Likelihood ratio(5)
Chi-square(5)
.551 X5
.511 G3
. 6 1 0 010
Likelihood ratio(3)
Likelihood ratio(W)
.507 X10
Chi-square(IO)
.413 XI
.503 X3
.410 G ' / z .502 XI
. 2 3 4 XI
.216 X ' / z . 3 8 6 / b , .587 G1
.213 b z
. 3 6 8 G13 . 5 6 2 X 1 7
Chi-square(3)
Chi-square(1)
.187 G7
Likelihood ratio(M&W:7,10,17)
.238 R
.187 X7
.141 G5
.115 X5
.367 XI 3 .555 G17
.361 XV, .544 X ' / z
.347 X10
.344 G10
.536 G ' / z
.498 / b ,
.340 R
.460 R
.308 b z
.272
bz
Likelihood ratio(l)
Chi-square(M&W:7,10,17)
Chi-squareC/,)
Likelihood ratioC/,)
Two tailed Skewness
Rectangle
Two tailed kurtosis
184
Table 4.58.
Statistics ranked by fraction of rejection of alternatives
to the exponential distribution (51 alternatives with
skewness less than 2 and kurtosis less than 9)
Sample sizes
Rank
20
50
1
2
. 5 9 8 k^
.542
.801
.800 V
3
4
.539
.536 V
5
6
.520 /bi
.512 R
.498 D
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.463 X3
.444 G3
.395
.357 r^
.347 X5
.345 G5
.341 G1
.338 G7
. 3 3 3 XI
100
Statisti cs
.842
Cramer-von Mises
.799
.794 k^
.842
.841 V
.837 D
Watson
Kuiper
.789
.781 D
.773 X5
.771 G5
.837 A^
. 8 3 6 k^
.820 G5
. 8 1 9 X10
Anderson-Darling
Correlation (P-P plot)
Likelihood ratio(5)
.752 G3
.746 X3
.718 X10
.716 GIG
.817 G10
. 8 1 5 X5
.806 X17
.701 0 1 3
. 6 9 4 X13
.680 R
. 8 0 2 G17
.676 / b i
.650 G1
.309 X7
. 2 8 3 GV, .642 XI
.252 XV, .596 r^
.149 K
.553 G'/,
. 8 0 3 G3
Kolmogorov-Smirnov
Chi-square(10)
Likelihood ratio(lO)
Chi-square(5)
Chi-square(M&W:7,10,17)
Likelihood ratio(3)
Likelihood ratio(M&W:7,10,17)
Chi-square(3)
Likelihood ratio(l)
.799 X3
.770 G1
. 7 5 5 XI
Chi-square(1)
.754 R
.742 / b i
.698 GV,
.691 r"
Rectangle
Two tailed Skewness
Likelihood ratioCV,)
Correlation (Q-Q plot)
.505 XV, .675 X ' / z
.404 K
.587 K
Chi-square('/,)
Two tailed kurtosis
185
Table 4.59.
Statistics ranked by fraction of rejection of alternatives
to the exponential distribution (14 alternatives with
skewness less than 2 and kurtosis greater than 9)
Sample sizes
100
Statistics
. 9 6 6 k^
.992
.965
.989 V
Cramer-von Mises
Kuiper
.960 V
. 9 8 8 A"
Anderson-Dar1i ng
.820 V
.960
.988
Correlation (P-P plot)
.758 X5
.958 D
.986
.757 X3
.752 G5
.956
.979 D
. 9 7 8 G5
.972 X10
Watson
Kolmogorov-Smirnov
Rank
20
1
2
.855 k"
.834
3
4
.826
5
6
50
.743
.949 X5
. 9 4 6 G5
9
10
.739 D
.735 G3
.944 G3
.941 X10
11
12
.715 G7
.701 X7
13
14
.660 G1
.659 XI
15
16
.654 /bi
.647 r:
17
18
.642 R
7
8
19
20
Likelihood ratio(5)
Chi-square(IO)
.970 r"
Correlation (Q-Q plot)
. 9 3 9 GIG
.970 X5
. 9 6 9 G10
Chi-square(5)
Likelihood ratio(IO)
. 9 3 9 X3
.963 G3
Likelihood ratio(3)
.936 G13
.956
.955
.954
.944
Chi-square(M&W:7,10,17)
Chi-square(3)
.935 X 1 3
.916 G1
. 9 1 6 XI
X17
X3
G17
G1
.891 GV, .942 XI
.604 G ' / z .887 r^
.936 G ' / z
.580 X ' / z .877 X ' / z .934 XV,
.811 R
.163 K
.899 R
21
.795
22
.286 K
•856 /bi
.419 K
Likelihood ratio(M&W:7,10,17)
Likelihood ratio(l)
Chi-square(1)
Likelihood ratioCV,)
Chi-square( V,)
Rectangle
Two tailed Skewness
Two tailed kurtosis
186
Table 4.50.
Statistics ranked by fraction of rejection of alternatives
to the exponential distribution (17 alternatives with
skewness greater than 2 and kurtosis greater than 9)
Sample sizes
ank
20
100
50
.233 D
.195 r^
.353 r^
5
6
.187 V
.182
.317 V
.310
7
8
.177 G7
. 1 6 8 k:
. 2 9 9 k"
9
10
.149 X5
.147 /bi
.145 G5
.140 K
.283 XI 3
.282 G13
.281 G10
.138 X7
.265 G5
11
12
13
14
15
16
17
18
19
20
. 1 3 8 X3
.136 G3
.132 R
X10
.271 X5
.256 X3
.254 G3
.241 XI
.127 GV, .237 G1
.125 XI
.230 G ' / z
.125 G1
.217 X ' / z
.110 XV, .205 /bi
.476
.476 r^
Cramer-von Mises
Correlation (Q-Q plot)
.467 D
.460 A^
Kolmogorov-Smirnov
.451 X17
.450 k^
.449 G17
Chi-square(M&W:7,10,17)
Correlation (P-P plot)
Likelihood ratio(M&W:7,10,17)
OC
3
4
.377
.372 A"
.357 D
OC
JO
.257
.254
00
1
2
Statistics
XI0
Anderson-D arling
Chi-square(IO)
.447 V
.446
Kuiper
Watson
.445 G5
.443 X5
Likelihood ratio(5)
Chi-square(5)
.443 G10
.441 G1
Likelihood ratio(l)
.439 G3
Likelihood ratio(3)
.438 X3
Chi-square(3)
Likelihood ratio(IO)
.432 XI
Chi-square(1)
.431 G ' / g
.423 X V j
Likelihood ratioC/j
Chi-square( V2)
.286 R
Rectangle
21
.202 R
.259 / b ,
Two tailed Skewness
22
.173 K
.226 K
Two tailed kurtosis
187
V,
PERCENTILES OF THE
AND
STATISTICS
This section describes the generation and smoothing of the Monte
Carlo percentiles of the r^ and
statistics.
Curves were fitted
through the percentiles to obtain formulas for the percentiles of these
statistics.
The percentiles of these statistics were generated for
testing the fit of the normal, Gumbel and exponential distributions with
unknown location and scale parameters.
The percentiles of the r^ and
statistics were also simulated for testing the fit of the exponential
distribution with only an unknown scale parameter because this is the
more frequently used probability model.
A description of the random
number generators can be found in Section A of Chapter 4 and Appendix B.
The uniform random number generator developed by Wichmann and Hill
(1982a) was used.
For each of the null distributions, the r^ and
statistics were
simulated at each of the sample sizes, n = 5(1)50(5)100(10)200(100)1000.
Table 5.1 shows the number of samples generated for each of the
replication employed at each of the sample sizes.
The choice of the
number of samples was based on the stability of the Monte Carlo
percentiles.
It was observed that generating a larger number of samples
than those listed in Table 5.1 for sample sizes 20 to 1000 affected the
simulated percentiles only in the third or fourth decimal place.
Larger
number of samples ought to be used for sample sizes 5 to 20 to achieve
the same stability. Larger number of samples were not used for sample
sizes 5 to 20 because of certain memory limitations of the Microsoft
FORTRAN compiler.
However, the use of more replications for sample
188
sizes 5 to 20 helps to achieve the desired stability of the simulated
percentiles.
Replications were used for smoothing the percentiles and
also to check the accuracy of the simulated percentiles.
Table 5.1.
Number of samples and replications employed in the
simulation of the r^ and
statistics
Sample sizes
Number of samples
Number of replications
5(1)10
11(1)15
15000
15000
9
7
16(1)20
21(1)30
15000
15000
5
3
31(1)50
55(5)100
15000
1 5000
2
2
110(10)200
300(100)1000
10000
5000
2
2
The percentiles were first averaged over all replications for each
sample size. Figure 5.1 contains a plot of the Monte Carlo percentiles
of the
statistic against the sample sizes for the normal case.
These
percentiles exhibit a very smooth pattern, with the percentiles at the
0.001 significant level showing slightly more fluctuation.
This plot
suggested that the following models may be appropriate for approximating
the percentiles,
Xp = 1 - a exp(-gn) ,
(5.1)
189
©
O
se
«
Significant levels
o = 0.300
A = 0.200
+ = 0.150
= 0.100
= 0.075
v = 0.050
B
0.025
X =0.010
• = 0.005
® = 0.001
a
@
®
B*
20
—I—
—T"
40
60
80
100
120
140
160
Sample size, n
Figure 5.1. Plot of percentiles against n
180
200
1 90
or
(5.2)
Xp = 1 - 1/(a + 3n) ,
where X^ is the percentile, n is the sample size and a and 6 are some
parameters.
A natural logarithmic transformation of model (5.1) yields
ln(l - Xp) = ggn + Bi ,
where g^ and
(5.3)
are functions of a and g respectively.
Nonlinear plots
were obtained when log(l - X^) was plotted against n, suggesting that
model (5.1) is not appropriate.
Model (5.2) can be rewritten as
1/(1 - Xp) = gn + a .
The plot of 1/(1 - Xp) against n for the
Figures 5.2-5.%.
(5.4)
percentiles is shown in
These plots are quite linear and model (5.4) can thus
be used to smooth and fit lines to the Monte Carlo percentiles.
An
interactive graphical smoothing and curve fitting procedure (IGSCF) was
developed using the IBM Personal Computer Plotting System (1984) on the
IBM PC AT.
The IGSCF procedure enables the points to be smoothed
interactively and provides least-squares estimates of a and g for model
(5.4).
The accuracy of the model can be examined by comparing the
smoothed Monte Carlo percentiles with those computed from the estimated
model.
13
Transformed percentiles of
300
450
600
750
900
1200
1050
ai
î\3
i
?
P
Ui
o*
M
CD
PL
"73
O
(D
3
a>
M
P)
OTQ
p
I-"
y
M
o o
o
en o
o
M
oi
o
o
en
O
1350
1500
192
0.300
0.200
0.150
0.100
0.075
0.050
0.025
0.010
0.005
0.001
0
20
40
60
80
100
120
140
160
180
200
Sample size, n
Figure 5.3. Plot of transformed percentiles against n
193
ti
0.300
0.200
0.150
0.100
0.075
0.050
0.025
0.010
0.005
0.001
0
5
10
15
20
25
30
35
40
45
50
Sample size, n
Figure 5.4. Plot of transformed percentiles against n
194
1.
Procedure IGSCF
[1] For a particular probability level, enter the sample sizes and
Monte Carlo percentiles into the arrays XN and XP respectively.
[2] Transform the percentiles using XP = 1/(1 - XP).
[3]
Plot XP against XN.
Enlarge certain portions of the plot of XP
against XN if necessary.
[4]
For any point that appears to deviate to much from the straight
line, the point is smoothed by linear interpolation of
neighboring points or using the best judgement based on the plot.
[5]
Go to [6] if all the points are smoothed, otherwise go to [3].
[6]
Obtain least-squares estimates of a and 3 and compare the Monte
Carlo percentiles and those from the estimated model.
[7]
Go to [8] if the model provides a satisfactory fit to the
percentiles, otherwise go to to [3] or stop and consider a new
transformation in [2].
[8]
Output smoothed percentiles and least-squares estimates of a and g.
Little smoothing of points was performed since there were only very
slight fluctuations about the straight line.
Only those points that are
clearly deviated from the straight lines were adjusted using the IGSCF
procedure.
Figures 5.5 - 5.7 are plots of the transformed percentiles
with certain points smoothed.
The IGSCF procedure was then used to
compute the least-squares estimates of the parameters a and g of model
(5.4).
Estimates of a and g for the
and r^ statistics are given in
Tables 5.2 - 5.8 for the normal, Gumbel and exponential models.
195
o
ta
0.300
«
0.200
o
0.150
0.100
0.075
0.050
0) o
0.025
0.010
0.005
cO o
C lO
0.001
«
Sample size, n
Figure 5.5. Plot of transformed percentiles against n
196
0.300
0.200
CN
0.150
^o
0.100
0.075
0.050
0o
0.025
0.010
0.005
0.001
20
40
60
Sample size, n
Figure 5.6. Plot of transformed percentiles against n
197
0.300
0.200
0.150
0.100
0.075
0.050
0.025
0.010
0.005
0.001
20
25
30
35
40
50
Sample size, n
Figure 5.7. Plot of transformed percentiles against n
198
Table 5.2.
Least-squares estimates of a and B of the model
approximating the percentiles of the normal P-P probability
plot correlation coefficient
Significance levels
0.001
0,005
0.010
0.025
0.050
0.075
0.100
0,150
0.200
0.300
B
.36772 .47846 .55045 .65915 .77333 .90071 .93794 1.1121 1.2258 1.4087
a
.95357 ,59852 ,18980 .48874 .74190 -1.261 1.5880 -.9422 -.7107 2.6229
Table 5.3.
Least-squares estimates of a and g of the model
approximating the percentiles of the normal Q-Q probability
plot correlation coefficient r^
Significance levels
0.001
0,005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
.16391 .21236 .23785 ,28773 ,34126 .38075 .41445 .47163 .52443 .62205
ai 1.3062 1.6253 1.9329 2.4007 2.9597 3.4357 3.8678 4.6899 5.4256 6.8755
Ba .12804 .20330 .22745 .26595 ,31707 .34682 .38115 .42743 .46008 .55372
ct2 5.6511 2.0049 2.8531 4,5304 5.0003 6.5800 6.6100 8.4408 11,139 11,824
Ba ,16119 ,18933 .21309 .25811 .29582 0.3214 .34132 .37701 ,43760 .50142
a3 ,74124 8.1060 7.7935 6.2141 10.573 13-520 15.708 21.552 15.428 21.885
199
Table 5.4.
Least-squares estimates of a and g of the model
approximating the percentiles of the Gumbel P-P probability
plot correlation coefficient
Significance levels
0.001
0.005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
g
.38450 .51041 .56241 .65946 .78468 .86620 .93210 1.1136 1.2403 1 . 4 0 8 0
a
.39895 -.1716 .63448 1 . 6 3 9 4 1.7074 2.5642 2.9917 .51855 .53716 4 . 0 7 4 5
Table 5.5.
Least-squares estimates of a and g of the model
approximating the percentiles of the Gumbel Q-Q probability
plot correlation coefficient r^
Significance levels
0.001
0.005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
61 .03375 .05749 .07486 .11122 .15404 .19064 .22292 .28105 .33432 .43343
ai 2.4461 2.9063 3.1936 3-7519 4.4428 4.9481 5.4104 6.2055 6.9079 8.3401
6 2 .03650 . 0 6 0 2 8 . 0 7 1 3 2 . 0 9 9 5 8 . 1 1 8 2 4 . 1 4 6 0 7 . 1 6 9 3 4 . 2 1 1 2 3 . 2 4 8 9 8 . 3 2 9 4 2
ca 2.0273 2.3905 3-0889 4.1082 7.1112 8.2502 9-5778 11-703 13-824 16.643
ga .02868 .04265 .05085 .07133 -09463 .11296 .13415 .16359 .19152 .24471
as 3.3611 5.8284 7-7133 9.5867 12.875 15.840 17.490 23.615 27.607 36.089
200
Table 5.6.
Least-squares estimates of a and g of the model
approximating the percentiles of the exponential P-P
probability plot correlation coefficient
Significance levels
0,001
0.005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
B
.32065 .41448 .47839 .58998 .58370 .75315 .83799 .93214 1.1118 1.2510
a
.55736 .64635 .43655 -.0160 .87806 1.6374 ,83343 2.8499 -.9612 3.4117
Table 5.7.
Least-squares estimates of a and g of the model
approximating the percentiles of the exponential Q-Q
probability plot correlation coefficient r^
Significance levels
0.001
0.005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
.02622 .04012 .05136 .07404 .10533 .13352 .15860 .20471 .24945 .33344
ai 2.3535 2.8207 3.1308 3.7591 4.4394 4.9942 5.4824 5.3478 7.1232 8.684]
62 .02752 .03733 .04167 .05559 .07349 .09408 .10549 .14979 .17713 .22381
az 2.2289 3.1243 3-9495 5.4723 7.2355 8.1551 9.9034 10.347 12.697 18.034
Ba .03325 .04135 .04745 .05590 .07100 .08423 .09479 .11979 .13995 .18285
as .30189 1.8551 2.7443 5.6312 8.5513 10.387 12.869 17.029 21.473 27.528
201
Table 5.8.
Least-squares estimates of a and g of the model
approximating the percentiles of the exponential (unknown
scale parameter) P-P probability plot correlation
coefficient
Significance levels
0.001
0.005
0.010
0.025
0.050
0.075
0.100
0.150
0.200
0.300
6
.33175 . 4 0 7 5 8 . 4 7 8 5 0 .57639 .69736 .75319 .83792 .93223 1.1083 1.2609
a
-.1434 1.1053 .48854 .48333 -.0942 1.6737 .82770 2.9445 -.7921 3.3480
By fitting three different straight lines through the transformed
percentiles for three separate ranges of the sample sizes, better
approximation of the percentiles of the r^ statistic was obtained.
pairs of estimates
The
(02,62) and (03,83) in Tables 5.3, 5.5 and
5.7 are for the following ranges of sample sizes [5,100], [101,200] and
[201,1000] respectively.
The percentiles of the
or r^ statistic can
be approximated using the a and g values listed in Tables 5.2 - 5.8 and
the formula:
Xp = 1 - l/(gn + a) .
(5.5)
Note that Table 5.7 for the r^ statistic is used for both exponential
cases, scale and location parameters unknown, and scale parameter
unknown.
These models provide very accurate estimates of the percentiles of
the k^ and r^ statistics.
The smoothed Monte Carlo percentiles and
those computed using the model (5.5) are tabulated in Tables 5.9 - 5.15.
202
Table 5.9.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the normal P-P probability
plot correlation coefficient
Significance levels
0.010
0.001
^M.C. model
n
M.C. model
0.050
M . C . model
0.1000
0. 3 0 0 0
M . C . model
M.C. model
5
.59
.64
.69
.66
.79
.78
.83
.84
.90
.90
10
.75
.78
.83
.82
.88
.88
.90
.91
.94
.94
20
.87
.88
.91
.91
.94
.94
.95
.95
.97
.97
50
.950
.948
.964
.964
.975
.975
.979
.979
.986
.986
100
.974
.973
.982
.982
.987
.987
.990
.990
.993
.993
150
.982
.982
.988
.988
.994
.994
.993
.993
.995
.995
200
.9868 .9866
.9911 .9909
.9937 . 9 9 3 6
.9947 . 9 9 4 7
.9965 . 9 9 6 5
300
.9915 .9910
.9940 .9940
.9957 .9957
.9965 . 9 9 6 5
.9977 .9976
400
. 9 9 3 4 .9932
.9954 .9955
.9968 .9968
.9974 . 9 9 7 3
.9983 .9982
500
.9946 .9946
.9964 .9964
.9974 .9974
.9979 . 9 9 7 9
.9986 .9986
500
.9956 .9955
. 9 9 6 9 .9970
.9978 .9978
.9983 .9982
.9988 .9988
700
. 9 9 6 2 .9961
.9974 . 9 9 7 4
.9981 .9982
. 9 9 8 5 .9985
. 9 9 9 0 .9990
800
.9966 . 9 9 6 6
.9977 .9977
.9984 .9984
. 9 9 8 7 .9987
.9991 .9991
900
.9970 .9970
.9980 .9980
.9986 .9986
.9988 .9988
.9992 .9992
1000
.9972 .9973
.9982 .9982
.9987 .9987
.9989 .9989
.9993 .9993
a
Monte Carlo .
203
Table 5.10.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the normal Q-Q
probability plot correlation coefficient r^
Significance levels
0.001
^.C. model
n
0.010
M.C. model
0.050
M.C. model
0. 1000
M.C. model
0.3000
M.C.
model
5
.59
.53
.67
.68
.77
.79
.81
10
.56
,65
.76
.77
.84
.84
.87
20
.77
.78
.85
.85
.90
.90
.92
.83
.88
.92
50
.897
.895
.929
.928
.951
.950
1 00
.943
.943
.961
.961
.973
1 50
.960
.960
.973
.973
.981
.973
.981
.960
.978
.984
.959
.978
.984
200
.9683
.9580
.9790 .9793
300
.9799 .9796
400
.9849 .9847
.9879 .9879
.9861 .9851 .9900 .9899 .9915 .9915
.9892 .9893 .9922 .9922 .9934 .9934
500
.9879 .9877
.9914 .9913
600
.9897
700
.9911
.9937 .9937 .9946 .9946 .9963 .9963
.9897 .9927 .9926 .9947 .9947 .9955 .9955 .9969 .9969
.9912 .9937 .9936 .9954 .9954 .9961 .9961 .9973 .9973
800
.9923
.9923
.9944 .9944
.9960
900
.9932 .9931
.9950 .9950
.9964
1000
.9938 .9938
.9954 .9955
.9967
a
Monte Carlo.
.9854 .9854
.89
.90
.92
.92
.95
.95
.974
.974
.986
.989
.985
.989
.9918 .9918
.9943 .9942
.9955 .9955
.9960 .9965 .9965 .9976 .9976
.9964 .9969 .9969 .9979 .9979
.9967 .9972 .9972 .9981 .9981
204
Table 5.11.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the Gumbel P-P
probability plot correlation coefficient
Significance levels
0.001
^M.C. model
n
0.010
M.C.
model
0.050
M.C. model
0.1000
0.3000
M.C. model
M.C. model
5
.59
.57
.70
.71
.80
.82
.84
.87
.90
.91
10
.76
.76
.84
.84
.89
.90
.91
i92
.94
.94
20
.88
.88
.92
.92
.94
.94
.95
.95
.97
.97
50
.950
.949
.965
.965
.975
.980
.980
100
.975
.974
.982
.983
.982
.988
.988
.992
.990
150
.982
.988
.976
.988
.992
i990
.993
.987
.993
.995
.987
.993
.995
200
.9869
.9871
.9910 .9912
.9938
.9937
.9949 .9947
300
.991 4 .9914
.9942 .9941
.9959
.9965 .9965
400
.9934 .9935
.9956 .9956
.9958
.9969 .9968
.9974 .9973
.9966 .9965
.9977 .9977
.9983 .9982
500
.9947 .9948
.9965 .9965
.9975 .9975
.9979
500
.9957 .9957
.9971
.9970
.9979 .9979
700
.9963 .9963
.9975 .9975
.9982 .9982
.9979
.9982 .9982
.9987 .9987
.9986 .9986
.9988 .9988
.9990 .9990
800
.9968 .9968
.9978 .9978
.9984 .9984
.9987 .9987
.9991
900
.9971 .9971
.9980 .9980
.9986 .9986
.9982
.9987 .9987
.9988 .9988
.9989 .9989
1000
.9974
.9974
a
Monte Carlo .
.9982
.993
.9991
.9992 .9992
.9993 .9993
205
Table 5.12.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the Gumbel Q-Q
probability plot correlation coefficient r^
Significance levels
n
0.001
0.010
^M.C. model
M.C. model
0.050
M.C. model
0,1000
0.3000
M.C. model
M.C. model
5
.54
.62
.67
.72
.76
.81
.81
.85
.88
.90
10
.64
.64
.73
.75
.82
.83
.86
.91
.92
20
.68
.68
.79
.79
.87
.87
.90
.87
.90
.94
.94
50
.764
.758
.859
.856
.920
.918
.941
.940
100
.824
.947
.950
.962
.964
.868
.904
.928
.906
150
.828
.867
.927
.959
.960
.971
.971
200
.8911 .8928
.9424
.9674
300
.91 66 .9164
.9758
400
.9324
.9803
.9773 .9770 .9878 .9879
.9828 .9827 .9909 .9909
.9860 .9859 .9926 .9925
500
.9424
600
.9565
.9681
.9758
.9326
.9422
.9567
.9645
.9644
.9805
.9435
.9703
.9698
.9835 .9834
.9516 .9514
.9740 .9738
700
.9576 .9573
^9769
800
.968 .967
.980 .980
.985 .985
.9856 .9856
.9882 .9882
.9898 .9898
.9946
.9769
.9873 .9874
.9909 .9910
.9952 .9952
.9623 .9620
.9793 .9793
.9887 .9887
900
.9656 .9657
.9812 .9813
1000
.9687 .9688
.9829 .9829
.9898 .9898
.9907 .9907
.9920 .9920 .9957 .9957
.9928 .9928 .9961 .9961
.9934 .9934 .9964 .9964
a
Monte Carlo.
.9937 .9937
.9945
206
Table 5.13.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the exponential P-P
probability plot correlation coefficient
Significance levels
0.001
®M.C. model
n
0.010
M.C. model
0.050
0.1000
0.3000
M.C. model
M.C. model
M.C. model
5
.58
.54
.68
.65
.78
.77
.82
10
.72
.73
i8l
.81
.87
.87
.89
.80
.89
20
.85
.86
.90
.90
.93
.93
.94
.94
50
.941
.940
.959
.972
.971
.977
100
.970
.969
.979
.986
150
.980
.979
.959
.979
.986
.986
.990
.986
.990
.988
.992
.977
.988
.992
200
.9845 .9845
.9899
.9896
.9928
.9927
.9941 .9941
.9961
300
i9930
.9931
.9951 .9951
.9960 .9960
.9974 .9974
400
.9888 .9897
.9922 .V922
.9948
.9948
.9964 .9964
.9970 .9970
.9981
500
.9938
.9938
.9958
.9958
600
.9949 .9948
700
.9956 .9956 ' .9970 .9970
800
.9961
.9961
900
1000
.89
.93
.96
.97
.985
.985
.90
.94
.992 .992
.995 .995
.9961
.9980
.9984
.9987 .9987
.9989 .9989
.9971 .9971
.9976
.9976
.9979 .9979 .9980 .9980
.9982 .9982 .9983 .9983
.9985
.9974 .9974
.9982 .9982
.9985 .9985
.9990
.9965 .9965
.9977 .9977
.9984 .9984
.9987 .9987
.9991
.9959 .9969
.9979 .9979
.9985 .9985
.9988 .9988
.9992 .9992
a
Monte Carlo.
.9965 .9965
.9990
.9991
207
Table 5.1Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the exponential Q-Q
probability plot correlation coefficient r^
Significance levels
0.010
0.001
®M.C. model
n
M.C. model
0.050
M.C. model
0.1000
0.3000
M.C. model
M.C. model
5
.49
.60
.65
.70
.76
.80
.81
.84
.88
10
.65
.62
.72
.72
.81
.82
.85
.86
.91
.90
.92
20 .66
.65
.76
.76
.85
.85
.89
.88
.94
.93
50
.730
.728
.828
.825
.899
.897
.927
.925
.961
100
.802
.799
.876
.879
.931
.934
.951
.953
150
.840
.843
.901
.902
.944
.945
.962
.961
.962
.975
.981
200
.8680 .8706
.9187 .9186
.9540 .9544
300
.9048 .9027
.9436 .9411
.9664
400
.9287 .9265
.9541 .9540
.9735
500
.9378 .9409
.9609 .9622
.9776 .9773
600
.9490
.9506
.9676 .9680
.9805 .9805
700
.9566 .9576
800
.9622
900
.9672
1000
.9710
.9628
.9669
.9702
a
Monte Carlo.
.9722
.9779 .9780
.9803 .9801
.981
.9677 .9839 .9841
.9665 .9758 .9758 .9878 .9844
.9729 .9808 .9803 .9878 .9879
.9722 .9828 i9828
.9754 .9754
.976
.9672
.9835 .9834 .9916 .9916
.9858 .9857 .9928 .9927
.9874 .9874 .9937 .9936
.9886 .9887 .9942
.9862 .9862 .9898 .9898 .9948
.9873 .9874 .9907 .9907 .9952
.9848 .9847
.9943
.9948
.9952
208
Table 5.15.
Comparison between the smoothed Monte Carlo percentiles and
those computed from the model for the exponential (unknown
scale parameter) P-P probability plot correlation
coefficient
Significance levels
n
0.001
0.010
0.050
0.1000
0.3000
^M.C. model
M.C. model
M.C. model
M.C. model
M.C. model
5
.58
.34
.68
.55
.78
.71
.82
.80
.89
.90
10
.72
.68
.81
.81
.87
.85
.89
i89
.93
.94
20
.86
.85
.90
.90
.93
.93
.94
.94
.96
.95
50
.942
.939
.971
.971
.977
.977
100
.970
.970
1 50
.979
.979
.959 .959
.979 .979
.986 .986
.986
.990
.986
.990
.988
.992
.988
.992
.985 .985
.992 .992
.995 .995
200
.9844 .9849
.9900
.9896
.9928
.9928
.9941 .9941
.9961 .9951
300
.9889 .9899
.9930
.9931
.9952 .9952
.9960
.9960
.9974 .9974
400
.9922 .9925
.9948 .9948
.9964
.9970
.9970
.9981 .9980
500
.9940 .9940
.9958
.9976 .9975
500
.9950
.9950
.9965 .9965
700
.9957 .9957
.9970 .9970
.9971 .9971
.9976 :9976
.9979 .9980
.9985 .9984
.9987 .9987
.9989 .9989
800
.9962
.9974 .9974
.9982
.9985 .9985
900
.9967
.9962
.9966
.9977 .9977
.9984
.9970 .9970
.9979 .9979
1000
a
Monte Carlo,
.9958
.9964
.9982
.9984
.9986 .9986
.9980 .9980
.9983 .9983
.9990 .9990
.9987 .9987 .9991 .9991
.9988 .9988 .9992 .9992
209
VI.
SUMMARY AND RECOMMENDATIONS
The interesting problem of assessing the fit of probability models
to data is investigated in this dissertation.
A new statistic
which
is based on the Pearson correlation coefficient of points on a P-P
probability plot is presented.
The
statistic measures the linearity
of a P-P (percent versus percent) probability plot.
A small value of
the k^ statistic indicates nonlinearity of the P-P probability plot and
suggests that hypothesized probability model should be rejected.
Two
random samples were generated from two different distributions and are
listed in Tables 5.1 and 6.2.
The identities of these two distributions will be revealed at the
end of the discussion.
data sets 1 and 2.
These two random samples will be referred to as
Figures 6.1 and 6.2 are the normal P-P probability
plots of these two data sets.
A probability plot provides an excellent
and informative tool for assessing the goodness of fit of probability
models to data set.
However, decisions based on the probability plot
alone are subjective and can be difficult to make when the sample size
is small or the alternative distribution is similar to the hypothesized
distribution.
The use of the
reduces the subjectivity.
statistic with the P-P probability plot
For the example, the
statistic is 0.955
for the probability plot in Figure 5.1 and 0.991 for the probability
plot in Figure 5.2.
The approximate 0.05 level percentile of the k^
statistic can be computed using formula (5.5) with g = 0.77333, ot =
0.74189 from Table 5.2, with n = 50.
percentile is 0.975.
The value of the 0.05 level
The normal probability model is rejected for data
210
set 1 since 0.955 is less than 0.975.
As for data set 2, there Is no
evidence that the normal probability model is not appropriate.
Table 6.1.
Data set 1
1.6953
0.4555
0.5573
1.3159
1.3785
1.7468
0.2049
0.2476
0.4204
2.4226
0.3470
0.2273
0.6521
0.5347
0.8718
0.7238
0.3673
0.0819
1.2704
Table 6.2.
0.7707
-0.0158
-0.3762
-1.1868
0.6096
-0.1295
0.7417
0.6641
1.6197
0.0765
0.9893
1,,0351
2,,2661
0.,9978
0,
.8975
0..0143
0,,2356
0.,9973
0.3306
0.2262
0.4336
0.8386
1.4142
1.2112
0.0011
0.4646
0.0578
0.4962
0.0390
0.6355
0.2429
3.6785
2.4692
0.2154
0.3001
1.7663
1.8008
1.0928
0.4558
1.3620
1.2612
Data set 2
0.2647
-0.6092
0.1647
-0.9696
0.0523
-1.1886
-1.5860
0.0865
1.1788
2.0548
-0.4029
-0.2187
1.1428
-1.4062
0.6113
0.5263
1 .1379
2.1265
-0.7022
0.0054
1.4275
0.8059
-2.0366
0.5006
1.4001
-1.4730
1.2081
0.1305
0.5624
-0.5945
0.2526
-0.0430
0.9683
0.0635
-1.2999
-0.1397
1.3533
2.3415
-0.0308
0.5219
Suppose one is interested in fitting a probability model to data
set 1.
The diagonal line on the P-P probability plot and the
statistic enable one to make a decision about whether a normal
probability model is appropriate.
The
statistic provides little
information concerning an alternative probability model for the data set
when the normal probability model is deemed inappropriate.
However, the
21 1
oOo o-
rn o -
O-
OO-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
Laplace
Cauchy
Gumbel
Figure 6.1. Normal P-P probability plot
0.9
1.0
212
O
o-
cu O-
Oo-
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Uniform probabilities
exponential
Laplace
Cauchy
Gumbel
Figure 6.2. Normal P-P probability plot
0.9
1.0
213
shape or curvature of the probability plot provides valuable information
concerning an appropriate alternative probability model.
Figure 6.1
suggests that the underlying probability model is skewed since the plot
does not pass through the point (0.5,0.5).
A new qualitative technique
based on the distribution curves on a P-P probability plot was developed
in Chapter III.
Several distribution curves are displayed on the normal
p-p probability plot.
The diagonal line is the "curve" corresponding to
the normal distribution.
Based on the distribution curves on the normal
P-P plot in Figure 5.1, an exponential probability model seems to
provide a good fit to data set 1 since the points fell roughly along the
exponential distribution curve.
An exponential P-P probability plot can
then be constructed for further examination of the data set.
The
exponential P-P probability plot for data set 1 is displayed as Figure
6.3.
The plot is roughly a straight line suggesting that the
exponential probability model is appropriate.
The
statistic for the
exponential probability plot is 0.990 and the 0.05 significant level
percentile computed using formula (5.5), B = 0.68370 and a = 0.87806
from Table 5.6, is 0.971.
Since 0.990 is greater than 0.971, the
exponential probability model is not rejected.
Data set 1 was in fact
generated from the exponential distribution using RANEXP(26719^7) and
data set 2 was generated from the normal distribution using
RANN0R(9113783) (SAS Inc., 1982).
A statistic, r^, based on the Pearson correlation coefficient of
points on a Q-Q (quantile versus quantile) probability plot was also
developed.
The r^ statistic is the Shapiro-Francia statistic using the
o
OOXi o -
06
.@4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Uniform probabilities
normal
Laplace
Cauchy
Gumbel
Figure 6.3. Exponential P—P probability plot
1.0
215
Weibull plotting position i/(n+1).
indicated that the the
The power study in Chapter IV
and r^ statistics have good power for
detecting a wide range of alternative distributions.
The r^ statistic
is very powerful in detecting alternative distributions with long or
heavy tails and the
statistic is more sensitive to deviations
occurring in the central region of the hypothesized distribution.
A
computer program can be easily written to supply a percentile at a
particular significant level given the sample size, using formulas
developed in Chapter V.
For those significant levels not listed in
Table 5.2, linear interpolation can be used to compute the percentile.
Also, a computer program can be written to supply an approximate p-value
given a k^ or r^ value and the sample size.
The intuitively easy
concept of the Pearson correlation coefficient for measuring the
linearity of a probability plot, good power of the k^ and the r^
statistics and the easy computer implementation of the k^ and r^
statistics for sample sizes 5 to 1000, make the
and r^ statistics
very valuable tools for assessing goodness of fit.
The joint use of the
P-P and Q-Q probability plots and the k^ and r^ statistics is a very
powerful combination for determining probability models for data.
6.3 contains the 0.05 level percentiles of the
normal, Gumbel and exponential distributions.
important feature of the
Table
statistic for the
One attractive and
statistic is that the percentiles are
similar for different hypothesized probability models.
The percentiles
216
of the
statistic for the normal probability model can thus be used
for testing any other hypothesized location-scale probability model.
This attractive feature is not possessed by the r^ statistic.
Table
6. H
contains the 0.05 level percentiles of the r^ statistic for the normal,
Gumbel and exponential distributions.
Table 6.3.
0.05 level percentiles of the
selected sample sizes
Sample sizes
0.88
0.975
0.987
0.9957
0.9974
0.9987
10
50
100
300
500
1000
Table 6.4.
Gumbel
exponential
0.90
0.976
0.988
0.87
0.971
0.9958
0.9951
0.9975
0.9987
0.9985
0.986
0.9971
0.05 level percentiles of the r^ statistic at
selected sample sizes
Sample sizes
10
50
1 00
300
500
1000
normal
statistic at
normal
0.84
0.950
0.973
0.9899
0.9937
0.9967
Gumbel
0.83
0.918
0.950
0.9758
0.9834
0.9907
exponential
0.82
0.897
0.934
0.9665
0.9776
0.9874
217
The Pearson chi-square and likelihood ratio statistics can be
regarded as the most well known among all the goodness-of-fit
statistics, however the extensive power study in Chapter IV indicated
that this class of statistics is generally not as powerful as the other
statistics studied in this dissertation.
Slight modifications of the
Pearson chi-square and likelihood ratio statistics like the Rao-Robson
statistics (Rao and Robson, 1974) or the power divergence statistics
(Cressie and Read, 1984) have power similar to that of the Pearson
chi-square or the likelihood ratio statistic.
In fact, the Monte Carlo
power comparison in Rao and Robson (1974) showed that the improvement of
the Rao-Robson statistics over the Pearson chi-square statistics is
quite small.
In addition to the relatively weak performance of the
Pearson chi-square and likelihood ratio statistics, the problem of
selecting the best choice of expected cell counts makes the application
of the Pearson chi-square and likelihood ratio statistics less
attractive.
Based on the extensive Monte Carlo power comparison in
Chapter IV, the following rule for the best choice of expected cell
counts is recommended: expected cell counts of 3, 5 and 8 for sample
sizes 20, 50 and 100 respectively.
Offsetting the shortcomings of the
Pearson chi-square and the likelihood ratio statistics, are certain
attractive features.
The Monte Carlo percentiles of the Pearson
chi-square and likelihood ratio statistics are quite stable across null
hypotheses.
The Monte Carlo percentiles are presented in Tables 6.5 -
218
Table 6.5.
Percentiles of the Pearson chi-square and likelihood ratio
statistics used in the empirical power comparison for the
testing of departures from the normal distribution
Significance
levels
StatisticsX n
a = 0.1
20
50
x'/.
48.0
XI
X3
X5
X10
X7/X13/X17
24.0
8.0
3.6
G'/z
G1
G3
G5
G10
G7/G13/G17
Table 6.5.
2.5
45.4
27.0
9.47
3.94
2.36
a = 0.05
100
114.0
60.0
21.4
1 2.4
5.4
3.9
108.5
65.5
23.8
13.1
5.61
224.0
114.0
24.8
12.4
6.92
213.2
125.7
44;3
26.5
12.7
3.98
6.98
39.9
20
50
a = 0.01
100
52.0 122.0 232.0
28.0 64.0 120.0
9.4 24.1 43.9
5.2 14.4 27.6
7.0 14.5
3.1
5.2 8.48
46.5 111.4 217.7
29.1 69.0 131 .4
11.1 26.6 48.2
5.87 15.4 29.6
7.12 15.0
3.53 5.31 8.56
20
50
100
50.0 134.0 248.0
34.0 72.0 132.0
13.6 28.9 51.1
7.5 18.8 33.6
9.8 19.0
5.2 7.75 12.1
50.9 117.0 225.3
33.5 74.7 139.7
15.4 31 .8 55.2
7.75 20.5 35.5
10.5 19.5
5.99 7.78 12.1
Percentiles of the chi-square and the likelihood ratio
statistics used in the empirical power comparison for
the testing of departures from the Gumbel distribution
Significance
levels
StatisticsX n
X'/z
XI
X3
X5
X10
X7/X13/X17
GV,
G1
G3
G5
G10
G7/G13/G17
a = 0.1
20
50
a = 0.05
100
48.0 114.0 220,0
24.0 50.0 114.0
8.0 20.7 42.6
4.0 12.4 24.8
5.4 12.4
6.8
2.5 3.92
45.4 108.5 212.8
27.0 65.2 125.7
9.47 23.5 45.8
3.97 13.3 25.5
5.71 12.6
2.35 3.98 5.94
20
50
a = 0.01
100
52.0 122.0 228.0
28.0 64.0 120.0
9.4 23.4 45.5
5.2 14.4 27.6
6.8 14.2
3.1
5.2 8.36
46.5 111.3 216.5
29.1 58.1 131 .0
11.1 25.5 51.4
5.87 15.5 29.5
7.02 14.5
3.53 5.33 8.55
20
50
100
60.0 134.0 244.0
34.0 72.0 132.0
13.6 29.6 55.7
7.6 18.8 33.2
9.6 18.5
5.2 7.92 12.2
50.9 115.8 224.3
33.3' 74.0 139.4
15.2 32.3 60.8
7.95 20.1 35.5
10.3 19.0
6.21 8.21 12.4
219
Table 6.7.
Percentiles of the Pearson chi-square and likelihood ratio
statistics used in the empirical power comparison for
the testing of departures from the exponential distribution
a = 0.1
Significance
levels
Statistics\ n
20
X'/,
48.0
XI
X3
X5
X10
X7/X13/X17
G'/,
G1
G3
G5
G10
G7/G13/G17
26.0
9.4
4.8
3.1
45.4
28.1
10.6
5.13
3.53
50
114.0
60.0
22.1
13.2
6.4
4.9
109.2
66.2
24.6
14.2
6.51
5.05
a = 0.05
100
224.0
116.0
41 :2
25.6
13.2
8.0
213.9
127.6
45.6
26.5
13.7
8.12
20
50
52.0 122.0
28.0 64.0
10.8 24.8
6.0 15.2
7.6
4.3 6.16
47.1 111.9
29.8 69.7
12.3 27.5
6.49 16.5
8.08
4.05 6.22
a = 0.01
100
232.0
122.0
45.2
28.4
15.2
9.56
217.7
131 .6
49.4
30.0
15.8
9.87
20
50
60.0 1 34.0
34.0 74.0
14.3 30.2
8.4 20.0
10.6
9.2
6.7
50.9 117.9
33.8 75.4
16.1 33.2
10.6 21 .4
11.4
6.88 9.55
100
248.0
134.0
52.5
34.0
19.6
13.2
225.3
140.6
56.4
35.9
20.6
13.7
The closeness of these percentiles from the normal distribution to
the exponential distribution suggests that the percentiles of the
Pearson chi-square and likelihood ratio statistics are approximately
distribution free.
Tables 6.8 and 6.9 contain the empirical type I
error levels for the Pearson chi-square statistic when the percentiles
of the chi-square distribution with degrees of freedom equal to the
number of cells less three were used.
These empirical type I errors
were computed from 5000 Monte Carlo samples, for the testing of
departures from normality.
220
Table 5,8.
Empirical Type I error of the Pearson chi-square statistics
when the 0,01 level
percentiles were used (based on
5000 Monte Carlo samples)
Expected cell counts
Number of cells
1/2
5
10
,0095
15
1
3
5
.0084
.011 4
.0130
.0085
.0104
.0108
.0074
.0110
.0118
20
.0155
.0115
.0142
.0100
iJO
.0155
,0125
.0108
.0104
50
.0125
,0124
.0104
.0128
80
.0118
.01 20
.0094
.0098
1 00
.0110
,0112
.0104
.0084
1 20
.0178
.0145
.0104
.0112
1 40
.0104
,0095
.0112
.0082
1 50
.0115
,0112
,0104
.0104
1 80
.0124
.0138
.01 25
.0084
200
.011 4
.0128
.0098
.0092
300
.0132
,0118
,0118
.0095
400
.0118
.0108
.0086
.0100
500
.0130
.0110
.0103
.0075
221
Table 6.9.
Empirical Type I error of the Pearson chi-square statistics
when the 0.05 level
percentiles were used (based on
5000 Monte Carlo samples)
Expected cell counts
Number of cells
1/2
5
10
.0488
15
1
3
5
.1398
.0746
.0762
.0276
.0444
.0534
.0476
.0586
.0534
20
.0370
.0582
.0518
.0572
40
.0354
.0456
.0492
.0522
60
.0442
.0562
.0476
.0576
80
.0586
.0494
.0460
.0472
100
.0546
.0492
.0550
.0502
1 20
.0624
.0610
.0534
.0440
140
.0564
.0500
.0538
.0430
160
.0552
.0456
.0504
.0446
180
.061 6
.0506
.0484
.0470
200
.0544
.0538
.0488
,0446
300
.041 4
.0490
.0522
.0457
400
.0510
.0476
.0430
.0450
500
.0570
.0514
.0510
.0535
222
Table 5.10.
Empirical Type I error of the likelihood ratio statistics
when the 0.05 level x^._g percentiles were used (based
on 5000 Monte Carlo samples)
Expected cell counts
Number of cells
1/2
5
10
.0095
15
1
3
5
.1398
.1185
.0884
.0556
.0864
.0752
.0890
.1042
.0738
20
.0084
.1106
.0962
.0778
40
.0073
.1256
.1186
.081 4
50
.0036
.1488
.1360
.1042
80
.0028
.1824
.1494
.0950
100
.0030
.2376
.1580
.1012
120
.0046
.2650
.1910
.1040
140
.0032
.3030
.2080
.1048
160
.0028
.3344
.2086
.1126
180
.0032
.3640
.221 2
.1132
200
.0024
.4150
.2476
.1094
300
.0040
.5624
.3032
.1398
400
.0022
.5922
.3552
.1580
500
.0026
.7860
.4180
.1890
223
The asymptotic theoretical results provided by Watson (1957, 1958)
and Roy (1956) suggest that the asymptotic distribution of the Pearson
chi-square statistic for the random cell case, is stochastically larger
than the chi-square distribution with k-3 degrees of freedom.
However,
the empirical Type I error levels achieved are close to the specified
Type I error levels.
Hence, the Pearson chi-square statistic together
with the percentiles from the chi-square distribution can be used for
the testing of the fit of general distributions to data.
Table 6.10
contains the empirical Type I error levels for the likelihood ratio
statistic when the percentiles of the chi-square distribution with
degrees of freedom equal to the number of cells less three were used.
The percentiles from the chi-square distribution do not provide a good
approximation for the likelihood ratio statistic, as was also noted by
Koehler and Larntz (1980).
Table 6.11.
0.05 level percentiles of the Anderson-Darling statistic
Sample sizes
normal
Gumbel
exponential
20
0.822
0.737
1.946
50
0.971
0.740
1.567
100
0.786
0.727
1.468
The extensive power comparison showed that the class of statistics
based on the empirical distribution function, especially the
224
Anderson-Darling, Watson and Cramer-von Mises statistics, have very good
power in detecting a wide range of alternative distributions.
The
Kuiper and the Kolraogorov-Smirnov statistics have moderately good power.
Also, the relative performance of the statistics based on the empirical
distribution function, except for the
statistic, is quite consistent
from the normal null distribution to the exponential null distribution.
Table 5.11 contains the 0.05 level percentiles of the A^ statistic for
normal, Gumbel and exponential distributions.
The percentiles of the
statistic vary from one distribution to the other.
This is one drawback
of the A^ statistic since new Monte Carlo percentiles must be generated
for the testing of different hypothesized distributions.
The
percentiles of the other statistics based on the empirical distribution
function also vary from one distribution to the other.
The statistics based on the moments can be very useful if used with
care.
It is recommended that a histogram be constructed when the
skewness, kurtosis or rectangle test is used.
This will avoid the
problem of accepting the hypothesized probability model for a random
sample from a distribution with shape different from that of the
hypothesized distribution, but with skewness and kurtosis similar to
those of the hypothesized distribution.
The skewness test can be very
weak for alternative distributions with skewness close to that of the
hypothesized distribution.
Similarly, the performance of the kurtosis
test can be poor for alternative distributions with kurtosis measure
close to that of the hypothesized distribution.
On the contrary, the
rectangle test can detect both kinds of departures from the hypothesized
225
probability model.
In addition, the rectangle test usually has power
comparable to the power of the better one from the skewness or kurtosis
test.
The tests based on moments performed better when the skewness and
kurtosis of the hypothesized distribution are small.
Some ideas for improving the power of the test of fit based on the
P-P probability plot were developed during the course of this study.
The shapes of the distribution curves on P-P probability plots suggest
comparing the fit of a quadratic or cubic polynomial to the fit of a
straight line through the points (0,0) and (1,1).
This is similar to
the suggestion made by LaBrecque (1977) for the normal Q-Q probability
plot.
However, the Monte Carlo study performed by LaBrecque indicates
that the improvement in the power is small.
The k^ statistic developed
in Chapter II measures how closely the points lie along an unspecified
straight line.
However, for the P-P probability plot, the line should
pass through the points (0,0) and (1,1), so that the following statistic
[^(z. - 0.5)(p. - 0.5)]=
kg =
,
(5.1)
I(z. - 0.5): I(p. - 0.5):
which measures how closely the points lie along a straight line through
the points (0,0) and (1,1) may be more powerful than the
statistic.
The ko statistic is related to the k^ statistic through
kg = kV[l + {n(i - 0.5):}/{I(z. - 5):}] .
Hence, kg statistic will be close to
(6.2)
when z is close to 0.5 which
occurs for symmetrical alternatives to symmetrical null hypotheses.
226
Consequently, k§ should be more powerful for detecting skewed
alternatives to symmetrical null distributions.
Since the distribution
curves on the Gumbel or exponential P-P probability plots do not pass
through (0.5,0.5), the k§ statistic will be more powerful for these
alternative distributions.
In light of the good performance of the Shapiro-Wilk statistic for
the testing of normality, it seems reasonable to consider the
corresponding statistic for the P-P probability plot.
F([X^-a]/B) and t. = Ci/(n+1) - 0.5].
Let
=
Assume the case when no
parameters are estimated to obtain F(*), then
, Z^, ...,Z^ is an
ordered random sample from the uniform (0,1) distribution.
From David
(1970, p. 28),
E(Z.) = i/(n+1) ,
and
min(i,j ;
(6.3)
Cov(Z.,Zj) =
n+2
n+1
n+1
n+1
The entire covariance matrix can be written as
V
n
n-1
n-2
=
n
n+2
n-1
2(n-1)
2(n-2)
n+1
Then from Graybill (1969, pp. 181, 182),
n-2
2(n-2)
3(n-2)
(6.4)
227
-1
2
-1
0
(n+1)(n+2)
-1
2
-1
0
-1
2
0
0
-1
0
0
0
.
.
.
.
. 0
. 0
. 0
(6.5)
•
•
2 - 1
0
2
-1
-1
2
The generalized least-squares estimators for the location and scale
parameters, a and g are
—1
»1
—1
(a, B)' = (T'V^^T) ^ T'V^^Z ,
where
T' =
1
S
1
^2
1
(6.6)
= (1, t)' ,
or
(Z^ + Z g ) / ^ ,
(6.7)
and
6 = (Z^ - Zi)/(Pn - P,) ,
(6.8)
where
= i/(n+1 ).
g is an estimator of 1 which is twelve times the variance of the U(0,1)
random variable, when the null hypothesis is true.
Using ^(Z^-Z)^/(n-1)
as an estimator of the variance of the U(0,1) random variable, the ratio
of these estimators yields the Shapiro-Wilk statistic for the uniform
P-P probability plot which is
228
k2 ^
"
n(n+1)(Z - Z )2
n
]
.
12(n-1) I(Z. - Z):
(6.9)
1
This statistic places heavy emphasis on the two extreme points and so
can be expected to be weak if the P-P probability plot of the
alternative distribution passes through the points (0,0) and (1,1).
Since the straight line in the P-P probability plot passes through
the points (0,0) and (1,1), the test
E(Z^) = i/(n+1) is more
appropriate than testing the fit of an arbitrary line.
testing
A statistic for
is
Ki -
V-' <Z„-P„)
(6.10)
(n+1)(n+2)
I t ^ i ^ l - Z ^ - 1 /(n+ 1 ) ] :
where P^ = (1/(n+1),2/(n+1),...,n/(n+1))•, Z^ = (Z^.Z^,
and
= 1.
Z^ = 0
This statistic is based on the spacings of the elements of
and is worthy of further consideration.
This statistic was
previously suggested as a test of fit by Irwin in the discussion of
Greenwood (1946) and by Kimball (1947) who computed the moments of this
statistic.
229
VII.
REFERENCES
Ahrens, J. H. and U. Dieter. 1974. Computer methods for sampling from
gamma, beta, Poisson, and binomial distributions. Computing
12:223-246.
Anderson, T. W. and D. A. Darling. 1952. Asymptotic theory of certain
"goodness of fit" criteria based on stochastic process. Annals of
Mathematical Statistics 23:193-212.
Anderson, T. W. and D. A. Darling. 1954. A test of goodness of fit.
Journal of the American Statistical Association 49:755-769.
Anscombe, F. J. and William J. Glynn. 1983. Distribution of the kurtosis
statistic bg for normal samples. Biometrika 70(1):227-234.
Barnett, V. 1975. Probability plotting methods and order statistics.
Applied Statistics 24:95-108.
Barnett, V. . 1975. Convenient probability plotting positions for the
normal distribution. Applied Statistics 25:47-50.
Barton, D. E. and C. L. Mallows. 1965. Some aspects of the random
sequence. Annals of Mathematical Statistics 36:236-250.
Beasley, J. D. and S. G. Springer. 1977. Algorithm AS111, The
percentage points of the normal distribution. Applied Statistics
26:118-121 .
Benard, A. and E. C. Bos-Levenbach. 1953. The plotting of observations
on probability paper (in Dutch). Statistica Neerlandica 7:153-173.
Biomedical Computer Programs P-Series.
California Press.
1979.
Berkeley: University of
Birnbaum, Z. W. 1952. Numerical tabulation of the distribution of
Kolmogorov's statistic for finite sample size. Journal of the
American Statistical Association 47:425.
Blom, G. 1958. Statistical estimates and transformed beta variables.
New York: John Wiley.
Bofinger, Eve. 1973. Goodness of fit test using sample quantiles.
Journal of the Royal Statistical Society 835:277-284.
Bowker, Albert H. and Gerald J. Lieberman. 1972. Engineering Statistics.
Second edition, Englewood Cliffs, N.J.: Prentice-Hall.
Bowman, K. 0.
1973.
Power of the kurtosis statistic, b^, in tests of
230
departures from normality.
Biometrika 60(3):523-528.
Bowman, K. 0. and L. R. Shenton. 1973. Notes on the distribution of /b^
in sampling from Pearson distributions. Biometrika 60(1):155-167.
Box, G. E. P. and M. E. Muller. 1958. A note on the generation of normal
deviates. Annals of Mathematical Statistics 28:610-611.
Brent, R. P. 1974. A Gaussian pseudo-random number generator (G5).
Communications of the ACM 17:70^-706.
California State Department.
Works Bull., 5.
1923.
Flow in California Streams.
Public
Chandra, M., N. D. Singpurwalla and M. A. Stephens. 1981. Kolmogorov
Statistics for test of fit for the extreme-value and Weibull
distributions. Journal of the American Statistical Association
75:729-731.
Chase, G. R. 1972. Chi-square test when parameters are estimated
independently of the sample. Journal of the American Statistical
Association 67:609-611.
Chernoff, H. and E. L. Lehmann. 1954. The use of maximum likelihood
estimates in the
test for goodness of fit. Annals of Mathematical
Statistics 25:579-586.
Chernoff, H. and G. J. Lieberraan. 1954. Use of normal probability paper.
Journal of the American Statistical Association 49:778-785.
Chibisov, D. M. 1971. Certain chi-square type tests for continuous
distribution. Theory of Probability and Its Applications 16:1-22.
Cochran, W. G. 1952. The
test of goodness-of-fit.
American Statistical Association 47:315-345.
Journal of the
Cohen, A. and H. B. Sackrowitz. 1975. Unbiasedness of the chi-square,
likelihood ratio and other goodness of fit tests for the equal cell
case. The Annals of Statistics 3:959-964.
Cramer, H. 1928. On the composition of elementary errors. Second paper:
Statistical Applications. Skand. Aktuartidskr. 11:141-180.
Cramer, H. 1946. Mathematical Methods of Statistics.
Princeton University Press.
Princeton, N.J.:
Cressie, N. and T. R. Read. 1984. Multinomial goodness-of-fit tests.
Journal of the Royal Statistical Society 46:440-463.
Currie, L.
D.
1980.
The upper tail of the distribution of
231
W-exponential.
Scandinavian Journal of Statistics.
7:1^7-1^9.
D'Agostino, R. B. 1971. An Omnibus test of normality for moderate and
large sample sizes. Biometrika 58:341-348.
D'Agostino, Ralph B. 1973. Monte Carlo power comparison of the W and D
tests of normality for n=100. Communications in Statistics 1:545-551.
D'Agostino, Ralph and E. S. Pearson. 1973. Tests for departure from
normality. Empirical results for the distributions of bg and /bj.
Biometrika 50(3):6l3-622.
D'Agostino, Ralph B. and Gary L. Tietjen. 1973. Approaches to the null
distribution of /bj. Biometrika 50(1):159-173•
Dahiya, R. C. and J. Gurland. 1972. Pearson chi-square test of fit with
random intervals, Biometrika 59:147-153.
Dahiya, R. C. and J. Gurland. 1973. How many classes in the Pearson
chi-square test? Journal of the American Statistical Association
58:707-712.
Daniel, Cuthbert. 1959. Use of half-normal plots in interpreting
factorial two-level experiments. Technometrics 1:311-341.
Darling, D. A. 1955. The Cramer-Smirnov test in the parametric case.
Annals of Mathematical Statistics 25:1-20.
Darling, D. A. 1957. The Kolmogorov-Smirnov, Cramer-von Mises tests.
Annals of Mathematical Statistics 28:823-838.
David, H. A.
1970.
Order statistics.
New York:
John Wiley & Sons, Inc.
Davidson, R. R. and W. E. Lever. 1970. The limiting distribution of the
likelihood ratio statistic under a class of local alternatives.
Sankhya 32:209-224.
Doksum, Kjell. 1975. Plotting with confidence:
two populations. Biometrika 63:421-434.
Graphical comparisons of
Donsker, M. D. 1952. Justisfication and extension of Doob's heusristic
approach to the Kolmogorov-Smirnov theorems. Annals of Mathematical
Statistics 23:277-281.
Doob, J. L. 1949. Heuristic approach to the Kolmogorov-Smirnov theorems.
Annals of Mathematical Statistics 20:393-403.
Durbin, J. 1 9 7 3 a . Weak convergence of the sample distribution function
when parameters are estimated. The Annals of Statistics 1 : 2 7 9 - 2 9 0 .
232
Durbin, J. 1973b. Distribution theory of tests based on the sample
distribution function. Society for Industrial and Applied
Mathematics, Philadelphia, Pennsylvania.
Durbin, J. 1975. Kolmogorov-Smirnov tests when parameters are estimated
with applications to test of exponentiality and tests on spacings.
Biometrika 62:5-22.
Ferrell, E. B. 1958. Probability paper for plotting experimental data.
Industrial Quality Control 15:1.
Filliben, James J. 1975. The probability plot correlation coefficient
test for normality. Technometrics 17(1):111-117.
Fisher, R. A. 1924. The conditions under which
measures the
discrepency between observations and hypothesis. Journal of the Royal
Statistical Society 87:442-450.
Fishman, G. S. 1976. Sampling from the gamma distribution on a computer.
Communications of the ACM 19:407-409.
Fligner, M. A. and T. P. Hettmansperger. 1979. On the use of conditional
asymptotic normality. Journal of the Royal Statistical Society
B41:178-183.
Can, F. F. 1985. Raw power comparison results and DSMCG. Unpulished
manuscript. Department of Statistics, Iowa State University, Ames,
Iowa.
Gerson, Marion. 1975. The techniques and uses of probability plotting.
The Statistician 24:235-257.
Graybill, Franklin A. 1969. Introduction to matrices with applications
in statistics. Belmont, California: Wadsworth.
Greenwood, M. 1945. The statistical study of infectious diseases.
Journal of the Royal Statistical Society A109:85-110.
Gringorten, Irving I, 1963. A plotting rule for extreme probability
paper. Journal of Geophysical Research 68:813-814.
Gumbel, E. J. 1943. On the reliability of the classical chi-square test.
Annals of Mathematical Statistics 14:253-263.
Gumbel, E. J. 1964. Statistical theory of extreme values.
D.C.: National Bureau of Standards.
Washington
Gurland, J. 1955. Distribution of definite and of indefinite quadratic
forms. Annals of Mathematical Statistics 26:122-127. [Correction
33:813].
233
Gurland, J. 1956. Quadratic forms in normally distributed random
variables. Sankhya 17:37-50.
Hacking, Ian.
1984.
Trial by number.
Science 84 5(9):59-70.
Hahn, Gerald J. and Samuel S. Shapiro. 1957.
engineering. New York: John Wiley.
Halmos, P. R.
1950.
Measure Theory.
Statistical models in
New York:
Van Nostrand Company.
Harter, H. L. 1961. Expected values of normal order statistics.
Biometrika 48:151-165.
Harter, H. Leon. 1980. Modified asymptotic formulas for critical values
of the Kolmogorov test statistic. The American Statistician
34:110-111.
Hawkins, D. M.
outliers".
1977. Comment on "A new statistic for testing suspected
Communication in Statistics 6:435-438.
Hazen, A. 1914. Storage to be provided in the impounding reservoirs for
municipal water supply. Transactions of the American Society of Civil
Engineers 77:1547-1550.
Hoaglin David C. and David F. Andrews. 1975. The reporting of
computation-based results in statistics. The American Statistician
29(3):122-126.
Hoist, L. 1972. Asymptotic normality and efficiency for certain
goodness-of-fit tests. Biometrika 59:137-145.
Hoist, L. 1976. On Multinomial Sums.
Technical Summary Report NO. 1629.
Mathematics Research Center
University of Wisconsin-Madison.
Hutchinson, T. P. 1979. The validity of the chi-square test when
expected frequencies are small: a list of recent research references.
Communication in Statistics A8(4):327-335.
IBM Personal Computer Plotting System. 1984. Programmer's guide and plot
system language bindings. International Business Machines
Corporation, Boca Raton, Florida.
Ivchenko I. V. and Medvedev. 1978. Separable statistics and hypothesis
testing. The case of small samples. Theory of Probability and Its
Applications 23:764-775.
Johnk, M. D. 1964. Erzeugung von betaverteilter and gamraaverteilter
zufallszahlen. Metrika 8:5-15.
234
Johnson, Norman L. 1949. System of frequency curves generated by method
of translation. Biometrika 36:149.
Johnson R. A. and D. W . Wichern. 1982. Applied multivariate statistical
analysis. Englewood Cliffs, New Jersey: Prentice-Hall.
Kac, M., Kiefer, J. and Wolfowitz, J. 1955. On tests of normality and
other tests of goodness-of-fit based on distance methods. Annals of
Mathematical Statistics 26:189-211.
Kambhampati, C. 1971. A chi-square statistic for goodness-of-fit tests.
Thesis, Cornell University.
Kane V. E. 1982. Standard and goodness-of-fit parameter estimation
methods for the three-parameter lognormal distribution.
Communications in Statistics 11:1935-1957.
Kempthorne, 0. 1967. The classical problem of inference-goodness of fit.
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability 1:235-249.
Kendall, M. G. and A. Stuart.
2. London: Griffin.
1961.
The advanced theory of statistics,
Kennedy, William J. Jr. and James E. Gentle. 1980.
Computing. New York: Marcel Dekker, Inc.
Statistical
Kimball, B. F. 1947. Some basic theorems for developing tests of fit for
the case of the nonparametric probability distribution function, I.
Annals of Mathematical Statistics 18:540-548.
Kimball, Bradford F. I960. On the choice of plotting positions on
probability paper. Journal of the American Statistical Association
55:546-560.
Kinderman, A. J., J. F. Monahan and J. G. Ramage. 1977. Computer methods
for sampling from Student's t distribution. Mathematics of
Computation 31:1009-1018.
King, James R. 1971. Probability charts for decision making.
Industrial Press Inc.
New York:
Koehler Kenneth J. and Kinley Larntz. 1980. An empirical investigation
of goodness-of-fit statistics for sparse multinomials. Journal of the
American Statistical Association 75(370):336-344.
Kolmogorov, A. 1933. Sulla determinazione empirica di una legge di
distribuzione. Giorn. 1st. Ital. Attuari. 4:83-91.
Kotz, S. , N. L. Johnson
and D. W. Boyd.
1967.
Series representations of
235
distributions of quadratic forms in normal variables.
case. Annals of Mathematical Statistics 3 8 : 8 2 3 - 8 3 6 .
I.
Central
Kuiper N. H. 1959. Alternate proof of a theorem of Birnbaum and Pyke.
Annals of Mathematical Statistics 30:251-252.
LaBrecque, J. 1977. Goodness-of-fit tests based on non-linearity in
probability plots. Technometrics 19:293-306.
Lancaster, H. 0. 1969. The chi-squared distribution.
Wiley and Sons, Inc.
New York:
John
Larntz, K. 1978. Small-sample comparisons of exact levels for chi-square
goodness-of-fit statistics. Journal of the American Statistical
Association 73:253-263.
Larsen, Ralph I., Thomas C. Curran and William F. Hunt, Jr. 1980. An air
quality data analysis system for interrelating effects, standards, and
needed source reductions: Part 6. Calculating concentration
reductions needed to achieve the new national ozone standard. Journal
of the Air Pollution Control Association 30:662-669.
Lawless, J. F. 1982. Statistical models and methods for lifetime data.
New York: John Wiley and Sons.
Lilliefors, H. W. 1967. On the Kolmogorov-Smirnov test for normality
with mean and variance unknown. Journal of the American Statistical
Association 62:399-404.
Lilliefors, H. W. 1969. On the Kolmogorov-Smirnov test for the
exponential distribution with mean unknown. Journal of the American
Statistical Association 64:387-389.
Littell, Ramon C., James T. McClave and Walter W. Offen. 1979.
Goodness-of-fit tests for the two parameter weibull distribution.
Communications in Statistics B8(3):257-269.
Looney, Stephens W. and Thomas R. Gulledge, Jr. 1985. Use of the
correlation coeffiecient with normal probability plots. The American
Statistician 39:75-79.
Mage, David T. 1980. An empirical model for the Kolmogorov-Smirnov
statistic. Journal of Environmental Science and Health, Part A
15:139-147.
Mage, David T. 1982. An objective graphical method for testing normal
distributional assumptions using probability plots. The American
Statistician 36(2):116-120.
Mann, H. B. and A. Wald.
1942.
On stochastic limit and order
236
relationships.
Annals of Mathematical Statistics 14:217-226.
Mann, N. R., E. M. Scherer and K. W. Fertig. 1973. A new goodness of fit
tests for the Weibull distribution or extreme value distribution with
unknown parameters. Communications in Statistics 2:283-400.
Medvedev, Yu. I. 1977a. Separable statistics in a polynomial scheme, I.
Theory of Probability and Its Applications 22(1):1-15.
Medvedev, Yu. I. 1977b. Separable statistics in a polynomial scheme, II.
Theory of Probability and Its Applications 22(3)=607-614.
Michael, John R.
70:11-17.
1983.
The stabilized probability plot.
Microsoft FORTRAN Reference Manual.
Corporation.
1983.
Bellevue, WA:
Biometrika
Microsoft
Mood, M. Alexander, Franklin A. Graybill and Duane C. Boas. 1974.
Introduction to the theory of statistics. Third edition. New York:
McGraw-Hill, Inc.
Moore, D. S. 1970. On the multivariate chi-square statistics with random
cell boundaries. Purdue Statistics Department Mimeo Series No. 246.
Moore, D. S. 1971. A chi-square statistic with random cell boundaries.
Annals of Mathematical Statistics 42:147-155.
Moore, D. S. and M. C. Spruill. 1975. Unified large sample theory of
general chi-squared statistics for tests of fit. Annals of Statistics
3(3):599-6l6.
Moore, David S. 1977. Generalized Inverses, Wald's method, and the
construction of chi-squared test of fit. Journal of the American
Statistical Association 72(357):131-137.
Morris, C. 1975. Central limit theorems for multinomial sums.
Statistics 3:165-188.
Annals of
Murthy, V. K. and A. V. Gafarian. 1970. Limiting distributions of some
variants of the chi-square statistics. Annals of Mathematical
Statistics 41:188-194.
Pearson, E. S., R. E. D'Agostino and K. 0. Bowman. 1977. Test of
departure from normality; Comparison of powers. Biometrika
64(2).-231-246.
Pearson, K. 1900. On the criterion that a given system of deviations
from the probable case of a correlated system of variables is such
that it can be reasonably supposed to have arisen from random
237
sampling.
Philosophical Magazine 50:157-175.
Quesenberry C. P. and Craige Hales. 1980. Concentration Bands for
Uniform plots. Journal of Statistical Computation and Simulation
11:41-53.
Rand Corporation. 1955. A million random digits with 100000 normal
deviates. Rand Corporation, Santa Monica, California.
Rao, K. C. and D. S. Robson. 1974. A chi-square statistic for
goodness-of-fit tests within the exponential family. Communications
in Statistics 3(1 2):1 1 39-1 153.
Rayner, J. C. and D. J. Best. 1982. The choice of class probabilities
and number of classes for the simple
goodness-of-fit test. Sankhya
844:28-38.
Read, Timothy B.C. 1984. Small-sample comparisons for the power
divergence goodness-of-fit statistics. Journal of the American
Statistical Association 79(388):929-935.
Roscoe, J. T. and J. A. Byars. 1971. An investigation of the restraints
with respect to the sample size commonly imposed on the use of the
chi-square statistics. Journal of the American Statistical
Association 66:755-759.
Roy, A. R. 1956. On
statistics with variable intervals. Technical
Report No. 1. Department of Statistics, Stanford University,
Stanford, California.
Royston, J. P. 1982a. An extension of Shapiro and Wilk's test for
normality to large samples. Applied Statistics 31:115-124.
Royston, J. P. 1982b. Expected normal order statistics (exact and
approximate). Algorithm AS177. Applied Statistics. 31(2):161-165.
Royston, J. P. 1982c.
31 :176-180.
The W test for normality.
Applied Statistics
Royston, J. P. 1983. Some techniques for assessing multivariate
normality based on the Shapiro-Wilk W. Applied Statistics 2:121-133.
Ryan, Thomas A., Jr. and Brain L. Joiner. 1974. Normal probability plots
and tests for normality. Technical Report. Statistics Department,
Pennsylvania State University, University Park, Pennsylvania.
Sahler, W. 1968. A survey on distribution-free statistics based on
distances between distribution functions, Metrika 13:149-169.
Sarkadi, K.
1975.
The consistency of the Shapiro-Franoia test.
238
Biometrika 62(2):445.
SAS Institute Inc.
1982. SAS User's guide: Basics.
Gary, NC: SAS Institute Inc.
1982 Edition.
Scheffé, H. 1947.
distributions.
A useful convergence theorem for probabilityAnnals of Mathematical Statistics 18:434-438.
SCIENCE 84.
American Association for the Advancement of Science.
1984.
Sethuraman^ J. 1961. Some limit theorems for joint distributions.
Sankhya 23(A):379-386.
Shapiro, S. S. 1964. An analysis of variance test for normality
(complete samples). Unpublished Ph.D. thesis, Rutgers - The State
University.
Shapiro, S. S. and M. B. Wilk. 1965. An analysis of variance test for
normality (complete samples). Biometrika 52:591-611.
Shapiro, S. S., M. B. Wilk and N. J. Chen. 1968. A comprehensive study
of various test for normality. Journal of the American Statistical
Association 53:1342-1372.
Shapiro, S. S. and R. S. Francia. 1972. An approximate analysis of
variance test for normality. Journal of the American Statistical
Association 67(337)=215-216.
Shapiro, S. S. and M. B. Wilk. 1972. An analysis of variance test for
the exponential distribution (complete sample). Technometrics
14:355-370.
Shapiro, S. S. 1980. How to test normality and other distributional
assumptions. ASQC Basic References in Quality Control: Statistical
Techniques, Volume 3- American Society for Quality Control,
Milwaukee, Wisconsin.
Shapiro, S. S. and C. W. Brain. 1982. Recommended distributional testing
procedures. American Journal of Mathematical and Management Science
2:175:221.
Sinha, B. K. 1976. On unbiased of Mann-Wald-Gumbel x^'test. Sankhya
A38:124-130.
Slakter, M. J. 1956. Comparative validity of the chi-square and two
modified chi-square goodness-of-fit tests for small but equal expected
frequencies. Biometrika 53:619-622.
1
Smirnov, N. V. 1936. Sur la distribution de w^ (Critérium de M. R. v.
Mises). C. R. Acad. Soi. Paris 202:449-452.
239
Smirnov, N. V. 1937. Sur la distribution de
(Critérium de M. R. v.
Mises). (Russian/French summary). Mat. Sbornik (N. S.).
2(44):973-993.
r
Smirnov, N. V. 1939. Sur les écarts de la courbe de distribution
empirique (Russian/French summary). Mat. Sbornik (N.S.) 6(18):3-26.
Smirnov, N. V. 1941. Approximate laws of distribution of random
variables from empirical data. Uspekhi Mat. Nauk 10:179-206
(Russian).
Smith, Paul J., Donald S. Rae, Ronald W. Manderscheid and Sam Silbergeld.
1979. Exact and approximate distributions of the chi-square statistic
for equiprobability. Communications in Statistics B8(2):131-1^9.
Snedecor, George W. and William G. Cochran. 1980. Statistical Methods.
Seventh edition. Ames, Iowa: Iowa State University Press.'
Steck, G. P. 1957. Limit theorems for conditional distributions.
University of California Publications in Statistics 2(12):237-284.
Stephens, M. A. 1970. Use of Kolmogorov-Smirnov, Cramer-Von Mises and
related statistics without using extensive tables. Journal of the
Royal Statistical Society B32(1):115-122.
Stephens, M. A. 1971. Asymptotic results for goodness-of-fit statistics
when parameters must be estimated. Stanford Research Report No. 180,
Department of Statistics, Stanford University. Stanford, California.
Stephens, M. A. 1974. EDF statistics for goodness-of-fit and some
comparisons. Journal of the American Statistical Association
69:730-737.
Stephens, M. A. 1976. Asymptotic results for goodness-of-fit statistics
with unknown parameters. The Annals of Statistics 4(2);357-369.
Stephens, M. A. 1977. Goodness of fit for the extreme value
distribution. Biometrika 64(3);583-588.
Stirling, Douglas. 1982. Enhancements to aid interpretation of
probability plots. Statistician 31:211-220.
Sukhatme, Shashikala. 1972. Fredholm determinant of a positive definite
kernal of a special type and its applications. Annals of Mathematical
Statistics 43:1914-1926.
Tate, M. W. and L. A. Hyer. 1973. Inaccuracy of the
test of goodness
of fit when expected frequencies are small. Journal of the American
Statistical Association 68:836-841.
240
Tukey, J. W. 1962. The future of data analysis.
Statistics 33:1-67.
Von Mises, R.
1931.
Wahrscheinlichkeitsrechnung.
Annals of Mathematical
Wein, Leipzig.
Wald, Abraham. 1943. Tests of statistical hypotheses concerning several
parameters when the number of observations is large. Transactions of
the American Mathematical Society 54:426-482.
Watson, G. S. 1957. The
goodness-of-fit test for normal
distributions. Biometrika 44:336-348.
Watson, G. S. 1958. On chi-square goodness-of-fit tests for continuous
distributions. Journal of the Royal Statistical Society, Series B
20:44-72.
Watson, G. S. 1959. Some recent results in
Biometrics 15:440-458.
goodness-of-fit tests.
Watson, G. S. 1951.
48:109-114.
Goodness-of-fit test on a circle.
Watson, G. S.
49:57-63.
Goodness-of-fit test on a circle II.
1962.
Weibull, W. 1939. The phenomenon of rupture in solids.
Vetenskaps Akademien Handlingar 153:17.
Biometrika
Biometrika
Ingénions
Weisberg, S. 1974. An empirical comparison of percentage points of W and
W'. Biometrika 61:644-646.
Weisberg, S. and C. Bingham. 1975. An approximate analysis test for
non-normality suitable for machine calculation. Technometrics
17:133-134.
Wichmann, B. A. and I. D. Hill. 1982a. An efficient and portable
pseudo-random number generator. Applied Statistics 31:188-190
[Correction 33:1233.
Wichmann, B. A. and I. D. Hill. 1982b. A pseudo-random number generator,
National Physical Laboratory Report DITC 6/82. National Physical
Laboratory, Teddington, Middx, UK.
Wilk, M. B. and R. Gnanadesikan. 1968. Probability plotting methods for
the analysis of data. Biometrika 55:1-17.
Witting, H. 1959. Uber einen x^-test, dessen klassen durch geordnete
Stichproben funktionen festgelegt werden. Ark. Mat. 10:468-479.
241
VIII.
ACKNOWLEDGEMENTS
I would like to express my deepest appreciation to Dr. Koehler for
his guidance, assistance, and patience.
The financial support given to
me by the Department of Statistics and the Statistical Laboratory for my
graduate program at the Iowa State University will be remembered.
I
appreciate the computing facility provided by Dr. Kennedy and thank Bud
for his thoughtfulness.
Friends at Iowa State and the church at Lincoln
Swing have made my stay here a very pleasant one.
family can not be described.
The sacrifice of my
242
IX.
APPENDIX A.
Normal distribution
1
N(y,o^)
"
f(x) =
PARAMETRIC FAMILIES OF DISTRIBUTIONS
^20*^
-co
e
<
,
p
<
/(2ira^)
,
1—
f(x) = (1-p)
e
.
+
x^
1
e
ScConN(p,a)
x^
" ~2^
e
1
+ p
/(2n)
*
, „
0 > 0 .
-« < x < " .
/(2ira^)
,
Truncated normal distribution
TruncN(a,b)
•
/(2iTa^)
,
F(b) - F(a)
Exponential distribution
X - a
^
-® < X < " ,
-= < a < b < " .
Exponential(a,B)
.
,
6
.
-" < a < " ,
B > 0 ,
X > a .
Exponential distribution with location 0
f(x) = — e
6
-" < n < " ,
-" < X < " .
/(2it)
Normal distribution with scale contaminated
f(x) = (1-p)
LoConN(p,u)
- ^*2*^
e
,
p
/(2w)
f(x) = — e
,
-" < X < " .
Normal distribution with location contaminated
f(x) =
œ
0 > 0 ,
^
,
6 > 0 ,
x > 0.
Exponential(g)
243
Exponential dlstributi on with 1ocation contaminated
f(x) = (1-p) e- X + p e- (X - =)
X > a .
Exponential distribution with scale contaminated
ScConE(p,B)
-co < a < oo J
- X
f(x) = (1-p) e
+
6 > 0 ,
X > 0 .
Truncated exponential distribution
TruncE(0,b)
- X
f(x) = e
b > 0 ,
0 < X < b .
F(b)
Logistic distribution
f(x) =
LogisticÇa,B)
< a < " ,
exp[-(x - a)/B]
g > 0 ,
-« < X < = .
6 {1 + exp[-(x - a)/6]}•
Laplace distribution
Laplace(a,B)
-0°
<
a
<
m
,
f(x) = exp[-|x - a|/B]/(2B) ,
B > 0 ,
-to < X < <° .
Asymmetric triangle distribution
Triangle 11(c)
f(x) = 2/c - 2x/c* ,
c > 0 ,
0 < X < c .
Symmetric triangle distribution
1/c - x/c^
f(x) =
1/c + x/c^
LoConE(p,a)
Triangle 1(c)
c > 0 ,
-c < X < c .
2#
Beta distribution
f<x).
Beta(a,B)
x""' (i-x)»"' ,
6 > 0:
0 < X < 1 .
r(ct)r(B)
Cauchy distribution
Cauchy(a,B)
1
f(x) =
-œ < a < " ,
g > 0 ,
-" < x < " .
,
m g [1 + {(x-a)/g}^
Gamma distribution
1
f(x) =
Gamma(\,a,B)
a-1
(x - X)
exp[-(x - A)/g],
* > 0'
g > 0 ,
x > X .
r(a)6°
Chi-square distribution
f(x) =
Chi-square(k)
k/2 - 1
vlT' ^
' sxp (-x/2),
r(k/2)2*'^
Weibull distribution
k > 0 ,
X > 0 .
Weibull(c,a,g)
,
f(x) = c/g*C(x - a)/B]
0 >0
• ex p [ - ( x - a ) / g ] ,
,
-»<%<=,
-co < X < " ,
g > 0 .
Standard Weibull distribution
f(x) = G x° ^'exp(-x°) ,
Weibull(c)
c > 0,
Gumbel or extreme value distribution
f(x) .
. expC-e-^:" "
-o> < X < " .
Gumbel(a,g)
^
-00 < x < " ,
g > 0 .
245
Uniform distribution
Uniform(a,B)
f ( x ) = 1 / ( 3 - a ) ,
-= < a < g < » ,
a < X < g .
Johnson bounded distribution
SB(a,g)
The Johnson bounded random variable Y is related to the standard
normal random variable X by the equation:
X = a +
glog[Y/(1
- Y)] ,
0 < Y < 1 .
Johnson unbounded distribution
SU(a,g)
The Johnson unbounded random variable Y is related to the standard
normal random variable X by the equation:
X = a+ Bsinh^Y,
Lognormal distribution
- = < Y < =» .
Lognormal(A,a,B)
The lognormal random variable Y is related to the standard normal
random variable X by the equation:
X = a +
Blog(Y - X) ,
X < Y < » .
Symmetric Tukey distribution
Tukey(A)
The symmetric Tukey random variable Y is related to the standard
uniform random variable U (on [0,1]) by the equation:
Y =
- (1 - U)^ .
t distribution
t(v)
f^(x) = [1 + xz/v]"(^
^
246
IX.
APPENDIX B.
RANDOM VARIATES GENERATORS
The methods of generating random numbers from distributions
different from the uniform distribution are described in this Appendix.
A uniform random (0,1) variate is denoted by U.
These methods may not
be the most efficient methods for generating random numbers.
For
efficient generators, an excellent description can be found in Kennedy
and Gentle (1980).
Normal distribution
N(y,a^) (Box-Muller transformation, 1958)
A pair of independent normal variates (Xi.Xg) is obtained by the
transformation:
Xi
= oos(2nUz)./[-2
In(Ui)]
X; = sin(2%U2)./[-2 ln(Uj]
Normal distribution with location contaminated
1.
Generate a normal random variate, X.
2.
Generate a uniform random variate, U.
3.
If U < p then deliver X + u, else deliver X.
Normal distribution with scale contaminated
LoConN(p,u)
ScConN(p,a)
1.
Generate a normal random variate, X.
2.
Generate a uniform random variate, U.
3.
If U < p then deliver aX, else deliver X.
247
Truncated normal distribution
TruncN(a,b)
1.
Generate a normal random variate, X.
2.
Generate a uniform random variate, U.
3.
If a < U < b then deliver X, else go to 1.
Exponential distribution
ExponentiaKa,g)
X = -glog(U) + a
Exponential distribution with location contaminated
1.
Generate an Exponential(0,1) random variate, X.
2.
Generate a uniform random variate, U.
3.
If U < p then deliver X + a, else deliver X.
Exponential distribution with scale contaminated
ScConE(p,6)
1.
Generate an ExponentiaKO,1 ) random variate, X.
2.
Generate a uniform random variate, U.
3.
If U < p then deliver gX, else deliver X.
Truncated exponential distribution
TruncE(0,b)
1.
Generate an ExponentiaKO,1 ) random variate, X.
2.
Generate a uniform random variate, U.
3-
If X < b then deliver X, else go to 1 •
Logistic distribution Logistic(a,B)
X = a - 61n(1/U - 1)
LoConE(p,ct)
248
Laplace distribution
Laplace(a,B)
1.
Generate two uniform random variates, Uj and Uj.
2.
If Uj < 0.5 then deliver a + 61n(2U2),
else deliver a - 31n(2[1-U2]).
Asymmetric triangle distribution
Triangle 11(c)
X = 0 - c/U.
Symmetric triangle distribution
Triangle 1(c)
1.
Generate a Triangle 11(c) random variate, Y.
2.
Generate a uniform random variate, U.
3.
If U < 0.5 then deliver X = - Y,
else deliver X = Y.
Beta distribution
Beta(a,B) (Algorithm Jojnk, 1964)
1.
Generate
and
2.
Set Yi =
3.
If W g 1 then deliver X = Yj/W, else go to 1.
Cauchy ditribution
.
Yz =
^^d W = Yj + Y^.
Cauchy(a,g)
X = a + B tan[n(U - 0.5)]
Gamma distribution
Gamma(X,a,B)
The generation of Gamma random variate was based on two methods,
depending on whether a is less than or greater than 1.
For a less than
1, the method by Ahrens in Ahrens and Dieter (1974) was used.
The
method by Fishman (1976) was employed when a is greater or equal to 1.
Descriptions of these two generators can be found in Kennedy and Gentle
( 1 9 8 0 , pp. 2 1 3 , 2 1 4 ) .
249
Welbull distribution
Weibull(c,a,B)
X = a + B(-lnU)T/°
Gumbel or extreme value distribution
Gumbel(a,B)
X = a - Bln(-lnU)
Johnson bounded distribution
SB(a,B)
1. Generate aN(0,1) random variate, Y.
2. Deliver X = a + Blog[Y/(1 - Y)].
Johnson unbounded distribution SU(a,B)
1. Generate a N(0,1) random variate, Y.
2. Deliver X = a + B slnh
Lognormal distribution
—1
Y,
Lognormal(X,a,B)
1. Generate aN(0,1) random variate, Y.
2. Deliver X = a + Blog(Y - X).
Symmetric Tukey distribution
Y =
Tukey(A)
- (1 - U)^ .
t distribution
t(v) (Kinderman, Monahan and Ramage, 1977)
Algorithm TAR:
1.
Generate U;.
Generate Uj.
If Uj < 0.5, go to 2.
Go to 3.
2.
Set X = 0.25/(Ui - 0.25).
3.
If U2 < 1 -
|x|/2,
If Uj < (1 + X^/v)
Set X =
Generate U3 and set
- 3.
= Uj/X^.
deliver X.
^ T)/'2
deliver X, else go to 1.
250
IX.
APPENDIX C.
COMPUTER PROGRAMS
SAS program for the generation of normal P-P and Q-Q probability plots
//CAN JOB 13542,SASPPQQ
//SI EXEC SAS
//SYSIN DD *
*
I
*
*
INPUT THE OBSERVATIONS X1,X2,...,XN INTO DATAX
|
* ;
DATA DATAX;
INPUT XI;
CARDS;
129
104
124
146
83
134
161
123
107
119
113
97
*
SORT THE OBSERVATIONS X1,X2,...,XN
PROC SORT; BY XI;
*
COMPUTE THE MEAN AND STANDARD DEVIATION OF X I , X 2
STORE MEAN AND STANDARD DEVIATION IN DATAMLE
PROC MEANS; VAR XI;
OUTPUT OUT=DATAMLE MEAN=XMEAN STD=XSTD N=NUM;
*
*
MERGE THE 2 DATA SETS
DATA DATACOMB;
MERGE DATAX DATAMLE;
*
COMPUTE:
XTRANS = STANDARDIZED XI
PI = I/(N+1) I.E. THE PLOTTING POSITIONS
NORPERC = INVERSE NORMAL CDF OF PI
NORPROB = NORMAL PROBABILITY OF XTRANS
DATA DATAPLOT; SET DATACOMB;
XN
251
IF _N_=1 THEN ALPHA=XMEAN;
IF _N_=1 THEN BETA=XSTD;
IF _N_=1 THEN N=NUM;
XTRANS=(XI-ALPHA)/BETA;
PI=_N_/(N+1);
NORPERC=PROBIT(PI);
NORPROB=PROBNORM(XTRANS);
DROP XMEAN XSTD NUM ALPHA BETA N;
RETAIN ALPHA BETA N;
*
I
*
*
PRINT THE OBSERVATIONS AND PLOTTING POSITIONS
|
* ;
PROC PRINT;
*
I
*
*
CONSTRUCT P-P PLOT
|
* ;
PROC PLOT;
PLOT NORPROB*PI='*' / VAXIS = 0 .1 .2 .3 .4 .5 .5 .7 .8 .9 1
HAXIS = 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
VSPACE = 4
HSPACE = 5
VPOS = 44
HPOS =55
HZERO
VZERO;
TITLE NORMAL P-P PROBABILITY PLOT;
*
CONSTRUCT Q-Q PLOT
NOTE: MODIFY VAXIS AND HPOS VALUES FOR OTHER Q-Q PLOTS
PROC PLOT;
PLOT XI*N0RPERC='*'/VAXIS= r .5 1 . 6 1.7 1 . 8 1.9 2 . 0 2. 1 2.2 2.3 2.4 2.5
HAXIS = -3 -2.4 -1.8 -1.2 -.6 0 .5 1.2 1.8 2.4 3
VSPACE = 4
HSPACE = 5
VPOS = 44
HPOS = 55;
TITLE NORMAL Q-Q PROBABILITY PLOT;
//
252
DISSPLA program for the generation of normal P-P and Q-Q probability
plots
//CAN JOB 13542,DISPPQQ
NORMAL P-P PROBABILITY PLOT
/*OUTPUT P001 FORMS=3001,C0PIES=1
//SI EXEC FORTVD,REGION.G0=512K,TIME=1
//FORT.SYSIN DD *
C
C
C
C
C
C
C
XI = OBSERVATIONS
PI = PLOTTING POSITIONS
XTRANS = STANDARDIZED OBSERVATIONS
NORPER = NORMAL CDF OF PLOTTING POSITIONS
NORPRO = NORMAL PROBABILITY OF STANDARDIZED OBSERVATIONS
NUMOBS = N, NUMBER OF OBSERVATIONS
REAL XI,PI,XTRANS,NORPER,NORPRO
DIMENSION XI(IOOO),PI(1000),XTRANS(1000),NORPER(1000)
DIMENSION NORPRO(WOO)
CC INPUT THE OBSERVATIONS XI, X2
XN INTO XI()
cDO 100 1=1,1000
READ (5,*,END=200) XI(I)
100 CONTINUE
200 NUM0BS=I-1
c-
C SORT THE OBSERVATIONS XI, X2, . . ., XN
CCALL SORT(NUMOBS,XI)
CC COMPUTE MEAN AND STANDARD DEVIATION OF XI, X2, . . ., XN
CCALL MLE(NUMOBS,XI,XMEAN,XSTD)
c-
COMPUTE PLOTTING POSITIONS [PI=I/(NUM0BS+1 )]
C
c-
NOBSPl=NUM0BS+1
DO 300 1=1,NUMOBS
PI(I)=I/REAL(N0BSP1)
300 CONTINUE
c-
COMPUTE STANDARDIZED OBSERVATIONS AND PLOTTING POSITIONS
C
c-
DO 400 1=1,NUMOBS
XTRANS(I)=(XI(I)-XMEAN)/XSTD
CALL NPROB(NORPRO(I),XTRANS(I))
CALL NINV(PI(I),NORPER(I))
400 CONTINUE
253
C
C
C
C
PREPARE LABELS FOR AXES OF Q-Q PLOT.
X AXIS LABELS WILL DISPLAY 2 DECIMAL PLACES.
XTOP=NORPER(NUMOBS)+0.1
XTOP=REAL(NINT(XTOP*10.0))/10.0
XB0T=N0RPER(1)-0.1
XBOT=REAL(NINT(XBOT*1 G.0))/1 0.0
XINCR=(XTOP-XBOT)/10.0
YTOP=XI(NUMOBS)
YB0T=XI(1)
YINCR=(YTOP-YBOT)/10.0
C
C
C
C
C
C
COSTRUCT NORMAL P-P PLOT
*
|
*
CALL ZETA(53,11,15)
CALL PHYSOR(2.0,4.25)
CALL HWROT('MOVIE')
CALL AREA2D(5.0,5.0)
CALL COMPLX
CALL SETCLR('BLACK')
CALL XMME('Uniform probabilities!',100)
CALL YNAME('Normal probabilities$',100)
CALL GRAF(0.0,0.1,1.0,0.0,0.1,1.0)
CALL RLVEC(0.0,0.0,1.0,1.0,0)
CALL THKFRM(.01)
CALL FRAME
CALL MARKER(15)
CALL CURVE(PI,N0RPR0,NUM0BS,-1)
CALL ENDGR(O)
CALL PHYSOR(2.0,1.25)
CALL AREA2D(5.0,8.0)
CALL MESSAGC'Figure 3.1. Normal P-P probability plot$',100,
&0.0,1.84375)
CALL ENDPL(I)
*
CONSTRUCT NORMAL Q-Q PLOT
|
*
CALL PHYSOR(2.0,4.25)
CALL AREA2D(5.0,5.0)
CALL XNAME('Normal percentiles!', 100)
CALL YNAME('Observed percentiles!',100)
CALL GRAF(XBOT,XINCR,XTOP,YBOT,YINCR,YTOP)
CALL THKFRM(.OI)
CALL FRAME
CALL SETCLR('BLACK')
CALL MARKER(15)
CALL CURVE(NORPER,XI,NUMOBS,-1)
CALL ENDGR(O)
CALL PHYSOR(2.0,1.25)
254
CALL AREA2D(5.0,8.0)
CALL MESSAG('Figure 3.2.
&0.0,1.84375)
CALL ENDPL(2)
CALL DONEPL
STOP
END
C
C
C
C
C
Normal Q-Q probability plot$',100,
*
SORT OF OBSERVATIONS IN NON-DESCENDING ORDER
SHELL SORT ALGORITHM USED
SOURCE: R. LOESER, COMMUNICATIONS OF THE ACM,
VOL 17, NO 3, P. 143
SUBROUTINE SORT(N,SOBS)
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*4 (I-N)
IMPLICIT REAL*4 (0-W)
DIMENSION SOBS(N)
1=1
101
102
IF(I-N) 102,102,103
1=1+1
GOTO 101
103
M=I-1
104
M=M/2
IF(M) 110,110,105
K=N-M
DO 109 J=1 ,K
I=J+M
I=I-M
IF(I) 109,109,107
L=I+M
IF(SOBS(L)-SOBS(I)) 108,108,109
S=SOBS(I)
SOBS(I)=SOBS(L)
SOBS(L)=S
GOTO 106
CONTINUE
GOTO 104
RETURN
END
105
106
107
108
109
110
C
C
C
C
*
MAXIMUM LIKELIHOOD ESTIMATION OF MEAN AND STANDARD
DEVIATION FOR THE OBSERVATIONS
*
SUBROUTINE MLE(NUMOBS,SOBS,SMEANX,STDX)
IMPLICIT DOUBLE PRECISION (A-H)
255
200
C
C
C
C
IMPLICIT INTEGER*^ (I-N)
IMPLICIT REAL*4 (0-W)
DIMENSION SOBS(NUMOBS)
DXBAR=(DBLE(SOBS(1))+DBLE(SOBS(2)) ) / 2 . OD+0
DT1=DBLE(S0BS(1))-DXBAR
DT2=DBLE(SOBS(2))-DXBAR
DVAR=DT1*DT1+DT2*DT2
DNPAR=DFLOAT(NUMPAR)
N0BSM2=NUM0BS-2
NPARM1=NUMPAR-1
DO 200 1=1,N0BSM2
J=I+1
K=I+2
DI=DFLOAT(I)
DJ=DFLOAT(J)
DK=DFLOAT(K)
DXMXB=DBLE(SOBS(K))-DXBAR
DVAR=(DI*DVAR+DXMXB*DXMXB*DJ/DK)/DJ
DXBAR=(DJ*DXBAR+ DBLE(SOBS(K)))/DK
CONTINUE
SMEANX=SNGL(DXBAR)
STDX=SNGL(DSQRT(DVAR))
RETURN
END
*
COMPUTE NORMAL PROBABILITIES
CODY ALGORITHM USED
SOURCE: KENNEDY & GENTLE, "STATISTICAL COMPUTING", 1980
SUBROUTINE NPROB(SP,SXP)
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*^) (I-N)
IMPLICIT REAL*4 (0-W)
DXP=DBLE(SXP)
IC0R=0
IF(DXP.GE.O) GOTO 200
IC0R=1
DXP=-DXP
200
CONTINUE
ARG=DXP/DSQRT(0.2D+1)
IF(DXP.GE.0.45875) GOTO 300
DP=(1.0D+0+ARG*D1(ARG))/2.0D+0
GOTO 900
300
IF(DXP.GE.4.0) GOTO 400
DP=(2.0D+0-DEXP(-ARG*ARG)*D2(ARG))/2.0D+0
GOTO 900
400
DP=(0.2D+1-(DEXP(-ARG*ARG)/ARG)*(1.0D+0/DSQRT(3.141 592653589798
&D+Û)+D3(1.OD+0/(ARG*ARG))/CARG*ARG)))/2.OD+0
256
900
950
CONTINUE
IFdCOR.NE.I ) GOTO
DP=1 .QD+G-DP
DXP=-DXP
CONTINUE
SP=SNGL(DP)
RETURN
END
950
C
DOUBLE PRECISION FUNCTION D1(F)
C
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*') (I-N)
IMPLICIT REAL*4 (O-W)
DP0=2.4266795523053175D+2
DPI =2.1 9792616182941520+1
DP2=6.99638348861913550+0
DP3=-3.5609843701 Si 53850-2
DQ0=2.1505887586986120D+2
DQ1=9.1 16490540451 4901 D+1
DQ2=1.50827976304077870+1
DQ3=1.OD+0
ANUM=((DP3*F*F+DP2)*F*F+DP1)*F*F+DP0
DEN= ((DQ3*F*F+DQ2)*F*F+DQ1)*F*F+DQO
D1=ANUM/DEN
RETURN
END
C
DOUBLE PRECISION FUNCTION D2(F)
C
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*4 (I-N)
IMPLICIT REAL*4 (O-W)
DP0=3.004592610201616005D+2
DPI=4.5191895371187294220+2
DP2=3.3932081 673-434368700+2
DP3=1.5298928504694040390+2
B4=4.3162227222056735300+1
55=7.2117582508830936590+0
B6=5.6419551 747897397110-1
B7=-1.3686485738271670670-7
DQ0=3.0045926095698329330+2
DQ1=7.909509253278980272D+2
DQ2=9.3135409485060962110+2
DQ3=6.3898026446563116650+2
G4=2.7758544474398764340+2
05=7.7000152935229472950+1
G6=l.2782727319629423510+1
G7=1.0D+0
ANUM=((((((B7*F+B6)*F+B5)*F+B4)*F+DP3)*F+DP2)*F+0P1)*F+DPO
257
DEN =((((((G7*F+G6)*F+G5)*F+G4)*F+DQ3)*F+DQ2)*F+DQ1)*F+DQO
D2=ANUM/DEN
RETURN
END
C
DOUBLE PRECISION FUNCTION D3(F)
C
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*4 (I-N)
IMPLICIT REAL*4 (0-W)
DP0=-2.996107077035421 74D-3
DPI=-4.9473091 0623250734D-2
DP2=-2.26956593539686930D-1
DP3=-2.78661 3086096477880-1
B4=-2.231 924597341 846860-2
DQ0=1.062092305284679180-2
0Q1=1.913089261078298410-1
0Q2=1.0516751 07067932070+0
0Q3=1.987332018l 71 352560+0
04=1.00+0
ANUM=DP0+DP1 /(F*F)+0P2/(F*F*F*F)+0P3/(F*F*F*F*F*F)+B4/(F*F*F*F*
C
F*F*F*F)
DEN =DQ0+DQ1/(F*F)+0Q2/(F*F*F*F)+0Q3/(F*F*F*F*F*F)+G4/(F*F*F*F*
C
F*F*F*F)
03=ANUM/0EN
RETURN
END
C
C
C
COMPUTE NORMAL PERCENTILES
ODEH & EVANS ALGORITHM USED
SOURCE: KENNEDY & GENTLE, "STATISTICAL COMPUTING", 1980
SUBROUTINE NINV(S,SP)
IMPLICIT DOUBLE PRECISION (A-H)
IMPLICIT INTEGER*4 (I-N)
IMPLICIT REAL*4 (0-W)
D=DBLE(S)
DLIM=0.10-18
D0=-0.322232431088
D1=-1.0
D2=-0.342242088547
D3=-0.0204231210245
D4=-0.4536422101 480-4
C0=0.0993484626060
C1=0.588581570495
C2=0.531103462366
03=0.103537752850
C4=0.385607006340-2
258
DP=0.0D+0
IF(D.GT.0.5) THEN
D=1-D
IC0R=1
ELSE
ICOR=0
ENDIF
IF(D.LT.DLIM) GOTO 200
IF(D.EQ.0.5) GOTO 200
B=DSQRT(DL0G(1.0/(D*D)))
DP=B+((((B*Dil + D3)*B+D2)*B+D1 )*B+DO)/(( ( (B*C4+C3)*B+C2) *B+C1 )*B + CO)
IFdCOR.EQ.I ) THEN
D=l-D
ELSE
DP=-DP
CONTINUE
ENDIF
200
CONTINUE
SP=SNGL(DP)
RETURN
END
//G0.FT15F001 DD SYSOUT=(P,,P001)
//GO.SYSIN DD *
129
104
124
145
83
134
151
123
107
119
113
97
/*
Related documents