Download class_Aug2608_oneSampleANOVA.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ST 524
One Sample Analysis
NCSU - Fall 2008
Example
An agricultural researcher plants twenty-five plots with a new variety of corn. Observed
yields at harvest are presented next.
plot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
yield
143.219
150.563
143.966
169.428
157.639
159.884
156.655
136.115
137.195
142.593
155.402
144.999
140.924
157.058
145.634
146.801
152.344
144.392
154.599
132.021
160.792
161.992
138.577
157.531
141.047
_TYPE_
_VALUE_
N
25.000
MIN
132.021
Q1
142.593
MEAN
149.255
MEDIAN
146.801
Q3
157.058
MAX
169.428
VAR
90.953411
Statistical Linear Model
y j = μ + e j , j = 1, . . ., 25 where e j ∼ N ( 0, σ 2 ) , and y j ∼ N ( μ , σ 2 )
Matrix representation of data according to linear model ⎡ e1 ⎤
⎡ 143.219 ⎤ ⎡ μ ⎤ ⎡ e1 ⎤ ⎡ 1⎤
⎢ e ⎥
⎢ 150.563⎥ ⎢ μ ⎥ ⎢ e ⎥ ⎢ 1⎥
⎢ 2⎥
⎢
⎥ ⎢ ⎥ ⎢ 2⎥ ⎢ ⎥
⎥ ⎢ ⎥
⎢
⎥
⎢
⎥ ⎢ ⎥ ⎢
⎥ = ⎢ ⎥[ μ]+ ⎢
⎥ ⎢
⎥ =⎢ ⎥+⎢
⎢
⎥
⎢
⎥
⎢
⎥ ⎢ ⎥
⎢ ⎥
⎢ e24 ⎥
⎢ 157.531⎥ ⎢ μ ⎥ ⎢ e24 ⎥ ⎢ 1⎥
⎥ ⎢ ⎥
⎢
⎥
⎢
⎥ ⎢ ⎥ ⎢
⎣ 141.047 ⎦ ⎣ μ ⎦ ⎣⎢ e25 ⎦⎥ ⎣ 1⎦
⎣⎢ e25 ⎦⎥
Y = Xβ + e
Log lIkelihood of the sample
Tuesday August 26, 2008
1
ST 524
One Sample Analysis
NCSU - Fall 2008
log L ( y, μ , σ 2 , ) = ∑ log f j ( y j , μ , σ 2 )
n
j =1
⎡
⎛ ( y − μ )2 ⎞ ⎤
1
j
⎟⎥
= ∑ log ⎢
exp ⎜
⎢ 2πσ
⎜ 2σ 2 ⎟ ⎥
j =1
⎝
⎠⎦
⎣
n
-2Log Likelihood of a random sample of n = 25 observations from a normal distribution
with mean μ and variance σ 2 is given by
1 n
2
−2 log L = n ln ( 2π ) + n ln (σ 2 ) + 2 ∑ ( yi − μ )
σ i =1
2
μ and σ , parameters of the normal distribution.
Note that -2logL is minimized when
μˆ = y = 149.255
σ =s
2
2
ML
(
= 87.31527= ∑ y j − y
)
2
n
2
, but sML
is a
biased estimate that tends to underestimate the variance.
Restricted Maximum Likelihood (REML)
Since parameters are unknown, Restricted Maximum Likelihood Estimation (REML) is
based on the residuals from linear model.
The -2log likelihood for residuals is given by
( (
−2 log L σ 2 , yi − y
))
= ( n − 1) ln (σ 2 ) +
1
σ
2
∑ ( y − y ) + ( n ) ln ( 2π )
n
i =1
2
i
And the solution is the REML estimate σˆ 2 = ( n − 1)
−1
∑( y − y)
n
i
i =1
2
which is the unbiased
estimate of σ 2 , σˆ 2 = s 2 = 90.953411 .
REML estimate (of σ 2 ) corresponds to the sample variance, unbiased estimate of σ 2 . REML estimation does not provide estimates of the fixed effect parameters ( μ ). It yields only estimates of the covariance parameters. (Littell et al., 2006) Unbiased estimates of these unknown parameters are
μˆ = y = 149.255
σˆ 2 = s 2 = 90.953411
,
From data, sample estimate of model,
predicted value = yˆ j = y = 149.255 , residual = y j − y = ( y j − 149.255 )
(
)
And -2 Residual log likelihood is estimated by
Tuesday August 26, 2008
2
ST 524
One Sample Analysis
NCSU - Fall 2008
( (
−2Res log L σ 2 , yi − y
))
= ( n − 1) ln (σ 2 ) +
1
σ
2
∑ ( y − y ) + ( n ) ln ( 2π )
n
2
i
i =1
= ( 25 − 1) ln ( 90.9534 ) +
1
( 2182.8819 ) + ( 25) ln ( 2π )
90.9534
= 179.58
Analysis of Variance
2
n ⎡⎣ y ⎤⎦ = 25*149.2548
The GLM Procedure
Dependent Variable: yield
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
556924.8831
556924.8831
6123.19
<.0001
Error
24
2182.8819
90.9534
Uncorrected Total
25
559107.7649
∑( y
n
∑y
2
j
= 143.2192 +
+ 141.047 2
i =1
i
−y
)
2
σˆ 2 = ( n − 1)
−1
∑( y − y)
n
i =1
2
i
R-Square
Coeff Var
Root MSE
yield Mean
0.000000
6.389711
9.536950
149.2548
Results from PROC MIXED
Type 3 Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Error DF
F Value
Pr > F
Residual
24
2182.881866
90.953411
.
.
.
Covariance Parameter Estimates
Cov Parm
Residual
Tuesday August 26, 2008
Estimate
90.9534
3
ST 524
One Sample Analysis
NCSU - Fall 2008
Fit Statistics
-2 Res Log Likelihood
179.6
AIC (smaller is better)
181.6
AICC (smaller is better)
181.8
BIC (smaller is better)
182.8
Fit Statistics
They are useful when modeling two different covariance structures in a mixed model
with common fixed effects. Select model with lower AIC value.
If fixed effects are not the same in both models, use Maximum Likelihood estimation
instead of Restricted Maximum Likelihood Estimation (REML)
AIC
−2Res log L + 2 p ,
p is the number of parameters in the covariance structure model
BIC
−2Res log L + p × log ( n − 1)
AICC
⎡ ( n − 1) ⎤
−2Res log L + 2 × p × ⎢
⎥
⎣ ( n − 1) − p − 1 ⎦
AIC = −2Res log L + 2 p
= 179.6 × 2(1) = 181.6
Akaike’s Information Criteria
Schwarz Bayesian Criteria
BIC = −2Res log L + p × log ( n − 1)
= 179.6 + 1× log ( 25 − 1) = 182.8
Akaike’s Information Criteria Corrected for small samples
⎡ ( n − 1) ⎤
AICC = −2Res log L + 2 × p × ⎢
⎥
⎣ ( n − 1) − p − 1 ⎦
⎛ 24 ⎞
= 179.6 + 2 × 1× ⎜
⎟ = 181.8
⎝ 24 − 1 − 1 ⎠
Residual analysis
Residuals, y j − yˆ j , are used to check validity of model assumptions, identify outliers and
potentially influential observations.
Raw residuals are the difference between the observation and the estimated (marginal)
mean.
(Internal) Studentized Residual: residual divided by the estimated value of its standard
e
deviation, j It is called internal because the residual contributes to the estimation of
υˆ j
the its standard error,
Tuesday August 26, 2008
4
ST 524
One Sample Analysis
NCSU - Fall 2008
Studentized Residuals
Tuesday August 26, 2008
5
Related documents