Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ST 524 One Sample Analysis NCSU - Fall 2008 Example An agricultural researcher plants twenty-five plots with a new variety of corn. Observed yields at harvest are presented next. plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 yield 143.219 150.563 143.966 169.428 157.639 159.884 156.655 136.115 137.195 142.593 155.402 144.999 140.924 157.058 145.634 146.801 152.344 144.392 154.599 132.021 160.792 161.992 138.577 157.531 141.047 _TYPE_ _VALUE_ N 25.000 MIN 132.021 Q1 142.593 MEAN 149.255 MEDIAN 146.801 Q3 157.058 MAX 169.428 VAR 90.953411 Statistical Linear Model y j = μ + e j , j = 1, . . ., 25 where e j ∼ N ( 0, σ 2 ) , and y j ∼ N ( μ , σ 2 ) Matrix representation of data according to linear model ⎡ e1 ⎤ ⎡ 143.219 ⎤ ⎡ μ ⎤ ⎡ e1 ⎤ ⎡ 1⎤ ⎢ e ⎥ ⎢ 150.563⎥ ⎢ μ ⎥ ⎢ e ⎥ ⎢ 1⎥ ⎢ 2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥[ μ]+ ⎢ ⎥ ⎢ ⎥ =⎢ ⎥+⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e24 ⎥ ⎢ 157.531⎥ ⎢ μ ⎥ ⎢ e24 ⎥ ⎢ 1⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ 141.047 ⎦ ⎣ μ ⎦ ⎣⎢ e25 ⎦⎥ ⎣ 1⎦ ⎣⎢ e25 ⎦⎥ Y = Xβ + e Log lIkelihood of the sample Tuesday August 26, 2008 1 ST 524 One Sample Analysis NCSU - Fall 2008 log L ( y, μ , σ 2 , ) = ∑ log f j ( y j , μ , σ 2 ) n j =1 ⎡ ⎛ ( y − μ )2 ⎞ ⎤ 1 j ⎟⎥ = ∑ log ⎢ exp ⎜ ⎢ 2πσ ⎜ 2σ 2 ⎟ ⎥ j =1 ⎝ ⎠⎦ ⎣ n -2Log Likelihood of a random sample of n = 25 observations from a normal distribution with mean μ and variance σ 2 is given by 1 n 2 −2 log L = n ln ( 2π ) + n ln (σ 2 ) + 2 ∑ ( yi − μ ) σ i =1 2 μ and σ , parameters of the normal distribution. Note that -2logL is minimized when μˆ = y = 149.255 σ =s 2 2 ML ( = 87.31527= ∑ y j − y ) 2 n 2 , but sML is a biased estimate that tends to underestimate the variance. Restricted Maximum Likelihood (REML) Since parameters are unknown, Restricted Maximum Likelihood Estimation (REML) is based on the residuals from linear model. The -2log likelihood for residuals is given by ( ( −2 log L σ 2 , yi − y )) = ( n − 1) ln (σ 2 ) + 1 σ 2 ∑ ( y − y ) + ( n ) ln ( 2π ) n i =1 2 i And the solution is the REML estimate σˆ 2 = ( n − 1) −1 ∑( y − y) n i i =1 2 which is the unbiased estimate of σ 2 , σˆ 2 = s 2 = 90.953411 . REML estimate (of σ 2 ) corresponds to the sample variance, unbiased estimate of σ 2 . REML estimation does not provide estimates of the fixed effect parameters ( μ ). It yields only estimates of the covariance parameters. (Littell et al., 2006) Unbiased estimates of these unknown parameters are μˆ = y = 149.255 σˆ 2 = s 2 = 90.953411 , From data, sample estimate of model, predicted value = yˆ j = y = 149.255 , residual = y j − y = ( y j − 149.255 ) ( ) And -2 Residual log likelihood is estimated by Tuesday August 26, 2008 2 ST 524 One Sample Analysis NCSU - Fall 2008 ( ( −2Res log L σ 2 , yi − y )) = ( n − 1) ln (σ 2 ) + 1 σ 2 ∑ ( y − y ) + ( n ) ln ( 2π ) n 2 i i =1 = ( 25 − 1) ln ( 90.9534 ) + 1 ( 2182.8819 ) + ( 25) ln ( 2π ) 90.9534 = 179.58 Analysis of Variance 2 n ⎡⎣ y ⎤⎦ = 25*149.2548 The GLM Procedure Dependent Variable: yield Source DF Sum of Squares Mean Square F Value Pr > F Model 1 556924.8831 556924.8831 6123.19 <.0001 Error 24 2182.8819 90.9534 Uncorrected Total 25 559107.7649 ∑( y n ∑y 2 j = 143.2192 + + 141.047 2 i =1 i −y ) 2 σˆ 2 = ( n − 1) −1 ∑( y − y) n i =1 2 i R-Square Coeff Var Root MSE yield Mean 0.000000 6.389711 9.536950 149.2548 Results from PROC MIXED Type 3 Analysis of Variance Source DF Sum of Squares Mean Square Error DF F Value Pr > F Residual 24 2182.881866 90.953411 . . . Covariance Parameter Estimates Cov Parm Residual Tuesday August 26, 2008 Estimate 90.9534 3 ST 524 One Sample Analysis NCSU - Fall 2008 Fit Statistics -2 Res Log Likelihood 179.6 AIC (smaller is better) 181.6 AICC (smaller is better) 181.8 BIC (smaller is better) 182.8 Fit Statistics They are useful when modeling two different covariance structures in a mixed model with common fixed effects. Select model with lower AIC value. If fixed effects are not the same in both models, use Maximum Likelihood estimation instead of Restricted Maximum Likelihood Estimation (REML) AIC −2Res log L + 2 p , p is the number of parameters in the covariance structure model BIC −2Res log L + p × log ( n − 1) AICC ⎡ ( n − 1) ⎤ −2Res log L + 2 × p × ⎢ ⎥ ⎣ ( n − 1) − p − 1 ⎦ AIC = −2Res log L + 2 p = 179.6 × 2(1) = 181.6 Akaike’s Information Criteria Schwarz Bayesian Criteria BIC = −2Res log L + p × log ( n − 1) = 179.6 + 1× log ( 25 − 1) = 182.8 Akaike’s Information Criteria Corrected for small samples ⎡ ( n − 1) ⎤ AICC = −2Res log L + 2 × p × ⎢ ⎥ ⎣ ( n − 1) − p − 1 ⎦ ⎛ 24 ⎞ = 179.6 + 2 × 1× ⎜ ⎟ = 181.8 ⎝ 24 − 1 − 1 ⎠ Residual analysis Residuals, y j − yˆ j , are used to check validity of model assumptions, identify outliers and potentially influential observations. Raw residuals are the difference between the observation and the estimated (marginal) mean. (Internal) Studentized Residual: residual divided by the estimated value of its standard e deviation, j It is called internal because the residual contributes to the estimation of υˆ j the its standard error, Tuesday August 26, 2008 4 ST 524 One Sample Analysis NCSU - Fall 2008 Studentized Residuals Tuesday August 26, 2008 5