Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
I. Paired Sample Design: 30 pts Download the data set bph-samp.sav and use SPSS to complete the following calculations: 1) Run a paired t test to compare if there is a mean change in QoL at the baseline and at 3 months. Ho: µD = 0 Ha: µD ≠ 0 Paired Samples Statistics Pair 1 Mean 3.8000 2.1000 qol_base qol_3mo N Std. Deviation 1.13529 1.28668 10 10 Std. Error Mean .35901 .40689 Pa ired Sa mples Correlations N Pair 1 qol_base & qol_3mo 10 Correlation .243 Sig. .498 Pa ired Sa mples Test Paired Differences Pair 1 qol_base - qol_3mo Mean Std. Deviation 1.70000 1.49443 Std. Error Mean .47258 95% Confidence Interval of the Difference Lower Upper .63095 2.76905 t 3.597 df 9 Since the p-value (.006) is smaller than 0.05, we reject the null hypothesis. We have sufficient evidence to prove that there is significant difference between the means at the baseline and at 3 months. 2) Run a one-sample t test to test the same hypothesis as in (1) but on the variable DELTA. Ho: µDelta = 0 Ha: µDelta ≠ 0 Sig. (2-tailed) .006 One-Sample Statistics N delt_qol Mean 1.7000 10 Std. Deviation 1.49443 Std. Error Mean .47258 One-Sam ple Test Test Value = 0 delt_qol t 3.597 df Sig. (2-tailed) .006 9 95% Confidenc e Int erval of t he Difference Lower Upper .6309 2.7691 Mean Difference 1.70000 Since the p-value (.006) is smaller than 0.05, we reject the null hypothesis. We have sufficient evidence to prove that the mean delta is different than zero. The two methods gave the same result! 3) Check if the QoL at the baseline and at 3 months follow Normal distributions. Ho: QoL at the baseline = normal Ha: QoL at the baseline <.> normal Histogram Normal Q-Q Plot of qol_base 4 1.5 1.0 Expected Normal Frequency 3 2 0.5 0.0 -0.5 1 -1.0 Mean =3.8 Std. -1.5 Dev. =1.135 N =10 0 2 3 4 qol_base 5 6 2 3 4 Observed Value 5 6 Tests of Normality a qol_base Kolmogorov-Smirnov Statistic df Sig. .230 10 .143 Statistic .933 Shapiro-Wilk df 10 Sig. .479 a. Lilliefors Significance Correction Since the p-value is greater than 0.05 we fail to reject the null hypothesis. So QoL at the baseline can be assumed normal. Ho: QoL at 3 months = normal Ha: QoL at 3 months <.> normal Histogram Normal Q-Q Plot of qol_3mo 4 1.5 1.0 Expected Normal Frequency 3 2 0.5 0.0 -0.5 1 -1.0 Mean =2.1 Std. Dev. =1.287 N =10 1 0 1 2 3 4 5 2 3 4 Observed Value qol_3mo Tests of Normality a qol_3mo Kolmogorov-Smirnov Statistic df Sig. .231 10 .139 Statistic .824 Shapiro-Wilk df 10 Sig. .028 a. Lilliefors Significance Correction Since the p-value is greater than 0.05 we fail to reject the null hypothesis. So QoL at the baseline can be assumed normal according to Kolmogorov-Simirnov test. Note that the data doesn't pass the normality test in Shapiro-Wilk. The sample size is small to decide the normality at this point. We may want to do a nonparametric test. 5 4) If you have a concern about the small sample size and perhaps non-normal data, choose an appropriate nonparametric test to compare the median QoL score at the baseline and at 3 months. (Hint: You need to research nonparametric tests.) Test Statisticsb Z As ymp. Sig. (2-tailed) qol_3mo qol_base -2.448a .014 a. Based on positive ranks . b. Wilcoxon Signed Ranks Test Test Statisticsb Exact Sig. (2-tailed) qol_3mo qol_base .039a a. Binomial distribution us ed. b. Sign Test Both Wilcoxon Signed Ranks Test and Sign Test prove that the median scores at the baseline and at 3 months are different. Sign test may be more suitable for this case because the histograms don't seem to be symmetric. II. Independent Sample Design: 30pts Download the data set lactation.sav and use SPSS to complete the following calculations. 1) Produce a side-by-side boxplot for the percentage of bone loss in the breast feeding group vs. the control group. 2.50 percentc 0.00 -2.50 -5.00 -7.50 breast-f eeding control group 2) Produce a side-by-side histogram for the percentage of bone loss in the breast feeding group vs. the control group. (Hint: You need to search for SPSS functions not covered in the lecture to produce the histograms side-by-side.) Cholesterol control breast-feeding 25% Percent 20% 15% 10% 5% -7.5 -5.0 -2.5 0.0 Percent bone loss 2.5 -7.5 -5.0 -2.5 0.0 Percent bone loss 2.5 3) Test if the mean percentage of bone loss in the two groups are the same using the right version of the t test based on the SPSS output. H0: µ1 = µ2 Ha: µ1 ≠ µ2 Group Statistics Percent bone loss Ref = Laskey control breast-feeding N 22 47 Mean .309 -3.587 Std. Deviation 1.2983 2.5056 Std. Error Mean .2768 .3655 Independent Samples Test Levene's Test for Equality of Variances F Percent bone loss Equal variances assumed Equal variances not assumed 11.255 Sig. .001 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 6.857 67 .000 3.8963 .5682 2.7621 5.0305 8.499 66.197 .000 3.8963 .4585 2.9810 4.8116 Since the p-value is very small we reject the null hypothesis of equal means. We have sufficient evidence to prove that the control and the breast-feeding groups have different means. 4) Calculate a 95% confidence interval for the mean difference in the percentage of bone loss between the two groups. 95% confidence interval for the mean difference is calculated in the previous table as (2.7621, 5.0305) III. Cross-Tabulation: 30pts Download the data set bd1.sav and use SPSS to complete the following calculations. 1) Test the association between esophageal cancer and alcohol consumption (using the original alcohol consumption variable). Write down the hypotheses, the test used, the pvalue and the interpretation. H0: There is no association between esophageal cancer and alcohol consumption Ha: There is association Chi-Square Te sts Pearson Chi-Square Lik elihood Ratio Linear-by-Linear As soc iation N of Valid Cases Value 158.955a 146.498 152.974 3 3 As ymp. Sig. (2-sided) .000 .000 1 .000 df 975 a. 0 c ells (.0% ) have expected count less than 5. The minimum expected count is 13. 74. Since the p-value (.000) is very small (<0.05) we reject the null hypothesis. We have sufficient evidence to conclude that there is association between esophageal cancer and alcohol consumption. 2) Is there a need to use Fisher’s Exact test? Why? Esophageal cancer * Alcohol consumption Crosstabulation Count Es ophageal cancer Total case control 0 - 39 gm/day 29 386 415 Alcohol consumption 40 - 79 80 - 119 gm/day gm/day 75 51 280 87 355 138 120+ gm/day 45 22 67 Total 200 775 975 Fisher's Exact test is not necessary because the cell counts are not small. 3) Test the association between esophageal cancer and alcohol consumption (using the dichotomized alcohol consumption variable). Write down the hypotheses, the test used, the p-value and the interpretation. H0: There is no association between esophageal cancer and dichotomized alcohol consumption Ha: There is association Esophage al ca nce r * Alcohol dichotom ize d Crosstabulati on Count Es ophageal cancer case control Total Alc ohol dic hotomized 80+ gms/day 0-79 gms/day 96 104 109 666 205 770 Total 200 775 975 Chi-Square Tests Pearson Chi-Square Continuity Correction a Likelihood Ratio Fis her's Exact Test Linear-by-Linear As sociation N of Valid Cases Value 110.255b 108.221 96.433 110.142 df 1 1 1 As ymp. Sig. (2-sided) .000 .000 .000 1 Exact Sig. (2-sided) Exact Sig. (1-sided) .000 .000 .000 975 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 42. 05. Since the p-value (.000) is very small (<0.05) we reject the null hypothesis. We have sufficient evidence to conclude that there is association between esophageal cancer and dichotomized alcohol consumption. 4) Hand calculate a 95% CI for the odds ratio estimate in (3). a = 96, b = 104, c = 109, d = 666 OR = (a*d)/(b*c) = (96*666)/(104*109) = 5.640 SE = sqrt(1/a + 1/b + 1/c + 1/d) = sqrt(1/96 + 1/104 + 1/109 + 1/666) = 0.175 Upper = exp(ln(odds ratio)+1.9596*SE) = 7.95 Lower = exp(ln(odds ratio)-1.9596*SE) = 4.00 CI: (4.00, 7.95) IV. ANOVA: 30pts Complete Exercise 13.9 items (a) through (d) found on pages 292 and 293 in the textbook. Apply the nonparametric ANOVA analysis to test if the groups are different in the decrease of body temperatures. A trial evaluated the fever reducing effects of three substances. Study subjects were adults seen in an emergency room with diagnosis of flu with body temperatures between 100.0 degrees F to 100.9 degrees F. The three treatments (aspirin, ibuprofen, and acetaminophen) were assigned randomly to study subjects. Body temperatures were reevaluated 2 hours after administration of treatments. Table13.14 lists the data. a) Explore these data with side by side boxplots. Discuss your findings. b) Calculate the mean and standard deviation of each group c) Complete an ANOVA for the problem. What do you conclude? d) Conduct post hoc comparisons with the LSD method. Which groups differ significantly at alpha (using symbol) = 0.05? Table 13.14 Data for Exercise 13.19. Decreases in body Temperature (degrees Fahrenheit) Group 1(aspirin): 0.95 1.48 1.33 1.28 Group 2(ibuprofen) 0.39 0.44 1.31 2.48 1.39 Group 3(acetamin) 0.19 1.02 0.07 0.01 0.62 -0.39 a) 2.50 2.00 1.50 1.00 0.50 0.00 -0.50 aspirin ibuprofen acetamin Group Looking at the box-plot we can see that aspirin is more effective than acetamin. Ibuprofen has the best result on some patients but it's not as consistent as the other two. b) c Descriptives Temp_Decreas e N as pirin ibuprofen acetamin Total Mean 1.2600 1.2020 .2533 .8380 4 5 6 15 Std. Deviation .22346 .85444 .49657 .74301 Std. Error .11173 .38212 .20273 .19184 95% Confidence Interval for Mean Lower Bound Upper Bound .9044 1.6156 .1411 2.2629 -.2678 .7745 .4265 1.2495 c) Ranks Temp_Decreas e Group as pirin ibuprofen acetamin Total N 4 5 6 15 Mean Rank 11.00 10.00 4.33 Te st Statisticsa,b Chi-Square df As ymp. Sig. Temp_ Decrease 6.833 2 .033 a. Kruskal Wallis Test b. Grouping Variable: Group According to Kruskal Wallis Test, p-value is .33 and it is less than 0.05. So we can conclude that on the average, aspirin, ibuprofen, and acetamin have different effects of reducing the temperature. d) Minimum .95 .39 -.39 -.39 Maximum 1.48 2.48 1.02 2.48 Multiple Comparisons Dependent Variable: Tem p_Decreas e LSD (I) Group as pirin ibuprofen acetam in (J) Group ibuprofen acetam in as pirin acetam in as pirin ibuprofen Mean Difference (I-J) .05800 1.00667* -.05800 .94867* -1.00667* -.94867* Std. Error .40170 .38654 .40170 .36260 .38654 .36260 Sig. .888 .023 .888 .023 .023 .023 95% Confidence Interval Lower Bound Upper Bound -.8172 .9332 .1645 1.8489 -.9332 .8172 .1586 1.7387 -1.8489 -.1645 -1.7387 -.1586 *. The mean difference is s ignificant at the .05 level. Post Hoc analysis shows that acetamin has a significantly lower value than aspirin and ibuprofen. V. Multiple Linear Regression: 30pts Download the data set hdur.sav and use SPSS to complete the following calculations. 1) Fit a multiple linear regression model for predicting the hospital duration using age, sex, body temperature, white blood cell counts, antibiotic use, blood culture and service (medication vs. surgery). Using a cut-off value of 0.10 to assess the significance of the predictors. Identify all significant predictors. Model Summ ary Model 1 R R Square .643a .414 Adjust ed R Square .172 St d. Error of the Es timate 5.201 a. Predic tors: (Constant), serv, Antibiotic use, Blood culture, Body t emp, sex, age, W hit e blood cell count tak en ANOVAb Model 1 Regres sion Residual Total Sum of Squares 324.194 459.806 784.000 df 7 17 24 Mean Square 46.313 27.047 F 1.712 a. Predictors: (Constant), serv, Antibiotic use, Blood culture, Body temp, s ex, age, White blood cell count taken b. Dependent Variable: Durration of hospitalization Sig. .172a Coefficientsa Model 1 (Constant) age sex Body temp White blood cell count taken Antibiotic us e Blood culture serv Unstandardized Coefficients B Std. Error -303.218 179.431 .093 .067 -1.195 2.502 3.307 1.827 Standardized Coefficients Beta .328 -.106 .394 t -1.690 1.390 -.478 1.810 Sig. .109 .183 .639 .088 -.168 .463 -.094 -.363 .721 -3.449 -1.645 -3.122 2.675 2.829 2.755 -.277 -.125 -.268 -1.289 -.581 -1.133 .215 .569 .273 a. Dependent Variable: Durration of hospitalization Looking at the model, it seems like the only significant predictor is the Body temp (.088<.10). 2) Assess the model fit for the multiple linear regression model using appropriate statistics and graphics. Adjusted R square value of .172 is very low. It states that the 17.2% of the variation in the duration of hospitality is explained by the model. So it's not a good fit. ANOVA results in a p-value of .172 and its greater than .10. Which also proves that the fit is not that good. 3) Assess the assumptions of linear regression in this data using appropriate statistics and graphics. Normal P-P Plot of Regression Standardized Residual Dependent Variable: Durration of hospitalization Expected Cum Prob 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Points are closely located around the line, so the "Normality" assumption seems to be valid. Scatterplot Dependent Variable: Durration of hospitalization Regression Standardized Predicted Value 4 3 2 1 0 -1 -2 -2 -1 0 1 2 Regression Standardized Residual Error seems to be increasing, so the assumption of homoscedasticity may be violated. Independence of the variables is also an important assumption. Actually all body properties are more or less dependent to each other. So this assumption may highly be violated. Also age and the body parameters may not actually be linearly related. 4) Identify outliers and influential observations using appropriate statistics that can be generated in SPSS. (Hint: You need to do your own research because this is not covered in the textbook or lectures). Scatterplot Dependent Variable: Durration of hospitalization Regression Studentized Residual 3 2 1 0 -1 -2 -2 -1 0 1 2 Regression Standardized Predicted Value Looking at the residuals there are several outliers in the data. 3 4