Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 12: Analysis of Categorical Data 1 Chapter 12 Analysis of Categorical Data LEARNING OBJECTIVES This chapter presents several nonparametric statistics that can be used to analyze data enabling you to: 1. Understand the chi-square goodness-of-fit test and how to use it. 2. Analyze data using the chi-square test of independence. CHAPTER TEACHING STRATEGY Chapter 12 is a chapter containing the two most prevalent chi-square tests: chisquare goodness-of-fit and chi-square test of independence. These two techniques are important because they give the statistician a tool that is particularly useful for analyzing nominal data (even though independent variable categories can sometimes have ordinal or higher categories). It should be emphasized that there are many instances in business research where the resulting data gathered are merely categorical identification. For example, in segmenting the market place (consumers or industrial users), information is gathered regarding gender, income level, geographical location, political affiliation, religious preference, ethnicity, occupation, size of company, type of industry, etc. On these variables, the measurement is often a tallying of the frequency of occurrence of individuals, items, or companies in each category. The subject of the research is given no "score" or "measurement" other than a 0/1 for being a member or not of a given category. These two chi-square tests are perfectly tailored to analyze such data. The chi-square goodness-of-fit test examines the categories of one variable to determine if the distribution of observed occurrences matches some expected or theoretical distribution of occurrences. It can be used to determine if some standard or previously known distribution of proportions is the same as some observed distribution of Chapter 12: Analysis of Categorical Data 2 proportions. It can also be used to validate the theoretical distribution of occurrences of phenomena such as random arrivals which are often assumed to be Poisson distributed. You will note that the degrees of freedom which are k - 1 for a given set of expected values or for the uniform distribution change to k - 2 for an expected Poisson distribution and to k - 3 for an expected normal distribution. To conduct a chi-square goodness-of-fit test to analyze an expected Poisson distribution, the value of lambda must be estimated from the observed data. This causes the loss of an additional degree of freedom. With the normal distribution, both the mean and standard deviation of the expected distribution are estimated from the observed values causing the loss of two additional degrees of freedom from the k - 1 value. The chi-square test of independence is used to compare the observed frequencies along the categories of two independent variables to expected values to determine if the two variables are independent or not. Of course, if the variables are not independent, they are dependent or related. This allows business researchers to reach some conclusions about such questions as is smoking independent of gender or is type of housing preferred independent of geographic region. The chi-square test of independence is often used as a tool for preliminary analysis of data gathered in exploratory research where the researcher has little idea of what variables seem to be related to what variables, and the data are nominal. This test is particularly useful with demographic type data. A word of warning is appropriate here. When an expected frequency is small, the observed chi-square value can be inordinately large thus yielding an increased possibility of committing a Type I error. The research on this problem has yielded varying results with some authors indicating that expected values as low as two or three are acceptable and other researchers demanding that expected values be ten or more. In this text, we have settled on the fairly widespread accepted criterion of five or more. CHAPTER OUTLINE 16.1 Chi-Square Goodness-of-Fit Test Testing a Population Proportion Using the Chi-square Goodness-of-Fit Test as an Alternative Technique to the z Test 16.2 Contingency Analysis: Chi-Square Test of Independence KEY TERMS Categorical Data Chi-Square Distribution Chi-Square Goodness-of-Fit Test Chi-Square Test of Independence Contingency Analysis Contingency Table Chapter 12: Analysis of Categorical Data 3 SOLUTIONS TO CHAPTER 16 12.1 ( f0 fe )2 f0 68 42 33 22 10 8 f0 53 37 32 28 18 15 fe 3.309 0.595 0.030 1.636 6.400 6.125 Ho: The observed distribution is the same as the expected distribution. Ha: The observed distribution is not the same as the expected distribution. Observed 2 ( f0 fe )2 = 18.095 fe df = k - 1 = 6 - 1 = 5, = .05 2.05,5 = 11.07 Since the observed 2 = 18.095 > 2.05,5 = 11.07, the decision is to reject the null hypothesis. The observed frequencies are not distributed the same as the expected frequencies. 12.2 f0 19 17 14 18 19 21 18 18 fo = 144 fe 18 18 18 18 18 18 18 18 fe = 144 ( f0 fe )2 f0 0.056 0.056 0.889 0.000 0.056 0.500 0.000 0.000 1.557 Chapter 12: Analysis of Categorical Data 4 Ho: The observed frequencies are uniformly distributed. Ha: The observed frequencies are not uniformly distributed. x f k 0 144 = 18 8 In this uniform distribution, each fe = 18 df = k – 1 = 8 – 1 = 7, = .01 2.01,7 = 18.4753 Observed 2 ( f0 fe )2 = 1.557 fe Since the observed 2 = 1.557 < 2.01,7 = 18.4753, the decision is to fail to reject the null hypothesis There is no reason to conclude that the frequencies are not uniformly distributed. 12.3 Number f0 0 1 2 3 28 17 11 5 (Number)(f0) 0 17 22 15 54 Ho: The frequency distribution is Poisson. Ha: The frequency distribution is not Poisson. = 54 =0.9 61 Number 0 1 2 >3 Expected Probability .4066 .3659 .1647 .0628 Expected Frequency 24.803 22.312 10.047 3.831 Since fe for > 3 is less than 5, collapse categories 2 and >3: Chapter 12: Analysis of Categorical Data Number fo fe 0 1 >2 28 17 16 61 24.803 22.312 13.878 60.993 df = k - 2 = 3 - 2 = 1, 5 ( f0 fe )2 f0 0.412 1.265 0.324 2.001 = .05 2.05,1 = 3.84146 Calculated 2 ( f0 fe )2 = 2.001 fe Since the observed 2 = 2.001 < 2.05,1 = 3.84146, the decision is to fail to reject the null hypothesis. There is insufficient evidence to reject the distribution as Poisson distributed. The conclusion is that the distribution is Poisson distributed. 12.4 Category f(observed) 10-20 6 20-30 14 30-40 29 40-50 38 50-60 25 60-70 10 70-80 7 n = f = 129 x s= fm 5,715 f 129 fM 2 Midpt. fm 15 90 25 350 35 1,015 45 1,710 55 1,375 65 650 75 525 fm = 5,715 = 44.3 ( fM ) 2 n 1 fm2 1,350 8,750 35,525 76,950 75,625 42,250 39,375 2 fm = 279,825 n (5,715) 2 129 = 14.43 128 279,825 Ho: The observed frequencies are normally distributed. Ha: The observed frequencies are not normally distributed. Chapter 12: Analysis of Categorical Data For Category 10 - 20 Prob 10 44.3 = -2.38 14.43 20 44.3 z = = -1.68 14.43 z = .4913 - .4535 Expected prob.: For Category 20-30 .0378 Prob for x = 20, z = -1.68 30 44.3 z= = -0.99 14.43 .4535 -.3389 Expected prob: For Category 30 - 40 .1146 Prob for x = 30, z = -0.99 40 44.3 z = = -0.30 14.43 .3389 -.1179 Expected prob: For Category 40 - 50 .2210 Prob for x = 40, z = -0.30 50 44.3 z = = 0.40 14.43 .1179 +.1554 Expected prob: .2733 For Category 50 - 60 Prob 60 44.3 = 1.09 14.43 for x = 50, z = 0.40 .3621 z = 6 -.1554 Expected prob: .2067 Chapter 12: Analysis of Categorical Data For Category 60 - 70 Prob 70 44.3 = 1.78 14.43 for x = 60, z = 1.09 .4625 z = 7 -.3621 Expected prob: .1004 For Category 70 - 80 Prob 80 44.3 = 2.47 14.43 for x = 70, z = 1.78 .4932 z = -.4625 Expected prob: .0307 For < 10: Probability between 10 and the mean, 44.3 = (.0378 + .1145 + .2210 + .1179) = .4913. Probability < 10 = .5000 - .4913 = .0087 For > 80: Probability between 80 and the mean, 44.3 = (.0307 + .1004 + .2067 + .1554) = .4932. Probability > 80 = .5000 - .4932 = .0068 Category < 10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 > 80 Prob .0087 .0378 .1146 .2210 .2733 .2067 .1004 .0307 .0068 expected frequency .0087(129) = 0.99 .0378(129) = 4.88 14.78 28.51 35.26 26.66 12.95 3.96 0.88 Due to the small sizes of expected frequencies, category < 10 is folded into 10-20 and >80 into 70-80. Chapter 12: Analysis of Categorical Data 8 ( f0 fe )2 f0 .003 .041 .008 .213 .103 .672 .964 2.004 Category fo fe 10-20 20-30 30-40 40-50 50-60 60-70 70-80 6 14 29 38 25 10 7 5.87 14.78 28.51 32.26 26.66 12.95 4.84 Calculated 2 ( f0 fe )2 = 2.004 fe df = k - 3 = 7 - 3 = 4, = .05 2.05,1 = 9.48773 Since the observed 2 = 2.004 > 2.05,4 = 9.48773, the decision is to fail to reject the null hypothesis. There is not enough evidence to declare that the observed frequencies are not normally distributed. 12.5 Definition fo Exp.Prop. fe Happiness Sales/Profit Helping Others Achievement/ Challenge 42 95 27 .39 .12 .18 227(.39)= 88.53 227(.12)= 27.24 40.86 63 227 .31 70.34 ( f0 fe )2 f0 24.46 168.55 4.70 0.77 198.48 Ho: The observed frequencies are distributed the same as the expected frequencies. Ha: The observed frequencies are not distributed the same as the expected frequencies. Observed 2 = 198.48 df = k – 1 = 4 – 1 = 3, 2.05,3 = 7.81473 = .05 Chapter 12: Analysis of Categorical Data 9 Since the observed 2 = 198.48 > 2.05,3 = 7.81473, the decision is to reject the null hypothesis. The observed frequencies for men are not distributed the same as the expected frequencies which are based on the responses of women. 12.6 Age fo 10-14 15-19 20-24 25-29 30-34 > 35 22 50 43 29 19 49 212 Prop. from survey .09 .23 .22 .14 .10 .22 fe (.09)(212)=19.08 (.23)(212)=48.76 46.64 29.68 21.20 46.64 ( f0 fe )2 f0 0.45 0.03 0.28 0.02 0.23 0.12 1.13 Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies. Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies. = .01, df = k - 1 = 6 - 1 = 5 2.01,5 = 15.0863 The observed 2 = 1.13 Since the observed 2 = 1.13 < 2.01,5 = 15.0863, the decision is to fail to reject the null hypothesis. There is not enough evidence to declare that the distribution of observed frequencies is different from the distribution of expected frequencies. Chapter 12: Analysis of Categorical Data 12.7 Age 10-20 20-30 30-40 40-50 50-60 60-70 x fo 16 44 61 56 35 19 231 fM n fm 240 1,100 2,135 2,520 1,925 1,235 fm = 9,155 fm2 3,600 27,500 74,725 113,400 105,875 80,275 2 fm = 405,375 9,155 = 39.63 231 fM 2 s = m 15 25 35 45 55 65 10 ( fM ) 2 n 1 n (9,155) 2 231 = 13.6 230 405,375 Ho: The observed frequencies are normally distributed. Ha: The observed frequencies are not normally distributed. For Category 10-20 Prob 10 39.63 = -2.18 .4854 13.6 20 39.63 z = = -1.44 -.4251 13.6 Expected prob. .0603 z = For Category 20-30 for x = 20, z = z = -1.44 .4251 30 39.63 = -0.71 -.2611 13.6 Expected prob. .1640 For Category 30-40 for x = 30, z = Prob z = -0.71 Prob .2611 40 39.63 = 0.03 +.0120 13.6 Expected prob. .2731 Chapter 12: Analysis of Categorical Data For Category 40-50 z = Prob 50 39.63 = 0.76 13.6 for x = 40, 11 z = 0.03 .2764 -.0120 Expected prob. .2644 For Category 50-60 z = Prob 60 39.63 = 1.50 13.6 for x = 50, z = 0.76 .4332 -.2764 Expected prob. .1568 For Category 60-70 z = Prob 70 39.63 = 2.23 13.6 for x = 60, z = 1.50 .4871 -.4332 Expected prob. .0539 For < 10: Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854 Probability < 10 = .5000 - .4854 = .0146 For > 70: Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871 Probability > 70 = .5000 - .4871 = .0129 Age < 10 10-20 20-30 30-40 40-50 Probability fe .0146 (.0146)(231) = 3.37 .0603 (.0603)(231) = 13.93 .1640 37.88 .2731 63.09 .2644 61.08 Chapter 12: Analysis of Categorical Data 50-60 60-70 > 70 .1568 .0539 .0129 12 36.22 12.45 2.98 Categories < 10 and > 70 are less than 5. Collapse the < 10 into 10-20 and > 70 into 60-70. Age fo fe 10-20 20-30 30-40 40-50 50-60 60-70 16 44 61 56 35 19 17.30 37.88 63.09 61.08 36.22 15.43 ( f0 fe )2 f0 0.10 0.99 0.07 0.42 0.04 0.83 2.45 df = k - 3 = 6 - 3 = 3, = .05 2.05,3 = 7.81473 Observed 2 = 2.45 Since the observed 2 < 2.05,3 = 7.81473, the decision is to fail to reject the null hypothesis. There is no reason to reject that the observed frequencies are normally distributed. 12.8 Number 0 1 2 3 4 5 6 or more = f 18 28 47 21 16 11 9 f = 150 f number 358 150 f (f) (number) 0 28 94 63 64 55 54 f(number) = 358 = 2.4 Ho: The observed frequencies are Poisson distributed. Ha: The observed frequencies are not Poisson distributed. Chapter 12: Analysis of Categorical Data Number 0 1 2 3 4 5 6 or more Probability .0907 .2177 .2613 .2090 .1254 .0602 .0358 fo fe 18 28 47 21 16 11 9 13.61 32.66 39.66 31.35 18.81 9.03 5.36 13 fe (.0907)(150 = 13.61 (.2177)(150) = 32.66 39.20 31.35 18.81 9.03 5.36 ( f0 fe )2 f0 1.42 0.66 1.55 3.42 0.42 0.43 2.47 10.37 The observed 2 = 10.27 = .01, df = k – 2 = 7 – 2 = 5, 2.01,5 = 15.0863 Since the observed 2 = 10.27 < 2.01,5 = 15.0863, the decision is to fail to reject the null hypothesis. There is not enough evidence to reject the claim that the observed frequencies are Poisson distributed. 12.9 H0: p = .28 Ha: p .28 n = 270 x = 62 fo Spend More Don't Spend More Total fe ( f0 fe )2 f0 62 270(.28) = 75.6 2.44656 208 270(.72) = 194.4 0.95144 270 270.0 3.39800 Chapter 12: Analysis of Categorical Data 14 The observed value of 2 is 3.398 = .05 and /2 = .025 df = k - 1 = 2 - 1 = 1 2.025,1 = 5.02389 Since the observed 2 = 3.398 < 2.025,1 = 5.02389, the decision is to fail to reject the null hypothesis. 12.10 H0: p = .30 Ha: p .30 n = 180 x= 42 42 180(.30) = 54 ( f0 fe )2 f0 2.6666 138 180(.70) = 126 1.1429 180 180 3.8095 f0 Provide Don't Provide Total fe The observed value of 2 is 3.8095 = .05 and /2 = .025 df = k - 1 = 2 - 1 = 1 2.025,1 = 5.02389 Since the observed 2 = 3.8095 < 2.025,1 = 5.02389, the decision is to fail to reject the null hypothesis. Chapter 12: Analysis of Categorical Data 15 12.11 Variable One Variable Two 203 326 68 110 271 436 529 178 707 Ho: Variable One is independent of Variable Two. Ha: Variable One is not independent of Variable Two. e11 = (529)( 271) = 202.77 707 e12 = (529)( 436) = 326.23 707 e21 = (271)(178) = 68.23 707 e22 = (436)(178) = 109.77 707 2 = Variable Two Variable (202.77) (326.23) One 203 326 (68.23) (109.77) 68 110 529 271 707 436 178 (203 202.77) 2 (326 326.23) 2 (68 6.23) 2 (110 109.77)2 + + + = 202.77 326.23 68.23 109.77 .00 + .00 + .00 + .00 = 0.00 = .05, df = (c-1)(r-1) = (2-1)(2-1) = 1 2.05,1 = 3.84146 Since the observed 2 = 0.00 < 2.05,1 = 3.84146, the decision is to fail to reject the null hypothesis. Variable One is independent of Variable Two. Chapter 12: Analysis of Categorical Data 16 12.12 Variable One 24 93 117 Variable Two 13 47 59 187 58 244 142 583 72 302 725 234 Ho: Variable One is independent of Variable Two. Ha: Variable One is not independent of Variable Two. e11 = (142)(117) = 22.92 725 e12 = (142)(72) = 14.10 725 e13 = (142)( 234) = 45.83 725 e14 = (142)(302) = 59.15 725 e21 = (583)(117) = 94.08 725 e22 = (583)(72) = 57.90 725 e23 = (583)( 234) = 188.17 725 e24 = (583)(302) = 242.85 725 Variable Two Variable (22.92) (14.10) (45.83) (59.15) One 24 13 47 58 (94.08) (57.90) (188.17) (242.85) 93 59 187 244 117 72 234 302 142 583 725 (24 22.92) 2 (13 14.10) 2 (47 45.83) 2 (58 59.15)2 = + + + + 22.92 14.10 45.83 59.15 2 (59 57.90) 2 (244 242.85) 2 (93 94.08) 2 (188 188.17)2 + + + = 57.90 242.85 94.08 188.17 .05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24 Chapter 12: Analysis of Categorical Data 17 = .01, df = (c-1)(r-1) = (4-1)(2-1) = 3 2.01,3 = 11.3449 Since the observed 2 = 0.24 < 2.01,3 = 11.3449, the decision is to fail to reject the null hypothesis. Variable One is independent of Variable Two. 12.13 Number of Children 0 1 2 or 3 >3 Social Class Lower Middle Upper 7 18 6 9 38 23 34 97 58 47 31 30 97 184 117 31 70 189 108 398 Ho: Social Class is independent of Number of Children. Ha: Social Class is not independent of Number of Children. e11 = (31)(97) = 7.56 398 e31 = (189)(97) = 46.06 398 e12 = (31)(184) = 14.3 398 e32 = (189)(184) = 87.38 398 e13 = (31)(117 ) = 9.11 398 e33 = (189)(117) = 55.56 398 e21 = (70)(97) = 17.06 398 e41 = (108)(97) = 26.32 398 e22 = (70)(184) = 32.36 398 e42 = (108)(184) = 49.93 398 e23 = (70)(117) = 20.58 398 e43 = (108)(117) = 31.75 398 Chapter 12: Analysis of Categorical Data 0 Number of Children 1 2 or 3 >3 18 Social Class Lower Middle Upper (7.56) (14.33) (9.11) 7 18 6 (17.06) (32.36) (20.58) 9 38 23 (46.06) (87.38) (55.56) 34 97 58 (26.32) (49.93) (31.75) 47 31 30 97 184 117 31 70 189 108 398 (18 14.33) 2 (9 17.06) 2 (7 7.56) 2 (6 9.11) 2 = + + + + 14.33 17.06 7.56 9.11 2 (23 20.58) 2 (34 46.06) 2 (38 32.36)2 (97 87.38)2 + + + + 46.06 32.36 87.38 20.58 (47 26.32) 2 (58 55.56)2 (31 49.93)2 (30 31.75)2 + + + = 26.32 55.56 49.93 31.75 .04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 + 7.18 + .10 = 34.97 = .05, df = (c-1)(r-1) = (3-1)(4-1) = 6 2.05,6 = 12.5916 Since the observed 2 = 34.97 > 2.05,6 = 12.5916, the decision is to reject the null hypothesis. Number of children is not independent of social class. Chapter 12: Analysis of Categorical Data 19 12.14 Region NE S W Type of Music Preferred Rock R&B Coun Clssic 140 32 5 18 134 41 52 8 154 27 8 13 428 100 65 39 195 235 202 632 Ho: Type of music preferred is independent of region. Ha: Type of music preferred is not independent of region. e11 = (195)( 428) = 132.6 632 e23 = (235)(65) = 24.17 632 e12 = (195)(100) = 30.85 632 e24 = (235)(39) = 14.50 632 e13 = (195)( 65) = 20.06 632 e31 = (202)( 428) = 136.80 632 e14 = (195)(39) = 12.03 632 e32 = (202)(100) = 31.96 632 e21 = (235)( 428) = 159.15 632 e33 = (202)(65) = 20.78 632 e22 = (235)(100) = 37.18 632 e34 = (202)(39) = 12.47 632 NE Region S W Type of Music Preferred Rock R&B Coun Clssic (132.06) (30.85) (20.06) (12.03) 140 32 5 18 (159.15) (37.18) (24.17) (14.50) 134 41 52 8 (136.80) (31.96) (20.78) (12.47) 154 27 8 13 428 100 65 39 195 235 202 632 Chapter 12: Analysis of Categorical Data 2 = 20 (141 132.06) 2 (5 20.06) 2 (18 12.03) 2 (32 30.85)2 + + + + 132.06 20.06 12.03 30.85 (134 159.15) 2 (52 24.17) 2 (41 37.18)2 (8 14.50) 2 + + + + 159.15 24.17 37.18 14.50 (27 31.96) 2 (8 20.78) 2 (13 12.47) 2 (154 136.80)2 + + + = 31.96 20.78 12.47 136.80 .48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 + 7.86 + .02 = 64.91 = .01, df = (c-1)(r-1) = (4-1)(3-1) = 6 2.01,6 = 16.8119 Since the observed 2 = 64.91 > 2.01,6 = 16.8119, the decision is to reject the null hypothesis. Type of music preferred is not independent of region of the country. 12.15 Transportation Mode Air Train Truck Industry Publishing 32 12 41 Comp.Hard. 5 6 24 37 18 65 85 35 120 H0: Transportation Mode is independent of Industry. Ha: Transportation Mode is not independent of Industry. e11 = (85)(37) = 26.21 120 e21 = (35)(37) = 10.79 120 e12 = (85)(18) = 12.75 120 e22 = (35)(18) = 5.25 120 e13 = (85)(65) = 46.04 120 e23 = (35)(65) = 18.96 120 Chapter 12: Analysis of Categorical Data 21 Transportation Mode Air Train Truck Industry Publishing (26.21) (12.75) (46.04) 32 12 41 Comp.Hard. (10.79) (5.25) (18.96) 5 6 24 37 18 65 85 35 120 (12 12.75) 2 (32 26.21) 2 (41 46.04)2 = + + + 12.75 26.21 46.04 2 (5 10.79) 2 (6 5.25) 2 (24 18.96) 2 + + = 10.79 5.25 18.96 1.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43 = .05, df = (c-1)(r-1) = (3-1)(2-1) = 2 2.05,2 = 5.99147 Since the observed 2 = 6.431 > 2.05,2 = 5.99147, the decision is to reject the null hypothesis. Transportation mode is not independent of industry. 12.16 Number of Stories 1 2 Number of Bedrooms <2 3 >4 116 101 57 90 325 160 206 426 217 274 575 849 H0: Number of Stories is independent of number of bedrooms. Ha: Number of Stories is not independent of number of bedrooms. e11 = (274)( 206) = 66.48 849 e21 = (575)( 206) = 139.52 849 e12 = (274)( 426) = 137.48 849 e22 = (575)( 426) = 288.52 849 Chapter 12: Analysis of Categorical Data 22 e13 = (274)( 217) = 70.03 849 2 = (90 139.52) 2 (101 137.48) 2 (57 70.03) 2 (90 139.52) 2 + + + + 139.52 137.48 70.03 139.52 e23 = (575)( 217) = 146.97 849 (325 288.52) 2 (160 146.97)2 + = 288.52 146.97 2 = 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34 = .10 df = (c-1)(r-1) = (3-1)(2-1) = 2 2.10,2 = 4.60517 Since the observed 2 = 72.34 > 2.10,2 = 4.60517, the decision is to reject the null hypothesis. Number of stories is not independent of number of bedrooms. 12.17 Type of Store Mexican Citizens Yes No Dept. 24 17 Disc. 20 15 Hard. 11 19 Shoe 32 28 87 79 41 35 30 60 166 Ho: Citizenship is independent of store type Ha: Citizenship is not independent of store type e11 = (41)(87) = 21.49 166 e31 = (30)(87) = 15.72 166 e12 = (41)(79) = 19.51 166 e32 = (30)(79) = 14.28 166 e21 = (35)(87) = 18.34 166 e41 = (60)(87) = 31.45 166 Chapter 12: Analysis of Categorical Data e22 = (35)(79) = 16.66 166 Type of Store e42 = 23 (60)(79) = 28.55 166 Mexican Citizens Yes No Dept. (21.49) (19.51) 24 17 Disc. (18.34) (16.66) 20 15 Hard. (15.72) (14.28) 11 19 Shoe (31.45) (28.55) 32 28 87 79 41 35 30 60 166 (24 21.49) 2 (17 19.51) 2 (15 16.66) 2 (20 18.34) 2 = + + + + 21.49 19.51 16.66 18.34 2 (11 15.72) 2 (19 14.28) 2 (28 28.55) 2 (32 31.45)2 + + + = 15.72 14.28 28.55 31.45 .29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93 = .05, df = (c-1)(r-1) = (2-1)(4-1) = 3 2.05,3 = 7.81473 Since the observed 2 = 3.93 < 2.05,3 = 7.81473, the decision is to fail to reject the null hypothesis. Citizenship is independent of type of store. Chapter 12: Analysis of Categorical Data 24 12.18 = .01, k = 7, df = 6 H0: The observed distribution is the same as the expected distribution Ha: The observed distribution is not the same as the expected distribution Use: ( f0 fe )2 fe 2 critical 2.01,7 = 18.4753 fo fe (f0-fe)2 214 235 279 281 264 254 211 206 232 268 284 268 232 206 64 9 121 9 16 484 25 2 ( f0 fe )2 f0 0.311 0.039 0.451 0.032 0.060 2.086 0.121 3.100 ( f0 fe )2 = 3.100 fe Since the observed value of 2 = 3.1 < 2.01,7 = 18.4753, the decision is to fail to reject the null hypothesis. The observed distribution is not different from the expected distribution. 12.19 Variable 1 12 8 7 27 Variable 2 23 17 11 51 e11 = 11.00 e12 = 20.85 e13 = 24.12 e21 = 8.87 e22 = 16.75 e23 = 19.38 21 20 18 59 56 45 36 137 Chapter 12: Analysis of Categorical Data e31 = 7.09 2 = e32 = 13.40 25 e33 = 15.50 (23 20.85) 2 (12 11.04)2 (21 24.12)2 (8 8.87) 2 + + + + 20.85 11.04 24.12 8.87 (11 13.40) 2 (17 16.75) 2 (20 19.38)2 (7 7.09) 2 + + + + 13.40 16.75 19.38 7.09 (18 15.50) 2 = 15.50 .084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652 df = (c-1)(r-1) = (2)(2) = 4 = .05 2.05,4 = 9.48773 Since the observed value of 2 = 1.652 < 2.05,4 = 9.48773, the decision is to fail to reject the null hypothesis. 12.20 Customer Industrial Retail NE 230 185 415 Location W S 115 68 143 89 258 157 413 417 830 e11 = (413)( 415) = 206.5 830 e21 = (417)( 415) = 208.5 830 e12 = (413)( 258) = 128.38 830 e22 = (417)( 258) = 129.62 830 e13 = (413)(157) = 78.12 830 e23 = (417)(157) = 78.88 830 Chapter 12: Analysis of Categorical Data 26 Location NE W S Customer Industrial (206.5) (128.38) (78.12) 230 115 68 Retail (208.5) (129.62) (78.88) 185 143 89 415 258 157 2 = 413 417 830 (115 128.38) 2 (68 78.12) 2 (230 206.5) 2 + + + 128.38 78.12 206.5 (185 208.5) 2 (143 129.62) 2 (89 78.88)2 + + = 208.5 129.62 78.88 2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70 = .10 and df = (c - 1)(r - 1) = (3 - 1)(2 - 1) = 2 2.10,2 = 4.60517 Since the observed 2 = 10.70 > 2.10,2 = 4.60517, the decision is to reject the null hypothesis. Type of customer is not independent of geographic region. 12.21 Cookie Type fo Chocolate Chip 189 Peanut Butter 168 Cheese Cracker 155 Lemon Flavored 161 Chocolate Mint 216 Vanilla Filled 165 fo = 1,054 Ho: Ha: Cookie Sales is uniformly distributed across kind of cookie. Cookie Sales is not uniformly distributed across kind of cookie. If cookie sales are uniformly distributed, then fe = f 0 no.kinds 1,054 = 175.67 6 Chapter 12: Analysis of Categorical Data fo fe 189 168 155 161 216 165 175.67 175.67 175.67 175.67 175.67 175.67 27 ( f0 fe )2 f0 1.01 0.33 2.43 1.23 9.26 0.65 14.91 The observed 2 = 14.91 = .05 df = k - 1 = 6 - 1 = 5 2.05,5 = 11.0705 Since the observed 2 = 14.91 > 2.05,5 = 11.0705, the decision is to reject the null hypothesis. Cookie Sales is not uniformly distributed by kind of cookie. 12.22 Bought Car Y N Gender M F 207 65 811 984 1,018 1,049 272 1,795 2,067 Ho: Purchasing a car or not is independent of gender. Ha: Purchasing a car or not is not independent of gender. (272)(1,018) = 133.96 2,067 (1,795)(1,018) e21 = = 884.04 2,067 e11 = (27)(1,049) = 138.04 2,067 (1,795)(1,049) e22 = = 910.96 2,067 e12 = Chapter 12: Analysis of Categorical Data Bought Car Gender M F (133.96) (138.04) 207 65 272 (884.04) (910.96) 811 984 1,795 1,018 1,049 2,067 Y N 2 = 28 (207 133.96) 2 (65 138.04) 2 (811 884.04) 2 + + + 133.96 138.04 884.04 (984 910.96) 2 = 910.96 = .05 39.82 + 38.65 + 6.03 + 5.86 = 90.36 df = (c-1)(r-1) = (2-1)(2-1) = 1 2.05,1 = 3.841 Since the observed 2 = 90.36 > 2.05,1 = 3.841, the decision is to reject the null hypothesis. Purchasing a car is not independent of gender. 12.23 Arrivals 0 1 2 3 4 5 6 = fo 26 40 57 32 17 12 8 fo = 192 ( f )(arrivals) 426 192 f 0 (fo)(Arrivals) 0 40 114 96 68 60 48 (fo)(arrivals) = 426 = 2.2 0 Ho: The observed frequencies are Poisson distributed. Ha: The observed frequencies are not Poisson distributed. Chapter 12: Analysis of Categorical Data Arrivals 0 1 2 3 4 5 6 Probability .1108 .2438 .2681 .1966 .1082 .0476 .0249 29 fe (.1108)(192) = 21.27 (.2438)(192) = 46.81 51.48 37.75 20.77 9.14 4.78 fo fe 26 40 57 32 17 12 8 21.27 46.81 51.48 37.75 20.77 9.14 4.78 ( f0 fe )2 f0 1.05 2.18 0.59 0.88 0.68 0.89 2.17 8.44 Observed 2 = 8.44 = .05 df = k - 2 = 7 - 2 = 5 2.05,5 = 11.0705 Since the observed 2 = 8.44 < 2.05,5 = 11.0705, the decision is to fail to reject the null hypothesis. There is not enough evidence to reject the claim that the observed frequency of arrivals is Poisson distributed. Chapter 12: Analysis of Categorical Data 30 12.24 Ho: The distribution of observed frequencies is the same as the distribution of expected frequencies. Ha: The distribution of observed frequencies is not the same as the distribution of expected frequencies. Soft Drink fo Classic Coke 361 Pepsi 272 Diet Coke 192 Mt. Dew 121 Dr. Pepper 94 Sprite 102 Others 584 fo = 1,726 proportions fe .206 (.206)(1726) = 355.56 .145 (.145)(1726) = 250.27 .085 146.71 .063 108.74 .059 101.83 .062 107.01 .380 655.86 ( f0 fe )2 f0 0.08 1.89 13.98 1.38 0.60 0.23 7.87 26.03 Calculated 2 = 26.03 = .05 df = k - 1 = 7 - 1 = 6 2.05,6 = 12.5916 Since the observed 2 = 26.03 > 2.05,6 = 12.5916, the decision is to reject the null hypothesis. The observed frequencies are not distributed the same as the expected frequencies from the national poll. 12.25 Position Years 0-3 4-8 >8 Systems Manager Programmer Operator Analyst 6 37 11 13 28 16 23 24 47 10 12 19 81 63 46 56 67 91 88 246 Chapter 12: Analysis of Categorical Data 31 e11 = (67)(81) = 22.06 246 e23 = (91)( 46) = 17.02 246 e12 = (67)(63) = 17.16 246 e24 = (91)(56) = 20.72 246 e13 = (67)( 46) = 12.53 246 e31 = (88)(81) = 28.98 246 e14 = (67)(56) = 15.25 246 e32 = (88)(63) = 22.54 246 e21 = (91)(81) = 29.96 246 e33 = (88)( 46) = 16.46 246 e22 = (91)( 63) = 23.30 246 e34 = (88)(56) = 20.03 246 Position 0-3 Years 4-8 >8 Systems Manager Programmer Operator Analyst (22.06) (17.16) (12.53) (15.25) 6 37 11 13 (29.96) (23.30) (17.02) (20.72) 28 16 23 24 (28.98) (22.54) (16.46) (20.03) 47 10 12 19 81 63 46 56 67 91 88 246 (6 22.06) 2 (11 12.53) 2 (13 15.25) 2 (37 17.16)2 = + + + + 22.06 12.53 15.25 17.16 2 (28 29.96) 2 (24 20.72) 2 (16 23.30)2 (23 17.02)2 + + + + 29.96 20.72 23.30 17.02 (47 28.98) 2 (10 22.54)2 (12 16.46)2 (19 20.03)2 + + + = 28.98 22.54 16.46 20.03 11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 + 1.21 + .05 = 59.63 Chapter 12: Analysis of Categorical Data = .01 32 df = (c-1)(r-1) = (4-1)(3-1) = 6 2.01,6 = 16.8119 Since the observed 2 = 59.63 > 2.01,6 = 16.8119, the decision is to reject the null hypothesis. Position is not independent of number of years of experience. 12.26 H0: p = .43 Ha: p .43 n = 315 x = 120 =.05 /2 = .025 fo fe ( f0 fe )2 f0 More Work, More Business 120 (.43)(315) = 135.45 1.76 Others 195 (.57)(315) = 179.55 1.33 Total 315 315.00 3.09 The calculated value of 2 is 3.09 = .05 and /2 = .025 df = k - 1 = 2 - 1 = 1 2.025,1 = 5.02389 Since 2 = 3.09 < 2.025,1 = 5.02389, the decision is to fail to reject the null hypothesis. 12.27 Number of Children 0 1 2 >3 Type of College or University Community Large Small College University College 25 178 31 49 141 12 31 54 8 22 14 6 127 387 57 234 202 93 42 571 Ho: Number of Children is independent of Type of College or University. Ha: Number of Children is not independent of Type of College or University. Chapter 12: Analysis of Categorical Data 33 e11 = (234)(127) = 52.05 571 e31 = (93)(127 ) = 20.68 571 e12 = (234)(387) = 158.60 571 e32 = (193)(387) = 63.03 571 e13 = (234)(57) = 23.36 571 e33 = (93)(57) = 9.28 571 e21 = (202)(127) = 44.93 571 e41 = (42)(127) = 9.34 571 e22 = (202)(387) = 136.91 571 e42 = (42)(387) = 28.47 571 e23 = (202)(57) = 20.16 571 e43 = (42)(57) = 4.19 571 Number of Children 0 1 2 >3 2 = Type of College or University Community Large Small College University College (52.05) (158.60) (23.36) 25 178 31 (44.93) (136.91) (20.16) 49 141 12 (20.68) (63.03) (9.28) 31 54 8 (9.34) (28.47) (4.19) 22 14 6 127 387 57 234 202 93 42 571 (25 52.05) 2 (178 158.6) 2 (49 44.93) 2 (31 23.36) 2 + + + + 52.05 158.6 44.93 23.36 (141 136.91) 2 (12 20.16)2 (31 20.68)2 (54 63.03)2 + + + + 136.91 20.16 20.68 63.03 (22 9.34) 2 (8 9.28) 2 (14 28.47)2 (6 4.19) 2 + + + = 9.34 9.28 28.47 4.19 Chapter 12: Analysis of Categorical Data 34 14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 + 17.16 + 7.35 + 0.78 = 54.63 = .05, df= (c - 1)(r - 1) = (3 - 1)(4 - 1) = 6 2.05,6 = 12.5916 Since the observed 2 = 54.63 > 2.05,6 = 12.5916, the decision is to reject the null hypothesis. Number of children is not independent of type of College or University. 12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square goodness-of-fit test indicates that there is a significant difference between the observed frequencies and the expected frequencies. The distribution of responses to the question are not the same for adults between 21 and 30 years of age as they are to others. Marketing and sales people might reorient their 21 to 30 year old efforts away from home improvement and pay more attention to leisure travel/vacation, clothing, and home entertainment. 12.29 The observed chi-square value for this test of independence is 5.366. The associated p-value of .252 indicates failure to reject the null hypothesis. There is not enough evidence here to say that color choice is dependent upon gender. Automobile marketing people do not have to worry about which colors especially appeal to men or to women because car color is independent of gender. In addition, design and production people can determine car color quotas based on other variables.