Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bliss 1 CHAPTER EIGHT TESTING HYPOTHESES REGARDING FREQUENCY DATA AND CONTINGENCY TABLES: THE CHI-SQUARE TESTS Chapter Objectives In this Chapter you will: • • • • • • Learn how to determine frequencies and proportions using data on nominal scales. Understand the concept of expected values of frequencies and proportions. Learn how to use the chi-squared (χ2) distribution to test hypotheses about differences between observed and expected frequencies and proportions of the various values of a nominal variable in a population. Learn how to use the chi-squared statistic to test hypotheses about the independence or relationship between variables measured on nominal scales. Learn how to determine the strength of relationships between variables measured on nominal scales. Learn to use SPSS to test hypotheses about frequencies, proportions, and relationships of variables measured on nominal scales. Bliss 2 A simple thing that we can do with data is to count them. In fact, there is some evidence that writing began developing in the Fertile Crescent as a way of keeping track of the number of domestic animals that were owned by a person or involved in a business transaction. Counting is old! When you come down to it, it is really the only thing we can do with data that is on nominal scales. Frequencies are the results of counting. We can look at relationships between variables on nominal scales and test hypotheses concerning distributions of nominal variables in populations. These latter hypotheses are referred to as hypotheses about “goodness of fit” since they test how well the patterns of the frequencies fit together. Let’s start our investigation of nominal variables by looking at these hypotheses. Hypotheses About Goodness of Fit Imagine that Mr. Lycanthrop, the principal of Transylvania High School suspected that the frequency of students being referred to his office for discipline varied with the phase of the moon. For a year he keeps track of the number of referrals for disciplinary action he received, noting carefully the phase the moon was in at each referral. At the end of the academic year Mr. Lycanthrop observed that the one thousand referrals teachers had made were distributed over the phases of the moon as shown in Table 8.1. Table 8.1 Observed Referrals for Discipline During the Various Phases of the Moon Phase of the Moon Number of Referrals New to waxing quarter 206 Waxing quarter to full 220 Full to waning quarter 305 Waning quarter to new 269 1000 Bliss 3 Comparing Observed and Expected Values What if Mr. Lycanthrop is wrong and the phase of the moon had nothing to do with the frequency of referrals for discipline? In this case we would expect to find the referrals distributed evenly across the different moon phases. The distribution of the 1,000 referrals would look like Table 8.2. Table 8.2 Expected Referrals for Discipline During the Various Phases of the Moon if the Phase of the Moon and the Number of Referrals were not related to each other Phase of the Moon Number of Referrals New to waxing quarter 250 Waxing quarter to full 250 Full to waning quarter 250 Waning quarter to new 250 1000 These expected values are certainly different from the values actually observed over the course of the academic year. In fact, we can see how much this distribution we observed in the study differed from the distribution we would expect if Mr. Lycanthrop were wrong and there was no relationship between the phases of the moon and discipline referrals by calculating the residuals (i.e., the differences between the two distributions) as shown in Table 8.3. Table 8.3 Observed and Expected Distributions with Residuals Observed Expected Frequency Frequency Residual Moon Phase (fo) (fe) (fo – fe) New to Waxing Quarter 206 250 -44 Waxing Quarter to Full 220 250 -30 Full to Waning Quarter 305 250 55 Waning Quarter to New 269 250 19 Note that the sum of the residuals always equals zero. Bliss 4 If Lycanthrop is wrong he should have observed the same frequencies that were expected if the phase of the moon had no effect on student discipline. Therefore, the residuals would all be zero. We see that this is not the case here. We can think of two reasons for this. The first is that the phases of the moon affect the behavior of students at Transylvania High (Lycanthrop is correct). The second is that the differences between the observed frequencies and the Who Invented the Chi-Squared Test? Karl Pearson introduced the chi-squared test and the name for it in an article in 1900 in The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. Pearson had been in the habit of writing the exponent in the multivariate normal density as “-1/2 chi-squared .” expected frequencies are simply due to chance (Lycanthrop is wrong). What is the probability that the residuals can be this great (that is, the differences between the observed and expected values can be as great as we see in Table 8) if Mr. Lycanthrop is wrong about the phases of the moon affecting student behavior and the differences between the observed and expected frequencies are simply due to chance? There are two problems that we have to deal with when we evaluate the magnitude of these deviations from expectancy. One is that regardless of how far we are from expectations, a simple sum of the deviations is not useful because it is always zero. For example, if in the table above, you change the observed frequencies to 235, 245, 255, and 260, and calculate the deviation of these values from expected frequency for each category (250), and add them up, the sum would be zero. But you can see that these four frequencies are much closer to the expected frequencies than the ones in Mr. Lycanthrop’s study in Table 8.3. One way to solve this problem is to square each deviation from expectancy, that is, examine the squared deviations ( f o − f e )2 rather than the simple deviations ( f o − f e ). Squaring the residuals would prevent them from canceling each other out. Bliss 5 A second problem that we have to solve is that deviation from expectancy is only meaningful if is rather large as compared to the number we expected. For example, in Table 8.3, the expected frequency for the first category (New to Waxing Quarter) is 250, and the deviation from this expected frequency is -44. Would 44 points have the same meaning/implication if the expected frequency was 2500? Obviously, if we expect 250, and we are 44 points off, we have a bigger deviation from expectation that if we are expect 2500 and we are 44 points off. A solution to this problem is to calculate the ratio of squared deviation for each category and the expected frequency for that category ( f o − f e )2 . The overall deviation from expectation in a fe study such as Mr. Lycanthrop’s would be the sum of these proportions. We can do this by calculating a test statistic known as Chi-Square (χ2) (it starts with a “K” sound and i rhymes with “sky”) using Formula 8.1. Observed frequency Sum Chi-square χ 2 = ∑ ( fo − fe fe )2 Expected frequency (8.1) Table 8.4 demonstrates the procedure for calculating the elements of the formula. Table 8.4 Calculating the Elements of the χ2 Formula Bliss 6 Moon Phase New to Waxing Quarter Waxing Quarter to Full Full to Waning Quarter Waning Quarter to New Observed Expected Squared Frequency Frequency Residual Residual (fo) (fe) (fo – fe) (fo – fe)2 206 250 -44 1936 220 250 -30 900 305 250 55 3025 269 250 19 361 ( f o − f e )2 fe 7.744 3.600 12.100 1.444 In the research on the effect of the phase of the moon on student behavior, the test statistic is calculated as shown below. χ2 = ∑ ( f o − f e )2 fe 2 (− 44 ) = (206 − 250)2 250 2 (− 30 ) 2 (55) + (220 − 250)2 250 + (305 − 250)2 250 + (269 − 250)2 250 2 (19 ) 1936 900 3025 361 + + + 250 250 250 250 250 250 250 250 = 7.744 + 3.600 + 12.100 + 1.444 = 24.888 = + + + = The sampling distribution of the χ2 statistic varies based on the degrees of freedom, which is defined as the number of categories minus one. Figure 8.1 displays the shape of the χ2 distribution for various degrees of freedom. Figure 8.1 χ2 distributions for 1, 3, 5, and 10 degrees of freedom Our example has four categories, so we can see that we have three degrees of freedom in this design. The table of critical values in Appendix X can tell us the value of χ2 that cuts off a Bliss 7 particular area under the curve. For instance, in the χ2 distribution with three degrees of freedom we find that a value of 7.815 cuts off the upper 5% of the area of the distribution. Hence, we can conclude that there is less than a 5% probability that we could have obtained a value of χ2 of 24.888 (as we did in this case) if there were no differences between the observed and expected distribution of disciplinary referrals. Given the obtained distribution of referrals across the four phases of the moon, Mr. Lycanthrop would have had less than a 5% chance of being correct if he had concluded that the distribution of disciplinary referrals was not related to the phases of the moon. Chi-Square for Goodness of Fit When data are on a nominal scale all we can really do with them is to count the number of times a particular value occurs in a sample of data. Using appropriate theory, we can predict what the distribution of these values would be under certain circumstances. For example, in the previous example we could apply simple probability theory to predict that, if there were no relationship between the phases of the moon and the occurrence of discipline referrals, we could expect that the frequencies of discipline problems should be equal during the four phases of the moon. Now, if the theory were appropriate, the observed distribution of the frequencies should be the same as the theoretically derived distribution. We can use this situation as our null hypothesis: Ho: fo = fe and test this null hypothesis against the alternative hypothesis that we did not observe the frequency distribution predicted by the theory (H1: fo ≠ fe). We begin by assuming that the null hypothesis is true, calculate the χ2 statistic, and determine what the probability is off getting a χ2 value as high as the one obtained if the null hypothesis were true. Remember that if the null hypothesis is true and we reject it, we will commit a Type I error…not a very smart thing to do. So, the probability that the null is true given a particular value of the Bliss 8 obtained χ2 statistic is also the probability of our making a Type I error if we decide to reject the null hypothesis in that particular case. As described in Chapter 6, the researcher must decide on the maximum level of Type I error he or she will tolerate before deciding that the chance of the null being true is too high to risk rejecting the null hypothesis. If we found that the chance that the null is true, given our data, is less than this maximum tolerable risk, we can conclude that the chance that the null hypothesis is true is low enough for us to feel comfortable rejecting the null hypothesis and concluding that the obtained distribution is different from the distribution expected under the theory. That is, we can conclude that the observed distribution of values is not a good fit with the expected value. A statistically significant result would tell us that the observed distribution does not fit the theoretically expected one. The expected frequencies of proportions are usually derived from theories (See the box concerning Mendel’s theory). But they may also be derived from the proportions in the population if it is known (see the box about the soldier’s boots). Therefore, a chi-square goodness of fit may also be used to test hypotheses regarding the difference between obtained sample frequencies/proportions and the population proportions. Mendel’s Theory Gregor Mendel (1822-1884), an Austrian Roman Catholic monk is the originator of the modern genetic theories. In his original work (first published in 1866), he predicted that two attributes of peas (color and seed texture) are genetically determined, and each has a dominant and a recessive form. For example, in certain situations if two yellow pea plants with rough textured peas are cross pollinated and the resulting seeds are planted, Mendelian theory tells us we would expect to obtain offspring pea plants in the ratio 9/16 yellow, rough; 3/16 yellow, smooth; 3/16 green, rough; and 1/16 green, smooth. Suppose Mendel had planted the seeds resulting from 90 such cross pollinated pairs of plants and planted a randomly chosen sample of their seeds finding that 60 of the offspring were yellow with rough seeds, 15 were yellow with smooth seeds, 10 were green with rough seeds, and 5 were green with smooth seeds. Would you conclude that his results supported his theory? Bliss 9 Steps in Conducting a χ2 Test for Goodness of Fit 1. Determine the maximum level of Type I error you will tolerate in making decisions about your hypothesis (i.e., the significance of your statistical test) and use this determination to construct a decision rule that tells you when to reject the null hypothesis. In our example, Mr. Lycanthrope decided that he would only reject the null hypothesis if there were less than a 5% chance of it being true (that is if the probability of his making a Type I error when he rejected the null hypothesis was less than 5%). So, his decision rule would have been: Reject the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%. Fail to reject the null hypothesis if p is greater than or equal to .05. 2. Determine a theoretical distribution of the data expected under specific circumstances (the expected values). In the case of our example, in the circumstance that there was no relationship between the incidence of disciplinary problems and the phase of the moon, we would expect the incidences of disciplinary referrals to be evenly distributed among the four moon phase periods (Table 8.2). 3. Gather data and determine the frequency distribution of the data observed in the field (Table 8.1). In our example we determine how many disciplinary referrals were made during each phase of the moon by checking the records kept by Mr. Lycanthrop. 4. Calculate the value of the χ2 statistic using Equation 8.1. 5. Determine the number of degrees of freedom of the design by subtracting one from the number of categories in the frequency distribution. In our example there are four phases of the moon and, therefore, the design has three (4 – 1) degrees of freedom. 6. Use Table X to determine the critical value of χ2 for the statistical test. That is, find that value that cuts off the upper percent of the area of the χ2 distribution that you determined in Bliss 10 step #1. In the case of Mr. Lycanthrop’s study, the critical value that cuts off the upper 5% of the distribution for χ2 with 3 degrees of freedom was found to be 7.815. 7. Compare the calculated value of χ2 using the data with this critical value found in step #6. If the calculated value exceeds the tabled critical value this tells us that the chances of obtaining a value this large if there were no differences between the observed and expected distributions was less than the maximum probability of making a Type I error that you chose. 8. Apply the decision rule you devised in step #1 to determine whether or not you should reject the null hypothesis. In the case of the possibly moonstruck students, we concluded that the probability of the null hypothesis being true was less than 5% (p<.05). Therefore, we will reject the null hypothesis and conclude that the distribution of disciplinary incidences across the various phases of the moon was different from what we would have expected if there were no relationship between moon phase and the frequency of disciplinary incidents. Army Boots Uniform boots issued to members of the United States Army come in nine sizes. The boots of currently serving soldiers are distributed as shown in the table below. Size 1 2 3 4 5 6 7 8 9 Proportion of soldiers .04 .08 .12 .16 .20 .16 .12 .08 .04 If we take a random sample of 1000 soldiers whose last address before enlisting was in the state of Florida, we find the boots issued to these 1000 soldiers were distributed by size in the following way. Size Number of soldiers from Florida 1 2 3 4 5 6 7 8 9 75 100 160 180 180 140 80 75 10 Based on this data, would you say that the distribution of boot sizes among soldiers from Florida is the same as the distribution of boots in the population of soldiers in the United States? Bliss 11 Assumptions for the Chi-Square for Goodness of Fit Chi-square is one of a family of statistics known as nonparametric statistics. As this name implies, using the statistic makes no assumptions of the parameters (characteristics) of the population from which the data was obtained. So, unlike such parametric statistical tests such as t-tests and analysis of variance, we do not have to be concerned about whether the data are more or less normally distributed or that there is homogeneity of variance among samples. Nonparametric tests are reasonably easy-going. However, as in most of life, there is no such thing as free lunch. Nonparametric tests make us pay for this easy-goingness, however. Compared to corresponding parametric statistical test, nonparametric tests have lower power. That is, they have less chance of rejecting a false null hypothesis than the corresponding parametric tests. Put simply, if two distributions differ by a given amount, a nonparametric statistical test will yield a higher probability of the null hypothesis being true than would a parametric statistic. So, in the event that the null hypothesis is actually false, there is a greater chance that you would fail to reject the null (and make a Type II error) using a nonparametric statistic than if you had used a parametric statistic. Quite simply, parametric tests are more sensitive to false null hypotheses than are parametric statistics. Bliss 12 SPSS Doing it by Computer This is the SPSS output was obtained by calculating a χ2 for goodness of fit using the Lycanthrop data. The table labeled “Phase of moon” contains the same information that is Phase of moon Observed N Expected N New to waxing quarter 206 250.0 -44.0 Waxing quarter to full 220 250.0 -30.0 Full to waning quarter 305 250.0 55.0 269 250.0 19.0 Waning quarter to new Total Residual 1000 displayed in Table 8.1. The table labeled Test Statistics “Test Statistics” presents the obtained Phase of moon value of χ2, the number of degrees of Chi-Square a 24.888 freedom (df) and the significance df 3 (Asymp. Sig.) of the obtained χ2 statistic Asymp. Sig. .000 with the given number of degrees of a. 0 cells (.0%) have expected frequencies less than freedom (p). Of course this significance 5. The minimum expected cell frequency is 250.0. (the probability that the null hypothesis is true) is not really zero. It couldn’t be unless we had data from the entire population of interest. SPSS rounds this value to three decimal places, so we can interpret a printed value of .000 as p<.0005. In any case, using the decision rule we derived earlier, we can reject the null hypothesis and conclude that the expected frequency distribution is not equal to the distribution of the actual data. Independence/Association In the nation of West Atlantis, the national legislature is made up of 200 representatives. Recently, geologists have found that large deposits of oil lie off the southern coast of the country along the continental shelf. This region is a large tourist and recreational area and a wellorganized movement has arisen to convince the government not to issue leases to oil companies allowing them to drill in the area. West Atlantis has a two party political system that is based on environmental ideology. The Green Party considers the quality of the natural environment to be the ultimate good and its Bliss 13 members believe that the less technology that people use, the better off the world will be. The Brown Party’s platform calls for a greater use of technology and natural resources and believes that what is good for business is good for the entire country. The last time the legislature voted on a similar issue dealing with permitting offshore oil exploration, the vote was as shown in Table 8.3. Statisticians refer to tables such as this one as contingency tables since they allow us to calculate contingency probabilities. For instance, if a legislator is a member of the Green Party, there is a 75% (60/80) chance that the legislator voted yes. Likewise, if a legislator is a member of the Brown Party, there is only a 62.5% (75/120) chance that he or she would have voted in favor of the motion. Table 8.3 Distribution of Votes by Party Party Vote Green Brown Yes 60 75 No 20 45 Clearly, if there is a relationship between party affiliation and how legislators tend to vote on issues of involving offshore drilling leases, both sides in the issue could use this information to target their arguments and financial support to certain legislators. On the other hand, if voting on these issues turns out to be independent of party affiliation, the movement would be better off targeting representatives based on some other variable. In this case, does knowing a legislator’s party affiliation change the probability of successfully predicting how a legislator would vote? We can answer this question by expanding Table 8.3 to show the marginals of the rows and columns in the contingency table as shown in Table 8.4. Bliss 14 As can be seen in Table 8.4, the marginals are Table 8.4 Distribution of Votes by Party With Row and Column Marginals Party Vote Green Brown Yes 60 75 135 No 20 45 65 80 120 200 simply the sums of the rows and columns. Now, assuming that we don’t know the party affiliation of the particular legislator in question, our best guess of his or her vote would be that the legislator voted Yes. Quite simply, this is because looking at the number of legislators who voted Yes and No (the row marginals) we see that more of them voted Yes (135) than voted No (65). If we were to guess the legislator vote Yes (the smartest thing to do) we would be wrong 65 times or 32.5% (65/200) of the time. Let’s assume now that we knew the legislator was a member of the Green Party. In this case we need only look at the Green Party Column in Table 8.4. Again, our best guess of how the legislator voted would be Yes since more Greens voted Yes (60) than voted No (20). In this case we would be wrong only 20 times or 25% (20/80) of the time if we so guessed. Finally, if we know that the legislator is a member of the Brown party, our best guess is still that the legislator voted Yes since Table 8.4 shows us that more Browns voted Yes (75) than No (45). If we were to use this decision to guess on the Brown legislator’s vote we would find we were wrong 45 times out of 120 or 37.5% of the time. Table 8.5 shows us the accuracy of our guesses on votes under the varying levels of knowledge we had about the party affiliation of the legislators in question. Table 8.5 Accuracy of Vote Guess Under Three Knowledge Conditions Party Affiliation Guessed Vote Chance of Guessing Incorrectly Unknown Yes 32.5% Green Party Yes 25.0% Brown Party Yes 37.5% Table 8.5 shows us that knowing that a legislator is a member of the Green party decreases our chances of guessing incorrectly by 7.5%. In this case, knowing about party Bliss 15 affiliation is useful in predicting how a legislator will vote. Note, however, that knowing that a legislator is a member of the Brown Party actually increases our chances of guessing this legislator’s vote incorrectly. In this case, we would be better off without the information. Since knowledge of the value of the party affiliation changes our probability of correctly predicting the value of the vote variable it should be easy to understand that the two variables are not independent of each other. In other words, there is a relationship between these variables. Now let’s look at the values in Table 8.6. Table 8.6 Distribution of Votes by Party With Row and Column Marginals (Case 2) Party Vote Green Brown Yes 54 81 No 26 39 80 120 Note that in this table even though the values in the individual cells are different from 135 65 200 those in Table 8.4, the marginals remain the same. Thus, there are still 80 legislators in the Green Party and 120 in the Brown and 135 legislators voted Yes on the bill while 65 voted No. All that has changed is the distributions within the categories. Using the same strategy we used with the previous distribution, we can see that if we do not know what party a particular legislator belonged to, our best guess about his or her vote would be that the legislator voted Yes since there were 135 voting Yes and only 65 voting No. We would have a 32% (65/200) chance of guessing incorrectly. If we know a legislator is a member of the Green Party, again our best guess would be that the legislator voted yes since 54 out of 80 of the Green legislators voted in that manner. We would be wrong 32.4% (26/80) of the time giving us no advantage over guessing when we did not know the legislator’s party affiliation. Finally, If we knew that the legislator was a member of the Brown Party, we would also guess that he or she voted in the affirmative since 81 out of 120 Browns voted affirmatively. We would find we had guessed wrong 32.5% (39/120) of the Bliss 16 time, again and again we see that knowing party affiliation, in this case, does not help us in making a prediction about how a legislator will vote. Table 8.7 shows this clearly. Table 8.7 Accuracy of Vote Guess Under Three Knowledge Conditions for Case 2 Party Affiliation Guessed Vote Chance of Guessing Incorrectly Unknown Yes 32.5% Green Party Yes 32.5% Brown Party Yes 32.5% In this second case we can say that the two variables, party affiliation and the way a legislator voted on offshore oil drilling are independent of each other. That is, knowing the value of one variable does not help us predict the value of the second variable. In the first case, the variables appeared to be related. If the variables are related, then it should be possible for us to use the information on the previous vote that the West Atlantis legislature made on offshore oil drilling to predict what the next vote might be. Let’s look again at the vote on the first offshore drilling bill in Table 8.8. Table 8.8 Distribution of Votes by Party With Row and Column Marginals and Percents Party Vote Green Brown 60 75 135 Yes 75% 62.5% 67.5% 20 45 65 No 25% 37.5% 32.5% 80 120 200 40% 60% 100% In this table the percentages within each cell correspond to the percent of legislators within each row (each party) who voted positively and negatively on the bill, respectively. These percentages are often referred to as column marginals percents since they give us the proportions based on the column marginals for each column. For instance, we note that 60 out of 80 Green Party members voted for the bill and 60/80 = 0.75 (or 75%). Similarly, 45 out of 120 members of the Brown Party voted against the bill and 45/12 = 0.375 (or 37.5%). The percents in the marginals are the percent of the total number of legislators (there are 200 of them) who fall in each row or column. So, the column Bliss 17 marginal percent for members of the Brown Party is 120/200 =0. 60, telling us that 60% of the members of the West Atlantis legislature belong to the Brown Party. The row marginal percent for the legislators who voted yes is 135/200 = 0.675% telling us that 67.5% of the legislators voted in favor of th the bill. Table 8.9 Distribution of Votes by Party With Row and Column Marginals and Percents (Case 2) Party Vote Green Brown 54 81 135 Yes 67.5% 67.5% 67.5% 26 39 65 No 32.5% 32.5% 32.5% 80 120 200 40% 60% 100% Now look at the same information for Case 2 shown in Table 8.9. Note that the row and column marginals and percent marginals are the same as in Case 1 (Table 8.8). What is different in these two cases are the column marginals. Note also that in Case 2 (where the variables are independent) the column percents in a given row are identical to the row marginal percent. This is not true in Case 1 where the variables are related to each other. So, now let’s get down to the question of interest. In the vote on the previous bill (Case 1), were the variables party affiliation and vote independent? If they were, then someone trying to influence the vote knows that he or she should not bother looking at the party affiliation of legislators in order to choose legislators they should spend resources lobbying. On the other hand, if the variables are related to each other, knowing legislators’ party affiliations might help in determining how to use limit resources to influence the vote. Clearly, Case 1 and Case 2 (the situation where there was independence between the variables) are different. All we need to do is look at the column and marginal percentages to see that. Of course, there are two reasons why the distribution might be different. The first is that the two variables are actually independent of each other. The second, however, is that they appear different due to sampling error. That is, because the first vote is not a representative sample of how the 200 members of the legislature Bliss 18 vote when it comes to issues of offshore oil drilling. This could have occurred if there was something special about the circumstances of the Case 1 vote. Perhaps there had recently been a huge oil spill at an offshore oilrig that polluted the ocean around it and the newspapers and television news programs showed picture after picture of dead birds, fish, and other sea life before the vote was taken. So, if the two variables were independent, we would expect to get cell frequencies that look like the ones in Case 2. We observed the cell frequencies in Case 1. What is the probability that we could get the observed frequencies (Case 1) if the variables really were independent and the difference between the observed and expected frequencies was simply due to sampling error? Calculating chi-square for independence. This situation should seem familiar. Here we have a set of observed frequencies and a set of expected frequencies. If the frequencies are distributed the same in the two sets, we can say they are independent. If not, there is an association (relationship) between the two variables. We can determine the probability that the two distributions come from the same population using the chi-squared statistic that we saw at 2 2 the beginning of the chapter. Remember that the formula for χ is χ = ∑ ( f o − f e )2 fe as noted in formula (8.1). In this case fo is the frequency in each of the cells in Case 1 (the table of the frequencies observed in the prior vote) and fe is the frequency expected if the variables are independent in the corresponding cell in the table of Case 2. Table 8.10 shows these observed and expected frequencies (in italics) for each cell. Bliss 19 Table 8.10 Distribution of Votes by Party With Expected Values Party Vote Green Brown 60 75 Yes 54 81 20 45 No 26 39 χ2 = We can calculate χ2 in the following manner. The resulting χ2 statistic will have (R-1) (C-1) degrees of freedom where R=the number of rows in the contingency table and C=the number of columns in the table. In our example, df= (2-1) (2-1)=(1) (1)=1. (60 − 54 )2 + (75 − 81)2 + (20 − 26 )2 + (45 − 39 )2 = (6)2 + (− 6)2 + (− 6)2 + (6)2 54 81 26 39 36 36 36 36 = + + + = .67 + .44 + 1.38 + .92 = 3.41 54 81 26 39 54 81 26 39 Now our question is, what are the chances of getting a value of χ2 as high as 3.41 or higher with one degree of freedom if how legislators voted is independent of their party affiliations? We can’t determine this directly, but we can find out whether this probability is less than a certain value by looking in Table X. First, let’s look at the two possible decisions we can make. We can decide that the variables are independent (in which case we will be saying that the observed frequency distribution among the cells of the table are the same as the distribution of the expected values). We might also assume that the variables are related (i.e. are not independent). In this case, we would find that the observed frequency distribution was different from the expected distribution. As in most cases of using hypothesis-testing statistics, we will set our null hypothesis as the condition where no relationship exists. So, the null hypothesis is that the expected and observed cell frequencies are equal (H0: fo = fe) while the alternative is that the expected and observed frequencies are not equal (H1: fo ≠ fe). Next we must devise a rule to use when deciding whether or not to reject the null hypothesis. In this case, let us decide that we will reject the null hypothesis when the chances of it being true are less than 5%. Using Table X, we find that, with one degree of freedom, a χ2 Bliss 20 value of 3.841 will cut off the upper 5% of the distribution. So, any χ2 value above 3.841 has less than a 5% chance of occurring if the two variables were independent; that is, if the null hypothesis were true. Any values below this have more than a 5% chance of occurring. In our example, we obtained a χ2 value of 3.41 and now we know that there was more than a 5% chance of obtaining a value as high as this or higher if the null hypothesis were true. Using our decision rule, then, we will fail to reject the null hypothesis and conclude that we have no reason to believe that a legislator’s vote on the offshore drilling bill was related to his or her affiliation. If we were going to lobby these legislators, party affiliation would not be a good variable to use in order to target our efforts. Finding expected values. Unlike goodness of fit models, expected values in tests for independence do not come from the theory behind the variables. Rather, these tests use the cell frequencies that would be expected if the two variables in question were independent of each other. As hinted in Tables 8.8 and 8.9, these values are a function of the marginal frequencies of the rows and columns. We can find the expected values of a specific cell by following this simple procedure. 1. Find the row marginal of the row containing the specific cell of interest. 2. Find the column marginal of the column containing the specific cell of interest. 3. Multiply the row marginal you found in step 1 by the column marginal found in step 2. 4. Take the product you found in step 3 and divide it by the total number of subjects in the contingency table. So, if ni equals the row marginal, nj equals the column marginal, and n is the total number of subjects, Bliss 21 Row Expected marginal value Eij = ni n j Column marginal n (8.2) Total sample size where E ij is the expected value of the cell in row i and column j. Using Equation 8.2 with the data in Table 8.4 we would obtain the expected value for members of the Green Party who voted in the affirmative (E11 ) in this way E11 = n1n1 (135)(80 ) 10800 = = = 54 n 200 200 as can be seen in Table 8.10. Equation 8.2 can be derived thus. A “percent marginal” is a row or column marginal expressed as a percent of the total number of subjects. Suppose that πi. is the percent marginal for row i and π.j is the percent marginal for column j. If ni. is the row marginal for row i and n.j is the column marginal for column j, we can estimate the ∧ row percent marginal for each row by π i. = ni. n and we can estimate the column ∧ percent marginal for each column by π . j = n. j n . If the variables are independent, we can apply the multiplication rule of probabilities [p(A∩B) = p(A)p(B)] and see that the probability that a subject will fall in the cell in row i ∧ ∧ ∧ and column j ⎛⎜ π ij ⎞⎟ is equal to π i. π . j = (ni. n ) n. j n . From this we see that the ⎝ ⎠ expected frequency of a cell if the variables are independent is ∧ n n. j ni. n. j . This, of course, is the row marginal times the column Eij = n i. = n n n margin all divided by the number of subjects in the design. ( ) Bliss 22 To sum up, here are the steps that are used Steps in conducting a χ2 test for association/independence. 1. Determine the maximum level of Type I error you will tolerate in making decisions about your hypothesis and use this determination to construct a decision rule that tells you when to reject the null hypothesis. In our example we decided that we would only reject the null hypothesis if there were less than a 5% chance of it being true. So our decision rule was: Reject the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%. Fail to reject the null hypothesis if p is more than or equal to .05. 2. Gather data and determine the frequency distribution of the data observed in the field (for example, Table 8.3). 3. Calculate the expected frequencies for each cell using Equation 8.2. 4. Calculate the value of the χ2 statistic using Equation 8.1. 5. Calculate the number of degrees of freedom in the design using the formula df = (R-1) (C-1) (the number of rows in the contingency table minus one times the number of columns in the table minus 1). 6. Compare the calculated value of χ2 using the data with this critical value found in step #4. If the calculated value exceeds the tabled critical value this tells us that the chances of obtaining a value this large if there were no differences between the observed and expected distributions was less than the maximum probability that you allowed of making a Type I error. 7. Apply the decision rule you devised in step #1 to determine whether or not you should reject the null hypothesis. In the case of the votes of legislators on offshore drilling, we concluded that the probability of the null hypothesis being true was greater than 5% (p>.05). Therefore, Bliss 23 we will fail to reject the null hypothesis and conclude that variables party affiliation and type of vote cast cannot be said to be related. The problem of small expected values in contingency tables. Often in social science research we work with rather small groups of subjects. This often results in cells that have small expected values. Table 8.11 displays a contingency table showing the frequency of responses when a sample of 60 people were asked whether or not they had experienced an act of racism directed towards them within the past year. The responses are grouped according to the selfreported racial/ethnic group membership of the participants. As in previous tables, the regular type numbers are the observed values and the italicized numbers are the expected values, that is the cell values you would expect to see if the two variables are not related to each other. Table 8.11 Experienced Racism in the Past Year by Race/Ethnicity White Black Hispanic Asian Other 5 12 5 2 1 Yes 25 11 6 4 3 1 No 20 14 25 3 9 15 5 6 10 5 3 7 2 2 3 35 60 If the frequencies of the expected values in any of the cells of a contingency table is less than 5, it has been the standard practice to attempt to increase these expected values by combining rows and/or columns of the table to increase expected values (see Table 8.12). The rationale for this is a bit beyond the scope of this book. Suffice it to say that the result of this problem of small expected values is that calculated probabilities of Type I errors tend to be underestimates of the actual chances of making a Type I error. In other words, when the value of χ2 obtained from the data tells you the probability of making a Type I error (the significance of the statistical test) is 4%, it might really be 6%. If p = .04 according to your statistic and you Bliss 24 were testing at α = .05, you would reject the null hypothesis. However, if p were actually equal to .06, you should have failed to reject the null hypothesis. In other words, you would have rejected the null hypothesis when you shouldn’t have … a Type I error. Theoretically then, small expected values increase our chances of making a Type I error beyond the level of α that we initially set. However, Camilli and Hapkins (1979) have shown that Type I error is not really a problem so long as the sample size is at least equal to eight. Overall (1980) noted that the real problem with small sample sizes is more likely to have to do with Type II error (i.e., with power) than with Type I error. It probably is not a bad idea to try to keep expected values of contingency table cells above 5, but one should not adhere to this rule slavishly if the situation warrants smaller expected frequencies. Table 8.12 Table 8.11 With the Last Three Categories Condensed White Black Other 5 12 8 Yes 25 11 6 8 No 20 14 25 3 9 15 12 2 20 35 60 Bliss 25 Some authors suggest the use of Yates Correction Factor for Continuity when using χ2 with a 2 ×2 contingency table. This adjustment to the standard formula for χ2 was suggested by Yates in 1934 who noted that, as mentioned in the discussion of small sample sizes, the sampling distribution of the calculated values of χ2 is discrete while the theoretical sampling distribution of the statistic is continuous. This is particularly a problem with small samples and this led Yates to devise this correction for use with 2×2 tables. The procedure merely subtracts .5 from the differences between the observed and expected frequencies in the numerator before they are squared giving Formula 8.3. 2 χ =∑ (( f 2 o − f e ) − .5) fe (8.3) This procedure works quite nicely so long as the marginals of the contingency table are fixed. To speak of fixed marginals is to say that, if you had repeated the study with a different sample from the same population, although the individual cell frequencies might change from the original sample, the marginals would be equal to those in the first sample. This is a very unusual situation. For this reason, even though the correction factor is available in the output of most computer statistical packages, it is probably not a good idea to use Yate’s Correction for Continuity. Measures of the strength of association. Remember that null hypothesis tested by the χ2 for goodness of fit is that the cell frequencies observed in the sample are the same as those we would have expected to see if the two variables were independent of each other. In other words, the null hypothesis is that the variables are independent. If we reject this null hypothesis we are simply saying that there is a relationship between the two variables; that the correlation between the two is not zero. However, this does not tell us anything about the strength of the relationship. As we discussed in Chapter 1, after we determined that was a finding that allowed us to reject a null hypothesis that there was no relationship between the subjects’ race and their experiences with racists, we need to determine the strength of the relationship . In our example we rejected the null hypothesis that there was no relationship between the subjects’ race and their Bliss 26 experiences with racists, and now we need to determine if the association (relationship) was strong or weak. In order to determine the strength of the relationship between the two variables in 2×2 Phi φ= χ2 N Chi-square (8.4) Total sample size contingency tables, we can use the Phi (Φ) statistic. Phi is related to χ2 as shown in Formula 8.4 (8.5) Ccccccdccc ssw222222 12r For tables larger than 2×2, the formula can be extended to Cramér’s V as shown in Formula 8.5 where k is either the number of rows or the number of columns Be cautious when calculating measures of association with contingency tables! in the contingency table, whichever is less. Both Φ and V are correlation coefficients, and may have values between 0 and 1.00 where zero indicates no relationship at all and 1.00, a perfect relationship between the two variables. From Chapter 6, you remember that if correlation values close to 1.00 indicate strong relationship, while values close to 0 indicate relative independence or lack or relationship. In other words, one may conclude that if the value of a correlation coefficient is 1.00, we can perfectly predict one variable from the second all of the time. A value of zero, however, can be interpreted as telling us any attempt to predict the value of one variable from Remember, as with any other correlation coefficient, Φ or V should only be calculated when there is reason to believe that the correlation between the two variables is non-zero. If the χ2 test for independence indicates that it is not appropriate to reject the null hypothesis (the hypothesis that tells us that the variables are independent, i.e. not related), we have no reason to believe that Φ or V is not zero. In this case we would be rather foolish to attempt to actually calculate the statistic. the value of the other would be no more accurate than taking a wild guess. Bliss 27 SPSS Doing it By Computer We have a randomly selected sample of 67 graduate students from a College of Education at a large public university in the southeastern United States. These students have responded to a questionnaire that included information on the colors of their hair and eyes. The information on hair and eye color has been coded as either “light” or “dark.” Geneticists tell us that the genes that control hair and eye color (there are a lot of them) are close to each other on a particular chromosome. Based on this information (and what we observe every day in the people around us) we have reason to believe that there is a relationship between the hair and eye color of human beings. Specifically, that people with light hair tend to have light eyes (and vice-versa) and that people who have dark hair tend to have dark eyes. If we know whether a person has light or dark hair we should be able to predict, at least to some degree) whether they have light or dark eyes. We use SPSS to devise a contingency table for these to variables and test the null hypothesis that these two variables are independent. If we can reject this null hypothesis, we can use the Φ statistic to determine the strength of the relationship between these Hair color * Eye color Crosstabulation two variables. The results of this Count analysis are shown below. The contingency table on the left Eye color shows the frequencies for each of Light Dark Total the cells of the contingency table Hair color Light 9 5 14 and the row and column marginals Dark 13 40 53 (labeled “Total”). In the second Total 22 45 67 table we note that the calculated value of χ2 with one degree of freedom is 7.937. The column labeled “Assymp. Sig.” tells us that the chances that the null Chi-Square Tests Value Pearson Chi-Square Continuity Correction a Likelihood Ratio Asymp. Sig. (2-sided) df 7.937b 1 .005 6.237 1 .013 7.522 1 .006 Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Exact Sig. (2-sided) .009 7.819 1 Exact Sig. (1-sided) .007 .005 67 a. Computed only for a 2x2 table b. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.60. hypothesis (the variables are independent) is true is only .5% (.005). Value Approx. Sig. Whether we chose to Nominal by Phi .344 .005 Nominal test this null hypothesis Cramer's V .344 .005 at the .05 or .01 levels, N of Valid Cases 67 we would reject it since a. Not assuming the null hypothesis. there is less than a 1% b. Using the asymptotic standard error assuming the null hypothesis. chance that the null is true and we would conclude that hair and eye color are related to each other in the population from which this sample Symmetric Measures Bliss 28 was drawn. Since we can now assume there is a relationship between these two. The third table shows us that Cramer’s V statistic has a value of .344 indicating a moderately weak relationship between hair and eye color in the population from which this sample was drawn. Bliss 29 Review Questions and Exercises 1. An investigator is trying to determine if there is a relationship between marital satisfaction of husbands and wives. He identifies a group of married couples, and asks each individual independently how happy he or she is in the current marriage. The data (number of respondents in each category) are reported in the following table. Wife’s level of satisfaction Low Medium High Husband’s Low 42 313 16 level of Medium 26 98 50 satisfaction High 14 22 92 a) Is there a statistically significant relationship between husbands’ and wives’ satisfaction with the marriage? Please state the null and alternate hypotheses, and test at Alpha=.05. b) If there is a statistically significant relationship, what is its strength? Please calculate an indicator of effect size. c) In this sample, what is the probability of having a married couple with the wife being unhappy (low satisfaction) and the husband being very happy (high satisfaction) with their marriage? 2. A sociologist is interested in the number of children in a population of 1,492 families that are being investigated. She hypothesizes the following distribution of family size: No children – 26% 1 child – 16% 2 children – 25% 3 children – 15% 4 children – 9% 5 children – 4% 6 children – 2% 7 children – 2% 8 or more children – 1% Test the hypothesis that this is the distribution in the population from which the sample of drawn at the .05 level of significance. Write a paragraph describing your conclusions. 3. As a part of an action research project, a school principal administered a standardized test of critical thinking skills to her third grade classes in the beginning and at the end of a school year. She finds out that the mean of all of her third graders’ (n=123) test scores at the end of the year was the same as their mean in the beginning of the year. But, unexpectedly she also finds-out that the variance of the scores were not the same. In the beginning of the year, the variance was 9 (s2=9), while at the end of the year the variance was 20. Bliss 30 a. Can she conclude that the variance of the scores actually increased? State the null and alternate hypotheses answering this question, and test with Alpha=.05. b. Write a short report to summarize these results, and make a clear statement regarding changes in critical thinking during the third grade.