Download chapter eight testing hypotheses regarding frequency data and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Bliss
1
CHAPTER EIGHT
TESTING HYPOTHESES REGARDING FREQUENCY DATA AND
CONTINGENCY TABLES: THE CHI-SQUARE TESTS
Chapter Objectives
In this Chapter you will:
•
•
•
•
•
•
Learn how to determine frequencies and proportions using data on nominal scales.
Understand the concept of expected values of frequencies and proportions.
Learn how to use the chi-squared (χ2) distribution to test hypotheses about differences
between observed and expected frequencies and proportions of the various values of a
nominal variable in a population.
Learn how to use the chi-squared statistic to test hypotheses about the independence or
relationship between variables measured on nominal scales.
Learn how to determine the strength of relationships between variables measured on
nominal scales.
Learn to use SPSS to test hypotheses about frequencies, proportions, and relationships of
variables measured on nominal scales.
Bliss
2
A simple thing that we can do with data is to count them. In fact, there is some evidence
that writing began developing in the Fertile Crescent as a way of keeping track of the number of
domestic animals that were owned by a person or involved in a business transaction. Counting is
old! When you come down to it, it is really the only thing we can do with data that is on nominal
scales. Frequencies are the results of counting. We can look at relationships between variables
on nominal scales and test hypotheses concerning distributions of nominal variables in
populations. These latter hypotheses are referred to as hypotheses about “goodness of fit” since
they test how well the patterns of the frequencies fit together. Let’s start our investigation of
nominal variables by looking at these hypotheses.
Hypotheses About Goodness of Fit
Imagine that Mr. Lycanthrop, the principal of Transylvania High School suspected that the
frequency of students being referred to his office for discipline varied with the phase of the
moon. For a year he keeps track of the number of referrals for disciplinary action he received,
noting carefully the phase the moon was in at each referral. At the end of the academic year Mr.
Lycanthrop observed that the one thousand referrals teachers had made were distributed over the
phases of the moon as shown in Table 8.1.
Table 8.1
Observed Referrals for Discipline During the Various Phases of the Moon
Phase of the Moon
Number of Referrals
New to waxing quarter
206
Waxing quarter to full
220
Full to waning quarter
305
Waning quarter to new
269
1000
Bliss
3
Comparing Observed and Expected Values
What if Mr. Lycanthrop is wrong and the phase of the moon had nothing to do with the
frequency of referrals for discipline? In this case we would expect to find the referrals
distributed evenly across the different moon phases. The distribution of the 1,000 referrals
would look like Table 8.2.
Table 8.2
Expected Referrals for Discipline During the Various Phases of the Moon
if the Phase of the Moon and the Number of Referrals were not related to
each other
Phase of the Moon
Number of Referrals
New to waxing quarter
250
Waxing quarter to full
250
Full to waning quarter
250
Waning quarter to new
250
1000
These expected values are certainly different from the values actually observed over the
course of the academic year. In fact, we can see how much this distribution we observed in the
study differed from the distribution we would expect if Mr. Lycanthrop were wrong and there
was no relationship between the phases of the moon and discipline referrals by calculating the
residuals (i.e., the differences between the two distributions) as shown in Table 8.3.
Table 8.3
Observed and Expected Distributions with Residuals
Observed Expected
Frequency Frequency Residual
Moon Phase
(fo)
(fe)
(fo – fe)
New to Waxing Quarter
206
250
-44
Waxing Quarter to Full
220
250
-30
Full to Waning Quarter
305
250
55
Waning Quarter to New
269
250
19
Note that the sum of the residuals always equals zero.
Bliss
4
If Lycanthrop is wrong he should have observed the
same frequencies that were expected if the phase of the moon
had no effect on student discipline. Therefore, the residuals
would all be zero. We see that this is not the case here. We
can think of two reasons for this. The first is that the phases
of the moon affect the behavior of students at Transylvania
High (Lycanthrop is correct). The second is that the
differences between the observed frequencies and the
Who Invented the
Chi-Squared
Test?
Karl Pearson
introduced the chi-squared test
and the name for it in an article
in 1900 in The London,
Edinburgh, and Dublin
Philosophical Magazine and
Journal of Science. Pearson
had been in the habit of writing
the exponent in the
multivariate normal density as
“-1/2 chi-squared .”
expected frequencies are simply due to chance (Lycanthrop is wrong). What is the probability
that the residuals can be this great (that is, the differences between the observed and expected
values can be as great as we see in Table 8) if Mr. Lycanthrop is wrong about the phases of the
moon affecting student behavior and the differences between the observed and expected
frequencies are simply due to chance?
There are two problems that we have to deal with when we evaluate the magnitude of
these deviations from expectancy. One is that regardless of how far we are from expectations, a
simple sum of the deviations is not useful because it is always zero. For example, if in the table
above, you change the observed frequencies to 235, 245, 255, and 260, and calculate the
deviation of these values from expected frequency for each category (250), and add them up, the
sum would be zero. But you can see that these four frequencies are much closer to the expected
frequencies than the ones in Mr. Lycanthrop’s study in Table 8.3. One way to solve this problem
is to square each deviation from expectancy, that is, examine the squared deviations ( f o − f e )2
rather than the simple deviations ( f o − f e ). Squaring the residuals would prevent them from
canceling each other out.
Bliss
5
A second problem that we have to solve is that deviation from expectancy is only
meaningful if is rather large as compared to the number we expected. For example, in Table 8.3,
the expected frequency for the first category (New to Waxing Quarter) is 250, and the deviation
from this expected frequency is -44. Would 44 points have the same meaning/implication if the
expected frequency was 2500? Obviously, if we expect 250, and we are 44 points off, we have a
bigger deviation from expectation that if we are expect 2500 and we are 44 points off. A
solution to this problem is to calculate the ratio of squared deviation for each category and the
expected frequency for that category
( f o − f e )2 .
The overall deviation from expectation in a
fe
study such as Mr. Lycanthrop’s would be the sum of these proportions. We can do this by
calculating a test statistic known as Chi-Square (χ2) (it starts with a “K” sound and i rhymes with
“sky”) using Formula 8.1.
Observed
frequency
Sum
Chi-square
χ
2
=
∑
(
fo − fe
fe
)2
Expected
frequency
(8.1)
Table 8.4 demonstrates the procedure for calculating the elements of the formula.
Table 8.4
Calculating the Elements of the χ2 Formula
Bliss
6
Moon Phase
New to Waxing Quarter
Waxing Quarter to Full
Full to Waning Quarter
Waning Quarter to New
Observed Expected
Squared
Frequency Frequency Residual Residual
(fo)
(fe)
(fo – fe) (fo – fe)2
206
250
-44
1936
220
250
-30
900
305
250
55
3025
269
250
19
361
( f o − f e )2
fe
7.744
3.600
12.100
1.444
In the research on the effect of the phase of the moon on student behavior, the test
statistic is calculated as shown below.
χ2 = ∑
( f o − f e )2
fe
2
(− 44 )
=
(206 − 250)2
250
2
(− 30 )
2
(55)
+
(220 − 250)2
250
+
(305 − 250)2
250
+
(269 − 250)2
250
2
(19 )
1936 900 3025 361
+
+
+
250
250
250
250
250 250 250 250
= 7.744 + 3.600 + 12.100 + 1.444 = 24.888
=
+
+
+
=
The sampling distribution of the χ2 statistic varies based on the degrees of freedom, which is
defined as the number of categories minus one. Figure 8.1 displays the shape of the χ2
distribution for various degrees of freedom.
Figure 8.1
χ2 distributions for 1, 3, 5,
and 10 degrees of freedom
Our example has four categories, so we can see that we have three degrees of freedom in this
design. The table of critical values in Appendix X can tell us the value of χ2 that cuts off a
Bliss
7
particular area under the curve. For instance, in the χ2 distribution with three degrees of freedom
we find that a value of 7.815 cuts off the upper 5% of the area of the distribution. Hence, we can
conclude that there is less than a 5% probability that we could have obtained a value of χ2 of
24.888 (as we did in this case) if there were no differences between the observed and expected
distribution of disciplinary referrals. Given the obtained distribution of referrals across the four
phases of the moon, Mr. Lycanthrop would have had less than a 5% chance of being correct if he
had concluded that the distribution of disciplinary referrals was not related to the phases of the
moon.
Chi-Square for Goodness of Fit
When data are on a nominal scale all we can really do with them is to count the number
of times a particular value occurs in a sample of data. Using appropriate theory, we can predict
what the distribution of these values would be under certain circumstances. For example, in the
previous example we could apply simple probability theory to predict that, if there were no
relationship between the phases of the moon and the occurrence of discipline referrals, we could
expect that the frequencies of discipline problems should be equal during the four phases of the
moon. Now, if the theory were appropriate, the observed distribution of the frequencies should
be the same as the theoretically derived distribution. We can use this situation as our null
hypothesis: Ho: fo = fe and test this null hypothesis against the alternative hypothesis that we did
not observe the frequency distribution predicted by the theory (H1: fo ≠ fe). We begin by
assuming that the null hypothesis is true, calculate the χ2 statistic, and determine what the
probability is off getting a χ2 value as high as the one obtained if the null hypothesis were true.
Remember that if the null hypothesis is true and we reject it, we will commit a Type I error…not
a very smart thing to do. So, the probability that the null is true given a particular value of the
Bliss
8
obtained χ2 statistic is also the probability of our making a Type I error if we decide to reject the
null hypothesis in that particular case. As described in Chapter 6, the researcher must decide on
the maximum level of Type I error he or she will tolerate before deciding that the chance of the
null being true is too high to risk rejecting the null hypothesis. If we found that the chance that
the null is true, given our data, is less than this maximum tolerable risk, we can conclude that the
chance that the null hypothesis is true is low enough for us to feel comfortable rejecting the null
hypothesis and concluding that the obtained distribution is different from the distribution
expected under the theory. That is, we can conclude that the observed distribution of values is
not a good fit with the expected value. A statistically significant result would tell us that the
observed distribution does not fit the theoretically expected one.
The expected frequencies of proportions are usually derived from theories (See the box
concerning Mendel’s theory). But they may also be derived from the proportions in the
population if it is known (see the box about the soldier’s boots). Therefore, a chi-square
goodness of fit may also be used to test hypotheses regarding the difference between obtained
sample frequencies/proportions and the population proportions.
Mendel’s Theory
Gregor Mendel (1822-1884), an Austrian Roman
Catholic monk is the originator of the modern genetic theories.
In his original work (first published in 1866), he predicted that two attributes
of peas (color and seed texture) are genetically determined, and each has a
dominant and a recessive form. For example, in certain situations if two
yellow pea plants with rough textured peas are cross pollinated and the
resulting seeds are planted, Mendelian theory tells us we would expect to
obtain offspring pea plants in the ratio 9/16 yellow, rough; 3/16 yellow,
smooth; 3/16 green, rough; and 1/16 green, smooth.
Suppose Mendel had planted the seeds resulting from 90 such cross
pollinated pairs of plants and planted a randomly chosen sample of their seeds
finding that 60 of the offspring were yellow with rough seeds, 15 were yellow
with smooth seeds, 10 were green with rough seeds, and 5 were green with
smooth seeds. Would you conclude that his results supported his theory?
Bliss
9
Steps in Conducting a χ2 Test for Goodness of Fit
1. Determine the maximum level of Type I error you will tolerate in making decisions about
your hypothesis (i.e., the significance of your statistical test) and use this determination to
construct a decision rule that tells you when to reject the null hypothesis. In our example,
Mr. Lycanthrope decided that he would only reject the null hypothesis if there were less than
a 5% chance of it being true (that is if the probability of his making a Type I error when he
rejected the null hypothesis was less than 5%). So, his decision rule would have been: Reject
the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%. Fail
to reject the null hypothesis if p is greater than or equal to .05.
2. Determine a theoretical distribution of the data expected under specific circumstances (the
expected values). In the case of our example, in the circumstance that there was no
relationship between the incidence of disciplinary problems and the phase of the moon, we
would expect the incidences of disciplinary referrals to be evenly distributed among the four
moon phase periods (Table 8.2).
3. Gather data and determine the frequency distribution of the data observed in the field (Table
8.1). In our example we determine how many disciplinary referrals were made during each
phase of the moon by checking the records kept by Mr. Lycanthrop.
4. Calculate the value of the χ2 statistic using Equation 8.1.
5. Determine the number of degrees of freedom of the design by subtracting one from the
number of categories in the frequency distribution. In our example there are four phases of
the moon and, therefore, the design has three (4 – 1) degrees of freedom.
6. Use Table X to determine the critical value of χ2 for the statistical test. That is, find that
value that cuts off the upper percent of the area of the χ2 distribution that you determined in
Bliss
10
step #1. In the case of Mr. Lycanthrop’s study, the critical value that cuts off the upper 5%
of the distribution for χ2 with 3 degrees of freedom was found to be 7.815.
7. Compare the calculated value of χ2 using the data with this critical value found in step #6. If
the calculated value exceeds the tabled critical value this tells us that the chances of obtaining
a value this large if there were no differences between the observed and expected
distributions was less than the maximum probability of making a Type I error that you chose.
8. Apply the decision rule you devised in step #1 to determine whether or not you should reject
the null hypothesis. In the case of the possibly moonstruck students, we concluded that the
probability of the null hypothesis being true was less than 5% (p<.05). Therefore, we will
reject the null hypothesis and conclude that the distribution of disciplinary incidences across
the various phases of the moon was different from what we would have expected if there
were no relationship between moon phase and the frequency of disciplinary incidents.
Army Boots
Uniform boots issued to members of the United States Army come in nine
sizes. The boots of currently serving soldiers are distributed as shown in the
table below.
Size
1
2
3
4
5
6
7
8
9
Proportion of soldiers .04 .08 .12 .16 .20 .16 .12 .08 .04
If we take a random sample of 1000 soldiers whose last address before
enlisting was in the state of Florida, we find the boots issued to these 1000
soldiers were distributed by size in the following way.
Size
Number of soldiers from
Florida
1
2
3
4
5
6
7
8
9
75 100 160 180 180 140 80 75 10
Based on this data, would you say that the distribution of boot
sizes among soldiers from Florida is the same as the
distribution of boots in the population of soldiers in the United
States?
Bliss
11
Assumptions for the Chi-Square for Goodness of Fit
Chi-square is one of a family of statistics known as nonparametric statistics. As this
name implies, using the statistic makes no assumptions of the parameters (characteristics) of the
population from which the data was obtained. So, unlike such parametric statistical tests such as
t-tests and analysis of variance, we do not have to be concerned about whether the data are more
or less normally distributed or that there is homogeneity of variance among samples.
Nonparametric tests are reasonably easy-going. However, as in most of life, there is no such
thing as free lunch. Nonparametric tests make us pay for this easy-goingness, however.
Compared to corresponding parametric statistical test, nonparametric tests have lower power.
That is, they have less chance of rejecting a false null hypothesis than the corresponding
parametric tests. Put simply, if two distributions differ by a given amount, a nonparametric
statistical test will yield a higher probability of the null hypothesis being true than would a
parametric statistic. So, in the event that the null hypothesis is actually false, there is a greater
chance that you would fail to reject the null (and make a Type II error) using a nonparametric
statistic than if you had used a parametric statistic. Quite simply, parametric tests are more
sensitive to false null hypotheses than are parametric statistics.
Bliss
12
SPSS
Doing it by Computer
This is the SPSS output
was obtained by
calculating a χ2 for
goodness of fit using the
Lycanthrop data. The
table labeled “Phase of
moon” contains the same
information that is
Phase of moon
Observed N
Expected N
New to waxing quarter
206
250.0
-44.0
Waxing quarter to full
220
250.0
-30.0
Full to waning quarter
305
250.0
55.0
269
250.0
19.0
Waning quarter to new
Total
Residual
1000
displayed in Table 8.1. The table labeled
Test Statistics
“Test Statistics” presents the obtained
Phase of moon
value of χ2, the number of degrees of
Chi-Square a
24.888
freedom (df) and the significance
df
3
(Asymp. Sig.) of the obtained χ2 statistic
Asymp. Sig.
.000
with the given number of degrees of
a. 0 cells (.0%) have expected frequencies less than
freedom (p). Of course this significance
5. The minimum expected cell frequency is 250.0.
(the probability that the null hypothesis is
true) is not really zero. It couldn’t be unless we had data from the entire
population of interest. SPSS rounds this value to three decimal places, so we can
interpret a printed value of .000 as p<.0005. In any case, using the decision rule
we derived earlier, we can reject the null hypothesis and conclude that the
expected frequency distribution is not equal to the distribution of the actual data.
Independence/Association
In the nation of West Atlantis, the national legislature is made up of 200 representatives.
Recently, geologists have found that large deposits of oil lie off the southern coast of the country
along the continental shelf. This region is a large tourist and recreational area and a wellorganized movement has arisen to convince the government not to issue leases to oil companies
allowing them to drill in the area.
West Atlantis has a two party political system that is based on environmental ideology.
The Green Party considers the quality of the natural environment to be the ultimate good and its
Bliss
13
members believe that the less technology that people use, the better off the world will be. The
Brown Party’s platform calls for a greater use of technology and natural resources and believes
that what is good for business is good for the entire country. The last time the legislature voted
on a similar issue dealing with permitting offshore oil exploration, the vote was as shown in
Table 8.3. Statisticians refer to tables such as this one as contingency tables since they allow us
to calculate contingency probabilities. For instance, if a legislator is a member of the Green
Party, there is a 75% (60/80) chance that the legislator voted yes. Likewise, if a legislator is a
member of the Brown Party, there is only a 62.5% (75/120) chance that he or she would have
voted in favor of the motion.
Table 8.3
Distribution of Votes by Party
Party
Vote
Green
Brown
Yes
60
75
No
20
45
Clearly, if there is a relationship between party affiliation and
how legislators tend to vote on issues of involving offshore
drilling leases, both sides in the issue could use this information
to target their arguments and financial support to certain
legislators. On the other hand, if voting on these issues turns out to be independent of party
affiliation, the movement would be better off targeting representatives based on some other
variable.
In this case, does knowing a legislator’s party affiliation change the probability of
successfully predicting how a legislator would vote? We can answer this question by expanding
Table 8.3 to show the marginals of the rows and columns in the contingency table as shown in
Table 8.4.
Bliss
14
As can be seen in Table 8.4, the marginals are
Table 8.4
Distribution of Votes by Party
With Row and Column Marginals
Party
Vote
Green
Brown
Yes
60
75
135
No
20
45
65
80
120
200
simply the sums of the rows and columns. Now,
assuming that we don’t know the party affiliation of the
particular legislator in question, our best guess of his or
her vote would be that the legislator voted Yes. Quite
simply, this is because looking at the number of legislators who voted Yes and No (the row
marginals) we see that more of them voted Yes (135) than voted No (65). If we were to guess
the legislator vote Yes (the smartest thing to do) we would be wrong 65 times or 32.5% (65/200)
of the time. Let’s assume now that we knew the legislator was a member of the Green Party. In
this case we need only look at the Green Party Column in Table 8.4. Again, our best guess of
how the legislator voted would be Yes since more Greens voted Yes (60) than voted No (20). In
this case we would be wrong only 20 times or 25% (20/80) of the time if we so guessed. Finally,
if we know that the legislator is a member of the Brown party, our best guess is still that the
legislator voted Yes since Table 8.4 shows us that more Browns voted Yes (75) than No (45). If
we were to use this decision to guess on the Brown legislator’s vote we would find we were
wrong 45 times out of 120 or 37.5% of the time. Table 8.5 shows us the accuracy of our guesses
on votes under the varying levels of knowledge we had about the party affiliation of the
legislators in question.
Table 8.5
Accuracy of Vote Guess Under Three Knowledge Conditions
Party Affiliation Guessed Vote Chance of Guessing Incorrectly
Unknown
Yes
32.5%
Green Party
Yes
25.0%
Brown Party
Yes
37.5%
Table 8.5 shows us that knowing that a legislator is a member of the Green party
decreases our chances of guessing incorrectly by 7.5%. In this case, knowing about party
Bliss
15
affiliation is useful in predicting how a legislator will vote. Note, however, that knowing that a
legislator is a member of the Brown Party actually increases our chances of guessing this
legislator’s vote incorrectly. In this case, we would be better off without the information. Since
knowledge of the value of the party affiliation changes our probability of correctly predicting the
value of the vote variable it should be easy to understand that the two variables are not
independent of each other. In other words, there is a relationship between these variables. Now
let’s look at the values in Table 8.6.
Table 8.6
Distribution of Votes by Party
With Row and Column Marginals (Case 2)
Party
Vote
Green
Brown
Yes
54
81
No
26
39
80
120
Note that in this table even though the
values in the individual cells are different from
135
65
200
those in Table 8.4, the marginals remain the
same. Thus, there are still 80 legislators in the
Green Party and 120 in the Brown and 135 legislators voted Yes on the bill while 65 voted No.
All that has changed is the distributions within the categories. Using the same strategy we used
with the previous distribution, we can see that if we do not know what party a particular
legislator belonged to, our best guess about his or her vote would be that the legislator voted Yes
since there were 135 voting Yes and only 65 voting No. We would have a 32% (65/200) chance
of guessing incorrectly.
If we know a legislator is a member of the Green Party, again our best guess would be
that the legislator voted yes since 54 out of 80 of the Green legislators voted in that manner. We
would be wrong 32.4% (26/80) of the time giving us no advantage over guessing when we did
not know the legislator’s party affiliation. Finally, If we knew that the legislator was a member
of the Brown Party, we would also guess that he or she voted in the affirmative since 81 out of
120 Browns voted affirmatively. We would find we had guessed wrong 32.5% (39/120) of the
Bliss
16
time, again and again we see that knowing party affiliation, in this case, does not help us in
making a prediction about how a legislator will vote. Table 8.7 shows this clearly.
Table 8.7
Accuracy of Vote Guess Under Three Knowledge Conditions
for Case 2
Party Affiliation Guessed Vote Chance of Guessing Incorrectly
Unknown
Yes
32.5%
Green Party
Yes
32.5%
Brown Party
Yes
32.5%
In this second case we can say that the two variables, party affiliation and the way a legislator
voted on offshore oil drilling are independent of each other. That is, knowing the value of one
variable does not help us predict the value of the second variable. In the first case, the variables
appeared to be related. If the variables are related, then it should be possible for us to use the
information on the previous vote that the West Atlantis legislature made on offshore oil drilling
to predict what the next vote might be. Let’s look again at the vote on the first offshore drilling
bill in Table 8.8.
Table 8.8
Distribution of Votes by Party With Row
and Column Marginals and Percents
Party
Vote
Green
Brown
60
75
135
Yes
75%
62.5%
67.5%
20
45
65
No
25%
37.5%
32.5%
80
120
200
40%
60%
100%
In this table the percentages within each cell correspond
to the percent of legislators within each row (each party)
who voted positively and negatively on the bill,
respectively. These percentages are often referred to as
column marginals percents since they give us the
proportions based on the column marginals for each
column. For instance, we note that 60 out of 80 Green Party members voted for the bill and
60/80 = 0.75 (or 75%). Similarly, 45 out of 120 members of the Brown Party voted against the
bill and 45/12 = 0.375 (or 37.5%). The percents in the marginals are the percent of the total
number of legislators (there are 200 of them) who fall in each row or column. So, the column
Bliss
17
marginal percent for members of the Brown Party is 120/200 =0. 60, telling us that 60% of the
members of the West Atlantis legislature belong to the Brown Party. The row marginal
percent for the legislators who voted yes is 135/200 = 0.675% telling us that 67.5% of the
legislators voted in favor of th the bill.
Table 8.9
Distribution of Votes by Party With Row
and Column Marginals and Percents (Case 2)
Party
Vote
Green
Brown
54
81
135
Yes
67.5%
67.5%
67.5%
26
39
65
No
32.5%
32.5%
32.5%
80
120
200
40%
60%
100%
Now look at the same information for Case 2
shown in Table 8.9.
Note that the row and column marginals and
percent marginals are the same as in Case 1 (Table
8.8). What is different in these two cases are the
column marginals. Note also that in Case 2 (where
the variables are independent) the column percents in a given row are identical to the row
marginal percent. This is not true in Case 1 where the variables are related to each other.
So, now let’s get down to the question of interest. In the vote on the previous bill (Case
1), were the variables party affiliation and vote independent? If they were, then someone trying
to influence the vote knows that he or she should not bother looking at the party affiliation of
legislators in order to choose legislators they should spend resources lobbying. On the other
hand, if the variables are related to each other, knowing legislators’ party affiliations might help
in determining how to use limit resources to influence the vote. Clearly, Case 1 and Case 2 (the
situation where there was independence between the variables) are different. All we need to do
is look at the column and marginal percentages to see that. Of course, there are two reasons why
the distribution might be different. The first is that the two variables are actually independent of
each other. The second, however, is that they appear different due to sampling error. That is,
because the first vote is not a representative sample of how the 200 members of the legislature
Bliss
18
vote when it comes to issues of offshore oil drilling. This could have occurred if there was
something special about the circumstances of the Case 1 vote. Perhaps there had recently been a
huge oil spill at an offshore oilrig that polluted the ocean around it and the newspapers and
television news programs showed picture after picture of dead birds, fish, and other sea life
before the vote was taken. So, if the two variables were independent, we would expect to get cell
frequencies that look like the ones in Case 2. We observed the cell frequencies in Case 1. What
is the probability that we could get the observed frequencies (Case 1) if the variables really were
independent and the difference between the observed and expected frequencies was simply due
to sampling error?
Calculating chi-square for independence. This situation should seem familiar. Here we
have a set of observed frequencies and a set of expected frequencies. If the frequencies are
distributed the same in the two sets, we can say they are independent. If not, there is an
association (relationship) between the two variables. We can determine the probability that the
two distributions come from the same population using the chi-squared statistic that we saw at
2
2
the beginning of the chapter. Remember that the formula for χ is χ =
∑
( f o − f e )2
fe
as
noted in formula (8.1). In this case fo is the frequency in each of the cells in Case 1 (the table of
the frequencies observed in the prior vote) and fe is the frequency expected if the variables are
independent in the corresponding cell in the table of Case 2. Table 8.10 shows these observed
and expected frequencies (in italics) for each cell.
Bliss
19
Table 8.10
Distribution of Votes by Party
With Expected Values
Party
Vote
Green
Brown
60
75
Yes
54
81
20
45
No
26
39
χ2 =
We can calculate χ2 in the following manner. The
resulting χ2 statistic will have (R-1) (C-1) degrees of freedom
where R=the number of rows in the contingency table and C=the
number of columns in the table. In our example, df= (2-1)
(2-1)=(1) (1)=1.
(60 − 54 )2 + (75 − 81)2 + (20 − 26 )2 + (45 − 39 )2 = (6)2 + (− 6)2 + (− 6)2 + (6)2
54
81
26
39
36 36 36 36
=
+
+
+
= .67 + .44 + 1.38 + .92 = 3.41
54 81 26 39
54
81
26
39
Now our question is, what are the chances of getting a value of χ2 as high as 3.41 or
higher with one degree of freedom if how legislators voted is independent of their party
affiliations? We can’t determine this directly, but we can find out whether this probability is less
than a certain value by looking in Table X.
First, let’s look at the two possible decisions we can make. We can decide that the
variables are independent (in which case we will be saying that the observed frequency
distribution among the cells of the table are the same as the distribution of the expected values).
We might also assume that the variables are related (i.e. are not independent). In this case, we
would find that the observed frequency distribution was different from the expected distribution.
As in most cases of using hypothesis-testing statistics, we will set our null hypothesis as the
condition where no relationship exists. So, the null hypothesis is that the expected and observed
cell frequencies are equal (H0: fo = fe) while the alternative is that the expected and observed
frequencies are not equal (H1: fo ≠ fe).
Next we must devise a rule to use when deciding whether or not to reject the null
hypothesis. In this case, let us decide that we will reject the null hypothesis when the chances of
it being true are less than 5%. Using Table X, we find that, with one degree of freedom, a χ2
Bliss
20
value of 3.841 will cut off the upper 5% of the distribution. So, any χ2 value above 3.841 has
less than a 5% chance of occurring if the two variables were independent; that is, if the null
hypothesis were true. Any values below this have more than a 5% chance of occurring. In our
example, we obtained a χ2 value of 3.41 and now we know that there was more than a 5% chance
of obtaining a value as high as this or higher if the null hypothesis were true. Using our decision
rule, then, we will fail to reject the null hypothesis and conclude that we have no reason to
believe that a legislator’s vote on the offshore drilling bill was related to his or her affiliation. If
we were going to lobby these legislators, party affiliation would not be a good variable to use in
order to target our efforts.
Finding expected values. Unlike goodness of fit models, expected values in tests for
independence do not come from the theory behind the variables. Rather, these tests use the cell
frequencies that would be expected if the two variables in question were independent of each
other. As hinted in Tables 8.8 and 8.9, these values are a function of the marginal frequencies of
the rows and columns. We can find the expected values of a specific cell by following this
simple procedure.
1. Find the row marginal of the row containing the specific cell of interest.
2. Find the column marginal of the column containing the specific cell of interest.
3. Multiply the row marginal you found in step 1 by the column marginal found in step 2.
4. Take the product you found in step 3 and divide it by the total number of subjects in the
contingency table.
So, if ni equals the row marginal, nj equals the column marginal, and n is the total number
of subjects,
Bliss
21
Row
Expected marginal
value
Eij =
ni n j
Column
marginal
n
(8.2)
Total sample size
where E ij is the expected value of the cell in row i and column j. Using Equation 8.2 with the
data in Table 8.4 we would obtain the expected value for members of the Green Party who voted
in the affirmative (E11 ) in this way
E11 =
n1n1 (135)(80 ) 10800
=
=
= 54
n
200
200
as can be seen in Table 8.10.
Equation 8.2 can be derived thus. A “percent marginal” is a row or
column marginal expressed as a percent of the total number of
subjects. Suppose that πi. is the percent marginal for row i and π.j is
the percent marginal for column j. If ni. is the row marginal for row
i and n.j is the column marginal for column j, we can estimate the
∧
row percent marginal for each row by π i. = ni. n and we can estimate the column
∧
percent marginal for each column by π . j = n. j n . If the variables are
independent, we can apply the multiplication rule of probabilities [p(A∩B) =
p(A)p(B)] and see that the probability that a subject will fall in the cell in row i
∧
∧
∧
and column j ⎛⎜ π ij ⎞⎟ is equal to π i. π . j = (ni. n ) n. j n . From this we see that the
⎝ ⎠
expected frequency of a cell if the variables are independent is
∧
n n. j ni. n. j
. This, of course, is the row marginal times the column
Eij = n i.
=
n n
n
margin all divided by the number of subjects in the design.
(
)
Bliss
22
To sum up, here are the steps that are used Steps in conducting a χ2 test for
association/independence.
1. Determine the maximum level of Type I error you will tolerate in making decisions about
your hypothesis and use this determination to construct a decision rule that tells you when to
reject the null hypothesis. In our example we decided that we would only reject the null
hypothesis if there were less than a 5% chance of it being true. So our decision rule was:
Reject the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%.
Fail to reject the null hypothesis if p is more than or equal to .05.
2. Gather data and determine the frequency distribution of the data observed in the field (for
example, Table 8.3).
3. Calculate the expected frequencies for each cell using Equation 8.2.
4. Calculate the value of the χ2 statistic using Equation 8.1.
5. Calculate the number of degrees of freedom in the design using the formula df = (R-1) (C-1)
(the number of rows in the contingency table minus one times the number of columns in the
table minus 1).
6. Compare the calculated value of χ2 using the data with this critical value found in step #4. If
the calculated value exceeds the tabled critical value this tells us that the chances of obtaining
a value this large if there were no differences between the observed and expected
distributions was less than the maximum probability that you allowed of making a Type I
error.
7. Apply the decision rule you devised in step #1 to determine whether or not you should reject
the null hypothesis. In the case of the votes of legislators on offshore drilling, we concluded
that the probability of the null hypothesis being true was greater than 5% (p>.05). Therefore,
Bliss
23
we will fail to reject the null hypothesis and conclude that variables party affiliation and type
of vote cast cannot be said to be related.
The problem of small expected values in contingency tables. Often in social science
research we work with rather small groups of subjects. This often results in cells that have small
expected values. Table 8.11 displays a contingency table showing the frequency of responses
when a sample of 60 people were asked whether or not they had experienced an act of racism
directed towards them within the past year. The responses are grouped according to the selfreported racial/ethnic group membership of the participants. As in previous tables, the regular
type numbers are the observed values and the italicized numbers are the expected values, that is
the cell values you would expect to see if the two variables are not related to each other.
Table 8.11
Experienced Racism in the Past Year by Race/Ethnicity
White Black Hispanic Asian Other
5
12
5
2
1
Yes
25
11
6
4
3
1
No
20
14
25
3
9
15
5
6
10
5
3
7
2
2
3
35
60
If the frequencies of the expected values in any of the cells of a contingency table is
less than 5, it has been the standard practice to attempt to increase these expected values by
combining rows and/or columns of the table to increase expected values (see Table 8.12). The
rationale for this is a bit beyond the scope of this book. Suffice it to say that the result of this
problem of small expected values is that calculated probabilities of Type I errors tend to be
underestimates of the actual chances of making a Type I error. In other words, when the value of
χ2 obtained from the data tells you the probability of making a Type I error (the significance of
the statistical test) is 4%, it might really be 6%. If p = .04 according to your statistic and you
Bliss
24
were testing at α = .05, you would reject the null hypothesis. However, if p were actually equal
to .06, you should have failed to reject the null hypothesis. In other words, you would have
rejected the null hypothesis when you shouldn’t have … a Type I error. Theoretically then,
small expected values increase our chances of making a Type I error beyond the level of α that
we initially set. However, Camilli and Hapkins (1979) have shown that Type I error is not really
a problem so long as the sample size is at least equal to eight. Overall (1980) noted that the real
problem with small sample sizes is more likely to have to do with Type II error (i.e., with power)
than with Type I error. It probably is not a bad idea to try to keep expected values of
contingency table cells above 5, but one should not adhere to this rule slavishly if the situation
warrants smaller expected frequencies.
Table 8.12
Table 8.11 With the Last Three Categories Condensed
White
Black
Other
5
12
8
Yes
25
11
6
8
No
20
14
25
3
9
15
12
2
20
35
60
Bliss
25
Some authors suggest the use of Yates Correction Factor for
Continuity when using χ2 with a 2 ×2 contingency table. This
adjustment to the standard formula for χ2 was suggested by Yates in
1934 who noted that, as mentioned in the discussion of small sample
sizes, the sampling distribution of the calculated values of χ2 is
discrete while the theoretical sampling distribution of the statistic
is continuous. This is particularly a problem with small samples and this led
Yates to devise this correction for use with 2×2 tables. The procedure merely
subtracts .5 from the differences between the observed and expected frequencies
in the numerator before they are squared giving Formula 8.3.
2
χ =∑
(( f
2
o
− f e ) − .5)
fe
(8.3)
This procedure works quite nicely so long as the marginals of the contingency
table are fixed. To speak of fixed marginals is to say that, if you had repeated the
study with a different sample from the same population, although the individual
cell frequencies might change from the original sample, the marginals would be
equal to those in the first sample. This is a very unusual situation. For this
reason, even though the correction factor is available in the output of most
computer statistical packages, it is probably not a good idea to use Yate’s
Correction for Continuity.
Measures of the strength of association. Remember that null hypothesis tested by the χ2
for goodness of fit is that the cell frequencies observed in the sample are the same as those we
would have expected to see if the two variables were independent of each other. In other words,
the null hypothesis is that the variables are independent. If we reject this null hypothesis we are
simply saying that there is a relationship between the two variables; that the correlation between
the two is not zero. However, this does not tell us anything about the strength of the
relationship. As we discussed in Chapter 1, after we determined that was a finding that allowed
us to reject a null hypothesis that there was no relationship between the subjects’ race and their
experiences with racists, we need to determine the strength of the relationship . In our example
we rejected the null hypothesis that there was no relationship between the subjects’ race and their
Bliss
26
experiences with racists, and now we need to determine if the association (relationship) was
strong or weak.
In order to determine the strength of the relationship between the two variables in 2×2
Phi
φ=
χ2
N
Chi-square
(8.4)
Total sample size
contingency tables, we can use the Phi (Φ) statistic. Phi is related to χ2 as shown in Formula 8.4
(8.5)
Ccccccdccc
ssw222222
12r
For tables larger than 2×2, the formula can be extended to Cramér’s V as shown in Formula 8.5
where k is either the number of rows or the number of columns
Be cautious when
calculating measures of
association with
contingency tables!
in the contingency table, whichever is less. Both Φ and V are
correlation coefficients, and may have values between 0 and
1.00 where zero indicates no relationship at all and 1.00, a
perfect relationship between the two variables. From Chapter
6, you remember that if correlation values close to 1.00
indicate strong relationship, while values close to 0 indicate
relative independence or lack or relationship. In other words,
one may conclude that if the value of a correlation coefficient
is 1.00, we can perfectly predict one variable from the second
all of the time. A value of zero, however, can be interpreted as
telling us any attempt to predict the value of one variable from
Remember, as with any other
correlation coefficient, Φ or V
should only be calculated when
there is reason to believe that the
correlation between the two
variables is non-zero. If the χ2
test for independence indicates
that it is not appropriate to reject
the null hypothesis (the
hypothesis that tells us that the
variables are independent, i.e. not
related), we have no reason to
believe that Φ or V is not zero.
In this case we would be rather
foolish to attempt to actually
calculate the statistic.
the value of the other would be no more accurate than taking a wild guess.
Bliss
27
SPSS
Doing it By Computer
We have a randomly selected sample of 67 graduate students from a College of
Education at a large public university in the southeastern United States. These students have
responded to a questionnaire that included information on the colors of their hair and eyes. The
information on hair and eye color has been coded as either “light” or “dark.” Geneticists tell us
that the genes that control hair and eye color (there are a lot of them) are close to each other on a
particular chromosome. Based on this information (and what we observe every day in the people
around us) we have reason to believe that there is a relationship between the hair and eye color of
human beings. Specifically, that people with light hair tend to have light eyes (and vice-versa) and
that people who have dark hair tend to have dark eyes. If we know whether a person has light or
dark hair we should be able to predict, at least to some degree) whether they have light or dark
eyes. We use SPSS to devise a contingency table for these to variables and test the null hypothesis
that these two variables are independent. If we can reject this null hypothesis, we can use the Φ
statistic to determine the strength
of the relationship between these
Hair color * Eye color Crosstabulation
two variables. The results of this
Count
analysis are shown below.
The contingency table on the left
Eye color
shows the frequencies for each of
Light
Dark
Total
the cells of the contingency table
Hair color
Light
9
5
14
and the row and column marginals
Dark
13
40
53
(labeled “Total”). In the second
Total
22
45
67
table we note that the calculated
value of χ2 with one degree of
freedom is 7.937. The column labeled “Assymp. Sig.” tells us that the chances that the null
Chi-Square Tests
Value
Pearson Chi-Square
Continuity Correction
a
Likelihood Ratio
Asymp. Sig.
(2-sided)
df
7.937b
1
.005
6.237
1
.013
7.522
1
.006
Fisher's Exact Test
Linear-by-Linear Association
N of Valid Cases
Exact Sig.
(2-sided)
.009
7.819
1
Exact Sig.
(1-sided)
.007
.005
67
a. Computed only for a 2x2 table
b. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.60.
hypothesis (the
variables are
independent) is true is
only .5% (.005).
Value
Approx. Sig.
Whether we chose to
Nominal by
Phi
.344
.005
Nominal
test this null hypothesis
Cramer's V
.344
.005
at the .05 or .01 levels,
N of Valid Cases
67
we would reject it since
a. Not assuming the null hypothesis.
there is less than a 1%
b. Using the asymptotic standard error assuming the null hypothesis.
chance that the null is
true and we would
conclude that hair and eye color are related to each other in the population from which this sample
Symmetric Measures
Bliss
28
was drawn. Since we can now assume there is a relationship between these two. The third table
shows us that Cramer’s V statistic has a value of .344 indicating a moderately weak relationship
between hair and eye color in the population from which this sample was drawn.
Bliss
29
Review Questions and Exercises
1. An investigator is trying to determine if there is a relationship between marital satisfaction of
husbands and wives. He identifies a group of married couples, and asks each individual
independently how happy he or she is in the current marriage. The data (number of
respondents in each category) are reported in the following table.
Wife’s level of satisfaction
Low Medium High
Husband’s Low
42
313
16
level of
Medium 26
98
50
satisfaction High
14
22
92
a) Is there a statistically significant relationship between husbands’ and wives’ satisfaction
with the marriage? Please state the null and alternate hypotheses, and test at Alpha=.05.
b) If there is a statistically significant relationship, what is its strength? Please calculate an
indicator of effect size.
c) In this sample, what is the probability of having a married couple with the wife being
unhappy (low satisfaction) and the husband being very happy (high satisfaction) with
their marriage?
2. A sociologist is interested in the number of children in a population of 1,492 families that are
being investigated. She hypothesizes the following distribution of family size:
No children – 26%
1 child – 16%
2 children – 25%
3 children – 15%
4 children – 9%
5 children – 4%
6 children – 2%
7 children – 2%
8 or more children – 1%
Test the hypothesis that this is the distribution in the population from which the sample of
drawn at the .05 level of significance. Write a paragraph describing your conclusions.
3. As a part of an action research project, a school principal administered a standardized test of
critical thinking skills to her third grade classes in the beginning and at the end of a school
year. She finds out that the mean of all of her third graders’ (n=123) test scores at the end of
the year was the same as their mean in the beginning of the year. But, unexpectedly she also
finds-out that the variance of the scores were not the same. In the beginning of the year, the
variance was 9 (s2=9), while at the end of the year the variance was 20.
Bliss
30
a. Can she conclude that the variance of the scores actually increased? State the null
and alternate hypotheses answering this question, and test with Alpha=.05.
b. Write a short report to summarize these results, and make a clear statement regarding
changes in critical thinking during the third grade.