Download Introduction to Two

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
'
$
Stat 504, Lecture 6
1
Introduction to
Two-Way Tables
Example 1: 2 × 2 Table of counts and/or proportions
Table 1: Incidence of Common Colds involving
French Skiers (Pauling(1971) as reported in Fienberg(1980)
Cold
No Cold
Totals
Placebo
31
109
140
Absorbic Acid
17
122
139
Totals
48
231
279
&
%
'
$
Stat 504, Lecture 6
2
Table 2: Incidence of Common Colds involving
French Skiers (Pauling(1971) as reported in Fienberg(1980)
Cold
No Cold
Totals
Placebo
0.111
0.391
0.502
Absorbic Acid
0.601
0.437
0.498
Totals
0.172
0.828
1
Q1: Compare relative frequency of occurrence of some
characteristics of two groups, e.g. is a probability of a
member of the placebo group contracting a cold same
as a probability of a member for the ascorbic group
contracting a cold?
Q2: Are two characteristics independent, e.g. are a
type of treatment and contracting cold associated?
Q3: Is one characteristic a cause for another, e.g.
does having a therapeutic value of ascorbic acid
(vitamin C) prevent contracting a cold?
&
%
'
$
Stat 504, Lecture 6
3
Suppose that we collect data on two binary variables,
Y and Z. Binary means that these variables take two
possible values, say 1 (e.g. ”cold”) and 2 (e.g. ”no
cold”). Suppose we collect values of Y (e.g.
treatment) and Z (e.g. contracting cold) for n sample
units. The data then consist of n pairs,
(y1 , z1 ), (y2 , z2 ), . . . , (yn , zn ).
We can summarize the data in a frequency table. Let
xij be the number of sample units having Y = i and
Z = j. Then x = (x11 , x12 , x21 , x22 ) is a summary of
all n responses, e.g x11 = 31. We could display x as a
one-way table with four cells, but it is customary to
display x as a square table with two rows and two
columns:
&
Z=1
Z=2
Y =1
x11
x12
Y =2
x21
x22
%
'
Stat 504, Lecture 6
$
4
Marginal totals. When a subscript in a cell count xij
is replaced by a plus sign (+), it will mean that we
have taken the sum of the cell counts over that
subscript. The row totals are
x1+
=
x11 + x12 ,
x2+
=
x21 + x22 ,
x+1
=
x11 + x21 ,
x+2
=
x12 + x22 ,
the column totals are
and the grand total is
x++ = x11 + x12 + x21 + x22 = n.
These quantities are often called marginal totals,
because they are conveniently placed in the margins
of the table, like this.
&
Z=1
Z=2
total
Y =1
x11
x12
x1+
Y =2
x21
x22
x2+
total
x+1
x+2
x++
%
'
Stat 504, Lecture 6
$
5
If the sample units are randomly sampled from a
large population, then x = (x11 , x12 , x21 , x22 ) will
have a multinomial distribution with index n = x++
and parameter vector
π = (π11 , π12 , π21 , π22 ),
where πij = P (Y = i, Z = j).
Z=1
Z=2
total
Y =1
π11
π12
π1+
Y =2
π21
π22
π2+
total
π+1
π+2
π++ = 1
The probability distribution {πij } is the joint
distribution of Y and Z.
When you sum the joint probabilities, you get a
marginal distribution, e..g the probability distribution
{πi+ } is the marginal distribution for Y where
P (Y = 1) = π1+ and P (Y = 2) = π2+ .
How does the distribution of Z change as the category
of Y changes? The conditional distribution of Z given
P
πij
Y , for example, is {πj|i } = πi+ , such that j πj|i = 1.
&
%
Related documents