Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics 111 – Lecture 14
Comparing Means
from Two Samples
and
One-Sample Inference for
Proportions
June 25, 2008
Stat 111 - Lecture 14 - Two Means
1
Administrative Notes
• Homework 5 is posted on website
• Due Wednesday, July 1st
June 25, 2008
Stat 111 - Lecture 14 - Two Means
2
Outline
• Two Sample Z-test (known variance)
• Two Sample t-test (unknown variance)
• Matched Pair Test and Examples
• Tests and Intervals for Proportions (Chapter 8)
June 25, 2008
Stat 111 - Lecture 14 - Two Means
3
Comparing Two Samples
• Up to now, we have looked at inference for one
sample of continuous data
• Our next focus in this course is comparing the data
from two different samples
• For now, we will assume that these two different
samples are independent of each other and come
from two distinct populations
Population 1:1 , 1
Population 2: 2 , 2
Sample 1:
Sample 2:
June 25, 2008
, s1
Stat 111 - Lecture 14 - Means
, s2
4
Blackout Baby Boom Revisited
• Nine months (Monday, August 8th) after Nov 1965
blackout, NY Times claimed an increased birth rate
• Already looked at single two-week sample: found no
significant difference from usual rate (430 births/day)
• What if we instead look at difference between
weekends and weekdays?
Sun Mon
Tue
Wed Thu
Fri
Sat
452
470
431
448
467
377
344
449
440
457
471
463
405
377
453
499
461
442
444
415
356
470
519
443
449
418
394
399
451
468
432
June 25, 2008
Weekdays
Stat 111 - Lecture 14 - Means
Weekends
5
Two-Sample Z test
• We want to test the null hypothesis that the two
populations have different means
• H0: 1 = 2 or equivalently, 1 - 2 = 0
• Two-sided alternative hypothesis: 1 - 2  0
• If we assume our population SDs 1 and 2 are
known, we can calculate a two-sample Z statistic:
• We can then calculate a p-value from this Z statistic
using the standard normal distribution
June 25, 2008
Stat 111 - Lecture 14 - Means
6
Two-Sample Z test for Blackout Data
• To use Z test, we need to assume that our pop. SDs
are known: 1 = s1 = 21.7 and 2 = s2 = 24.5
• From normal table, P(Z > 7.5) is less than 0.0002, so
our p-value = 2  P(Z > 7.5) is less than 0.0004
• Conclusion here is a significant difference between
birth rates on weekends and weekdays
• We don’t usually know the population SDs, so we
need a method for unknown 1 and 2
June 25, 2008
Stat 111 - Lecture 14 - Two Means
7
Two-Sample t test
• We still want to test the null hypothesis that the two
populations have equal means (H0: 1 - 2 = 0)
• If 1 and 2 are unknown, then we need to use the
sample SDs s1 and s2 instead, which gives us the
two-sample T statistic:
• The p-value is calculated using the t distribution, but
what degrees of freedom do we use?
• df can be complicated and often is calculated by software
• Simpler and more conservative: set degrees of freedom
equal to the smaller of (n1-1) or (n2-1)
June 25, 2008
Stat 111 - Lecture 14 - Two Means
8
Two-Sample t test for Blackout Data
• To use t test, we need to use our sample standard
deviations s1 = 21.7 and s2 = 24.5
• We need to look up the tail probabilities using the t
distribution
• Degrees of freedom is the smaller of n1-1 = 22
or n2-1 = 7
June 25, 2008
Stat 111 - Lecture 14 - Two Means
9
June 25, 2008
Stat 111 - Lecture 14 - Two Means
10
Two-Sample t test for Blackout Data
• From t-table with df = 7, we see that
P(T > 7.5) < 0.0005
• If our alternative hypothesis is two-sided, then we
know that our p-value < 2  0.0005 = 0.001
• We reject the null hypothesis at -level of 0.05 and
conclude there is a significant difference between
birth rates on weekends and weekdays
• Same result as Z-test, but we are a little more
conservative
June 25, 2008
Stat 111 - Lecture 14 - Two Means
11
Two-Sample Confidence Intervals
• In addition to two sample t-tests, we can also use the
t distribution to construct confidence intervals for the
mean difference
• When 1 and 2 are unknown, we can form the
following 100·C% confidence interval for the mean
difference 1 - 2 :
• The critical value tk* is calculated from a t distribution
with degrees of freedom k
• k is equal to the smaller of (n1-1) and (n2-1)
June 25, 2008
Stat 111 - Lecture 14 - Two Means
12
Confidence Interval for Blackout Data
• We can calculate a 95% confidence interval for the
mean difference between birth rates on weekdays
and weekends:
• We get our critical value tk* = 2.365 is calculated from
a t distribution with 7 degrees of freedom, so our 95%
confidence interval is:
• Since zero is not contained in this interval, we know
the difference is statistically significant!
June 25, 2008
Stat 111 - Lecture 14 - Two Means
13
Matched Pairs
• Sometimes the two samples that are being compared
are matched pairs (not independent)
• Example: Sentences for crack versus powder
cocaine
• We could test for the mean
difference between
X1 = crack sentences and
X2 = powder sentences
• However, we realize that these
data are paired: each row of
sentences have a matching
quantity of cocaine
• Our t-test for two independent
samples ignores this
relationship
June 25, 2008
Stat 111 - Lecture 14 - Two Means
14
Matched Pairs Test
• First, calculate the difference d = X1 - X2 for each pair
• Then, calculate the mean and SD of the differences d
Sentences
Quantity
Crack
X1
Powder
X2
Difference
d = X1 - X2
5
70.5
12
58.5
25
87.5
18
69.5
100
136
30
106.0
200
169.5
37
132.5
500
211.5
70.5
141.0
2000
264
87.5
176.5
5000
264
136
128.0
50000
264
211.5
52.5
150000
264
264
0.0
June 25, 2008
Stat 111 - Lecture 14 - Two Means
15
Matched Pairs Test
• Instead of a two-sample test for the difference
between X1 and X2, we do a one-sample test on the
difference d
• Null hypothesis: mean difference between the two
samples is equal to zero
H0 : d= 0
versus
Ha : d 0
• Usual test statistic when population SD is unknown:
• p-value calculated from t-distribution with df = 8
• P(T > 5.24) < 0.0005 so p-value < 0.001
• Difference between crack and powder sentences is
statistically significant at -level of 0.05
June 25, 2008
Stat 111 - Lecture 14 - Two Means
16
Matched Pairs Confidence Interval
• We can also construct a confidence interval for the
mean differenced of matched pairs
• We can just use the confidence intervals we learned for the
one-sample, unknown  case
• Example: 95% confidence interval for mean
difference between crack and powder sentences:
June 25, 2008
Stat 111 - Lecture 14 - Two Means
17
Summary of Two-Sample Tests
• Two independent samples with known 1 and 2
• We use two-sample Z-test with p-values calculated using the
standard normal distribution
• Two independent samples with unknown 1 and 2
• We use two-sample t-test with p-values calculated using the
t distribution with degrees of freedom equal to the smaller of
n1-1 and n2-1
• Also can make confidence intervals using t distribution
• Two samples that are matched pairs
• We first calculate the differences for each pair, and then use
our usual one-sample t-test on these differences
June 25, 2008
Stat 111 - Lecture 14 - Two Means
18
One-Sample Inference for
Proportions
June 25, 2008
Stat 111 - Lecture 14 - Two Means
19
Revisiting Count Data
• Chapter 6 and 7 covered inference for the population
mean of continuous data
• We now return to count data:
• Example: Opinion Polls
• Xi = 1 if you support Obama, Xi = 0 if not
• We call p the population proportion for Xi = 1
• What is the proportion of people who support the war?
• What is the proportion of Red Sox fans at Penn?
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
20
Inference for population proportion p
• We will use sample proportion as our best
estimate of the unknown population proportion p
where Y = sample count
• Tool 1: use our sample statistic as the center of an
entire confidence interval of likely values for our
population parameter
Confidence Interval : Estimate ± Margin of Error
• Tool 2: Use the data to for a specific hypothesis test
1. Formulate your null and alternative hypotheses
2. Calculate the test statistic
3. Find the p-value for the test statistic
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
21
Distribution of Sample Proportion
• In Chapter 5, we learned that the sample proportion
technically has a binomial distribution
• However, we also learned that if the sample size is
large, the sample proportion approximately follows a
Normal distribution with mean and standard deviation:
• We will essentially use this approximation throughout
chapter 8, so we can make probability calculations
using the standard normal table
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
22
Confidence Interval for a Proportion
• We could use our sample proportion as the center of
a confidence interval of likely values for the population
parameter p:
• The width of the interval is a multiple of the standard
deviation of the sample proportion
• The multiple Z* is calculated from a normal distribution
and depends on the confidence level
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
23
Confidence Interval for a Proportion
• One Problem: this margin of error involves the
population proportion p, which we don’t actually know!
• Solution: substitute in the sample proportion
for the
population proportion p, which gives us the interval:
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
24
Example: Red Sox fans at Penn
• What proportion of Penn students are Red Sox fans?
• Use Stat 111 class survey as sample
• Y = 25 out of n = 192 students are Red Sox fans so
• 95% confidence interval for the population proportion:
• Proportion of Red Sox fans at Penn is probably
between 8% and 18%
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
25
Hypothesis Test for a Proportion
• Suppose that we are now interested in using our count
data to test a hypothesized population proportion p0
• Example: an older study says that the proportion of Red
Sox fans at Penn is 0.10.
• Does our sample show a significantly different proportion?
• First Step: Null and alternative hypotheses
H0: p = 0.10
vs.
Ha: p 0.10
• Second Step: Test Statistic
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
26
Hypothesis Test for a Proportion
• Problem: test statistic involves population proportion p
• For confidence intervals, we plugged in sample
proportion but for test statistics, we plug in the
hypothesized proportion p0 :
• Example: test statistic for Red Sox example
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
27
Hypothesis Test for a Proportion
• Third step: need to calculate a p-value for our test
statistic using the standard normal distribution
• Red Sox Example: Test statistic Z = 1.39
• What is the probability of getting a test statistic as extreme or
more extreme than Z = 1.39? ie. P(Z > 1.39) = ?
prob = 0.082
Z = 1.39
• Two-sided alternative, so p-value = 2P(Z>1.39) = 0.16
• We don’t reject H0 at a =0.05 level, and conclude that Red
Sox proportion is not significantly different from p0=0.10
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
28
Another Example
• Mass ESP experiment in 1977 Sunday Mirror (UK)
• Psychic hired to send readers a mental message about a
particular color (out of 5 choices). Readers then mailed back
the color that they “received” from psychic
• Newspaper declared the experiment a success because, out of
2355 responses, they received 521 correct ones (
)
• Is the proportion of correct answers statistically different
than we would expect by chance (p0 = 0.2) ?
H0: p= 0.2 vs. Ha: p 0.2
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
29
Mass ESP Example
• Calculate a p-value using the standard normal
distribution
prob = 0.0075
Z = 2.43
• Two-sided alternative, so p-value = 2P(Z>2.43) = 0.015
• We reject H0 at a =0.05 level, and conclude that the survey
proportion is significantly different from p0=0.20
• We could also calculate a 95% confidence interval for p:
Interval doesn’t contain 0.20
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
30
Margin of Error
• Confidence intervals for proportion p is centered at the
sample proportion
and has a margin of error:
• Before the study begins, we can calculate the sample
size needed for a desired margin of error
• Problem: don’t know sample prop. before study begins!
• Solution: use
which gives us the maximum m
• So, if we want a margin of error less than m, we need
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
31
Margin of Error Examples
• Red Sox Example: how many students should I poll in
order to have a margin of error less than 5% in a 95%
confidence interval?
• We would need a sample size of 385 students
• ESP example: how many responses must newspaper
receive to have a margin of error less than 1% in a 95%
confidence interval?
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
32
Next Class - Lecture 15
• Two-Sample Inference for Proportions
• Moore, McCabe and Craig: Section 8.2
June 25, 2008
Stat 111 - Lecture 14- OneSample Proportions
33