Download 501_Lecture_07

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript





The t distributions
One-sample t confidence interval
One-sample t test
Matched pairs t procedures
Robustness of the t procedures
1
When the sampling distribution of x is close to Normal, we can find probabilities
involving x by standardizing:
x -m
z=
s
n
When we don’t know σ, we can estimate it using the sample standard deviation
sx. What happens when we standardize?
?? =
x -m
sx n
This new statistic does not have a Normal distribution!
2

The standard deviation of X can be estimated by
s
SE X 
n


This quantity is called the standard error of the
sample mean.
The test statistic will no longer be normally
distributed when we use the standard error.
◦ The test statistic will have a new distribution, called the t
(or Student’s t) distribution.
The t Distributions
When we perform inference about a population mean µ using a t
distribution, the appropriate degrees of freedom are found by subtracting
1 from the sample size n, making df = n – 1.
The t Distributions: Degrees of Freedom
Draw an SRS of size n from a large population that has a Normal
distribution with mean µ and standard deviation σ. The one-sample t
statistic
x -m
t=
sx n
has the t distribution with degrees of freedom df = n – 1.
4
The t Distributions
When we standardize based on the sample standard deviation sx, our statistic
has a new distribution called a t distribution.
The t distribution has a shape similar to that of the standard Normal curve in that
it is symmetric with a single peak at 0.
However, it differs from the Normal curve in that has more area in the tails.
However, there is a different t distribution for each sample size, specified by its
degrees of freedom (df).
5
The one-sample t interval for a population mean is similar in both
reasoning and computational detail to the one-sample z interval for a
population mean.
The One-Sample t Interval for a Population Mean
Choose an SRS of size n from a population having unknown mean µ. A level C
sx
confidence interval for µ is:
x  t*
n
where t* is the critical value for the t(n – 1) distribution.
The margin of error is: t *
sx
n
This interval is exact when the population distribution is Normal and
approximately correct for large n in other cases.

8

A mutual fund is trying to estimate the return on
investment in companies that won quality
awards last year. A random sample of 20 such
companies is selected, and the return on
investment is calculated. The mean of the
sample is 14.75 and the standard deviation of
the sample is 8.18. Construct a 95% confidence
interval for the mean return on investment.



From a running production of corn soy blend
we take a sample to measure content of
vitamin C. The results are:
26 31 23 22 11 22 14 31
Find a 95% confidence interval for the content
of vitamin C in this production.
Give the margin of error.
One-Sample t Test
Choose an SRS of size n from a large population that contains an unknown
mean µ. To test the hypothesis H0 : µ = µ0, compute the one-sample t statistic:
x  0
t
sx
n
Find the P-value by calculating the probability (at degrees of freedom = n – 1)
of getting a t statistic this large or larger in the direction specified by the
alternative hypothesis Ha.

11

These P-values are exact if the population
distribution is normal and are approximately
correct for large n in other cases.

Use Table D to know the range of the P-values,
software gives exact p-values

Test whether the vitamin C content from the
sample conforms to specifications of 40.

Test whether the vitamin C content from the
sample is lower than the specifications of 40.



Inference about a parameter of a single
distribution is less common than comparative
inference.
In certain circumstances a comparative study
makes use of single-sample t procedures.
In a matched pairs study, subjects are matched in
pairs; the outcomes are compared within each
matched pair.

A matched pairs analysis is appropriate when
there are two measurements or observations per
each individual and we examine the change from
the first to the second. Typically, the observations
are “before” and “after” measures in some sense.
◦ For each individual, subtract the “before” measure from the
“after” measure.
◦ Analyze the difference using the one-sample confidence
interval or significance-testing t procedures (with H0: µdiff =
0).



20 French teachers attend a course to improve
their skills. The teachers take a Modern Language
Association’s listening test at the beginning of the
course and at it’s end. The maximum possible
score on the test is 36. The differences in each
participant’s “before” and “after” scores have
sample mean 2.5 and sample standard deviation
2.893.
Is the improvement significant?
Construct a 95% confidence interval for the mean
improvement (in the entire population).
A confidence interval or significance test is called robust if the
confidence level or P-value does not change very much when the
conditions for use of the procedure are violated.
Using the t Procedures
Except in the case of small samples, the condition that the data are an
SRS from the population of interest is more important than the condition
that the population distribution is Normal.
Sample
size less than 15: Use t procedures if the data appear close to
Normal. If the data are clearly skewed or if outliers are present, do not
use t.
Sample
size at least 15: The t procedures can be used except in the
presence of outliers or strong skewness.
Large
samples: The t procedures can be used even for clearly skewed
distributions when the sample is large, roughly n ≥ 40.
20

One sample t
◦ Proc ttest—gives confidence interval and
significance test

For matched pairs
◦ Still use similar code as one sample t
◦ Or use the paired command




Two-sample problems
The two-sample t procedures
Robustness of the two-sample t procedures
Pooled two-sample t procedures
22



Two-sample problems typically arise from a
randomized comparative experiment with two
treatment groups. (Experimental study)
Comparing random samples separately selected
from two populations is also a two-sample
problem. (Observational study)
Unlike matched pairs design, there is no
matching of the units in the two sample, and the
two samples may be of different sizes.
What if we want to compare the means of some quantitative variable for
two populations, Population 1 and Population 2?
Our parameters of interest are the population means µ1 and µ2. The best
approach is to take separate random samples from each population and to
compare the sample means.
Suppose we want to compare the average effectiveness of two treatments
in a completely randomized experiment. In this case, the parameters µ1
and µ2 are the true mean responses for Treatment 1 and Treatment 2,
respectively. We use the mean response from each of the two groups to
make the comparison. Here’s a table that summarizes this situation:
24
When data come from two random samples or two groups in a randomized
experiment, the statistic x1 - x 2 is our best guess for the value of m1 - m 2.
When the two samples are independent of each other, the standard deviation of the statistic
x1 - x2 is:
s x -x =
1
2
s 12 s 22
n1
+
n2
Since we don't know the values of the parameters s 1 and s 2 , we replace them
in the standard deviation formula with the sample standard deviations. The result
is the standard error of the statistic x1 - x 2 :
s12 s2 2
+
n1 n 2
We standardize the observed difference to obtain a t statistic that tells us how far
the observed difference is from its mean in standard deviation units:
t=
(x1 - x2 )
s12 s22
+
n1 n2
The two-sample t statistic has approximately a
t distribution. We can use technology to
determine degrees of freedom OR we can use
a conservative approach, using the smaller of
n1 – 1 and n2 – 1 for the degrees
25 of freedom.
Two-Sample t Interval for a Difference Between Means
When the Random, Normal, and Independent conditions are met, a
level C confidence interval for (m1 - m 2 ) is
s12 s22
(x1 - x2 ) ± t *
+
n1 n2
where t * is the critical value at confidence level C for the t distribution with
degrees of freedom either gotten from technology or equal to the smaller of
n1 -1 and n2 -1.
27
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Gender
F
F
F
F
F
F
F
F
F
F
F
F
M
M
M
M
M
M
M
Mass
36.1
54.6
48.5
42
50.6
42
40.3
33.1
42.4
34.5
51.1
41.2
62
62.9
47.4
48.7
51.9
51.9
46.9
Rate
995
1425
1396
1418
1502
1256
1189
913
1124
1052
1347
1204
1792
1666
1362
1614
1460
1867
1439

n1  12
x1  1235.1
s1  188.3
n2  7
x2  1600
s2  189.2
Find a 95% CI for the difference in mean
metabolism rates between men and
women.
Two-Sample t Test for the Difference Between Two Means
Suppose the Random, Normal, and Independent conditions are met. To
test the hypothesis H 0 : m1 - m2 = 0, compute the t statistic
t=
(x1 - x2 )
s12 s22
+
n1 n2
Find the P-value by calculating the probability of getting a t statistic this large
or larger in the direction specified by the alternative hypothesis H a . Use the
t distribution with degrees of freedom approximated by technology or the
smaller of n1 -1 and n2 -1.
31

Perform a significance test using α=0.05.
The two-sample t procedures are more robust than the one-sample t
methods, particularly when the distributions are not symmetric.
Using the t Procedures
• Except in the case of small samples, the condition that the data are SRSs
from the populations of interest is more important than the condition that
the population distributions are Normal.
•
Sum of the sample sizes less than 15: Use t procedures if the data appear
close to Normal. If the data are clearly skewed or if outliers are present, do
not use t.
•
Sum of the sample size at least 15: The t procedures can be used except
in the presence of outliers or strong skewness.
•
Large samples: The t procedures can be used even for clearly skewed
distributions when the sum of the sample sizes is large.
33


df is the min(n1 – 1, n2 – 1)
◦ This choice is conservative.
Software will usually give smaller P-values.
◦ More correct.
◦ In our example with metabolism rates, software will
give df = 12.6 (12 in Table D)
æ s12 s22 ö
ç + ÷
class):
è n1 n2 ø
df =
2
2
2ö
2 ö
æ
æ
1 s1
1 s2
ç ÷ +
ç ÷
n1 -1 è n1 ø n2 -1 è n2 ø
2
◦ Formula (not needed in our
◦ Most of the time there is no difference in final
conclusion…
There are two versions of the two-sample t-test: one assuming equal
variance (“pooled two-sample test”) and one not assuming equal
variance (“unequal” variance, as we have studied) for the two
populations. They have slightly different formulas and degrees of
freedom.
The pooled (equal variance) twosample t-test was often used before
computers because it has exactly
the t distribution for degrees of
freedom n1 + n2 − 2.
However, the assumption of equal
variances is hard to check, and thus
the unequal variance test is safer.
Two Normally distributed populations
with unequal variances
35
Pooled Two-Sample Procedures
Suppose both populations are Normal and they have the same
standard deviations. The pooled estimator of σ2 is
2
2
(n
-1)s
+
(n
-1)s
1
2
2
s 2p = 1
(n1 + n2 - 2)
A level C confidence interval for µ1 − µ2 is
( x1 - x2 ) ± t * s p
1 1
+
n1 n2
where the degrees of freedom for t* are n1 + n2 − 2.
To test the hypothesis H0: µ1 = µ2 against a
one-sided or a two-sided alternative,
compute the pooled two-sample t statistic
for the t(n1 + n2 − 2) distribution.
x1 - x2
t=
1 1
sp
+
n1 n2
36

Two sample t
◦ Still use similar code as one sample t
◦ Add class command to designate two
groups or populations
Related documents