Download Chapter 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 7
Sampling Distributions
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Sampling Distributions
7.1
The Sampling Distribution of the
Sample Mean
7.2
The Sampling Distribution of the
Sample Proportion
7-2
Sampling Distribution of the Sample
Mean
The sampling distribution of the
sample mean x is the probability
distribution of the population of the
sample means obtainable from all
possible samples of size n from a
population of size N
7-3
Example 7.1: The Risk Analysis Case
• Population of the percent returns from
six grand prizes
• In order, the values of prizes (in $,000)
are 10, 20, 30, 40, 50 and 60
– Label each prize A, B, C, …, F in order of
increasing value
– The mean prize is $35,000 with a standard
deviation of $17,078
7-4
Example 7.1: The Risk Analysis Case
Continued
• Any one prize of these prizes is as likely
to be picked as any other of the six
– Uniform distribution with N = 6
– Each stock has a probability of being
picked of 1/6 = 0.1667
7-5
Example 7.1: The Risk Analysis Case #3
7-6
Example 7.1: The Risk Analysis Case #4
• Now, select all possible samples of size
n = 2 from this population of size N = 6
– Select all possible pairs of prizes
• How to select?
– Sample randomly
– Sample without replacement
– Sample without regard to order
7-7
Example 7.1: The Risk Analysis Case #5
• Result: There are 15 possible samples
of size n = 2
• Calculate the sample mean of each and
every sample
7-8
Example 7.1: The Risk Analysis Case #6
Sample
Mean
15
20
25
30
35
40
45
50
55
Relative
Frequency Frequency
1
1/15
1
1/15
2
2/15
2
2/15
3
3/15
2
2/15
2
2/15
1
1/15
1
1/15
7-9
Observations
• Although the population of N = 6 grand
prizes has a uniform distribution, …
• … the histogram of n = 15 sample mean
prizes:
– Seems to be centered over the sample
mean return of 35,000, and
– Appears to be bell-shaped and less spread
out than the histogram of individual returns
7-10
Example: Sampling All Stocks
• Population of returns of all 1,815 stocks listed
on NYSE for 1987
– See Figure on next slide
– The mean rate of return m was –3.5% with a
standard deviation s of 26%
• Draw all possible random samples of size
n=5 and calculate the sample mean return of
each
– Sample with a computer
– See Figure on next slide
7-11
Example: Sampling All Stocks
7-12
Results from Sampling All Stocks
• Observations
– Both histograms appear to be bell-shaped and
centered over the same mean of –3.5%
– The histogram of the sample mean returns looks
less spread out than that of the individual returns
• Statistics
– Mean of all sample means: µx = µ = -3.5%
– Standard deviation of all possible means:
x 

n

26
 11 .63 %
5
7-13
General Conclusions
1. If the population of individual items is
normal, then the population of all
sample means is also normal
2. Even if the population of individual
items is not normal, there are
circumstances when the population of
all sample means is normal (Central
Limit Theorem)
7-14
General Conclusions
Continued
3. The mean of all possible sample means
equals the population mean
–
That is, m = mx
4. The standard deviation x of all sample
means is less than the standard deviation of
the population
–
That is, x < 
–
Each sample mean averages out the high and
the low measurements, and so are closer to m
than many of the individual population
measurements
7-15
The Empirical Rule
• The empirical rule holds for the
sampling distribution of the sample
mean
– 68.26% of all possible sample means are
within (plus or minus) one standard
deviation x of m
– 95.44% of all possible observed values of x
are within (plus or minus) two x of m
– 99.73% of all possible observed values of x
are within (plus or minus) three x of m
7-16
Properties of the Sampling Distribution
of the Sample Mean
• If the population being sampled is
normal, then so is the sampling
distribution of the sample mean, x
• The mean x of the sampling
distribution of x is mx = m
– That is, the mean of all possible sample
means is the same as the population mean
7-17
Properties of the Sampling
Distribution of the Sample Mean #2
• The variance 2x of the sampling
distribution of x is
 
2
x

2
n
• That is, the variance of the sampling
distribution of x is
– Directly proportional to the variance of the
population
– Inversely proportional to the sample size
7-18
Properties of the Sampling
Distribution of the Sample Mean #3
• The standard deviation x of the
sampling distribution of x is
x 

n
• That is, the standard deviation of the
sampling distribution of x is
– Directly proportional to the standard
deviation of the population
– Inversely proportional to the square root of
the sample size
7-19
Notes
• The formulas for 2x and x hold if the sampled
population is infinite
• The formulas hold approximately if the sampled
population is finite but if N is much larger (at least
20 times larger) than the n (N/n ≥ 20)
– x is the point estimate of m, and the larger the
sample size n, the more accurate the estimate
– Because as n increases, x decreases as 1/√n
• Additionally, as n increases, the more
representative is the sample of the population
– So, to reduce x, take bigger samples!
7-20
Example 7.2: Car Mileage Case
• Population of all midsize cars of a particular
make and model
– Population is normal with mean µ and standard
deviation σ
– Draw all possible samples of size n
– Then the sampling distribution of the sample mean
is normal with mean µx = µ and standard
deviation  x   n
– In particular, draw samples of size:
• n=5
• n = 50
7-21
Effect of Sample Size
So, the different possible sample means will be more
closely clustered around m if n = 50 than if n =5
7-22
Example 7.2: Car Mileage Statistical
Inference
• Recall from Chapter 3 mileage
example, x = 31.56 mpg for a sample of
size n=50 and s = 0.8
• If the population mean µ is exactly 31,
what is the probability of observing a
sample mean that is greater than or
equal to 31.56?
7-23
Example 7.2: Car Mileage Statistical
Inference #2
• Calculate the probability of observing a
sample mean that is greater than or
equal to 31.6 mpg if µ = 31 mpg
– Want P(x > 31.5531 if µ = 31)
• Use s as the point estimate for  so that
x 

0.8

 0.113
n
50
7-24
Example 7.2: Car Mileage Statistical
Inference #3
• Then

31.56  m x 

Px  31.56 if m  31  P z 
x


31.56  31 

 P z 

0.113 

 P z  4.96
• But z = 4.96 is off the standard normal
table
• The largest z value in the table is 3.99,
which has a right hand tail area of
0.00003
7-25
Probability That x > 31.56 When µ = 31
7-26
Example 7.2: Car Mileage Statistical
Inference #4
• z = 4.96 > 3.99, so P(z ≥ 4.84) < 0.00003
– If m = 31 mpg, fewer than 3 in 100,000 of all
samples have a mean at least as large as
observed
• Have either of the following explanations:
– If m is actually 31 mpg, then very unlucky in
picking this sample
– Not unlucky, m is not 31 mpg, but is really larger
• Difficult to believe such a small chance would
occur, so conclude that there is strong
evidence that m does not equal 31 mpg
– Also, m is, in fact, actually larger than 31 mpg
7-27
Central Limit Theorem
• Now consider sampling a non-normal
population
• Still have: mx=m and x=/n
– Exactly correct if infinite population
– Approximately correct if population size N finite
but much larger than sample size n
• But if population is non-normal, what is the
shape of the sampling distribution of the
sample mean?
– The sampling distribution is approximately normal
if the sample is large enough, even if the
population is non-normal (Central Limit Theorem)
7-28
The Central Limit Theorem #2
• No matter what is the probability distribution
that describes the population, if the sample
size n is large enough, then the population of
all possible sample means is approximately
normal with mean mx=m and standard
deviation x=/n
• Further, the larger the sample size n, the
closer the sampling distribution of the sample
mean is to being normal
– In other words, the larger n, the better the
approximation
7-29
The Central Limit Theorem #3
Random Sample (x1, x2, …, xn)
x
X
as n  large
Population Distribution
(m, )
(right-skewed)
m
Sampling
Distribution of
Sample Mean
x
 m, x  
n

(nearly normal)
7-30
Example: Effect of Sample Size
The larger the sample
size, the more nearly
normally distributed is
the population of all
possible sample means
Also, as the sample
size increases, the
spread of the sampling
distribution decreases
7-31
How Large?
• How large is “large enough?”
• If the sample size is at least 30, then for
most sampled populations, the sampling
distribution of sample means is
approximately normal
– Here, if n is at least 30, it will be assumed
that the sampling distribution of x is
approximately normal
• If the population is normal, then the
sampling distribution of x is normal
regardless of the sample size
7-32
Example: Central Limit Theorem
Simulation
7-33
Unbiased Estimates
• A sample statistic is an unbiased point
estimate of a population parameter if the
mean of all possible values of the sample
statistic equals the population parameter
• x is an unbiased estimate of m because mx=m
– In general, the sample mean is always an
unbiased estimate of m
– The sample median is often an unbiased estimate
of m but not always
7-34
Unbiased Estimates
Continued
• The sample variance s2 is an unbiased
estimate of 2
– That is why s2 has a divisor of n – 1 and
not n
• However, s is not an unbiased estimate
of 
– Even so, the usual practice is to use s as
an estimate of 
7-35
Minimum Variance Estimates
• Want the sample statistic to have a
small standard deviation
– All values of the sample statistic should be
clustered around the population parameter
• Then, the statistic from any sample should be
close to the population parameter
7-36
Minimum Variance Estimates
Continued
• Given a choice between unbiased
estimates, choose one with smallest
standard deviation
– The sample mean and the sample median
are both unbiased estimates of m
– The sampling distribution of sample means
generally has a smaller standard deviation
than that of sample medians
7-37
Finite Populations
• If a finite population of size N is sampled
randomly and without replacement, must use
the “finite population correction” to calculate
the correct standard deviation of the sampling
distribution of the sample mean
– If N is less than 20 times the sample size, that is,
if N < 20  n
– Otherwise
x 

n
but instead  x 

n
7-38
Finite Populations Continued
• The finite population correction is
N n
N 1
• The standard error is
x 

n
N n
N 1
7-39
Stock Example
• In the example of sampling n = 2 stock
percent returns from N = 6 stocks
– Here, N = 3 • n, so to calculate σx, must use the
finite population correction

N  n  17.078  6  2
x 


n N 1 
2  6 1
 12.076  0.8944  10.8
 x  10.8%
– Note that the finite population correction is
substantially less than one
7-40