Download Lecture 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ch6. Sampling distribution
Dr. Deshi Ye
yedeshi@zju.edu.cn
1
Outline
 Population and sample
 The sampling distribution of the mean
(  known)
 The sampling distribution of the mean
(  unknown)
 The sampling distribution of the variance
2/38
Statistics
 Descriptive statistics
 Inferential statistics
 Remarks: many thanks to Paul
Resnick for some slides
3/38
Inferential Statistics
 1. Involves:
Population?
 Estimation
 Hypothesis
Testing
 2. Purpose
 Make Inferences
about Population
Characteristics
4/38
Inference Process
Estimates
& tests
Sample
statistic
(X)
Population
Sample
5/38
Key terms
 Population
 All items of interest
 Sample
 Portion of population
 Parameter
 Summary Measure about Population
 Statistic
 Summary Measure about sample
6/38
6.1 Population and Sample
 Population: refer to a population in
term of its probability distribution or
frequency distribution.
 Population f(x) means a population
described by a frequency distribution, a
probability distribution f(x)
 Population might be infinite or it is
impossible to observe all its values
even finite, it may be impractical or
uneconomical to observe it.
7/38
Sample
 Sample: a part of population.
 Random samples (Why we need?):
such results can be useful only if the
sample is in some way
“representative”.
 Negative example: performance of a
tire if it is tested only on a smooth
roads; family incomes based on the
data of home owner only.
8/38
Sampling
 Representative sample
 Same characteristics as the population
 Random sample
 Every subset of the population has an
equal chance of being selected
9/38
Random sample
 Random sample: A set of
observations
X 1 , X 2 ,, X n
constitutes a random sample of size n
from a finite population of size N, if its
value are chosen so that each subset
of n of the N elements of the
population has the same probability
of being selected.
10/38
Discussion
 Ways assuring the selection of a
sample is at least approximately
random
 Both finite population and infinite
population
11/38
6.2 The sampling distribution
 known)
of the Mean (
 Random sample of n observations,
and its mean x has been computed.
 Another random sample of n
observation, and also its mean x has
been computed.
 Probably no two of them are alike.
12/38
Suppose There’s a
Population ...
Population Size, N = 4
Random Variable, x,
Is # Errors in Work
Values of x: 1, 2, 3, 4
All values equally likely
Estimate  based on a
sample of two
© 1984-1994 T/Maker Co.
13/38
Checking list
 What is the experiment corresponding
to random variable X?
 What is the experiment corresponding
to the random variable X ?
 What is “the sampling distribution of
the mean”?
14/38
Population Characteristics
Summary Measures
Population Distribution
N

X
i 1
N
i
 2.5
N
 
 X
i 1
.3
.2
.1
.0
i
 
N
1
2

2
3
4
5
 1.12
4
15/38
All Possible Samples
of Size n = 2
16 Samples
1st 2nd Observation
Obs 1
2
3
4
1 1,1 1,2 1,3 1,4
2 2,1 2,2 2,3 2,4
3 3,1 3,2 3,3 3,4
4 4,1 4,2 4,3 4,4
Sample with replacement
16/38
All Possible Samples
of Size n = 2
16 Samples
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1st 2nd Observation
Obs 1
2
3
4
1 1,1 1,2 1,3 1,4
1 1.0 1.5 2.0 2.5
2 2,1 2,2 2,3 2,4
2 1.5 2.0 2.5 3.0
3 3,1 3,2 3,3 3,4
3 2.0 2.5 3.0 3.5
4 4,1 4,2 4,3 4,4
4 2.5 3.0 3.5 4.0
Sample with replacement
17/38
Sampling Distribution
of All Sample Means
16 Sample Means
Sampling
Distribution
1st 2nd Observation
Obs 1
2
3
4
1 1.0 1.5 2.0 2.5
2 1.5 2.0 2.5 3.0
3 2.0 2.5 3.0 3.5
4 2.5 3.0 3.5 4.0
P(`X)
.3
.2
.1
.0
`X
1.0 1.5 2.0 2.5 3.0 3.5 4.0
18/38
Comparison
Population
.3
.2
.1
.0
Sampling Distribution
P(`X)
.3
.2
.1
.0
P(X)
1
2
3
  2.5
  112
.
4
`X
1 1.5 2 2.5 3 3.5 4
 x  2.5
σ
x

5
0.79
8
19/38
EX
 Suppose that 50 random samples of
size n=10 are to be taken from a
population having the discrete
uniform distribution
1

f ( x)  10
0
for x  0,1,2,,9
else
sampling is with replacement, so to
speak, so that we sampling from an
infinite population.
20/38
Sample means
 We get 50 samples whose means are
4.4
3.1
3.0
5.3
3.6
3.2
5.3
3.0
5.5
2.7
5.0
3.8
4.6
4.8
4.0
3.5
4.3
5.8
6.4
5.0
4.1
3.3
4.6
4.9
2.6
4.4
5.0
4.0
6.5
4.2
3.6
4.9
3.7
3.5
4.4
6.5
4.8
5.2
4.5
5.6
5.3
3.1
3.7
4.9
4.7
4.4
5.3
3.8
5.3
4.3
21/38
Theorem
If a random sample of size n is taken from a
population having the mean  and the
variance  2 , then X is a random variable

whose distribution has the mean
For samples from infinite populations the
variance of this distribution is  2
n
For samples from a finite population without
replacement of size N
2 N n

the variance is
22/38
n N 1
Central limit theorem
 If X is the mean of a sample of size n
taken from a population having the
mean  and the finite variance  2 ,
then
X 
Z
/ n
is a random variable whose distribution
function approaches that of the
standard normal distribution as n  
23/38
Central Limit Theorem
As
sample
size gets
large
enough
(n  30) ...

x 
n
x  
sampling
distribution
becomes
almost
normal.
X
24/38
EX
 If a 1-gallon can of paint covers on
the average 513.3 square feet with a
standard variation of 31.5 square
feet.
 Question: what is the probability that
the sample mean area covered by a
sample of 40 of these 1-gallon cans
will be anywhere from 510 to 520
square feet?
25/38
Solution
 We shall have to find the normal
curve area between
z
510  513.3
 0.6625
31.5 / 40
and
z
520  513.3
 1.34
31.5/ 40
Check from the cumulative standard normal distribution Table
F (0.6625)  0.2538
F (1.34)  0.9099
Hence, the probability is
F (1.34)  F (0.6625)  0.6561
26/38
Another example
 You’re an operations
analyst for AT&T.
Long-distance
telephone calls are
normally distributed
with  = 8 min. &  =
2 min. If you select
random samples of 25
calls, what percentage
of the sample means
would be between 7.8
& 8.2 minutes?
27/38
Solution
X   7.8  8
Z

  .50
 n 2 25
Sampling
Distribution
X   8.2  8
Z

 .50 Standardized
 n 2 25
Normal Distribution
`X = .4
=1
.3830
.1915 .1915
7.8 8 8.2 `X
-.50 0 .50
Z
28/38
6.2 The sampling distribution
of the Mean (  unknown)
 If n is large, it doesn’t matter whether 
is known or not, as it is reasonable in that
case to substitute for it the sample
standard deviation s.
Question: how about n is a small value?
We need to make the assumption that the
sample comes from a normal population.
29/38
Assumption: population having
normal distribution
 If X is the mean of a random sample
of size n taken from a normal
population having the mean  and
2
n
2
(
X

X
)
2
i
the variance  , and S  
,
n 1
i 1
then
X 
t
S/ n
is a random variable having the t
distribution with the parameter   n 1
30/38
t-distribution

t (n)
P(t  t (n))  
31/38
EX.
 A manufacturer of fuses claims that with a
20% overload, the fuses will blow in 12.4
minutes on the average. To test this claim,
sample of 20 of the fuses was subjected to
a 20% overload, and the times it took them
to blow had a mean of 10.63 minutes and a
standard deviation of 2.48 minutes. If it
can be assumed that the data constitute a
random sample from a normal population.
Question: do they tend to support or refute
the manufacturer’s claim?
32/38
Solution
 First, we calculate
10.63  12.4
t
 3.19
2.48 / 20
Rule to reject the claim: t value is larger than 2.86 or
less than -2.86 where
P(t  2.86)  0.005
And
P(t  2.86)  0.005
33/38
6.4 The Sampling distribution of
the variance
 Theorem 6.4. If S 2 is the variance of
a random sample of size n taken from
a normal population having the
variance  2 then
n
2 
(n  1) S 2
2

2
(
X

X
)
 i
i 1
2
is a random variable having the chisquare distribution with the
parameter   n 1
34/38
Chi-square distribution

 2 ( n)
35/38
F distribution
2
2
S
S
 Theorem. If 1 and 2 are the
variances of independent random
samples of size
n1 and n2 , respectively, taken from
two normal populations having the same
variance, then
S12
F 2
S2
is a random variable having the F
distribution with the parameter
1  n1  1,2  n2  1
36/38
F distribution

F ( n1 , n2 )
37/38
Thanks!
38/38
Related documents