Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapters 14 & 16: Introduction to
Statistical Inferences
Level of
Confidence
1 
Maximum
Error
E
E = z(/2)
s
n
Sample
Size
n
1
Chapter Goals
• Learn the basic concepts of estimation
• Consider questions about a population mean using
two methods that assume the population standard
deviation is known
• Consider: what value or interval of values can we
use to estimate a population mean?
2
The Nature of Estimation
• Discuss estimation more precisely
• What makes a statistic good ?
• Assume the population standard deviation, s, is
known throughout this chapter
• Concentrate on learning the procedures for
making statistical inferences about a population
mean m
3
Point Estimate for a Parameter
Point Estimate for a Parameter: The value of the
corresponding statistic
 Example: x = 14.7 is a point estimate (single number
value) for the mean m of the sampled population
How good is the point estimate? Is it high? Or
low? Would another sample yield the same
result?
Note: The quality of an estimation procedure is enhanced if
the sample statistic is both less variable and unbiased
4
Unbiased Statistic
Unbiased Statistic: A sample statistic whose sampling
distribution has a mean value equal to the value of the
population parameter being estimated. A statistic that is
not unbiased is a biased statistic.
Example: The figures on the next slide illustrate the
concept of being unbiased and the effect of
variability on a point estimate
Assume A is the parameter being estimated
5
Illustrations
A
Negative bias
Under estimate
High variability
Unbiased
On target estimate
A
A
Positive bias
Over estimate
Low variability
6
Notes
1. The sample mean, x ,is an unbiased statistic because the
mean value of the sampling distribution is equal to the
population mean: m x = m
2. Sample means vary from sample to sample. We don’t
expect the sample mean to be exactly equal the population
mean m.
3. We do expect the sample mean to be close to the population
mean
4. Since closeness is measured in standard deviations, we
expect the sample mean to be within 2 standard deviations
of the population mean
7
Important Definitions
Interval Estimate: An interval bounded by two values and used to
estimate the value of a population parameter. The values that bound
this interval are statistics calculated from the sample that is being
used as the basis for the estimation.
Level of Confidence 1 -  : The probability that the sample to be
selected yields an interval that includes the parameter being
estimated
Confidence Interval: An interval estimate with a specified level of
confidence
8
Summary
• To construct a confidence interval for a population mean m, use
the CLT
• Use the point estimate
x as the central value of an interval
• Since the sample mean ought to be within 2 standard deviations of
the population mean (95% of the time), we can find the bounds to
an interval centered at x :
x  2(s x ) to x + 2(s x )
• The level of confidence for the resulting interval is approximately
95%, or 0.95
• We can be more accurate in determining the level of confidence
9
Illustration
Distribution of x
x  2(s x )
m
x
x + 2(s x )
• The interval x  2 s x to x + 2 s x is an approximate 95%
confidence interval for the population mean m based on this x
10
Estimation of Mean m (s Known)
• Formalize the interval estimation process as it
applies to estimating the population mean m based on
a random sample
• Assume the population standard deviation s is
known
• The assumptions are the conditions that need to exist
in order to correctly apply a statistical procedure
11
The Assumption...
The assumption for estimating the mean m using a known s :
The sampling distribution of x has a normal distribution
Assumption satisfied by:
1. Knowing that the sampled population is normally distributed, or
2. Using a large enough random sample (CLT)
Note: The CLT may be applied to smaller samples (for example
n = 15) when there is evidence to suggest a unimodal distribution
that is approximately symmetric. If there is evidence of skewness,
the sample size needs to be much larger.
12
The 1- Confidence Interval of m
• A 1- confidence interval for m is found by
x  z(/2)
s
to
n
x + z(/2)
s
n
Notes:
1. x is the point estimate and the center point of the
confidence interval
2. z(/2) : confidence coefficient, the number of multiples of the
standard error needed to construct an interval estimate of the
correct width to have a level of confidence 1- 
1 
 /2
- z(/2)
0
 /2
z(/2)
z
13
Notes Continued
3. s / n : standard error of the mean
The standard deviation of the distribution of x
4. z(/2) ( s / n ) : maximum error of estimate E
One-half the width of the confidence interval (the product
of the confidence coefficient and the standard error)
5. x  z(/2)
x + z(/2)
( s / n ) : lower confidence limit (LCL)
( s / n ) : upper confidence limit (UCL)
14
The Confidence Interval
A Five-Step Model:
1. Describe the population parameter of concern
2. Specify the confidence interval criteria
a. Check the assumptions
b. Identify the probability distribution and the formula to be used
c. Determine the level of confidence, 1 - 
3. Collect and present sample information
4. Determine the confidence interval
a. Determine the confidence coefficient
b. Find the maximum error of estimate
c. Find the lower and upper confidence limits
5. State the confidence interval
15
Example
 Example: The weights of full boxes of a certain kind of cereal are normally
distributed with a standard deviation of 0.27 oz. A sample of 18
randomly selected boxes produced a mean weight of 9.87 oz. Find a
95% confidence interval for the true mean weight of a box of this
cereal.
Solution:
1. Describe the population parameter of concern
The mean, m, weight of all boxes of this cereal
2. Specify the confidence interval criteria
a. Check the assumptions
The weights are normally distributed, the distribution of x is normal
b. Identify the probability distribution and formula to be used
Use the standard normal variable z with s = 0.27
c. Determine the level of confidence, 1 - 
The question asks for 95% confidence: 1 -  = 0.95
16
Solution Continued
3. Collect and present information
The sample information is given in the statement of the problem
Given: n = 18;
x = 9.87
4. Determine the confidence interval
a. Determine the confidence coefficient
The confidence coefficient is found using Table A or C:
z(/2)
1 
1.15
0.75
1.28
0.80
1.65
0.90
1.96
0.95
2.33
0.98
2.58
0.99
17
Solution Continued
b. Find the maximum error of estimate
Use the maximum error part of the formula for a CI
E = z(/2)
s
0.27 =
= 196
.
01247
.
n
18
c. Find the lower and upper confidence limits
Use the sample mean and the maximum error:
s
n
9.87  01247
.
9.7453
9.75
x  z(/2)
s
n
9.87 + 01247
.
9.9947
10.00
to x + z(/2)
to
to
to
5. State the confidence interval
9.75 to 10.00 is a 95% confidence interval for the true mean weight, m, of
cereal boxes
18
Example
 Example: A random sample of the test scores of 100 applicants for clerk-typist
positions at a large insurance company showed a mean score of
72.6. Determine a 99% confidence interval for the mean score of all
applicants at the insurance company. Assume the standard deviation
of test scores is 10.5.
Solution:
1. Parameter of concern
The mean test score, m, of all applicants at the insurance company
2. Confidence interval criteria
a. Assumptions: The distribution of the variable, test score, is not known.
However, the sample size is large enough (n = 100) so that the CLT applies
b. Probability distribution: standard normal variable z with s = 10.5
c. The level of confidence: 99%, or 1 -  = 0.99
19
Solution Continued
3. Sample information
Given: n = 100 and x = 72.6
4. The confidence interval
a. Confidence coefficient: z(/2) = z(0.005) = 2 .58
b. Maximum error: E = z(/2) ( s / n ) = ( 2.58)(10.5 / 100 ) = 2.709
c. The lower and upper limits:
72.6  2.709 = 69.891
to
72.6 + 2.709 = 75309
.
5. Confidence interval
With 99% confidence we say, “The mean test score is between 69.9 and 75.3”,
or “69.9 to 75.3 is a 99% confidence interval for the true mean test score”
Note: The confidence is in the process. 99% confidence means: if we conduct
the experiment over and over, and construct lots of confidence intervals,
then 99% of the confidence intervals will contain the true mean value m.
20
Sample Size
• Problem: Find the sample size necessary in order to
obtain a specified maximum error and level of
confidence (assume the standard deviation is known)
E = z(/2)
s
n
Solve this expression for n:
z(/2)  s 

n=
 E 
2
21
Example
 Example: Find the sample size necessary to estimate a population
mean to within 0.5 with 95% confidence if the standard
deviation is 6.2
Solution:
 z(/2)  s  2
n=



E
. )(6.2)  2
 (196
2=
=
[24
.
304]
590.684
n=

 0.5

Therefore, n = 591
Note: When solving for sample size n, always round up to the
next largest integer (Why?)
22