Download normal distribution

Document related concepts
no text concepts found
Transcript
A Mathematical View
of Our World
1st ed.
Parks, Musser, Trimpe,
Maurer, and Maurer
Chapter 11
Inferential Statistics
Section 11.1
Normal Distributions
• Goals
• Study normal distributions
• Study standard normal distributions
• Find the area under a standard normal
curve
11.1 Initial Problem
• A class of 90 students had a mean test
score of 74, with a standard deviation of
8.
• If the professor curves the scores, how
many students will get As and how many
will get Fs?
• The solution will be given at the end of the section.
Statistical Inference
• The process of making predictions
about an entire population based on
information from a sample is called
statistical inference.
Data Distributions
• For large data sets, a smooth curve can
often be used to approximate the histogram.
Data Distributions, cont’d
• The larger the data set and the smaller the
bin size, the better the approximation of the
smooth curve.
Example 1
• The distribution of weights for a large
sample of college men is shown.
Example 1, cont’d
• What percent of the men have weights
between:
a)167 and 192 pounds?
b)137 and 192 pounds?
c) 137 and 222 pounds?
Example 1, cont’d
• Solution:
a)167 and 192 pounds?
• The area under the curve is 0.2, so 20%
of the men are in this weight range.
Example 1, cont’d
• Solution:
b)137 and 192 pounds?
• The area under the curve is 0.4, so 40%
of the men are in this weight range.
Example 1, cont’d
• Solution:
c) 137 and 222 pounds?
• The area under the curve is 0.6, so 60%
of the men are in this weight range.
Normal Distributions
• Data that has a symmetric, bell-shaped
distribution curve is said to have a normal
distribution.
• The mean and standard deviation determine the
exact shape and position of the curve.
Example 2
a) Which normal curve has the largest mean?
b) Which normal curve has the largest
standard deviation?
Example 2, cont’d
a) Solution: The data sets are already labeled
in order of smallest mean to largest mean.
• Data Set III has the largest mean.
Example 2, cont’d
b) Solution: Data Set III has the largest
standard deviation because it is the
shortest, widest curve.
• The order of the standard deviations is II, I, III.
Normal Distributions, cont’d
• Normal distributions with various means
and standard deviations are shown on
the following slides.
Normal Distributions, cont’d
Normal Distributions, cont’d
Normal Distributions, cont’d
Standard Normal Distribution
• The normal distribution with a mean of 0 and
a standard deviation of 1 is called the
standard normal distribution.
Standard Normal Distribution, cont’d
• The areas under any normal distribution
can be compared to the areas under
the standard normal distribution, as
shown in the figure on the next slide.
Standard Normal Distribution, cont’d
Area
• One way to find the area under a region
of the standard normal curve is to use a
table.
• Tables of values for the standard normal
curve are printed in textbooks to eliminate
the need to do repeated complicated
calculations.
Example 3
• What fraction of the total area under the
standard normal curve lies between
a = -0.5 and b = 1.5?
Example 3, cont’d
• Solution: Find a = -0.5 and b = 1.5 in the
table.
Example 3, cont’d
• Solution, cont’d: The value in the table is
0.6247.
• A total of 62.47% of the area is shaded.
• In any normal distribution, 62.47% of the data
lies between 0.5 standard deviations below
the mean and 1.5 standard deviations above
the mean.
• The probability a randomly selected data
value will lie between -0.5 and 1.5 is 62.47%
Example 4
• What percent of the data in a standard
normal distribution lies between 0.5 and
2.5?
Example 4, cont’d
• Solution: The value in the table for
a = 0.5 and b = 2.5 is 0.3023.
• So 30.23% of the data in a standard
normal distribution lies between 0.5 and
2.5.
Areas, cont’d
• Because the normal curve is
symmetric, the areas in the previous
table are repeated.
Areas, cont’d
• Figure
11.11
and
table
11.2
Areas, cont’d
• A more common type of table:
Example 5
• Find the percent of data points in a standard
normal distribution that lie between z = -1.8
and z = 1.3.
Example 5, cont’d
• Solution: Find the two areas in Table
11.3 and add them together.
• The area from 0 to 1.3 is 0.4032.
• The area from 0 to -1.8 is 0.4641.
• The total shaded area is 0.4032 + 0.4641
= 0.8673.
Example 6
• Find the percent of data points in a standard
normal distribution that lie between z = 1.2
and z = 1.7.
Example 6, cont’d
• Solution: Find the two areas in Table
11.3 and subtract them.
• The area from 0 to 1.2 is 0.3849.
• The area from 0 to 1.7 is 0.4554.
• The total shaded area is 0.4554 - 0.3849
= 0.0705.
Question:
The value from the table associated with z =
2.1 is 0.4821. To find the percentage of data
values less than -2.1 in a standard normal
distribution, what do you need to do?
a. Add the table
value to 0.5.
b. Subtract the
table value from 0.5.
c. The table value
is the answer.
d. Divide the table
value in half.
Question:
What percentage of data values lie
between z = -1.2 and z = -0.7 in a
standard normal distribution?
a. 11.51%
b. 62.49%
c. 12.69%
d. 24.20%
11.1 Initial Problem Solution
• A class of 90 students had a mean test score
of 74 with a standard deviation of 8 points.
• The test will be curved so that all students
whose scores are at least 1.5 standard
deviations above or below the mean will
receive As and Fs, respectively.
• How many students will get As and how many
will get Fs?
Initial Problem Solution, cont’d
• Because the class is large, it is likely the
scores have a normal distribution.
• If the scores are curved:
• The mean of 74 will correspond to a score of 0 in
the standard normal distribution.
• A score that is 1.5 standard deviations above the
mean will correspond to a score of +1.5 in the
standard normal distribution, while a score that is
1.5 standard deviations below the mean will
correspond to a score of -1.5.
Initial Problem Solution, cont’d
• The percentage of As is the same as the area
to the right of z = 1.5 in the standard normal
distribution.
• Approximately 43.32% of the area is between 0
and 1.5.
• Since 50% of the area is to the right of 0, the
area above 1.5 is 50% - 43.32% = 6.68%
• Thus, 6.68% of the students, or approximately 6
students, will receive As.
Initial Problem Solution, cont’d
• The percentage of Fs is the same as the area
to the left of z = -1.5 in the standard normal
distribution.
• Because of the symmetry of the normal
distribution, this is the same as the area above z
= 1.5, so the calculations are the same as in the
last step.
• Thus, 6.68% of the students, or approximately 6
students, will receive Fs.
Section 11.2
Applications of Normal
Distributions
• Goals
• Study normal distribution applications
• Use the 68-95-99.7 Rule
• Use the population z-score
11.2 Initial Problem
• Two suppliers make an engine part.
• Supplier A charges $120 for 100 parts which
have a standard deviation of 0.004 mm from
the mean size.
• Supplier B charges $90 for 100 parts which
have a standard deviation of 0.012 mm from
the mean size.
• Which supplier is a better choice?
• The solution will be given at the end of the
section.
Normal Distributions
• If a data set is represented by a
normal distribution with mean μ and
standard deviation σ, the percentage
of the data between μ + rσ and μ + sσ
is the same as the percentage of the
data in a standard normal distribution
that lies between r and s.
Normal Distributions, cont’d
Example 1
• Approximately 10% of the data in a
standard normal distribution lies within 1/8
of a standard deviation from the mean.
• Within 1/8 means between -0.125 and 0.125.
• Suppose the measurements of a certain
population are normally distributed with a
mean of 112 and standard deviation of 24.
What values correspond to the interval
given above?
Example 1, cont’d
• Solution: In the standard normal distribution
we are considering the interval from r = 0.125 to s = 0.125.
• For the nonstandard distribution, the interval will
be 112 + (-0.125)(24) = 109 to
112 + (0.125)(24) = 115.
• We know that 10% of the data values will lie
between 112 and 115.
Example 2
• The HDL cholesterol levels for a group of
women are approximately normally
distributed with a mean of 64 mg/dL and a
standard deviation of 15 mg/dL.
• Determine the percentage of these women
that have HDL cholesterol levels between
19 and 109 mg/dL.
Example 2, cont’d
• Solution: The mean of 64 mg/dL
corresponds to 0 in the standard normal
curve.
• The value of 19 is 45 less than the mean,
corresponding to 3 standard deviations
below the mean.
• The value of 64 is 45 more than the mean,
corresponding to 3 standard deviations
above the mean.
Example 2, cont’d
• Solution, cont’d: The area under the
standard normal curve between z = -3 and
z = 3 is found:
• From 0 to 3, there is 49.87% of the area.
• From 0 to -3, there is also 49.87%.
• Approximately, 2(49.87%) = 99.74% of the
women will have a HDL level between 19 and
109 mg/dL.
68-95-99.7 Rule
• For all normal distributions:
• Approximately 68% of the measurements
lie within 1 standard deviation of the mean.
• Approximately 95% of the measurements
lie within 2 standard deviations of the
mean.
• Approximately 99.7% of the measurements
lie within 3 standard deviations of the
mean.
68-95-99.7 Rule, cont’d
Example 3
• Designers of a new computer mouse
have learned that the lengths of
women’s hands are normally
distributed with a mean of 17 cm and a
standard deviation of 1 cm.
• What percentage of women have
hands in the range from 15 cm to 19
cm?
Example 3, cont’d
• Solution:
• A length of 15 cm is 2 standard deviations
below the mean of 17 cm.
• A length of 19 cm is 2 standard deviations
above the mean of 17 cm.
• According to the 68-95-99.7 Rule, the
percent of women whose hands are within
2 standard deviations of the mean length
is 95%.
Question:
Recall from the previous example
that women’s hands have a mean
length of 17 cm, with a standard
deviation of 1 cm. Use the 68-9599.7 Rule to determine what percent
of women’s hands are between 14
cm and 18cm long.
a. 68.00%
c. 49.85%
b. 81.50%
d. 83.85%
Example 4
•
The lake sturgeon has a mean length of
114 cm and a standard deviation of 29 cm.
•
If the lengths are normally distributed,
determine:
a) What percent of lake sturgeon had lengths
between 56 cm and 143 cm?
b) What percent of lake sturgeon were not
between 56 cm and 143 cm in length?
Example 4, cont’d
•
Solution:
•
Note that 114 cm is 2 standard deviations above the
mean.
•
Also, 56 cm is 1 standard deviation below the mean.
Example 4, cont’d
•
Solution, cont’d:
a) The area from z = -1 to z = 2 is 0.135 + 0.34 + 0.34 =
0.815.
•
So 81.5% of the sturgeon were between 56 cm and
143 cm long.
Example 4, cont’d
•
Solution, cont’d:
b) This is the complement of the event in part (a).
•
So 100% - 81.5% = 18.5% of the sturgeon
were not between 56 cm and 143 cm long.
Population z-scores
• The formula for converting a normal
distribution value to a standard normal
distribution value is called a population
z-score.
• The population z-score of a
measurement, x, is given by:
z
x

Question:
If a data value in a normal
distribution has a population z-score
of 0, we know that
.
a. The data value is equal to the mean.
b. The data value is larger than the mean.
c. The data value is smaller than the
mean.
d. The data value is equal to the standard
deviation.
Example 5
• Suppose a normal distribution has a
mean of 4 and a standard deviation of
3.
• Find the z-scores of the measurements
-1, 2, 3, 5, and 9.
Example 5, cont’d
Example 5, cont’d
• Solution, cont’d: The relationship between
the normal values and the standard normal
values is illustrated.
Example 6
• In 1996, the finishing times for the New York
City Marathon were approximately normal,
with a mean of 260 minutes and a standard
deviation of about 50 minutes.
• What percentage of the finishers that year
had times between 285 minutes and 335
minutes.
Example 6, cont’d
• Solution: Find the z-scores.
• For a time of 285 minutes,
285  260 25
z

 0.5
50
50
• For a time of 335 minutes,
335  260 75
z

 1.5
50
50
Example 6, cont’d
• Solution, cont’d: Find the areas
• The area from 0 to 0.5 is 0.1915.
• The area from 0 to 1.5 is 1.4332.
Example 6, cont’d
• Solution, cont’d: Subtract the areas to find
0.4332 – 0.1915 = 0.2417.
• The conclusion is that 24.17% of the finishing
times were between 285 and 335 minutes.
Example 7
• Recall the distribution of HDL cholesterol
levels from the previous example, with a
mean of 64 mg/dL and a standard deviation
of 15 mg/dL.
• If an HDL level of 40 mg/dL signals an
increased risk for coronary heart disease,
what percentage of the women studied are
at increased risk?
Example 7, cont’d
• Solution: Find the z-score for an HDL
level of 40 mg/dL:
40  64
z
 1.6
15
• The area between 0 and -1.6 is
0.4452.
Example 7, cont’d
• Solution: The area to the left of -1.6 is
0.5 – 0.4452 = 0.0548.
• In this group of women, 5.48% of them
are at increased risk for coronary heart
disease because of low HDL levels.
11.2 Initial Problem Solution
• Two suppliers make an engine part.
• Supplier A charges $120 for 100 parts which
have a standard deviation of 0.004 mm from
the mean size.
• Supplier B charges $90 for 100 parts which
have a standard deviation of 0.012 mm from
the mean size.
• If parts must be within 0.012 mm to be
acceptable, which supplier is a better
choice?
Initial Problem Solution, cont’d
• Determine the cost for each acceptable
part from each supplier.
• Supplier A: Since σ = 0.004 mm, all parts
within 3 standard deviations will be
acceptable.
• We know that 99.7% of the parts are within 3
standard deviations of the mean.
• Each acceptable part costs $120
99.7
 $1.20
Initial Problem Solution, cont’d
• Determine the cost for each acceptable
part from each supplier.
• Supplier B: Since σ = 0.012 mm, all parts
within 1 standard deviation will be
acceptable.
• We know that 68% of the parts are within 1
standard deviation of the mean.
• Each acceptable part costs $90
68
 $1.32
Initial Problem Solution, cont’d
• Overall each part from supplier B costs
less than each part from supplier A, but
more parts from B will have to be
thrown away.
• Each acceptable part from supplier B
costs more than each acceptable part
from supplier A.
• They should choose supplier A.
Section 11.3
Confidence Intervals
• Goals
• Study proportions
• Study population proportions
• Study sample proportions
• Study confidence intervals
• Study margin of error
11.3 Initial Problem
• A candy company prints prize tickets
inside the wrappers of some of their
candy bars.
• Suppose you buy 400 candy bars and
find that 25 of them have prizes. If you
buy 1000 more, how many prizes would
you expect to win?
• The solution will be given at the end of the section.
Proportions
• A fraction of the population under
consideration is called a population
proportion.
• The notation for a population proportion is p.
• For example, if 65,000,000 of 130,000,000
people support the President’s budget, the
population proportion of people who support
the budget is
65, 000, 000
p
130, 000, 000
 50%
Proportions, cont’d
• A fraction of the sample being
measured is called a sample proportion.
• The notation for a sample proportion is p̂.
• For example, if 198 of 413 people polled
support the President’s budget, the
sample proportion of people who
support the budget is
198
pˆ 
413
 48%
Example 1
• A college has 3520 freshman, of which 1056
have consumed an alcoholic beverage in the
last 30 days.
• Of the 50 students surveyed in a health
class, 11 say they have had an alcoholic
beverage in the last 30 days.
• What are the population proportion and the
sample proportion?
Example 1, cont’d
• Solution: The population is the 3520
freshmen at the college.
• The population proportion is
1056
p
 30%
3520
Example 1, cont’d
• Solution, cont’d: The sample is the 50
students who were surveyed.
• The sample proportion is
11
pˆ 
 22%
50
Example 1, cont’d
• Solution, cont’d: Notice that the
population proportion and the sample
proportion were not identical.
• The sample proportion can vary
depending on what random sample of
students is chosen.
Example 1, cont’d
• Solution, cont’d: A
distribution of the
sample
proportions for
various possible
samples of this
population is
shown at right.
Sample Proportions Distribution
• If samples of size n are taken from a
population having a population
proportion p, then the set of all sample
proportions has a mean and standard
deviation of:
•  p
• 
p 1  p 
n
Sample Proportions, cont’d
• If two conditions are met, then n is large
enough and the distribution of sample
proportions is approximately normal.
• The conditions are:
• p 3
p 1  p 
n
0
• p  3 p 1  p   1
n
Example 2
• Suppose the population proportion of a
group is 0.4, and we choose a simple
random sample of size 30.
• Find the mean and standard deviation
of the set of all sample proportions.
Example 2, cont’d
• Solution: In this case, p = 0.4 and n =
30.
• The mean is   p  0.4
• The standard deviation is

p 1  p 
n

0.4 1  0.4 
30
 0.09
Example 2, cont’d
• Solution, cont’d: The sample proportion
distribution is graphed below.
Question:
If a population proportion is known
to be 0.25, is a sample size of 20
large enough to guarantee that the
distribution of sample proportions is
approximately normal?
a. yes
b. no
Example 3
• Fox News asked 900 registered voters
whether or not they would take a smallpox
vaccine.
• Suppose it is known that 60% of all
Americans would take the vaccine. What is
the approximate percentage of samples for
which between 58% and 62% of voters in the
sample would take the shot?
Example 3, cont’d
• Solution: We know that p = 0.6, so the mean
of the sample proportion distribution is 0.6.
• The sample size is n = 900, so the standard
deviation is

p 1  p 
n

0.6 1  0.6 
900
 0.02
Example 3, cont’d
• Solution, cont’d: A normal curve is shown,
labeled with sample proportion values as well
as their z-scores.
Example 3, cont’d
• Solution, cont’d: Approximately 68% of the
samples would show a sample proportion of
between 58% and 62%.
Standard Error
•
In most situations, we do not know the
population proportion.
•
•
The point of measuring the sample is to
estimate the population proportion.
The standard error is the standard deviation
of the set of all sample proportions:
sˆ 
pˆ 1  pˆ 
n
Example 4
• What is the standard error in a sample
of size 400 if the sample proportion in
one sample is 35%?
Example 4, cont’d
• Solution: Use the formula from the
previous slide:
sˆ 
pˆ 1  pˆ 
n

0.35 1  0.35 
400
 0.024
Confidence Intervals
• According to the 68-95-99.7 Rule, 95% of the
time the sample proportion will be within 2
standard deviations of the population
proportion.
• A 95% confidence interval is the interval
 pˆ  2sˆ, pˆ  2sˆ 
Confidence Intervals, cont’d
• For a 95% confidence interval, the
margin of error is  2sˆ
• Any value in the confidence interval is a
reasonable estimate for the population
proportion.
Confidence Intervals, cont’d
• For example, (a) and (b) below show
good estimates while (c) shows an
unlikely estimate.
Example 5
• Determine the 95% confidence interval
and the margin of error for a sample
size of 400 with a sample proportion of
35%.
Example 5, cont’d
• Solution: In a previous example we found
the standard error in this case to be 2.4%.
• Calculate the confidence interval bounds:
•
•
pˆ  2sˆ  35%  2  2.4%  30.2%
•
pˆ  2sˆ  35%  2  2.4%  39.8%
The margin of error is
 2sˆ   2  2.4%   4.8%
Question:
Find the 95% confidence interval for
a sample size of 100 with a sample
proportion of 25%. Round your
answer to the nearest hundredth of a
percent.
a. (20.67%, 29.33%)
b. (24.63%, 25.38%)
c. (16.34%, 33.66%)
d. (12.01%, 37.99%)
Question:
Find the margin of error for the 95%
confidence interval in the previous
question.
Recall, the sample size was 100 and
the sample proportion was 25%.
Round to the nearest hundredth of a
percent.
a. ± 4.33%
c. ± 2.17%
b. ± 17.32%
d. ± 8.66%
Example 6
• In a sample of 600 U.S. citizens, 362
people say they drive an American-built
car.
• Find the 95% confidence interval and
the margin of error for the proportion of
the population that drive an Americanbuilt car.
Example 6, cont’d
• Solution: The sample proportion is:
362
pˆ 
 0.603
600
• The standard error is:
sˆ 
pˆ 1  pˆ 
n

0.603 1  0.603
600
 0.020
Example 6, cont’d
• Solution, cont’d: Calculate the
confidence interval bounds:
• pˆ  2sˆ  60.3%  2  2%  56.3%
• pˆ  2sˆ  60.3%  2  2%  64.3%
• The margin of error is
 2sˆ   2  2%   4%
Example 6, cont’d
• Solution, cont’d: With a confidence
level of 95% we can say that 60.3% of
Americans drive American-built cars,
with a margin of error of ± 4%.
Example 7
• In a survey of 1000 adults, 44% said they
were satisfied with the quality of health care
in the U.S.
• The margin of error was reported as ± 3%.
• Assuming a 95% confidence interval was
used, verify that the margin of error is
correct and explain what it means.
Example 7, cont’d
• Solution: We know n = 1000 and pˆ  0.44
• The standard error is
sˆ 
pˆ 1  pˆ 
n

0.44 1  0.44 
1000
 0.0157
Example 7, cont’d
• Solution, cont’d: The margin of error is
 2sˆ   2  0.0157    0.0314
• The margin of error of approximately 3%
indicates that the researchers are 95%
confident that the true percentage of adults
satisfied with health care in the U.S. is
between 41% and 47%.
Example 8
• A manufacturer tests 1000 computer
chips and finds 216 defective ones.
Find a 95% confidence interval for the
population proportion of defective
chips.
Example 8, cont’d
• Solution: We know n = 1000 and pˆ  0.216
• The standard error is
sˆ 
pˆ 1  pˆ 
n

0.216 1  0.216 
1000
 0.013
Example 8, cont’d
• Solution, cont’d: Calculate the
confidence interval bounds:
• pˆ  2sˆ  21.6%  2 1.3%  19.0%


• pˆ  2sˆ  21.6%  2 1.3%  24.2%
• The 95% confidence interval is (19.0%,
24.2%).
Example 9
• A manufacturer tests 10,000 computer
chips and finds 2160 defective ones.
Find a 95% confidence interval for the
population proportion of defective
chips.
• Does choosing a larger sample give
significantly better results?
Example 9, cont’d
• Solution: We know n = 10,000 and
pˆ  0.216
• The standard error is
sˆ 
pˆ 1  pˆ 
n

0.216 1  0.216 
10, 000
 0.004
Example 9, cont’d
• Solution, cont’d: Calculate the
confidence interval bounds:
• pˆ  2sˆ  21.6%  2 0.4%  20.8%


• pˆ  2sˆ  21.6%  2  0.4%  22.4%
• The 95% confidence interval is (20.8%,
22.4%).
Example 9, cont’d
• Solution, cont’d: Compare the results for
sample sizes of 1000 and 10,000.
• For n = 1000, the 95% confidence interval is
(19.0%, 24.2%).
• For n = 10,000, the 95% confidence interval is
(20.8%, 22.4%).
• Increasing the sample size to 10,000 does
give a significantly better estimate of the
population proportion.
11.3 Initial Problem Solution
• A candy company prints prize tickets
inside the wrappers of some of their
candy bars.
• Suppose you buy 400 candy bars
and find that 25 of them have prizes.
If you buy 1000 more, how many
prizes would you expect to win?
Initial Problem Solution, cont’d
• For the first 400 candy bars you
bought, the sample proportion of
25
winning bars is
pˆ 
400
 0.0625
• The standard error is
sˆ 
pˆ 1  pˆ 
n

0.0625 1  0.0625 
400
 0.0121
Initial Problem Solution, cont’d
• Calculate the confidence interval bounds:
•
pˆ  2sˆ  6.25%  2 1.21%  3.83%
•
pˆ  2sˆ  6.25%  2 1.21%  8.67%
• Out of 1000 new candy bars, you should
expect between 3.83% and 8.67%, or
between 38 and 87 bars, to be winners.
Related documents