Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Mathematical View of Our World 1st ed. Parks, Musser, Trimpe, Maurer, and Maurer Chapter 11 Inferential Statistics Section 11.1 Normal Distributions • Goals • Study normal distributions • Study standard normal distributions • Find the area under a standard normal curve 11.1 Initial Problem • A class of 90 students had a mean test score of 74, with a standard deviation of 8. • If the professor curves the scores, how many students will get As and how many will get Fs? • The solution will be given at the end of the section. Statistical Inference • The process of making predictions about an entire population based on information from a sample is called statistical inference. Data Distributions • For large data sets, a smooth curve can often be used to approximate the histogram. Data Distributions, cont’d • The larger the data set and the smaller the bin size, the better the approximation of the smooth curve. Example 1 • The distribution of weights for a large sample of college men is shown. Example 1, cont’d • What percent of the men have weights between: a)167 and 192 pounds? b)137 and 192 pounds? c) 137 and 222 pounds? Example 1, cont’d • Solution: a)167 and 192 pounds? • The area under the curve is 0.2, so 20% of the men are in this weight range. Example 1, cont’d • Solution: b)137 and 192 pounds? • The area under the curve is 0.4, so 40% of the men are in this weight range. Example 1, cont’d • Solution: c) 137 and 222 pounds? • The area under the curve is 0.6, so 60% of the men are in this weight range. Normal Distributions • Data that has a symmetric, bell-shaped distribution curve is said to have a normal distribution. • The mean and standard deviation determine the exact shape and position of the curve. Example 2 a) Which normal curve has the largest mean? b) Which normal curve has the largest standard deviation? Example 2, cont’d a) Solution: The data sets are already labeled in order of smallest mean to largest mean. • Data Set III has the largest mean. Example 2, cont’d b) Solution: Data Set III has the largest standard deviation because it is the shortest, widest curve. • The order of the standard deviations is II, I, III. Normal Distributions, cont’d • Normal distributions with various means and standard deviations are shown on the following slides. Normal Distributions, cont’d Normal Distributions, cont’d Normal Distributions, cont’d Standard Normal Distribution • The normal distribution with a mean of 0 and a standard deviation of 1 is called the standard normal distribution. Standard Normal Distribution, cont’d • The areas under any normal distribution can be compared to the areas under the standard normal distribution, as shown in the figure on the next slide. Standard Normal Distribution, cont’d Area • One way to find the area under a region of the standard normal curve is to use a table. • Tables of values for the standard normal curve are printed in textbooks to eliminate the need to do repeated complicated calculations. Example 3 • What fraction of the total area under the standard normal curve lies between a = -0.5 and b = 1.5? Example 3, cont’d • Solution: Find a = -0.5 and b = 1.5 in the table. Example 3, cont’d • Solution, cont’d: The value in the table is 0.6247. • A total of 62.47% of the area is shaded. • In any normal distribution, 62.47% of the data lies between 0.5 standard deviations below the mean and 1.5 standard deviations above the mean. • The probability a randomly selected data value will lie between -0.5 and 1.5 is 62.47% Example 4 • What percent of the data in a standard normal distribution lies between 0.5 and 2.5? Example 4, cont’d • Solution: The value in the table for a = 0.5 and b = 2.5 is 0.3023. • So 30.23% of the data in a standard normal distribution lies between 0.5 and 2.5. Areas, cont’d • Because the normal curve is symmetric, the areas in the previous table are repeated. Areas, cont’d • Figure 11.11 and table 11.2 Areas, cont’d • A more common type of table: Example 5 • Find the percent of data points in a standard normal distribution that lie between z = -1.8 and z = 1.3. Example 5, cont’d • Solution: Find the two areas in Table 11.3 and add them together. • The area from 0 to 1.3 is 0.4032. • The area from 0 to -1.8 is 0.4641. • The total shaded area is 0.4032 + 0.4641 = 0.8673. Example 6 • Find the percent of data points in a standard normal distribution that lie between z = 1.2 and z = 1.7. Example 6, cont’d • Solution: Find the two areas in Table 11.3 and subtract them. • The area from 0 to 1.2 is 0.3849. • The area from 0 to 1.7 is 0.4554. • The total shaded area is 0.4554 - 0.3849 = 0.0705. Question: The value from the table associated with z = 2.1 is 0.4821. To find the percentage of data values less than -2.1 in a standard normal distribution, what do you need to do? a. Add the table value to 0.5. b. Subtract the table value from 0.5. c. The table value is the answer. d. Divide the table value in half. Question: What percentage of data values lie between z = -1.2 and z = -0.7 in a standard normal distribution? a. 11.51% b. 62.49% c. 12.69% d. 24.20% 11.1 Initial Problem Solution • A class of 90 students had a mean test score of 74 with a standard deviation of 8 points. • The test will be curved so that all students whose scores are at least 1.5 standard deviations above or below the mean will receive As and Fs, respectively. • How many students will get As and how many will get Fs? Initial Problem Solution, cont’d • Because the class is large, it is likely the scores have a normal distribution. • If the scores are curved: • The mean of 74 will correspond to a score of 0 in the standard normal distribution. • A score that is 1.5 standard deviations above the mean will correspond to a score of +1.5 in the standard normal distribution, while a score that is 1.5 standard deviations below the mean will correspond to a score of -1.5. Initial Problem Solution, cont’d • The percentage of As is the same as the area to the right of z = 1.5 in the standard normal distribution. • Approximately 43.32% of the area is between 0 and 1.5. • Since 50% of the area is to the right of 0, the area above 1.5 is 50% - 43.32% = 6.68% • Thus, 6.68% of the students, or approximately 6 students, will receive As. Initial Problem Solution, cont’d • The percentage of Fs is the same as the area to the left of z = -1.5 in the standard normal distribution. • Because of the symmetry of the normal distribution, this is the same as the area above z = 1.5, so the calculations are the same as in the last step. • Thus, 6.68% of the students, or approximately 6 students, will receive Fs. Section 11.2 Applications of Normal Distributions • Goals • Study normal distribution applications • Use the 68-95-99.7 Rule • Use the population z-score 11.2 Initial Problem • Two suppliers make an engine part. • Supplier A charges $120 for 100 parts which have a standard deviation of 0.004 mm from the mean size. • Supplier B charges $90 for 100 parts which have a standard deviation of 0.012 mm from the mean size. • Which supplier is a better choice? • The solution will be given at the end of the section. Normal Distributions • If a data set is represented by a normal distribution with mean μ and standard deviation σ, the percentage of the data between μ + rσ and μ + sσ is the same as the percentage of the data in a standard normal distribution that lies between r and s. Normal Distributions, cont’d Example 1 • Approximately 10% of the data in a standard normal distribution lies within 1/8 of a standard deviation from the mean. • Within 1/8 means between -0.125 and 0.125. • Suppose the measurements of a certain population are normally distributed with a mean of 112 and standard deviation of 24. What values correspond to the interval given above? Example 1, cont’d • Solution: In the standard normal distribution we are considering the interval from r = 0.125 to s = 0.125. • For the nonstandard distribution, the interval will be 112 + (-0.125)(24) = 109 to 112 + (0.125)(24) = 115. • We know that 10% of the data values will lie between 112 and 115. Example 2 • The HDL cholesterol levels for a group of women are approximately normally distributed with a mean of 64 mg/dL and a standard deviation of 15 mg/dL. • Determine the percentage of these women that have HDL cholesterol levels between 19 and 109 mg/dL. Example 2, cont’d • Solution: The mean of 64 mg/dL corresponds to 0 in the standard normal curve. • The value of 19 is 45 less than the mean, corresponding to 3 standard deviations below the mean. • The value of 64 is 45 more than the mean, corresponding to 3 standard deviations above the mean. Example 2, cont’d • Solution, cont’d: The area under the standard normal curve between z = -3 and z = 3 is found: • From 0 to 3, there is 49.87% of the area. • From 0 to -3, there is also 49.87%. • Approximately, 2(49.87%) = 99.74% of the women will have a HDL level between 19 and 109 mg/dL. 68-95-99.7 Rule • For all normal distributions: • Approximately 68% of the measurements lie within 1 standard deviation of the mean. • Approximately 95% of the measurements lie within 2 standard deviations of the mean. • Approximately 99.7% of the measurements lie within 3 standard deviations of the mean. 68-95-99.7 Rule, cont’d Example 3 • Designers of a new computer mouse have learned that the lengths of women’s hands are normally distributed with a mean of 17 cm and a standard deviation of 1 cm. • What percentage of women have hands in the range from 15 cm to 19 cm? Example 3, cont’d • Solution: • A length of 15 cm is 2 standard deviations below the mean of 17 cm. • A length of 19 cm is 2 standard deviations above the mean of 17 cm. • According to the 68-95-99.7 Rule, the percent of women whose hands are within 2 standard deviations of the mean length is 95%. Question: Recall from the previous example that women’s hands have a mean length of 17 cm, with a standard deviation of 1 cm. Use the 68-9599.7 Rule to determine what percent of women’s hands are between 14 cm and 18cm long. a. 68.00% c. 49.85% b. 81.50% d. 83.85% Example 4 • The lake sturgeon has a mean length of 114 cm and a standard deviation of 29 cm. • If the lengths are normally distributed, determine: a) What percent of lake sturgeon had lengths between 56 cm and 143 cm? b) What percent of lake sturgeon were not between 56 cm and 143 cm in length? Example 4, cont’d • Solution: • Note that 114 cm is 2 standard deviations above the mean. • Also, 56 cm is 1 standard deviation below the mean. Example 4, cont’d • Solution, cont’d: a) The area from z = -1 to z = 2 is 0.135 + 0.34 + 0.34 = 0.815. • So 81.5% of the sturgeon were between 56 cm and 143 cm long. Example 4, cont’d • Solution, cont’d: b) This is the complement of the event in part (a). • So 100% - 81.5% = 18.5% of the sturgeon were not between 56 cm and 143 cm long. Population z-scores • The formula for converting a normal distribution value to a standard normal distribution value is called a population z-score. • The population z-score of a measurement, x, is given by: z x Question: If a data value in a normal distribution has a population z-score of 0, we know that . a. The data value is equal to the mean. b. The data value is larger than the mean. c. The data value is smaller than the mean. d. The data value is equal to the standard deviation. Example 5 • Suppose a normal distribution has a mean of 4 and a standard deviation of 3. • Find the z-scores of the measurements -1, 2, 3, 5, and 9. Example 5, cont’d Example 5, cont’d • Solution, cont’d: The relationship between the normal values and the standard normal values is illustrated. Example 6 • In 1996, the finishing times for the New York City Marathon were approximately normal, with a mean of 260 minutes and a standard deviation of about 50 minutes. • What percentage of the finishers that year had times between 285 minutes and 335 minutes. Example 6, cont’d • Solution: Find the z-scores. • For a time of 285 minutes, 285 260 25 z 0.5 50 50 • For a time of 335 minutes, 335 260 75 z 1.5 50 50 Example 6, cont’d • Solution, cont’d: Find the areas • The area from 0 to 0.5 is 0.1915. • The area from 0 to 1.5 is 1.4332. Example 6, cont’d • Solution, cont’d: Subtract the areas to find 0.4332 – 0.1915 = 0.2417. • The conclusion is that 24.17% of the finishing times were between 285 and 335 minutes. Example 7 • Recall the distribution of HDL cholesterol levels from the previous example, with a mean of 64 mg/dL and a standard deviation of 15 mg/dL. • If an HDL level of 40 mg/dL signals an increased risk for coronary heart disease, what percentage of the women studied are at increased risk? Example 7, cont’d • Solution: Find the z-score for an HDL level of 40 mg/dL: 40 64 z 1.6 15 • The area between 0 and -1.6 is 0.4452. Example 7, cont’d • Solution: The area to the left of -1.6 is 0.5 – 0.4452 = 0.0548. • In this group of women, 5.48% of them are at increased risk for coronary heart disease because of low HDL levels. 11.2 Initial Problem Solution • Two suppliers make an engine part. • Supplier A charges $120 for 100 parts which have a standard deviation of 0.004 mm from the mean size. • Supplier B charges $90 for 100 parts which have a standard deviation of 0.012 mm from the mean size. • If parts must be within 0.012 mm to be acceptable, which supplier is a better choice? Initial Problem Solution, cont’d • Determine the cost for each acceptable part from each supplier. • Supplier A: Since σ = 0.004 mm, all parts within 3 standard deviations will be acceptable. • We know that 99.7% of the parts are within 3 standard deviations of the mean. • Each acceptable part costs $120 99.7 $1.20 Initial Problem Solution, cont’d • Determine the cost for each acceptable part from each supplier. • Supplier B: Since σ = 0.012 mm, all parts within 1 standard deviation will be acceptable. • We know that 68% of the parts are within 1 standard deviation of the mean. • Each acceptable part costs $90 68 $1.32 Initial Problem Solution, cont’d • Overall each part from supplier B costs less than each part from supplier A, but more parts from B will have to be thrown away. • Each acceptable part from supplier B costs more than each acceptable part from supplier A. • They should choose supplier A. Section 11.3 Confidence Intervals • Goals • Study proportions • Study population proportions • Study sample proportions • Study confidence intervals • Study margin of error 11.3 Initial Problem • A candy company prints prize tickets inside the wrappers of some of their candy bars. • Suppose you buy 400 candy bars and find that 25 of them have prizes. If you buy 1000 more, how many prizes would you expect to win? • The solution will be given at the end of the section. Proportions • A fraction of the population under consideration is called a population proportion. • The notation for a population proportion is p. • For example, if 65,000,000 of 130,000,000 people support the President’s budget, the population proportion of people who support the budget is 65, 000, 000 p 130, 000, 000 50% Proportions, cont’d • A fraction of the sample being measured is called a sample proportion. • The notation for a sample proportion is p̂. • For example, if 198 of 413 people polled support the President’s budget, the sample proportion of people who support the budget is 198 pˆ 413 48% Example 1 • A college has 3520 freshman, of which 1056 have consumed an alcoholic beverage in the last 30 days. • Of the 50 students surveyed in a health class, 11 say they have had an alcoholic beverage in the last 30 days. • What are the population proportion and the sample proportion? Example 1, cont’d • Solution: The population is the 3520 freshmen at the college. • The population proportion is 1056 p 30% 3520 Example 1, cont’d • Solution, cont’d: The sample is the 50 students who were surveyed. • The sample proportion is 11 pˆ 22% 50 Example 1, cont’d • Solution, cont’d: Notice that the population proportion and the sample proportion were not identical. • The sample proportion can vary depending on what random sample of students is chosen. Example 1, cont’d • Solution, cont’d: A distribution of the sample proportions for various possible samples of this population is shown at right. Sample Proportions Distribution • If samples of size n are taken from a population having a population proportion p, then the set of all sample proportions has a mean and standard deviation of: • p • p 1 p n Sample Proportions, cont’d • If two conditions are met, then n is large enough and the distribution of sample proportions is approximately normal. • The conditions are: • p 3 p 1 p n 0 • p 3 p 1 p 1 n Example 2 • Suppose the population proportion of a group is 0.4, and we choose a simple random sample of size 30. • Find the mean and standard deviation of the set of all sample proportions. Example 2, cont’d • Solution: In this case, p = 0.4 and n = 30. • The mean is p 0.4 • The standard deviation is p 1 p n 0.4 1 0.4 30 0.09 Example 2, cont’d • Solution, cont’d: The sample proportion distribution is graphed below. Question: If a population proportion is known to be 0.25, is a sample size of 20 large enough to guarantee that the distribution of sample proportions is approximately normal? a. yes b. no Example 3 • Fox News asked 900 registered voters whether or not they would take a smallpox vaccine. • Suppose it is known that 60% of all Americans would take the vaccine. What is the approximate percentage of samples for which between 58% and 62% of voters in the sample would take the shot? Example 3, cont’d • Solution: We know that p = 0.6, so the mean of the sample proportion distribution is 0.6. • The sample size is n = 900, so the standard deviation is p 1 p n 0.6 1 0.6 900 0.02 Example 3, cont’d • Solution, cont’d: A normal curve is shown, labeled with sample proportion values as well as their z-scores. Example 3, cont’d • Solution, cont’d: Approximately 68% of the samples would show a sample proportion of between 58% and 62%. Standard Error • In most situations, we do not know the population proportion. • • The point of measuring the sample is to estimate the population proportion. The standard error is the standard deviation of the set of all sample proportions: sˆ pˆ 1 pˆ n Example 4 • What is the standard error in a sample of size 400 if the sample proportion in one sample is 35%? Example 4, cont’d • Solution: Use the formula from the previous slide: sˆ pˆ 1 pˆ n 0.35 1 0.35 400 0.024 Confidence Intervals • According to the 68-95-99.7 Rule, 95% of the time the sample proportion will be within 2 standard deviations of the population proportion. • A 95% confidence interval is the interval pˆ 2sˆ, pˆ 2sˆ Confidence Intervals, cont’d • For a 95% confidence interval, the margin of error is 2sˆ • Any value in the confidence interval is a reasonable estimate for the population proportion. Confidence Intervals, cont’d • For example, (a) and (b) below show good estimates while (c) shows an unlikely estimate. Example 5 • Determine the 95% confidence interval and the margin of error for a sample size of 400 with a sample proportion of 35%. Example 5, cont’d • Solution: In a previous example we found the standard error in this case to be 2.4%. • Calculate the confidence interval bounds: • • pˆ 2sˆ 35% 2 2.4% 30.2% • pˆ 2sˆ 35% 2 2.4% 39.8% The margin of error is 2sˆ 2 2.4% 4.8% Question: Find the 95% confidence interval for a sample size of 100 with a sample proportion of 25%. Round your answer to the nearest hundredth of a percent. a. (20.67%, 29.33%) b. (24.63%, 25.38%) c. (16.34%, 33.66%) d. (12.01%, 37.99%) Question: Find the margin of error for the 95% confidence interval in the previous question. Recall, the sample size was 100 and the sample proportion was 25%. Round to the nearest hundredth of a percent. a. ± 4.33% c. ± 2.17% b. ± 17.32% d. ± 8.66% Example 6 • In a sample of 600 U.S. citizens, 362 people say they drive an American-built car. • Find the 95% confidence interval and the margin of error for the proportion of the population that drive an Americanbuilt car. Example 6, cont’d • Solution: The sample proportion is: 362 pˆ 0.603 600 • The standard error is: sˆ pˆ 1 pˆ n 0.603 1 0.603 600 0.020 Example 6, cont’d • Solution, cont’d: Calculate the confidence interval bounds: • pˆ 2sˆ 60.3% 2 2% 56.3% • pˆ 2sˆ 60.3% 2 2% 64.3% • The margin of error is 2sˆ 2 2% 4% Example 6, cont’d • Solution, cont’d: With a confidence level of 95% we can say that 60.3% of Americans drive American-built cars, with a margin of error of ± 4%. Example 7 • In a survey of 1000 adults, 44% said they were satisfied with the quality of health care in the U.S. • The margin of error was reported as ± 3%. • Assuming a 95% confidence interval was used, verify that the margin of error is correct and explain what it means. Example 7, cont’d • Solution: We know n = 1000 and pˆ 0.44 • The standard error is sˆ pˆ 1 pˆ n 0.44 1 0.44 1000 0.0157 Example 7, cont’d • Solution, cont’d: The margin of error is 2sˆ 2 0.0157 0.0314 • The margin of error of approximately 3% indicates that the researchers are 95% confident that the true percentage of adults satisfied with health care in the U.S. is between 41% and 47%. Example 8 • A manufacturer tests 1000 computer chips and finds 216 defective ones. Find a 95% confidence interval for the population proportion of defective chips. Example 8, cont’d • Solution: We know n = 1000 and pˆ 0.216 • The standard error is sˆ pˆ 1 pˆ n 0.216 1 0.216 1000 0.013 Example 8, cont’d • Solution, cont’d: Calculate the confidence interval bounds: • pˆ 2sˆ 21.6% 2 1.3% 19.0% • pˆ 2sˆ 21.6% 2 1.3% 24.2% • The 95% confidence interval is (19.0%, 24.2%). Example 9 • A manufacturer tests 10,000 computer chips and finds 2160 defective ones. Find a 95% confidence interval for the population proportion of defective chips. • Does choosing a larger sample give significantly better results? Example 9, cont’d • Solution: We know n = 10,000 and pˆ 0.216 • The standard error is sˆ pˆ 1 pˆ n 0.216 1 0.216 10, 000 0.004 Example 9, cont’d • Solution, cont’d: Calculate the confidence interval bounds: • pˆ 2sˆ 21.6% 2 0.4% 20.8% • pˆ 2sˆ 21.6% 2 0.4% 22.4% • The 95% confidence interval is (20.8%, 22.4%). Example 9, cont’d • Solution, cont’d: Compare the results for sample sizes of 1000 and 10,000. • For n = 1000, the 95% confidence interval is (19.0%, 24.2%). • For n = 10,000, the 95% confidence interval is (20.8%, 22.4%). • Increasing the sample size to 10,000 does give a significantly better estimate of the population proportion. 11.3 Initial Problem Solution • A candy company prints prize tickets inside the wrappers of some of their candy bars. • Suppose you buy 400 candy bars and find that 25 of them have prizes. If you buy 1000 more, how many prizes would you expect to win? Initial Problem Solution, cont’d • For the first 400 candy bars you bought, the sample proportion of 25 winning bars is pˆ 400 0.0625 • The standard error is sˆ pˆ 1 pˆ n 0.0625 1 0.0625 400 0.0121 Initial Problem Solution, cont’d • Calculate the confidence interval bounds: • pˆ 2sˆ 6.25% 2 1.21% 3.83% • pˆ 2sˆ 6.25% 2 1.21% 8.67% • Out of 1000 new candy bars, you should expect between 3.83% and 8.67%, or between 38 and 87 bars, to be winners.