Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Section 2 Measures of Dispersion Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 2 of 27 Chapter 3 – Section 2 ● Comparing two sets of data ● The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data ● The measures of dispersion in this section measure the differences between how far “spread out” the data values are Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 3 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 4 of 27 Chapter 3 – Section 2 ● The range of a variable is the largest data value minus the smallest data value ● Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 ● The largest value is 11 ● The smallest value is 1 ● Subtracting the two … 11 – 1 = 10 … the range is 10 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 5 of 27 Chapter 3 – Section 2 ● The range only uses two values in the data set – the largest value and the smallest value ● The range is not resistant ● If we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 ● The range is now ( 6000 – 1 ) = 5999 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 6 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 7 of 27 Chapter 3 – Section 2 ● The variance is based on the deviation from the mean ( xi – μ ) for populations ( xi – x ) for samples ● To treat positive differences and negative differences, we square the deviations ( xi – μ )2 for populations ( xi – x )2 for samples Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 8 of 27 Chapter 3 – Section 2 ● The population variance of a variable is the sum of these squared deviations divided by the number in the population 2 2 2 2 (x μ) (x μ) (x μ) ... (x μ) i 2 N 1 N N ● The population variance is represented by σ2 ● Note: For accuracy, use as many decimal places as allowed by your calculator Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 9 of 27 Chapter 3 – Section 2 ● Compute the population variance of 6, 1, 2, 11 ● Compute the population mean first μ = (6 + 1 + 2 + 11) / 4 = 5 ● Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 ● Average the squared deviations (16 + 9 + 1 + 36) / 4 = 15.5 ● The population variance σ2 is 15.5 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 10 of 27 Chapter 3 – Section 2 ● The sample variance of a variable is the sum of these squared deviations divided by one less than the number in the sample 2 (x1 x )2 (x2 x )2 ... (xn x )2 (xi x ) n -1 n 1 ● The sample variance is represented by s2 ● We say that this statistic has n – 1 degrees of freedom Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 11 of 27 Chapter 3 – Section 2 ● Compute the sample variance of 6, 1, 2, 11 ● Compute the sample mean first x = (6 + 1 + 2 + 11) / 4 = 5 ● Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 ● Average the squared deviations (16 + 9 + 1 + 36) / 3 = 20.7 ● The sample variance s2 is 20.7 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 12 of 27 Chapter 3 – Section 2 ● Why are the population variance (15.5) and the sample variance (20.7) different for the same set of numbers? ● In the first case, { 6, 1, 2, 11 } was the entire population (divide by N) ● In the second case, { 6, 1, 2, 11 } was just a sample from the population (divide by n – 1) ● These are two different situations Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 13 of 27 Chapter 3 – Section 2 ● Why do we use different formulas? ● The reason is that using the sample mean is not quite as accurate as using the population mean ● If we used “n” in the denominator for the sample variance calculation, we would get a “biased” result ● Bias here means that we would tend to underestimate the true variance Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 14 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 15 of 27 Chapter 3 – Section 2 ● The standard deviation is the square root of the variance ● The population standard deviation Is the square root of the population variance (σ2) Is represented by σ ● The sample standard deviation Is the square root of the sample variance (s2) Is represented by s Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 16 of 27 Chapter 3 – Section 2 ● If the population is { 6, 1, 2, 11 } The population variance σ2 = 15.5 The population standard deviation σ = 15.5 3.9 ● If the sample is { 6, 1, 2, 11 } The sample variance s2 = 20.7 The sample standard deviation s = 20.7 4.5 ● The population standard deviation and the sample standard deviation apply in different situations Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 17 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 18 of 27 Chapter 3 – Section 2 ● The standard deviation is very useful for estimating probabilities Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 19 of 27 Chapter 3 – Section 2 ● The empirical rule ● If the distribution is roughly bell shaped, then Approximately 68% of the data will lie within 1 standard deviation of the mean Approximately 95% of the data will lie within 2 standard deviations of the mean Approximately 99.7% of the data (i.e. almost all) will lie within 3 standard deviations of the mean Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 20 of 27 Chapter 3 – Section 2 ● For a variable with mean 17 and standard deviation 3.4 Approximately 68% of the values will lie between (17 – 3.4) and (17 + 3.4), i.e. 13.6 and 20.4 Approximately 95% of the values will lie between (17 – 2 3.4) and (17 + 2 3.4), i.e. 10.2 and 23.8 Approximately 99.7% of the values will lie between (17 – 3 3.4) and (17 + 3 3.4), i.e. 6.8 and 27.2 ● A value of 2.1 (less than 6.8) and a value of 33.2 (greater than 27.2) would both be very unusual Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 21 of 27 Chapter 3 – Section 2 ● Learning objectives 1 2 3 4 5 The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Use Chebyshev’s inequality Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 22 of 27 Chapter 3 – Section 2 ● Chebyshev’s inequality gives a lower bound on the percentage of observations that lie within k standard deviations of the mean (where k > 1) ● This lower bound is An estimated percentage The actual percentage for any variable cannot be lower than this number ● Therefore the actual percentage must be this value or higher Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 23 of 27 Chapter 3 – Section 2 ● Chebyshev’s inequality ● For any data set, at least 1 1 100% k 2 of the observations will lie within k standard deviations of the mean, where k is any number greater than 1 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 24 of 27 Chapter 3 – Section 2 ● How much of the data lies within 1.5 standard deviations of the mean? ● From Chebyshev’s inequality 1 1 100% 55.6% 2 1.5 so that at least 55.6% of the data will lie within 1.5 standard deviations of the mean Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 25 of 27 Chapter 3 – Section 2 ● If the mean is equal to 20 and the standard deviation is equal to 4, how much of the data lies between 14 and 26? ● 14 to 26 are 1.5 standard deviations from 20 1 1 100% 55.6% 2 1 . 5 so that at least 55.6% of the data will lie between 14 and 26 Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 26 of 27 Summary: Chapter 3 – Section 2 ● Range The maximum minus the minimum Not a resistant measurement ● Variance and standard deviation Measures deviations from the mean Not a resistant measurement ● Empirical rule About 68% of the data is within 1 standard deviation About 95% of the data is within 2 standard deviations Sullivan – Fundamentals of Statistics – 2nd Edition – Chapter 3 Section 2 – Slide 27 of 27