Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Descriptive Statistics © 2002 Thomson / South-Western Slide 3-1 Learning Objectives • Distinguish between measures of central tendency, measures of variability, and measures of shape • Understand the meanings of mean, median, mode, quartile, percentile, and range • Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation © 2002 Thomson / South-Western Slide 3-2 Learning Objectives -- Continued • Differentiate between sample and population variance and standard deviation • Understand the meaning of standard deviation as it is applied by using the empirical rule • Understand box and whisker plots, skewness, and kurtosis © 2002 Thomson / South-Western Slide 3-3 Measures of Central Tendency • Measures of central tendency yield information about “particular places or locations in a group of numbers.” • Common Measures of Location –Mode –Median –Mean –Percentiles –Quartiles © 2002 Thomson / South-Western Slide 3-4 Mode • The most frequently occurring value in a data set • Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) • Bimodal -- Data sets that have two modes • Multimodal -- Data sets that contain more than two modes © 2002 Thomson / South-Western Slide 3-5 Mode -- Example • The mode is 44. • There are more 44s than any other value. © 2002 Thomson / South-Western 35 41 44 45 37 41 44 46 37 43 44 46 39 43 44 46 40 43 44 46 40 43 45 48 Slide 3-6 Median (ΔΙΑΜΕΣΟΣ) • Middle value in an ordered array of numbers. • Applicable for ordinal, interval, and ratio data • Not applicable for nominal data • Unaffected by extremely large and extremely small values. © 2002 Thomson / South-Western Slide 3-7 Median: Computational Procedure • First Procedure – Arrange observations in an ordered array. – If number of terms is odd, the median is the middle term of the ordered array. – If number of terms is even, the median is the average of the middle two terms. • Second Procedure – The median’s position in an ordered array is given by (n+1)/2. © 2002 Thomson / South-Western Slide 3-8 Median: Example with an Odd Number of Terms Ordered Array includes: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 • • • • There are 17 terms in the ordered array. Position of median = (n+1)/2 = (17+1)/2 = 9 The median is the 9th term, 15. If the 22 is replaced by 100, the median remains at 15. • If the 3 is replaced by -103, the median remains at 15. © 2002 Thomson / South-Western Slide 3-9 Mean (ΜΕΣΟΣ) • Is the average of a group of numbers • Applicable for interval and ratio data, not applicable for nominal or ordinal data • Affected by each value in the data set, including extreme values • Computed by summing all values in the data set and dividing the sum by the number of values in the data set © 2002 Thomson / South-Western Slide 3-10 Population Mean X X X X ... X 1 2 3 N N N 24 13 19 26 11 5 93 5 18. 6 © 2002 Thomson / South-Western Slide 3-11 Sample Mean X X X X ... X X 1 2 3 n n n 57 86 42 38 90 66 6 379 6 63.167 © 2002 Thomson / South-Western Slide 3-12 Quartiles Measures of central tendency that divide a group of data into four subgroups • Q1: 25% of the data set is below the first quartile • Q2: 50% of the data set is below the second quartile • Q3: 75% of the data set is below the third quartile © 2002 Thomson / South-Western Slide 3-13 Quartiles, continued • Q1 is equal to the 25th percentile • Q2 is located at 50th percentile and equals the median • Q3 is equal to the 75th percentile Quartile values are not necessarily members of the data set © 2002 Thomson / South-Western Slide 3-14 Quartiles Q2 Q1 25% © 2002 Thomson / South-Western 25% Q3 25% 25% Slide 3-15 Quartiles: Example • Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 • Q 1: 25 i (8) 2 100 109114 Q1 1115 . 2 • Q 2: 50 i (8) 4 100 116121 Q2 1185 . 2 • Q 3: 75 i (8) 6 100 122125 Q3 1235 . 2 © 2002 Thomson / South-Western Slide 3-16 Measures of Variability • Measures of variability describe the spread or the dispersion of a set of data. • Common Measures of Variability –Range –Interquartile Range –Mean Absolute Deviation –Variance –Standard Deviation – Z scores –Coefficient of Variation © 2002 Thomson / South-Western Slide 3-17 Variability No Variability in Cash Flow Variability in Cash Flow © 2002 Thomson / South-Western Mean Mean Mean Mean Slide 3-18 Variability Variability No Variability © 2002 Thomson / South-Western Slide 3-19 Range • The difference between the largest and the smallest values in a set of data • Simple to compute 35 41 44 45 • Ignores all data points 37 41 44 46 except the 37 43 44 46 two extremes 39 43 44 46 • Example: Range 40 43 44= 46 Largest - Smallest = 40 43 45 48 48 - 35 = 13 © 2002 Thomson / South-Western Slide 3-20 Interquartile Range • Range of values between the first and third quartiles • Range of the “middle half” • Less influenced by extremes Interquartile Range Q 3 Q1 © 2002 Thomson / South-Western Slide 3-21 Deviation from the Mean • Data set: 5, 9, 16, 17, 18 • Mean: X 65 13 N 5 • Deviations from the mean: -8, -4, 3, 4, 5 -4 -8 0 +3 +4 5 10 15 +5 20 © 2002 Thomson / South-Western Slide 3-22 Mean Absolute Deviation • Average of the absolute deviations from the mean X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 +8 +4 +3 +4 +5 24 © 2002 Thomson / South-Western M . A. D. X N 24 5 4.8 Slide 3-23 Population Variance • Average of the squared deviations from the arithmetic mean X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 © 2002 Thomson / South-Western 64 16 9 16 25 130 2 2 X N 130 5 2 6 .0 Slide 3-24 2 Population Standard Deviation • Square root of the variance X 5 9 16 17 18 X X -8 -4 +3 +4 +5 0 © 2002 Thomson / South-Western 64 16 9 16 25 130 2 2 X N 130 5 2 6 .0 2 2 6 .0 5 .1 Slide 3-25 2 Empirical Rule • Data are normally distributed (or approximately normal) Distance from the Mean 1 2 3 © 2002 Thomson / South-Western Percentage of Values Falling Within Distance 68 95 99.7 Slide 3-26 Sample Variance • Average of the squared deviations from the arithmetic mean X 2,398 1,844 1,539 1,311 7,092 X X X 625 71 -234 -462 0 © 2002 Thomson / South-Western X 390,625 5,041 54,756 213,444 663,866 2 S 2 X X n1 6 6 3 ,8 6 6 3 2 2 1 , 2 8 8 .6 7 Slide 3-27 2 Sample Standard Deviation • Square root of the sample variance X X X X X 2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866 2 S 2 X X n1 6 6 3 ,8 6 6 3 2 2 1 , 2 8 8 .6 7 S S 2 2 2 1 , 2 8 8 .6 7 4 7 0 .4 1 © 2002 Thomson / South-Western Slide 3-28 2 Coefficient of Variation • Ratio of the standard deviation to the mean, expressed as a percentage • Measurement of relative dispersion C.V . 100 © 2002 Thomson / South-Western Slide 3-29 Coefficient of Variation 84 29 1 1 4.6 CV . . 1 1 100 1 4.6 100 29 1586 . © 2002 Thomson / South-Western 2 2 10 CV . . 2 2 100 2 10 100 84 1190 . Slide 3-30 Measures of Shape • Skewness – Absence of symmetry – Extreme values in one side of a distribution • Kurtosis – Peakedness of a distribution • Box and Whisker Plots – Graphic display of a distribution – Reveals skewness © 2002 Thomson / South-Western Slide 3-31 Skewness Negatively Skewed © 2002 Thomson / South-Western Symmetric (Not Skewed) Positively Skewed Slide 3-32 Skewness Mean Median Mean Median Mode Negatively Skewed Symmetric (Not Skewed) Mode © 2002 Thomson / South-Western Mean Mode Median Positively Skewed Slide 3-33 Coefficient of Skewness • Summary measure for skewness S 3 Md • If S < 0, the distribution is negatively skewed (skewed to the left). • If S = 0, the distribution is symmetric (not skewed). • If S > 0, the distribution is positively skewed (skewed to the right). © 2002 Thomson / South-Western Slide 3-34 Coefficient of Skewness 1 M d1 S 1 1 23 26 M d2 12.3 2 3 1 M d1 1 3 23 26 12.3 0.73 © 2002 Thomson / South-Western S 2 2 26 26 M d3 12.3 3 3 2 M d2 2 3 26 26 12.3 0 S 3 3 29 26 12.3 3 3 M d3 3 3 29 26 12.3 0.73 Slide 3-35 Kurtosis • Peakedness of a distribution – Leptokurtic: high and thin – Mesokurtic: normal in shape – Platykurtic: flat and spread out Leptokurtic Mesokurtic Platykurtic © 2002 Thomson / South-Western Slide 3-36 Box and Whisker Plot • Five specific values are used: – Median, Q2 – First quartile, Q1 – Third quartile, Q3 – Minimum value in the data set – Maximum value in the data set © 2002 Thomson / South-Western Slide 3-37 Box and Whisker Plot, continued • Inner Fences – IQR = Q3 - Q1 – Lower inner fence = Q1 - 1.5 IQR – Upper inner fence = Q3 + 1.5 IQR • Outer Fences – Lower outer fence = Q1 - 3.0 IQR – Upper outer fence = Q3 + 3.0 IQR © 2002 Thomson / South-Western Slide 3-38 Box and Whisker Plot Minimum Q1 © 2002 Thomson / South-Western Q2 Q3 Maximum Slide 3-39 Skewness: Box and Whisker Plots, and Coefficient of Skewness S<0 Negatively Skewed © 2002 Thomson / South-Western S=0 Symmetric (Not Skewed) S>0 Positively Skewed Slide 3-40