Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive statistics Describing data with numbers: measures of variability What to describe? • What is the “location” or “center” of the data? • How do the data vary? Measures of Variability • • • • Range Interquartile range Variance and standard deviation Coefficient of variation All of these measures are appropriate for measurement data only. Range • The difference between largest and smallest data point. • Highly affected by outliers. • Best for symmetric data with no outliers. What is the range? GPAs of Spring 1998 Stat 250 Students Frequency 20 10 0 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 GPA Range Descriptive Statistics Variable GPA N 92 Mean 3.0698 Variable GPA Minimum 2.0200 Median 3.1200 Maximum 3.9800 TrMean 3.0766 StDev 0.4851 Q1 2.6725 Range = 3.98 - 2.02 = 1.96 SE Mean 0.0506 Q3 3.4675 Interquartile range • The difference between the “third quartile” (75th percentile) and the “first quartile” (25th percentile). So, the “middle-half” of the values. • IQR = Q3-Q1 • Robust to outliers or extreme observations. • Works well for skewed data. What is the Interquartile Range? GPAs of Spring 1998 Stat 250 Students Frequency 20 10 0 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 GPA Interquartile range Descriptive Statistics Variable GPA N 92 Mean 3.0698 Variable GPA Minimum 2.0200 Median 3.1200 Maximum 3.9800 TrMean 3.0766 StDev 0.4851 Q1 2.6725 SE Mean 0.0506 Q3 3.4675 IQR = 3.4675 - 2.6725 = 0.795 Variance 2 (x x ) s2 n 1 1. Find difference between each data point and mean. 2. Square the differences, and add them up. 3. Divide by one less than the number of data points. Variance • If measuring variance of population, denoted by 2 (“sigma-squared”). • If measuring variance of sample, denoted by s2 (“s-squared”). • Measures average squared deviation of data points from their mean. • Highly affected by outliers. Best for symmetric data. • Problem is units are squared. Standard deviation • Sample standard deviation is square root of sample variance, and so is denoted by s. • Units are the original units. • Measures average deviation of data points from their mean. • Also, highly affected by outliers. What is the variance or standard deviation? Fastest Ever Driving Speed 226 Stat 100 Students, Fall '98 100 Men 126 Women 70 80 90 100 110 120 130 140 150 160 Speed (MPH) Variance or standard deviation Sex N female 126 male 100 female male Mean 91.23 06.79 Minimum 65.00 75.00 Median 90.00 110.00 Maximum 120.00 162.00 TrMean 90.83 105.62 StDev SE Mean 11.32 1.01 17.39 1.74 Q1 85.00 95.00 Q3 98.25 118.75 Females: s = 11.32 mph and s2 = 11.322 = 128.1 mph2 Males: s = 17.39 mph and s2 = 17.392 = 302.5 mph2 What is the variance or standard deviation? Fastest Ever Driving Speed Sex male female 120 170 220 KPH 270 Variance or standard deviation Sex female male N 126 100 Mean 152.05 177.98 Sex Minimum female 108.33 male 125.00 Median 150.00 183.33 Maximum 200.00 270.00 TrMean 151.39 176.04 Q1 141.67 158.33 StDev SE Mean 18.86 1.68 28.98 2.90 Q3 163.75 197.92 Females: s = 18.86 kph and s2 = 18.862 = 355.7 kph2 Males: s = 28.98 kph and s2 = 28.982 = 839.8 kph2 Coefficient of Variation • Ratio of sample standard deviation to sample mean multiplied by 100. • Measures relative variability, that is, variability relative to the magnitude of the data. • Unitless, so good for comparing variation between two groups. Coefficient of variation (MPH) Sex N Mean female 126 91.23 male 100 106.79 female male Minimum 65.00 75.00 Median 90.00 110.00 Maximum 120.00 162.00 TrMean 90.83 105.62 StDev SE Mean 11.32 1.01 17.39 1.74 Q1 85.00 95.00 Females: CV = (11.32/91.23) x 100 = 12.4 Males: CV = (17.39/106.79) x 100 = 16.3 Q3 98.25 118.75 Coefficient of variation (KPH) Sex female male N 126 100 Mean 152.05 177.98 Sex Minimum female 108.33 male 125.00 Median 150.00 183.33 Maximum 200.00 270.00 TrMean 151.39 176.04 Q1 141.67 158.33 StDev SE Mean 18.86 1.68 28.98 2.90 Q3 163.75 197.92 Females: CV = (18.86/152.05) x 100 = 12.4 Males: CV = (28.98/177.98) x 100 = 16.3 The most appropriate measure of variability depends on … the shape of the data’s distribution. Choosing Appropriate Measure of Variability • If data are symmetric, with no serious outliers, use range and standard deviation. • If data are skewed, and/or have serious outliers, use IQR. • If comparing variation across two data sets, use coefficient of variation.