Download variation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Descriptive statistics
Describing data with numbers:
measures of variability
What to describe?
• What is the “location” or “center” of the
data?
• How do the data vary?
Measures of Variability
•
•
•
•
Range
Interquartile range
Variance and standard deviation
Coefficient of variation
All of these measures are appropriate for
measurement data only.
Range
• The difference between largest and smallest
data point.
• Highly affected by outliers.
• Best for symmetric data with no outliers.
What is the range?
GPAs of Spring 1998 Stat 250 Students
Frequency
20
10
0
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
GPA
Range
Descriptive Statistics
Variable
GPA
N
92
Mean
3.0698
Variable
GPA
Minimum
2.0200
Median
3.1200
Maximum
3.9800
TrMean
3.0766
StDev
0.4851
Q1
2.6725
Range = 3.98 - 2.02 = 1.96
SE Mean
0.0506
Q3
3.4675
Interquartile range
• The difference between the “third
quartile” (75th percentile) and the “first
quartile” (25th percentile). So, the
“middle-half” of the values.
• IQR = Q3-Q1
• Robust to outliers or extreme observations.
• Works well for skewed data.
What is the Interquartile Range?
GPAs of Spring 1998 Stat 250 Students
Frequency
20
10
0
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
GPA
Interquartile range
Descriptive Statistics
Variable
GPA
N
92
Mean
3.0698
Variable
GPA
Minimum
2.0200
Median
3.1200
Maximum
3.9800
TrMean
3.0766
StDev
0.4851
Q1
2.6725
SE Mean
0.0506
Q3
3.4675
IQR = 3.4675 - 2.6725 = 0.795
Variance
2
(x

x
)
s2  
n 1
1. Find difference between
each data point and mean.
2. Square the differences, and
add them up.
3. Divide by one less than the
number of data points.
Variance
• If measuring variance of population,
denoted by 2 (“sigma-squared”).
• If measuring variance of sample, denoted by
s2 (“s-squared”).
• Measures average squared deviation of data
points from their mean.
• Highly affected by outliers. Best for
symmetric data.
• Problem is units are squared.
Standard deviation
• Sample standard deviation is square root of
sample variance, and so is denoted by s.
• Units are the original units.
• Measures average deviation of data points
from their mean.
• Also, highly affected by outliers.
What is the variance
or standard deviation?
Fastest Ever Driving Speed
226 Stat 100 Students, Fall '98
100
Men
126
Women
70
80
90
100 110 120 130 140 150 160
Speed (MPH)
Variance or standard deviation
Sex
N
female 126
male
100
female
male
Mean
91.23
06.79
Minimum
65.00
75.00
Median
90.00
110.00
Maximum
120.00
162.00
TrMean
90.83
105.62
StDev SE Mean
11.32
1.01
17.39
1.74
Q1
85.00
95.00
Q3
98.25
118.75
Females: s = 11.32 mph and s2 = 11.322 = 128.1 mph2
Males: s = 17.39 mph and s2 = 17.392 = 302.5 mph2
What is the variance
or standard deviation?
Fastest Ever Driving Speed
Sex
male
female
120
170
220
KPH
270
Variance or standard deviation
Sex
female
male
N
126
100
Mean
152.05
177.98
Sex
Minimum
female 108.33
male
125.00
Median
150.00
183.33
Maximum
200.00
270.00
TrMean
151.39
176.04
Q1
141.67
158.33
StDev SE Mean
18.86
1.68
28.98
2.90
Q3
163.75
197.92
Females: s = 18.86 kph and s2 = 18.862 = 355.7 kph2
Males: s = 28.98 kph and s2 = 28.982 = 839.8 kph2
Coefficient of Variation
• Ratio of sample standard deviation to
sample mean multiplied by 100.
• Measures relative variability, that is,
variability relative to the magnitude of the
data.
• Unitless, so good for comparing variation
between two groups.
Coefficient of variation (MPH)
Sex
N
Mean
female 126 91.23
male
100 106.79
female
male
Minimum
65.00
75.00
Median
90.00
110.00
Maximum
120.00
162.00
TrMean
90.83
105.62
StDev SE Mean
11.32
1.01
17.39
1.74
Q1
85.00
95.00
Females: CV = (11.32/91.23) x 100 = 12.4
Males: CV = (17.39/106.79) x 100 = 16.3
Q3
98.25
118.75
Coefficient of variation (KPH)
Sex
female
male
N
126
100
Mean
152.05
177.98
Sex
Minimum
female 108.33
male
125.00
Median
150.00
183.33
Maximum
200.00
270.00
TrMean
151.39
176.04
Q1
141.67
158.33
StDev SE Mean
18.86
1.68
28.98
2.90
Q3
163.75
197.92
Females: CV = (18.86/152.05) x 100 = 12.4
Males: CV = (28.98/177.98) x 100 = 16.3
The most appropriate measure
of variability depends on …
the shape of the data’s
distribution.
Choosing Appropriate
Measure of Variability
• If data are symmetric, with no serious
outliers, use range and standard deviation.
• If data are skewed, and/or have serious
outliers, use IQR.
• If comparing variation across two data sets,
use coefficient of variation.
Related documents