Download Range and Percentile (2)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Summary of Prev. Lecture


Central Tendency
Mode


Median


Highest frequency with Nominal or Category data
Middle value that can avoid outliers' influence
Mean



Arithmetic Mean: First and Second Moment
Geometric Mean
Weighted Mean
1
Distribution Descriptor 2
1. Measure of Dispersion (2)
Geography
Jinmu Choi
2. Range and Percentile (2)
3. Mean Deviation, Variance, Std. Dev. (3)
4. Weighted Var. and Std. Dev., CV (3)
5. Skewness and Kurtosis (2)
Summary and Next…
2
Dispersion

Dispersion: How the values are concentrated or
scattered around the mean and along the value
line



Very similar to the mean
Quite different from the mean
Just scattered around
Xa: 1, 3, 5, 7, 9, 11, 13: Mean =
Range =
Xb: -11, -5, 1, 7, 13, 19, 25: Mean =
Range =
3
Dispersion Measures

Magnitude of dispersion





Range: Maximum – Minimum
Percentiles
Mean deviations
Standard deviations
Direction and Sharpness


Skewness
Kurtosis
4
Range

Range: Maximum – Minimum


The greater the range in a data series, the more
dispersed the data are
Only how far the values are scattered
Xb: -11, -5, 1, 7, 13, 19, 25 : Mean =
Range =
Xc: -11, -10, 6, 7, 8, 24, 25: Mean =
Range =
5
Percentiles

Milestones within the range of data


Sorting and counting ¼, ½, ¾ of the total
observations from the minimum
Medium = ½ from the minimum = 50%
Xb: -11, -5, 1, 7, 13, 19, 25 : Mean =
Range =
Percentile
Xc: -11, -10, 6, 7, 8, 24, 25: Mean =
Range =
Percentile
6
Mean Deviation

Dispersion using all values
The average difference from all values to their mean
Xa: 1, 3, 5, 7, 9, 11, 13:
n
xi  x
Mean Dev. = 3.4286
Xb: -11, -5, 1, 7, 13, 19, 25:
D  i 1
n
Mean Dev. = 10.285
 Only concern the distance of the values from the
mean, not the direction
M.:5 M.Dev. = 2.22…
1 2 3 4 5 6 7 8 9
M.:6 M.Dev. = 3.33…


1
2
3
4
5
6
7
8
18
7
Variance


Squared difference from the mean
Population variance
n
2 

 x   
n
2
i
i 1
n

 x 
2
i
i 1
n
 2
Sample variance
 x  x 
n
S2 
i 1
2
i
n 1
2


xi    xi 

 i 1
  i 1 
n 1
n(n  1)
n
n
2
8
Standard Deviation

Averaged squared deviation


The magnitude or scale of the original dataset
Mean: 201.23, Var.: 88432.30, Std. Dev. : 297.38
n
 x   


i 1
 x  x 
n
2
i
S
n
i 1
2
i
n 1
Resembling Normal distribution with Standard Dev.
x   x  
 About 68% of the data value:


About 95% of the data value:
About 99% of the data value:
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
9
x  2  x  2
x  3  x  3
M.:5 Std.Dev. = 2.58…
M.:6 Std.Dev. = 4.76…
18
9
Weighted Variance

Variance for grouped data




n
2 
n
 x   
2
i
i 1

n
 x 
2
i
i 1
n
 f x  x   f x
k
k
2
i
i
 2
w
2
i i
 xw
2
 w2  i 1 k
 i 1 k
Get the range for each group (class)
fi
fi


Get mid value for each group (class)
i 1
i 1
Put mid value for each observation
2
n
n
n
2


2
Calculate variance using list of mid values
xi  x   xi    xi 

 i 1 
2
S 

i 1
n 1
Range
Mid value
4~50
4~50
4~50
4~50
4~50
4~50
4~50
4~50
4~50
4~50
50~200
50~200
50~200
200~1000
200~1000
200~1000
27
27
27
27
27
27
27
27
27
27
125
125
125
600
600
600
i 1
n 1

k
S 
2
w
fx
i 1
2
i i
k
f
i 1
i
n(n  1)
 xw
1
10
2
Weighted Standard Deviation


Square root of weighted variance
Sw 
Unweighted variance: 88432.30
 Unweighted std. dev.: 297.38
 Weighted variance: 1537.7615
 Weighted std. dev.: 39.21
Why they are differ?
 Variations in each group
have been removed
fx
i 1
k
f
i 1
2
1
i
 
k
w 
 xw
2
i i
i 1
Unweighted Vs. Weighted statistics


k
f i xi  x w

2
k
f
i 1
i
11
Coefficient of Variation


Problem of Mean, Variance: Sensitive to scale
Standard deviation
X: 1 3 5 7 9 11 13: mean 7, std. dev.: 4
Y: 10 30 50 70 90 110 130: mean 70, std.dev.: 40

Coefficient of variation




To check just scale difference between two datasets
S

CV 
CV 
x
x
Mean: the center of the data
Standard deviation: how much dispersion the data have
Both (CV): difference in magnitude for comparing multiple
datasets
12
Skewness

Third moment statistic:
Directional bias of the
distribution of the data
 x  x 
n
Sk 


n 3
X axis: numerical range
Y axis: frequency
Positive skewness


i
Use frequency distribution
(histogram)


i 1
3
Bulk < Mean
Negative skewness

Mean < Bulk
13
Kurtosis

Fourth moment statistic:
Sharpness of the
distribution of the data
 x  x 
n
K





i
n
4
3
Use histogram


i 1
4
X axis: numerical range
Y axis: frequency
Kurtosis of normal dist.: 3
Normal distribution: K=0
High Kurtosis (sharp peak):
K>0
Low Kurtosis (flat): K<0
14
Summary

Dispersion







Range: gives boundary
Percentile: gives clustering of observation
Mean Deviation: magnitude of dispersion
Variance and Standard Deviation: magnitude of
dispersion
Weighted Variance and Standard Deviation:
dispersion of grouped values
Coefficient of Variation: removes scale differences
Direction and Sharpness


Skewness: direction from mean
Kurtosis: sharpness compared to normal distribution
15
Next


Lab3: Additional Statistics and MAUP
Lecture 4: Relationship Descriptor 1.
Correlation Analysis
(Ch 3, pp.94-107)
16
Related documents