Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary of Prev. Lecture Central Tendency Mode Median Highest frequency with Nominal or Category data Middle value that can avoid outliers' influence Mean Arithmetic Mean: First and Second Moment Geometric Mean Weighted Mean 1 Distribution Descriptor 2 1. Measure of Dispersion (2) Geography Jinmu Choi 2. Range and Percentile (2) 3. Mean Deviation, Variance, Std. Dev. (3) 4. Weighted Var. and Std. Dev., CV (3) 5. Skewness and Kurtosis (2) Summary and Next… 2 Dispersion Dispersion: How the values are concentrated or scattered around the mean and along the value line Very similar to the mean Quite different from the mean Just scattered around Xa: 1, 3, 5, 7, 9, 11, 13: Mean = Range = Xb: -11, -5, 1, 7, 13, 19, 25: Mean = Range = 3 Dispersion Measures Magnitude of dispersion Range: Maximum – Minimum Percentiles Mean deviations Standard deviations Direction and Sharpness Skewness Kurtosis 4 Range Range: Maximum – Minimum The greater the range in a data series, the more dispersed the data are Only how far the values are scattered Xb: -11, -5, 1, 7, 13, 19, 25 : Mean = Range = Xc: -11, -10, 6, 7, 8, 24, 25: Mean = Range = 5 Percentiles Milestones within the range of data Sorting and counting ¼, ½, ¾ of the total observations from the minimum Medium = ½ from the minimum = 50% Xb: -11, -5, 1, 7, 13, 19, 25 : Mean = Range = Percentile Xc: -11, -10, 6, 7, 8, 24, 25: Mean = Range = Percentile 6 Mean Deviation Dispersion using all values The average difference from all values to their mean Xa: 1, 3, 5, 7, 9, 11, 13: n xi x Mean Dev. = 3.4286 Xb: -11, -5, 1, 7, 13, 19, 25: D i 1 n Mean Dev. = 10.285 Only concern the distance of the values from the mean, not the direction M.:5 M.Dev. = 2.22… 1 2 3 4 5 6 7 8 9 M.:6 M.Dev. = 3.33… 1 2 3 4 5 6 7 8 18 7 Variance Squared difference from the mean Population variance n 2 x n 2 i i 1 n x 2 i i 1 n 2 Sample variance x x n S2 i 1 2 i n 1 2 xi xi i 1 i 1 n 1 n(n 1) n n 2 8 Standard Deviation Averaged squared deviation The magnitude or scale of the original dataset Mean: 201.23, Var.: 88432.30, Std. Dev. : 297.38 n x i 1 x x n 2 i S n i 1 2 i n 1 Resembling Normal distribution with Standard Dev. x x About 68% of the data value: About 95% of the data value: About 99% of the data value: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 x 2 x 2 x 3 x 3 M.:5 Std.Dev. = 2.58… M.:6 Std.Dev. = 4.76… 18 9 Weighted Variance Variance for grouped data n 2 n x 2 i i 1 n x 2 i i 1 n f x x f x k k 2 i i 2 w 2 i i xw 2 w2 i 1 k i 1 k Get the range for each group (class) fi fi Get mid value for each group (class) i 1 i 1 Put mid value for each observation 2 n n n 2 2 Calculate variance using list of mid values xi x xi xi i 1 2 S i 1 n 1 Range Mid value 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 4~50 50~200 50~200 50~200 200~1000 200~1000 200~1000 27 27 27 27 27 27 27 27 27 27 125 125 125 600 600 600 i 1 n 1 k S 2 w fx i 1 2 i i k f i 1 i n(n 1) xw 1 10 2 Weighted Standard Deviation Square root of weighted variance Sw Unweighted variance: 88432.30 Unweighted std. dev.: 297.38 Weighted variance: 1537.7615 Weighted std. dev.: 39.21 Why they are differ? Variations in each group have been removed fx i 1 k f i 1 2 1 i k w xw 2 i i i 1 Unweighted Vs. Weighted statistics k f i xi x w 2 k f i 1 i 11 Coefficient of Variation Problem of Mean, Variance: Sensitive to scale Standard deviation X: 1 3 5 7 9 11 13: mean 7, std. dev.: 4 Y: 10 30 50 70 90 110 130: mean 70, std.dev.: 40 Coefficient of variation To check just scale difference between two datasets S CV CV x x Mean: the center of the data Standard deviation: how much dispersion the data have Both (CV): difference in magnitude for comparing multiple datasets 12 Skewness Third moment statistic: Directional bias of the distribution of the data x x n Sk n 3 X axis: numerical range Y axis: frequency Positive skewness i Use frequency distribution (histogram) i 1 3 Bulk < Mean Negative skewness Mean < Bulk 13 Kurtosis Fourth moment statistic: Sharpness of the distribution of the data x x n K i n 4 3 Use histogram i 1 4 X axis: numerical range Y axis: frequency Kurtosis of normal dist.: 3 Normal distribution: K=0 High Kurtosis (sharp peak): K>0 Low Kurtosis (flat): K<0 14 Summary Dispersion Range: gives boundary Percentile: gives clustering of observation Mean Deviation: magnitude of dispersion Variance and Standard Deviation: magnitude of dispersion Weighted Variance and Standard Deviation: dispersion of grouped values Coefficient of Variation: removes scale differences Direction and Sharpness Skewness: direction from mean Kurtosis: sharpness compared to normal distribution 15 Next Lab3: Additional Statistics and MAUP Lecture 4: Relationship Descriptor 1. Correlation Analysis (Ch 3, pp.94-107) 16