Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2. Data Compression for One Variable George Duncan 90-786 Intermediate Empirical Methods for Public Policy and Management Lecture 2: Data Compression for One Variable Forms of data compression Complex thinking about simple means Links between centers and spreads Use of Minitab Forms of Data Compression: Relation to Level of Measurement Level of Measurement Description Nominal Ordinal Interval Summary of Observations Frequency table Bar Chart Pie Chart Frequency table Bar Chart Frequency table Histogram Box Plot One-way scatterplot Central Tendency Mode Median Mean Median Dispersion Relative frequency Interquartile range of the mode Standard deviation Example How prevalent is the mayor-council form of government? What are the units of analysis? How many units have been observed? How many cases are in the sample? What type of analysis do we have? What variables are being measured? What is the level of measurement? Form of Government in Cities Under 25,000 Population in Kansas Form of Government No. 1 2 3 4 5 6 ... City Abilene Andale Andover Atchison Beloit Cherryvale ... 74 Symbolic Code CM MC MC CM MC CO ... Winfield CM = 1, council-manager MC = 2, mayor-council CO = 3, commission Numerical Code 1 2 2 1 2 3 ... CM 1 Governance Frequency Table Value Form of Government Absolute Relative Frequency Frequency Number of Proportion Percentage Observations 1 Council-Manager 37 0.50 50% 2 Mayor-Council 32 0.43 43.2% 3 Commission 5 0.07 6.8% 74 1.00 100% Total Governance Bar Chart 40 35 30 25 20 15 10 5 0 Council-Manager Mayor-Council Commission Governance Pie Chart 2. Mayor-council 43.2% (32) 3. Commission 6.8% (5) 1. Council-manager 50% (37) Quality of Fire Departments Fire Insurance Class Number Relative Frequency Cumulative Frequency 1 1 0.30% 0.30 2 45 13.35 13.65 3 148 43.92 57.57 4 98 29.08 86.65 5 35 10.39 97.03 6 8 2.37 99.41 7 1 0.30 99.70 8 1 0.30 100.00 9 0 0.00 100.00 10 0 0.00 100.00 Total 337 100.00% Fire Insurance Bar Chart 160 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Garbage Collection Tons of Trash Collected by the City of Normal, Oklahoma for the Week of June 8, 1992 Tons of Garbage 50-60 60-70 70-80 Number of Observations 15 25 30 80-90 90-100 20 10 Total 100 Garbage Histogram Frequency 30 25 20 15 10 5 0 50-60 60-70 70-80 80-90 90-100 Tons of Garbage Measures of Central Tendency Median = 73 tons Mode = 75 tons Mean (average of all observed values ) x = 72.97 Where: x = xi n Measures of Dispersion Range = Max - Min Variance = S 2 Standard Deviation = S 2 where: S = (xi - x) Coefficient of Variation = n-1 S x 2 Measure of Dispersion: Garbage Example Range = 97 - 50 = 47 Variance = 151.3 Standard Deviation = 12.3 Coefficient of Variation = 0.17 Box Plot Outer fence = Q + 3.0 *IQR 3 o Outlier (extreme data value) Inner fence = Q 3 + 1.5 *IQR Whisker Q 3 75th percentile Median Q1 25th percentile Interquartile range, IQR = ( Q 3 - Q1 ) Whisker Inner fence = Q - 1.5 *IQR 1 Outer fence = Q - 3.0 *IQR 1 Garbage Box Plot Max = 97 Q 3 = 82.25 Median = 73 Q 1 = 64 Min = 50 Shapes of Distribution Positive skewness Symmetric distribution Mean > Median Mean = Median Negative skewness Mean < Median Complex Thinking about Simple Means The mean time served for drug law violation by prisoners released from U.S. Federal prisons during 1965 to 1980 was 22.4 months. The median family income in Texas in 1975 was $12,672. The modal number of commercial TV stations in 1980 among the fifty U.S. states was 12 per state. Applications of a Mean Earnings of workers in the automobile industry averaged $577.30 per week in the U.S. for 1986. The mean temperature in MinneapolisSt. Paul during January is minus 12 degrees Celsius. The U.S. national rate of motor-vehicle traffic deaths per 100,000 population in 1985 was 18.8. Means can be tricky! Quality of Life Index Country A B C 1965 Population Index 20 30 10 100 70 20 1975 Population 22 34 32 Index 104 76 33 Calculate the average (per capita) quality of life, separately for 1965 and 1975. Explain why the 1975 average is lower than the 1965 average, even though the quality of life has increased in every country. Links between Centers and Spreads Data = Fit + Residual X Fit Z Y Locate Fit to Minimize a Function of the Residuals Mean and Standard Deviation Average Deviation is Zero Sum of Squared Deviations is Minimized Median and Average Absolute Deviation No more than half of the residuals are less than zero and no more than half of the residuals are greater than zero. The sum of the absolute values of the residuals is as small as possible. Mode and Percentage of Misses As many as possible of the residuals are zero. Next Time ... Friday Workshop--Minitab Applications Lecture 3--Data Compression for Two Variables: Scatterplots