Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MEASURES OF DISPERSION (VARIABILITY) • • • • Range Variance and Standard Deviation Coefficient of Variation Non-central Locations: Inter-fractile Ranges Range Ungrouped data: range = max data value – minimum data value grouped data: range = upper limit of largest class – lower limit of smallest class Standard Deviation Statistical measure which expresses the average deviation (spread) about the mean. The sample standard deviation is given by: s s 2 2 x ( x ) /n n 1 2 2 fx ( fx ) /n n 1 (ungrouped data) (grouped data) COEFFICIENT OF VARIATION: Relative measure of dispersion Ratio of the standard deviation to the arithmetic mean, expressed as a percentage: s CV (100%) x The higher the value of CV, the greater the variability Example: A sample of movie theaters in a large metropolitan area tallied the total number of movies showing in a certain week. Class interval 1–3 4–6 7–9 10 – 12 13 – 15 Frequency (f) 1 4 3 2 1 Σ = 11 Compute each of the following quantities: (a) range (b) standard deviation (c) coefficient of variation. 3- 5 (a) Range = 15-1 = 14 (b) s fx 2 ( fx) / n 2 n 1 Movies frequency class f showing midpoint x 1–3 1 2 fx fx2 2 4 4–6 4 5 20 100 7–9 3 8 24 192 10- 12 2 11 22 242 13 - 15 1 14 14 196 82 734 Total 11 734 82 82 / 11 s 3.5 10 (c) s CV (100%) x s 3.5, x 7.46 s 3.5 CV 100% 46.9% x 7.46 3- 9 Empirical Rule: For any symmetrical, bell-shaped distribution: About 68% of the observations will lie within one standard deviation of the mean About 95% of the observations will lie within two standard deviations of the mean Virtually all the observations will be within three standard deviations of the mean 3- 10 Bell-shaped curve showing relationship between m and σ 68% 95% 99.7% m3s m2s ms m m+s m+2s m+ 3s Symmetric Distribution Zero skewness → :Mean =Median = Mode Mean Median Mode The Relative Positions of the Mean, Median, and Mode: 3- 11 Positively skewed (right skewed) Mean>Median>Mode Mode Mean Median The Relative Positions of the Mean, Median and Mode. 3- 12 3- 13 Negatively Skewed (left skewed) Mean<Median<Mode Mean Mode Median The Relative Positions of the Mean, Median and Mode Non-Central Location Measures (Fractiles or Quantiles) Fractiles (Quantiles) Divide size-ordered data sets into subsets of equal frequency: • Quartiles • Sextiles • Octiles • Deciles • percentiles Most commonly used are quartiles and percentiles. Quartiles: Divide size-ordered data sets into four equalfrequency subsets, i.e they define boundaries of datasets split into 4 equal-frequency classes. Lower (first) quartile Q1: Identifies the value below which the lower 25% of the ordered data set lies. (Q1is also the 25th percentile) Middle (second) quartile Q2: Identifies the value that separates the lower 50% of the data set from the upper 50% of the data set. Note: Q2 is also the median or 50th percentile) Upper (third) quartile Q3: Identifies the value above which the top 25% of the ordered data set lies. (Q3 is also the 75th percentile) Percentiles divide an ordered data set into 100 equal-frequency parts. A Percentile is a data value below which a specified percentage of data values in an ordered data set fall. CALCULATING QUARTILES FOR GROUPED DATA The jth quartile for grouped data is given by: jn F c 4 Qj L + fQ j n = sample size L = lower limit of jth quartile class F = < cumulative frequency of immediately preceding class. fQj = frequency of jth quartile class. Measures of Dispersion using Fractiles Interquartile Range (IQR): Difference between Q3 and Q1. Includes the middle 50% of the observations. IQR = Q3 - Q1 Middle 80% range = P90 – P10 Middle 40% range = P70 –P30 Quartile deviation QD= (Q3-Q1)/2 Quartile Deviation (QD): Measure of spread about the median. Equals half the difference between Q3 and Q1. Q3 Q1 IQR QD 2 2 CALCULATING PERCENTILES GROUPED DATA The jth percentile for grouped data is given by: jn F c 100 Pj L + fP j n = sample size. L = lower limit of jth percentile class. F = < cumulative frequency of immediately preceding class. fPj = frequency of jth percentile class. 3- 23 Example: A sample of movie theaters in a large metropolitan area tallied the total number of movies showing in a certain week. Find the 70th percentile and the IQR. Class interval Frequency (f) 1–3 1 4–6 4 7–9 3 10 – 12 2 13 – 15 1 Σ = 11 Movies showing frequency f < cum freq (F) 1–3 1 1 4–6 4 5 7–9 3 8 10- 12 2 10 13 - 15 1 11 Pos P70 = 70x11/100 = 7.7 Movies freq f 1–3 4–6 7–9 10 – 12 13 – 15 1 4 3 2 1 < cum freq (F) 1 5 8 10 11 (7.7 5) P70 7 + 3 7 + 2.7 9.7 3 Pos Q1 = 1x11/4 = 2.75 Pos Q3 = 3 x 2.75 = 8.25 Movies freq f 1–3 4–6 7–9 10 – 12 13 – 15 1 4 3 2 1 (2.75 1) Q1 4 + 3 5.3 4 < cum freq (F) 1 5 8 10 11 (8.25 8) Q3 10 + 3 10.4 2 IQR = Q3 - Q1=10.4 - 5.3 = 5.1