Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measures of Dispersion Week 4 Dispersion • Two groups of three students Group 1 4 7 10 Group 2 7 7 7 • Mean mark Group 1 4 + 7 + 10 = 21/3 = 7 Group 2 7 + 7 + 7 = 21/3 = 7 • Same mean mark, but Group 1’s marks are widely spread, Group 2’s are all the same • The following diagram reinforces this point 2 3 Range • The absolute difference between the highest and lowest value of the raw data • Group of students 4 7 10 • Range = Maximum – Minimum 10 – 4 = 6 4 Interquartile Range • This is the absolute difference between the upper and lower quartiles of the distribution. • Interquartile Range = Upper Quartile - Lower Quartile • See next powerpoints for estimating quartiles 5 Quartiles (1) • Upper quartile: that value for which 25% of the distribution is above it and 75% below • Lower quartile: that value for which 75% of the distribution is above it and 25% below 6 Quartiles (2) • If the data is ungrouped, then put the data in order in an array • Find the quartile position , then estimate its value, as previously for the median • Upper quartile (Q3): position = 3(n + 1) 4 • Lower quartile (Q1): position = (n + 1) 4 7 Quartiles (3) Example: ungrouped data: 3, 5, 6, 9, 15, 27, 30, 35, 37 • Lower quartile: position = n + 1 = 9 + 1 = 2.5th 4 4 Lower quartile: value = 5.5 (mid-way between 2nd and 3rd number in array) • Upper quartile: position = 3(n + 1) = 3(9 + 1) 4 4 = 7.5th Upper quartile: value = 32.5 (mid-way between 7th and 8th number in array) 8 Quartiles (4) • Grouped data: use the same approach as for estimating the median for grouped data in week 4, except this time use the quartile positions 9 Semi-Interquartile Range • This is half the interquartile range. It is sometimes called the Quartile Deviation • Semi-Interquartile Range = Upper Quartile - Lower Quartile 2 10 Example Using previous ungrouped data Interquartile range = UQ - LQ = 32.5 – 5.5 = 27 Semi-interquartile range = UQ - LQ 2 = 32.5 – 5.5 2 = 27 = 13.5 2 11 Mean Deviation • Average of the absolute deviations from the arithmetic mean (ignoring the sign) • When two straight lines (rather than curved brackets) surround a number or variable it is referred to as the modulus and we ignore the sign 12 Mean Deviation of ungrouped data • X1 = 2, X2 = 4, X3 = 3 • MD = X 1 X X 2 X X 3 X n • MD = 2 3 4 3 3 3 3 = 1 1 0 3 =⅔ 13 Variance • If we square all the deviations from the arithmetic mean, then we no longer need to bother with dropping the signs since all the values will be positive. • We can then replace the straight line brackets (modulus) for the Mean Deviation with the more usual round brackets. • Variance is the average of the squared deviations from the arithmetic mean 14 Variance: ungrouped data (1) • Variance = X n i 1 i X 2 n • To calculate the variance 1. Calculate the mean value X 2. Subtract the mean from each value in turn, that is, find X i X 2 3. Square each answer to get Xi X 15 Variance: ungrouped data (2) 4. Add up all these squared values to get X n i 1 X i 5. Divide the result by n to get X n i 1 1 X 2 2 n 6. You now have the average of the squared deviations from the mean (in square units) 16 Standard deviation (SD) • This is simply the square root of the variance • An advantage is that we avoid the square units of the variance • Larger SD, larger the average dispersion of data from the mean • Smaller SD, smaller the average dispersion of data from the mean 17 Example 1: variance/standard deviation xi x1 - x (x1 – x)2 4 7 10 Total 4–7=-3 7–7= 0 10 – 7 = 3 (-32) = 9 02 = 0 32 = 9 18 18 Solutions X n Variance = i 1 i X n 2 18 6 square units 3 Standard deviation is square root of 6 = 2.449 units 19 Example 2: variance/standard deviation xi xi - x 7 7 7 Total 7–7=0 7–7=0 7–7=0 (xi – x)2 02 = 0 02 = 0 02 = 0 0 20 Solution X n Variance = i 1 i X n 2 0 0 square units 3 Standard deviation is square root of 0 = 0 i.e. there is no spread of values 21 Variance of grouped data j S 2 F X i i 1 j F i 1 i Fi X i i 1 j Fi i 1 j 2 i 2 where Fi = Frequency of ith class interval Xi = mid point of ith class interval j = number of class intervals 22 Price of item (£) No of items sold LCB Fi UCB Xi FiXi FiXi^2 1.5 2.5 15 2 30 60 2.5 3.5 2 3 6 18 3.5 4.5 19 4 76 304 4.5 5.5 10 5 50 250 5.5 6.5 14 6 84 504 246 1136 60 23 1136 246 S 60 60 2 2 S2 = 18.93 – 4.12 S2 = 18.93 – 16.81 S2 = £2.122 S = √ 2.12 = £1.45 24 Co-efficient of variation (C of V) • A measure of relative dispersion S • Given by X i.e. the standard deviation divided by the arithmetic mean of the data. • Data sets with a higher co-efficient of variation have higher relative dispersion 25