* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download There is a strong connection between mean and variance, and between median and MAD
Survey
Document related concepts
Transcript
Connections Between Measures of Center and Measures of Dispersion Not all measures are created equal, and they don't necessarily play best with each other. It is pretty intuitive that if we decided to describe the dispersion of a data set using its range (the difference between its largest and its smallest entry), the natural measure of center, corresponding to it, would be the midrange (the midpoint between the maximum and minimum value) It is less obvious that the two main measures of center have each a preferred “partner” as a measure of dispersion. The idea is that the median happens to be the number for which the mean absolute deviation of the data from it is the smallest. Similarly, the mean (the arithmetical average) happens to be the number for which the variance of the data, computed with reference to it, is the smallest. To illustrate it, let's look at a few concrete examples. Mean Absolute Deviation/Median There is a difference whether the number of data points is even or odd. Three data points: – 3, 1, 4. Choosing a “measure of center” m, we plot ∣−3−m∣+∣1−m∣+∣4−m∣ : −5,−1, 2, 4 . Choosing a “measure of center” m, we plot ∣−5−m∣+∣−1−m∣+∣2−m∣+∣4−m∣ : Four data points: n The mean absolute deviation, is defined, for data x 1 , x 2 ,… , x n , as ∑k=1 ∣x k−m∣ n only, since dividing by the constant number n does not affect where the minimum is reached. . We plotted the numerator As you can see, if we have an odd number of data, the lowest value for the mean absolute deviation is the data point that sits in the middle, that is, the one that has as many data points that are smaller, as that are larger. When the number of points is even, there is no such special data, so any number between the two data that are in the middle will work. Conventionally, people give the midpoint −1+2 1 = between these two as the median (in our second example, that would be ), but this is for 2 2 convenience only, to avoid an apparently “vague” statement like “any number between – 1 and 2”. Don't be fooled by assertions that suggest that this midpoint is really special. Variance/Mean With the same four data points, choosing a “measure of center” μ, the square distance is (−5−μ)2 +(−1−μ) 2+(2−μ)2 +(4−μ )2=4μ 2−2⋅(−5−1+ 2+ 4)⋅μ +(−5)2 +(−1)2+ 2 2+ 42 The vertex of b 2 this quadratic function is (recall, for a x + b x +c it is at − ) will be at 2a −2⋅(−5−1+ 2+4) −5−1+ 2+ 4 2 μ=− = , that is at the mean (in this case, 0). The function is 4 μ + 21 : 2⋅4 4 n 1 The formula for the variance is ∑k=1 ( x k −μ ) 2 . This is an object that is much easier to handle with n pencil and paper, and it is much more useful when dealing with samples coming from a normal distribution model (which we will discuss soon). Given that classical statistics had to confine itself mostly to such model, which are much simpler to handle analytically than most others, the role of this pair is dominant in the realm where we will be confined in our course. Incidentally, an equivalent formula for the variance, which is often computationally easier is 1 n 2 2 x −μ . n ∑ k=1 k 1 This is what is commonly called the population variance. The so-called sample variance has the same definition, but with a denominator of n−1 , instead of n. The reason for its introduction is pretty weak: if we are looking at a sample from a probabilistic model, the sample variance is an “unbiased estimator” of the variance of the model, whereas the population variance is biased. Since this property has very little, if any, practical consequence, it hardly makes it a necessity to rely on “sample”, rather than “population”. In fact, the population variance is the Maximum Likelihood Estimator for the variance of the model, and one can argue that this is a much more significant property. However, formulas developed in connection to classical statistics, especially the t distribution, which we will soon meet, are, for historical reasons, based on the sample variance, which is why we will have to work mostly with it. As opposed to the median/mean absolute deviation pair, it is easy show that he mean is the “best” choice for center when using the variance as measure of dispersion. In the former case, we have to deal with a function that is piecewise defined, through absolute values, and we have to argue directly (it can be done, of course, but it's not “on automatic pilot”). In the latter, we are minimizing n n n n ∑k =1 ( x k −μ )2 =∑k=1 (μ 2−2 x k μ + x 2k )=nμ 2 −2 ∑k=1 x k +∑k =1 x 2k . Looking at this as a function of μ, this is a quadratic function, and we can locate its vertex using the usual formulas for parabolas: the n b quadratic function a x 2+ b x +c has its vertex at v=− . In our case a=n , b=−2 ∑ k=1 x k , 2a n and the “best” μ is at − −2 ∑k=1 x k 2n n = ∑k=1 x k n , which is, of course, the mean.