Download lecture 4, January 14, 2004

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Note
I skipped a few slides from the last lecture (January 12,
2004). I will discuss those at the end of this week.
The website for this course has not been activated yet. I
am extremely sorry for that.
During next tutorial please bring your book, ruler,
calculator, and of course 100% of yourself.
The first midterm syllabus is upto Chapter 6.
The last Wednesday, I presented one way of calculating
percentile, which raised a lot of questions (remember
adding 0.5), but still I could not find the exact answer of
doing that. Today I will be showing you the method written in
your text book.
Thank you.
2.18 (text book)
The following numbers of positions have
been held by a random sample of
aerospace engineers during the 10 to 15
years since their graduation:
1233254431
2146554232
Calculate (a) the sample mean,
(b) the sample median,
(c) the sample mode,
(d) the 25th percentile,
th
(e) the 75 percentile
Solution
Summary Statistical Measures:
Variability
 After location, measures of variability
provide the next most important
descriptive summaries.
 Such a quantity expresses the degree to
which individual observation values differ
from each other.
Importance of Variability
 Variability and dispersion are the synonymous
terms used in statistics to characterize individual
differences.
 The greater the variability between observations,
the more they will be spread out.
 Populations or samples having high variability
will have a frequency distribution involving wider
class intervals or more classes than a lowvariability group measured on the same scale.
MEASURES OF VARIABILITY




Range
Variance
Standard Deviation
Coefficient of Variation (CV)
MEASURES OF VARIABILITY: EXAMPLE
 Heights of players of two teams in inches are as follows:
Team I: 72,73,76,76,78, so mean=75, median=mode=76
Team II: 67,72,76,76,84, so mean=75, median=mode=76
 How about the variation?
MEASURES OF VARIABILITY
RANGE
 The first and simplest measure of variability is the
range.
 The range of a set of measurements is the numerical
difference between the largest and smallest
measurements.
Range = Largest value - Smallest value
MEASURES OF VARIABILITY
RANGE
 Team I Range
= 78-72
= 6 inches
 Team II Range
= 84-67
= 17 inches
 So, Team I
variation is less
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 A major drawback of the range is that it uses only two
extreme values, ignores all the intermediate values,
and provides no information on the dispersion of the
values between the smallest and largest
observations.
 On the other hand, variance / standard deviation /
CV, uses all the values and provides information on
the dispersion of the intermediate values
 Computation of variance / standard deviation / CV
requires computation of deviation from the mean
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Team I deviations from the mean:
(72-75)=-3, (73-75)=-2, (76-75)=1, (76-75)=1, (78-75)=3
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Team I deviations from the mean:
-3, -2, 1, 1, 3
 Sum of deviations from the mean is always 0 e.g.,
3-2+1+1+3=0
 Sum of squared deviations from the mean is not
necessarily 0 e.g.,
(-3)2+(-2)2+(1)2+(1)2+(3)2=24 inch2
 Although sum of squared deviations increases if the
dispersion increases, the sum depends on the
number of measurements. So, mean squared
deviations is a preferred measure of dispersion.
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Variance is the mean squared deviation e.g., Team I
Variance = [(-3)2+(-2)2+(1)2+(1)2+(3)2] / 5 = 4.8 inch2
 Standard deviation is the root mean squared
deviation i.e., square root of variance. So, Team I
Standard deviation = 4.8  2.19 inches
 Coefficient of variation is the standard deviation
divided by the mean. So, Team I
Coefficient of variation = 2.19 / 75 = 0.0292 = 2.92%
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Why there are three similar terms?
 In the above example, variance has unit inch2, but
standard deviation has unit inch - the unit of the
original data. So, standard deviation may
sometimes be preferred over variance.
 Coefficient of variation is dimension less. Hence,
coefficient of variation is a useful quantity for
comparing the variability in data sets having
different standard deviations and different means
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Interpret standard deviation
 It’s difficult to interpret
 Larger amount of standard deviation implies
greater variability
 Standard deviation is widely used to approximate
the proportion of measurements that fall into
various intervals of values. This is specially true if
the data has a bell-shaped distribution.
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Interpret standard deviation
 An Empiricial Rule states that if the data has a bellshaped distribution,
 approximately 68% measurements fall within one
standard deviation of the mean i.e., between
(mean-standard deviation) and (mean+standard
deviation)
 approximately 95% measurements fall within two
standard deviations of the mean, and
 virtually all the measurements fall within three
standard deviations of the mean
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
Mean
-3 -2 -1
+1 +2 +3
68.26%
95.44%
99.74%
MEASURES OF VARIABILITY
VARIANCE, STANDARD DEVIATION, CV
 Interpret standard deviation
 Example: suppose that the final marks has a bellshaped distribution, with a mean of 75 and a standard
deviation of 7. Then,
 approximately 68% marks fall between (75-7)=68
and (75+7)=82.
 approximately 95% marks fall between (75-27)=61
and (75+27)=89, and
 virtually all the measurements fall between (75-37)
=54 and (75+37)=96
Related documents