Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 3
Descriptive
Statistics
© 2002 Thomson / South-Western
Slide 3-1
Learning Objectives
• Distinguish between measures of
central tendency, measures of
variability, and measures of shape
• Understand the meanings of mean,
median, mode, quartile, percentile, and
range
• Compute mean, median, mode,
percentile, quartile, range, variance,
standard deviation, and mean absolute
deviation
© 2002 Thomson / South-Western
Slide 3-2
Learning Objectives -- Continued
• Differentiate between sample and
population variance and standard
deviation
• Understand the meaning of standard
deviation as it is applied by using the
empirical rule
• Understand box and whisker plots,
skewness, and kurtosis
© 2002 Thomson / South-Western
Slide 3-3
Measures of Central Tendency
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Location
–Mode
–Median
–Mean
–Percentiles
–Quartiles
© 2002 Thomson / South-Western
Slide 3-4
Mode
• The most frequently occurring value in a
data set
• Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
• Bimodal -- Data sets that have two
modes
• Multimodal -- Data sets that contain
more than two modes
© 2002 Thomson / South-Western
Slide 3-5
Mode -- Example
• The mode is 44.
• There are more 44s
than any other value.
© 2002 Thomson / South-Western
35
41
44
45
37
41
44
46
37
43
44
46
39
43
44
46
40
43
44
46
40
43
45
48
Slide 3-6
Median (ΔΙΑΜΕΣΟΣ)
• Middle value in an ordered array of
numbers.
• Applicable for ordinal, interval, and ratio
data
• Not applicable for nominal data
• Unaffected by extremely large and
extremely small values.
© 2002 Thomson / South-Western
Slide 3-7
Median: Computational Procedure
• First Procedure
– Arrange observations in an ordered array.
– If number of terms is odd, the median is
the middle term of the ordered array.
– If number of terms is even, the median is
the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array
is given by (n+1)/2.
© 2002 Thomson / South-Western
Slide 3-8
Median: Example with
an Odd Number of Terms
Ordered Array includes:
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
•
•
•
•
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, 15.
If the 22 is replaced by 100, the median
remains at 15.
• If the 3 is replaced by -103, the median
remains at 15.
© 2002 Thomson / South-Western
Slide 3-9
Mean (ΜΕΣΟΣ)
• Is the average of a group of numbers
• Applicable for interval and ratio data,
not applicable for nominal or ordinal
data
• Affected by each value in the data set,
including extreme values
• Computed by summing all values in the
data set and dividing the sum by the
number of values in the data set
© 2002 Thomson / South-Western
Slide 3-10
Population Mean
X X  X  X ...  X



1
2
3
N
N
N
24  13  19  26  11

5
93

5
 18. 6
© 2002 Thomson / South-Western
Slide 3-11
Sample Mean
X X  X  X ...  X

X

1
2
3
n
n
n
57  86  42  38  90  66

6
379

6
 63.167
© 2002 Thomson / South-Western
Slide 3-12
Quartiles
Measures of central tendency that divide
a group of data into four subgroups
• Q1: 25% of the data set is below the first
quartile
• Q2: 50% of the data set is below the
second quartile
• Q3: 75% of the data set is below the third
quartile
© 2002 Thomson / South-Western
Slide 3-13
Quartiles, continued
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and
equals the median
• Q3 is equal to the 75th percentile
Quartile values are not necessarily
members of the data set
© 2002 Thomson / South-Western
Slide 3-14
Quartiles
Q2
Q1
25%
© 2002 Thomson / South-Western
25%
Q3
25%
25%
Slide 3-15
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121,
122, 125, 129
• Q 1:
25
i
(8)  2
100
109114
Q1 
 1115
.
2
• Q 2:
50
i
(8)  4
100
116121
Q2 
 1185
.
2
• Q 3:
75
i
(8)  6
100
122125
Q3 
 1235
.
2
© 2002 Thomson / South-Western
Slide 3-16
Measures of Variability
• Measures of variability describe the spread
or the dispersion of a set of data.
• Common Measures of Variability
–Range
–Interquartile Range
–Mean Absolute Deviation
–Variance
–Standard Deviation
– Z scores
–Coefficient of Variation
© 2002 Thomson / South-Western
Slide 3-17
Variability
No Variability in Cash Flow
Variability in Cash Flow
© 2002 Thomson / South-Western
Mean
Mean
Mean
Mean
Slide 3-18
Variability
Variability
No Variability
© 2002 Thomson / South-Western
Slide 3-19
Range
• The difference between the largest and
the smallest values in a set of data
• Simple to compute
35
41
44
45
• Ignores all data points
37
41
44
46
except the
37
43
44
46
two extremes
39
43
44
46
• Example:
Range
40
43
44= 46
Largest - Smallest
=
40
43
45
48
48 - 35 = 13
© 2002 Thomson / South-Western
Slide 3-20
Interquartile Range
• Range of values between the first and third
quartiles
• Range of the “middle half”
• Less influenced by extremes
Interquartile Range  Q 3  Q1
© 2002 Thomson / South-Western
Slide 3-21
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
X 65


  13
N
5
• Deviations from the mean: -8, -4, 3, 4, 5
-4
-8
0
+3 +4
5
10
15
+5
20

© 2002 Thomson / South-Western
Slide 3-22
Mean Absolute Deviation
• Average of the absolute deviations from
the mean
X
5
9
16
17
18
X  
X  
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
© 2002 Thomson / South-Western
M . A. D. 

X 
N
24

5
 4.8
Slide 3-23
Population Variance
• Average of the squared deviations from
the arithmetic mean
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0
© 2002 Thomson / South-Western


64
16
9
16
25
130
2

2

 X 
N
130

5
 2 6 .0
Slide 3-24
2
Population Standard Deviation
• Square root of
the variance
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0
© 2002 Thomson / South-Western


64
16
9
16
25
130
2

2

 X 
N
130

5
 2 6 .0
 

2
 2 6 .0
 5 .1
Slide 3-25
2
Empirical Rule
• Data are normally distributed (or
approximately normal)
Distance from
the Mean
  1
  2
  3
© 2002 Thomson / South-Western
Percentage of Values
Falling Within Distance
68
95
99.7
Slide 3-26
Sample Variance
• Average of the squared deviations from
the arithmetic mean
X
2,398
1,844
1,539
1,311
7,092
X  X X
625
71
-234
-462
0
© 2002 Thomson / South-Western
 X

390,625
5,041
54,756
213,444
663,866
2
S
2

X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
Slide 3-27

2
Sample Standard Deviation
• Square root of the
sample variance
X X  X X  X 
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
2
S
2

X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
S 
S
2
 2 2 1 , 2 8 8 .6 7
 4 7 0 .4 1
© 2002 Thomson / South-Western
Slide 3-28

2
Coefficient of Variation
• Ratio of the standard deviation to the
mean, expressed as a percentage
• Measurement of relative dispersion

C.V . 100

© 2002 Thomson / South-Western
Slide 3-29
Coefficient of Variation
  84
  29

1
1

 4.6

CV
. .
1
1
100
1
4.6
100

29
 1586
.
© 2002 Thomson / South-Western
2
2
 10

CV
. .
2
2
100
2
10
 100
84
 1190
.
Slide 3-30
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a
distribution
• Kurtosis
– Peakedness of a distribution
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
© 2002 Thomson / South-Western
Slide 3-31
Skewness
Negatively
Skewed
© 2002 Thomson / South-Western
Symmetric
(Not Skewed)
Positively
Skewed
Slide 3-32
Skewness
Mean
Median
Mean
Median
Mode
Negatively
Skewed
Symmetric
(Not Skewed)
Mode
© 2002 Thomson / South-Western
Mean
Mode
Median
Positively
Skewed
Slide 3-33
Coefficient of Skewness
• Summary measure for skewness
S
3   Md 

• If S < 0, the distribution is negatively
skewed (skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively
skewed (skewed to the right).
© 2002 Thomson / South-Western
Slide 3-34
Coefficient of Skewness

1
M
d1

S
1
1
 23

 26
M
d2
 12.3


2
3 1 

M

d1
1
3 23  26

12.3
 0.73
© 2002 Thomson / South-Western

S
2
2
 26

 26
M
d3
 12.3


3
3 2 

M
d2
2
3 26  26

12.3
0


S
3
3
 29
 26
 12.3


3 3 

M

d3
3
3 29  26

12.3
 0.73
Slide 3-35
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
© 2002 Thomson / South-Western
Slide 3-36
Box and Whisker Plot
• Five specific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
© 2002 Thomson / South-Western
Slide 3-37
Box and Whisker Plot, continued
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
© 2002 Thomson / South-Western
Slide 3-38
Box and Whisker Plot
Minimum
Q1
© 2002 Thomson / South-Western
Q2
Q3
Maximum
Slide 3-39
Skewness: Box and Whisker Plots,
and Coefficient of Skewness
S<0
Negatively
Skewed
© 2002 Thomson / South-Western
S=0
Symmetric
(Not Skewed)
S>0
Positively
Skewed
Slide 3-40
Related documents