Download Describing Data Numerically Numerical Descriptive Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Describing Data Numerically
EC 233.01/02
Describing Data Numerically
Extra Lecture Notes 3
Central Tendency
Describing Data Numerically
Variation
Arithmetic Mean
Range
Median
Variance
M d
Mode
Standard Deviation
Coefficient of Variation
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall
1
2
Numerical Descriptive Measures




Sample statistics versus
population parameters
The summary measures we will learn about are relevant for a
population as well as a sample. Yet, there is a distinction:
Measure
Summary measures describing a population, called
parameters, are denoted with Greek letters. Population
parameters are unique, and in many instances unknown.
Mean
Variance
Summary measures describing observations in a sample, are
called sample statistics. With each new sample, you obtain
new sample statistics.
Standard
Deviation
By using sample statistics, we try to make inferences about
population parameters. Please think about what this means!
3-3
EC 233 Lecture Notes 3
3-4
Population
Parameter
Sample
Statistic

X
2
S2

S
Measures of Central Tendency
Arithmetic Mean

Central Tendency
The arithmetic mean (mean) is the most
common measure of central tendency

Mean
Median
N
Mode
μ
n
x
For a population of N values:
x
i1
x
i1
i
N

x1  x 2    x N
N
Population size
i
n
Arithmetic
average

Midpoint of
ranked values
Most frequently
observed value
Population
values
For a sample of size n:
n
x
x
i1
n
i

x1  x 2    x n
n
Sample size
5
Arithmetic Mean
Observed
values
6
Median
(continued)




The most common measure of central tendency
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1  2  3  4  5 15

3
5
5
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
Mean = 4

7
EC 233 Lecture Notes 3
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
1  2  3  4  10 20

4
5
5
In an ordered list, the median is the “middle”
number (50% above
above, 50% below)
Not affected by extreme values
8
Mode
Finding the Median


The location of the median:

n 1
Median position 
position in the ordered data
2





If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers


n 1
is not the value of the median, only the
2
position of the median in the ranked data
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes
Note that
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
9
10
Review Example:
Summary Statistics
Review Example

Five houses on a hill by the beach
House Prices:
$2,000 K
$2,000,000
500,000
300,000
100,000
100,000
House Prices:
$2,000,000
500,000
300,000
100,000
,
100,000
$500 K
$300 K


Sum 3,000,000
,
,

$100 K
M
Mean:
($3,000,000/5)
($3
000 000/5)
= $600,000
Median: middle value of ranked data
= $300,000
Mode: most frequent value
= $100,000
$100 K
11
EC 233 Lecture Notes 3
12
Which measure of location
is the “best”?


Shape of a Distribution
Mean is g
generally
y used, unless
extreme values (outliers) exist
If outliers exist, then median is
often used, since the median is not
sensitive to extreme values.


Describes how data are distributed

Measures of shape

Example: Median home prices may be
reported for a region – less sensitive to
outliers
Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mean < Median
Mean = Median
Median < Mean
13
14
Measures of Variability
Range
Variation
Range
Variance

Standard
Deviation

Coefficient
of Variation
Simplest measure of variation
Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest

Measures of variation give
information on the spread
or variability of the data
values.
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12
Range = 14 - 1 = 13
Same center,
different variation
15
EC 233 Lecture Notes 3
13 14
16
Disadvantages of the Range

Ignores the way in which data are distributed
7
8
9
10
11
12
7
Range = 12 - 7 = 5

Quartiles
8
9
10
11

Quartiles split the ranked data into 4 segments with
an equal number of values per segment
12
25%
Range = 12 - 7 = 5
Q1
Sensitive to outliers
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile

1111111111122222222333345
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
25%

Range = 120 - 1 = 119
17
18
Population Variance

Sample Variance
Average
g of squared
q
deviations of values from
the mean

Average
g ((approximately)
pp
y) of squared
q
deviations
of values from the mean (why n-1?)
n
N
Population variance:

σ2 
Where
 (x i  μ) 2

s2 
i 1
N
μ = population mean
Where
 (x  x)
i 1
n = sample size
xi =
Xi = ith value of the variable X
ith
value of the variable x
2
i
n -1
X = arithmetic mean
N = population size
19
EC 233 Lecture Notes 3
Sample variance:
20
Population Standard Deviation



Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data




Population standard deviation:
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data

N
σ
n
Sample standard deviation:
S
 (x i  μ) 2
 (x  x)
2
i
i1
n -1
i 1
N
21
22
Calculation Example:
Sample Standard Deviation
Measuring variation
Sample
Data (xi) :
Small standard deviation
12
14
n=8
s
Large standard deviation
23
EC 233 Lecture Notes 3
10
15
17
18
18
24
Mean = x = 16
(10  X)  (12  x)  (14  x)2    (24  x)2
n 1
2
2

(10  16)2  (12  16)2  (14  16)2    (24  16)2
8 1

126
7

4.2426
A measure of the
“average” scatter around
the mean
24
Advantages of Variance and
Standard Deviation
Comparing Standard Deviations
Data A
11
12
13
14
15
16
17
18
19
20 21
M
Mean
= 15.5
1
s = 3.338
14
15
16
17
18
19
20 21
Mean = 15.5
s = 0.926
Data B
11
12
13
12
13

Each value in the data set is used in the
calculation
Values far from the mean are given extra
weight (because deviations from the mean are
squared)
Data C
11

14
15
16
17
18
19
20 21
Mean = 15.5
s = 4.570
25
26
Measures of Variation:
Summary Characteristics




The more the data are spread out, the greater
the range, variance, and standard deviation.
The more the data are concentrated, the
smaller the range, variance, and standard
deviation.
If the values are all the same (no variation), all
these measures will be zero.
None of these measures are ever negative.
3-27
EC 233 Lecture Notes 3
Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of
data measured in different units
 s
CV     100%
x 
28
Comparing Coefficient
of Variation

Stock A:
 Average price last year = $50
 Standard deviation = $5

s
$5
CVA    100% 
100%  10%
x
$50
 

Using Microsoft Excel
Stock B:


Average price last year = $100
Standard deviation = $5
s
$5
CVB    100% 
100%  5%
$100
x
Descriptive Statistics can be obtained
from Microsoft® Excel

Both stocks
have the same
standard
deviation but
deviation,
stock B is less
variable relative
to its price
Use menu choice:
tools / data analysis / descriptive statistics

E t details
Enter
d t il iin di
dialog
l b
box
29
30
Using Excel
Using Excel
(continued)
Use menu choice:

tools / data analysis /
descriptive statistics



31
EC 233 Lecture Notes 3
Enter dialog box
details
Check box for
summary statistics
Click OK
32
Excel output
Microsoft Excel
descriptive statistics output
output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
33
EC 233 Lecture Notes 3
Related documents