Download Presenting and Describing Data

Document related concepts
no text concepts found
Transcript
Descriptive Statistics:
Presenting
and Describing Data
Frequency Distribution
A table or graph describing the number of
observations in each category or class of a
data set.
Example:
Consider the number of bottles of soda sold
in a snack bar during lunch hour, on 40 days.
(The numbers have been arranged in increasing order.)
63
66
67
68
68
70
71
71
71
73
73
74
74
75
75
75
76
76
76
77
78
79
79
79
81
82
82
84
84
84
85
85
85
85
86
86
89
90
92
94
In order to get a better grasp of this
distribution of numbers, we’ll organize
them into categories or classes.
We’ll look at
absolute frequency,
relative frequency,
cumulative absolute frequency, &
cumulative relative frequency
Notation
[20, 30] denotes all real numbers between 20 & 30,
including the 20 & the 30.
(20, 30) denotes all real numbers between 20 & 30,
including neither the 20 nor the 30.
[20, 30) denotes all real numbers between 20 & 30,
including the 20 but not the 30.
(20, 30] denotes all real numbers between 20 & 30,
including the 30 but not the 20.
So the square bracket means include that endpoint &
the round parenthesis means do not include that
endpoint.
Absolute Frequency
class
[60, 65)
[65, 70)
[70, 75)
[75, 80)
[80, 85)
[85, 90)
[90, 95)
abs. freq.
1
4
8
11
6
7
3
40
Histogram
of Absolute Frequency
12
10
8
6
4
2
0
60 65 70 75 80 85 90 95
Bottles of soda
Relative Frequency
class
[60, 65)
[65, 70)
[70, 75)
[75, 80)
[80, 85)
[85, 90)
[90, 95)
abs. freq.
1
4
8
11
6
7
3
40
rel. freq.
0.025
0.100
0.200
0.275
0.150
0.175
0.075
1.000
Relative Frequency
0.300
0.250
0.200
0.150
0.100
This graph looks
the same as the
last one, except the
numbers on the
vertical axis are
percentages
(in decimal form)
instead of integers.
0.050
0.000
60 65 70 75 80 85 90 95
Bottles of soda
Frequency Polygon
line connecting middle points of tops of bars
12
10
8
6
4
2
0
60 65 70 75 80 85 90 95
Bottles of soda
Cumulative Absolute Frequency
class
[60, 65)
[65, 70)
[70, 75)
[75, 80)
[80, 85)
[85, 90)
[90, 95)
abs. freq.
1
4
8
11
6
7
3
40
rel. freq.
0.025
0.100
0.200
0.275
0.150
0.175
0.075
1.000
cum. abs. freq.
1
5
13
24
30
37
40
Cumulative Absolute Frequency
40
Notice that the
graph of the
cumulative
absolute
frequency
looks like a set
of stairs going
up from left to
right.
35
30
25
20
15
10
5
0
60 65 70 75 80 85 90 95
Bottles of soda
Cumulative Relative Frequency
class
[60, 65)
[65, 70)
[70, 75)
[75, 80)
[80, 85)
[85, 90)
[90, 95)
abs. freq. rel. freq.
1
4
8
11
6
7
3
40
0.025
0.100
0.200
0.275
0.150
0.175
0.075
1.000
cum. abs. freq. cum. rel. freq.
1
5
13
24
30
37
40
0.025
0.125
0.325
0.600
0.750
0.925
1.000
Cumulative Relative Frequency
Cumulative Relative Frequency
1.00
Again we have our
stairs, but the
numbers on the
vertical axis are
percentages
(in decimal form),
and the height of
the last bar is
always 1 (or 100%).
0.75
0.50
0.25
0.00
60 65 70 75 80 85 90 95
Bottles of soda
Cumulative Relative Frequency Ogive
Cumulative Relative Frequency
1.00
Line connecting
the points at the
back of the steps.
0.75
0.50
0.25
0.00
60 65 70 75 80 85 90 95
Bottles of soda
Next we will consider two types of
summary measures:
1. Measures of the center of the distribution
(also called measures of central tendency)
2. Measures of the spread of the distribution
Measures of
the center of the distribution,
or central tendency,
or typical value, or average
Measures of the Center of the Distribution
Mean or Arithmetic Mean:
add up the values of the observations;
then divide by the number of observations.
Median:
the value for which half of the observations
are above that value & half are below it.
Mode:
Most common, most frequent, or most
probable value.
Determining the location of the median
Recall that the median is the value for which
half of the observations are above that value &
half are below it. So we are looking for the
middle value.
Suppose there are n numbers in our data set.
We arrange them in order from the smallest
value to the largest, and give the smallest value
rank 1, the second smallest rank 2, and so forth
up to the largest value, which has rank n.
The rank of the median will be (n+1)/2.
Remember that n is the number of elements
in the data set.
We have two possible cases.
Case 1: n is odd.
Case 2: n is even.
Case 1: n is odd.
Recall that the rank of the median is (n+1)/2.
Example: n = 9
Then (n+1)/2 = (9+1)/2 = 10/5 = 5.
So the value of the 5th number is our median.
Case 2: n is even.
Recall that the rank of the median is (n+1)/2.
Example: n = 10
Then (n+1)/2 = (10+1)/2 = 11/2 = 5.5.
So we are looking for the value halfway
between the 5th and 6th numbers. So we add
the values of the 5th and 6th numbers together
and divide by 2. The result is our median.
Example 1
Observations
2, 2, 3, 4, 8, 10, 13
Mean
6
Median
4
Mode
2
Example 2
Observations
Mean
Median
Mode
-5, 8, 8, 9, 10, 12
7
8.5
8
Example 3
Observations
2, 3, 4, 4, 4, 7
Mean
4
Median
4
Mode
4
Example 4
Observations
11, 9, 26, 11, 10, 11
To calculate the median, we will want to
have the observations in order:
9 , 10, 11, 11, 11, 26
Mean
13
Median
11
Mode
11
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
We will denote the number of observations in
our population as N. In this example, it’s 250.
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
First we need the sum of all the observations:
(700 + 700 + 700 + … + 700) + (800 + 800 + 800 + … + 800) +
… + (1200 + 1200 + 1200 + … + 1200)
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
First we need the sum of all the observations:
(700 + 700 + 700 + … + 700) + (800 + 800 + 800 + … + 800) +
… + (1200 + 1200 + 1200 + … + 1200)
= (700 • 8) + (800 • 23) + (900 • 75) + (1000 • 90) + (1100 • 43) + (1200 • 11)
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
x i fi
5600
18,400
67,500
90,000
47,300
13,200
242,000
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
x i fi
5600
18,400
67,500
90,000
47,300
13,200
242,000
Then to get the mean, we will divide that sum
by the number of observations.
Computing the Mean
for a Frequency Distribution of a Population
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
x i fi
5600
18,400
67,500
90,000
47,300
13,200
242,000
So the mean equals 242,000 / 250 = 968.0.
Notation
We denote the mean of a population
by the Greek letter mu:
For a simple list of numbers,
we computed  as:
If c is the number of categories
or classes in our frequency
distribution, then we computed
 for a frequency distribution as:

N
  (1/ N ) xi
i 1
c
  (1/ N ) xi f i
i 1
What is the mode of this
frequency distribution?
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
The mode is the most frequent or most common
value, which in this example is 1000.
What is the median of this
frequency distribution?
Salary xi
Freq. fi
700
800
900
1000
1100
1200
8
23
75
90
43
11
250
Remember, the median is the middle value, or the
average of the two middle values, when there is an
even number of observations, as there is here.
Where is the median?
Salary value: x x x … x x x x … x x x
Position:
1 2 3 … 124 125 126 127 … 248 249 250
The middle is between the salaries in the 125th and
126th positions, where there are 125 values
below and 125 above.
So we need to determine what salaries are in the
125th and 126th positions.
What is the median of this frequency distribution?
Salary xi
700
800
900
1000
1100
1200
Freq. fi
8
23
75
90
43
11
250
In the $700 category, we have observations 1 through 8.
In the $800 category, we have observations 9 through 31 (= 8+23).
In the $900 category, we have observations 32 through 106 (= 8+23+75) .
In the $1000 category, we have observations 107 through 196 (= 8+23+75+90) .
So the 125th & 126th observations are in the $1000 category.
Averaging the values of the two middle observations together, we get
(1000+1000)/2 = 1000.
So our median is 1000.
Calculating mean & median for interval data.
Suppose we have the following population data.
Interval frequency f
[0, 15)
10
[15, 30)
10
[30, 45)
5
[45, 60)
5
[60, 75)
5
We will compute the mean first.
We have 35 observations.
Interval frequency f
[0, 15)
10
[15, 30)
10
[30, 45)
5
[45, 60)
5
[60, 75)
5
35
We need a representative element from
each interval. For that we’ll use the midpoint.
Interval frequency f midpoint x
[0, 15)
10
7.5
[15, 30)
10
22.5
[30, 45)
5
37.5
[45, 60)
5
52.5
[60, 75)
5
67.5
35
Now we continue as we did before to calculate
the mean for a frequency distribution.
Interval frequency f midpoint x
xf
[0, 15)
10
7.5
75.0
[15, 30)
10
22.5
225.0
[30, 45)
5
37.5
187.5
[45, 60)
5
52.5
262.5
[60, 75)
5
67.5
337.5
35
Add up.
Interval frequency f midpoint x
xf
[0, 15)
10
7.5
75.0
[15, 30)
10
22.5
225.0
[30, 45)
5
37.5
187.5
[45, 60)
5
52.5
262.5
[60, 75)
5
67.5
337.5
35
1087.5
Divide by the number of observations,
and we have the mean.
Interval frequency f midpoint x
xf
[0, 15)
10
7.5
75.0
[15, 30)
10
22.5
225.0
[30, 45)
5
37.5
187.5
[45, 60)
5
52.5
262.5
[60, 75)
5
67.5
337.5
35
1087.5
 = 1087.5/35
= 31.07
Now let’s calculate the median.
Interval
frequency f
[0, 15)
10
[15, 30)
10
[30, 45)
5
[45, 60)
5
[60, 75)
5
35
To calculate the median of interval data, we
need to make an assumption.
We know the number of observations in each
interval, but not exactly what they are.
We’re going to assume that the observations
are evenly distributed in the intervals.
First, we need to figure out in which
category the median is.
There are 35 observations, so
the middle one is the 18th one.
Interval
frequency f
[0, 15)
10
[15, 30)
10
(There are 17 observations
below the 18th and 17 above it.)
[30, 45)
5
The first 10 observations are in
the first category.
[45, 60)
5
[60, 75)
5
35
The 11th to the 20th observations
are in the second category.
So the median must be in the
second category.
The formula for calculating the median for interval
data looks quite different from what we did before.
 ( N / 2)   f p 
median  Lmd  
 ( width)
f md


Lmd is the lower limit on the category containing the median.
N is the population size.
Sfp is the sum of the frequencies of the categories
preceding the category containing the median.
fmd is the frequency of the category containing the median.
width is the width of the interval containing the median.
Let’s go through the parts
of the formula, keeping in
mind that the median is in
the second category.
Interval
frequency f
[0, 15)
10
[15, 30)
10
[30, 45)
5
[45, 60)
5
[60, 75)
5
35
Lmd is the lower limit on the
category containing the median.
15
N is the population size.
35
Sfp is the sum of the frequencies
of the categories preceding the
category containing the median.
10
fmd is the frequency of the
category containing the median.
10
width is the width of the interval
containing the median.
15
Now we just assemble the pieces.
Interval
frequency f
median
[0, 15)
[15, 30)
[30, 45)
[45, 60)
[60, 75)
10
10
5
5
5
35
 ( N / 2)   f p 
 Lmd  
 ( width)
f md


 17.5  10 
 15  
 (15)
 10 
 15  [0.75](15)
 26.25
What does this mean?
Interval
frequency f
median
[0, 15)
[15, 30)
[30, 45)
[45, 60)
[60, 75)
10
10
5
5
5
35
 ( N / 2)   f p 
 Lmd  
 ( width)
f md


 17.5  10 
 15  
 (15)
 10 
 15  [0.75](15)
 26.25
Remember that the median is the 18 observation.
That means it’s the 8th observation of 10 in the second category.
So it is closer to the end of that interval than the beginning.
What the formula is telling us is that the median is 0.75 or ¾ of the way through
the distance of 15 units, in the interval starting at 15.
Measures of dispersion
or the spread of the distribution
Measures of Dispersion
•
•
•
•
Range
Mean Absolute Deviation (MAD)
Mean Squared Deviation (MSD)
Coefficient of Variation (CV)
As we shall see, the first three are measures
of absolute dispersion, while the CV is a
measure of relative dispersion.
range
largest value minus smallest value
Example 1
Observations: 1 2 2 2 3 4 4 5 6
The range is 6 -1 = 5
Example 2
Observations: 1 1 1 1 1 1 1 1 6
The range is 6 -1 = 5
Intuitively, this distribution seems to be less
spread out than the distribution in Example 1,
but the range doesn’t capture that.
Mean Absolute Deviation (MAD)
N
MAD  (1/ N )| xi   |
i 1
This formula is for the MAD for a simple list of numbers.
We’ll do the MAD for a frequency distribution shortly.
Example:
x
4
8
10
13
15
First we need
the mean.
N
MAD  (1/ N )| xi   |
i 1
Example:
x
4
8
10
13
15
50
N
MAD  (1/ N )| xi   |
i 1
Example:
x
4
8
10
13
15
50
 = 50/5 = 10
N
MAD  (1/ N )| xi   |
i 1
Example:
x
4
8
10
13
15
50
 = 50/5 = 10
x-
-6
-2
0
3
5
N
MAD  (1/ N )| xi   |
i 1
Example:
x
4
8
10
13
15
50
 = 50/5 = 10
x-
-6
-2
0
3
5
N
MAD  (1/ N )| xi   |
i 1
| x- |
6
2
0
3
5
Example:
x
4
8
10
13
15
50
 = 50/5 = 10
x-
-6
-2
0
3
5
N
MAD  (1/ N )| xi   |
i 1
| x- |
6
2
0
3
5
16
Example:
x
4
8
10
13
15
50
 = 50/5 = 10
x-
-6
-2
0
3
5
N
MAD  (1/ N )| xi   |
i 1
| x- |
6
2
0
3
5
16
MAD = 16/5 = 3.2
Population Variance or
Mean Squared Deviation (MSD)
N
  MSD  (1/ N ) ( xi   ) 2
2
i 1
Example:
x
4
8
10
13
15
Recall  = 10
N
  MSD  (1/ N ) ( xi   ) 2
2
i 1
x-
-6
-2
0
3
5
Example:
x
4
8
10
13
15
Recall  = 10
N
  MSD  (1/ N ) ( xi   ) 2
2
i 1
x-
-6
-2
0
3
5
( x- )2
36
4
0
9
25
Example:
x
4
8
10
13
15
Recall  = 10
N
  MSD  (1/ N ) ( xi   ) 2
2
i 1
x-
-6
-2
0
3
5
( x- )2
36
4
0
9
25
74
Example:
x
4
8
10
13
15
Recall  = 10
N
  MSD  (1/ N ) ( xi   ) 2
2
i 1
x-
-6
-2
0
3
5
( x- )2
36
4
0
9
25
74
2 = MSD = 74/5
= 14.8
population standard deviation
    √Population Variance
2
Example:
population standard deviation
In the example we just did,
the population variance was 14.8 .
So the standard deviation is
    √14.8 = 3.847
2
Calculating the MAD, MSD, & Std. Dev.
for a Frequency Distribution
xi
fi
1
3
2
5
3
2
The total number of observations N
is the sum of the frequencies or 10.
xi
fi
1
3
2
5
3
2
10
Calculate the population mean .
xi
fi
xifi
1
3
3
2
5
10
3
2
6
10
Calculate the population mean .
xi
fi
xifi
1
3
3
2
5
10
3
2
6
10
19
Calculate the population mean .
xi
fi
xifi
1
3
3
2
5
10
3
2
6
10
19
 = 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
xi
fi
xifi
xi - 
1
3
3
-0.9
2
5
10
0.1
3
2
6
1.1
10
19
 = 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
xi
fi
xifi
xi -  |xi – |
1
3
3
-0.9
0.9
2
5
10
0.1
0.1
3
2
6
1.1
1.1
10
19
 = 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
xi
fi
xifi
xi -  |xi – |
1
3
3
-0.9
0.9
2.7
2
5
10
0.1
0.1
0.5
3
2
6
1.1
1.1
2.2
10
19
 = 19/10
=1.9
|xi – |fi
Calculate the Mean Absolute Deviation (MAD).
xi
fi
xifi
xi -  |xi – |
1
3
3
-0.9
0.9
2.7
2
5
10
0.1
0.1
0.5
3
2
6
1.1
1.1
2.2
10
19
 = 19/10
=1.9
|xi – |fi
5.4
Calculate the Mean Absolute Deviation (MAD).
xi
fi
xifi
xi -  |xi – |
|xi – |fi
1
3
3
-0.9
0.9
2.7
2
5
10
0.1
0.1
0.5
3
2
6
1.1
1.1
2.2
10
19
5.4
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
Calculate the Mean Squared Deviation (MSD)
or Population Variance 2.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2
1
3
3
-0.9
0.9
2.7
0.81
2
5
10
0.1
0.1
0.5
0.01
3
2
6
1.1
1.1
2.2
1.21
10
19
5.4
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
Calculate the Mean Squared Deviation (MSD)
or Population Variance 2.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2 (xi – )2 fi
1
3
3
-0.9
0.9
2.7
0.81
2.43
2
5
10
0.1
0.1
0.5
0.01
0.05
3
2
6
1.1
1.1
2.2
1.21
2.42
10
19
5.4
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
Calculate the Mean Squared Deviation (MSD)
or Population Variance 2.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2 (xi – )2 fi
1
3
3
-0.9
0.9
2.7
0.81
2.43
2
5
10
0.1
0.1
0.5
0.01
0.05
3
2
6
1.1
1.1
2.2
1.21
2.42
10
19
5.4
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
4.90
Calculate the Mean Squared Deviation (MSD)
or Population Variance 2.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2 (xi – )2 fi
1
3
3
-0.9
0.9
2.7
0.81
2.43
2
5
10
0.1
0.1
0.5
0.01
0.05
3
2
6
1.1
1.1
2.2
1.21
2.42
10
19
5.4
4.90
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
2 = MSD
= 4.90/10
= 0.49
Last, we calculate the standard deviation.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2 (xi – )2 fi
1
3
3
-0.9
0.9
2.7
0.81
2.43
2
5
10
0.1
0.1
0.5
0.01
0.05
3
2
6
1.1
1.1
2.2
1.21
2.42
10
19
5.4
4.90
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
2 = MSD
= 4.90/10
= 0.49
Last, we calculate the standard deviation.
xi
fi
xifi
xi -  |xi – |
|xi – |fi
(xi – )2 (xi – )2 fi
1
3
3
-0.9
0.9
2.7
0.81
2.43
2
5
10
0.1
0.1
0.5
0.01
0.05
3
2
6
1.1
1.1
2.2
1.21
2.42
10
19
5.4
4.90
 = 19/10
=1.9
MAD = 5.4/10
= 0.54
2 = MSD
= 4.90/10
= 0.49
 =√0.49
= 0.7
So the formulae for calculating the MAD,
and MSD (or population variance) are
c
MAD  (1/ N ) | xi   | f i
i 1
c
  MSD  (1/ N ) ( xi   ) 2 f i
2
i 1
The standard deviation is still just the square root of the variance.
The formulae we have been
using are for populations.
If we have samples instead,
we have some notational
changes and one change
in the calculation process.
Notational Changes for Samples
instead of Populations
First, the sample size is n instead of N.
Next, we denote the sample mean by
“Xbar” instead of .
To calculate the sample mean for a
simple list of numbers we have:
To calculate the sample mean for a
frequency distribution we have:
X
n
X  (1/ n) xi
i 1
c
X  (1/ n) xi f i
i 1
Mean Absolute Deviation (MAD)
for a sample
MAD for a
simple list of
numbers:
MAD for a
frequency
distribution:
n
MAD  (1/ n)| xi  X |
i 1
c
MAD  (1/ n)| xi  X | f i
i 1
Sample Variance
The MSD is just for the population variance.
We don’t have an MSD for the sample.
The calculation is also slightly different for
the sample variance than it was for the
population variance.
Sample Variance (denoted by s2)
Sample Variance
for a simple list of
numbers:
Sample Variance
for a frequency
distribution:
n
s  [1/(n  1)] ( xi  X )
2
2
i 1
c
s  [1/(n  1)] ( xi  X ) f i
2
2
i 1
The only change that is not just notational is that instead of
dividing by n, we divide by n-1.
The reason for this change is so that the sample variance will
be an unbiased estimator of the population variance. We’ll
discuss the idea of unbiasedness later in the semester.
Coefficient of Variation (CV)
Our other measures of the spread of the
distribution were measures of absolute
dispersion.
The CV is calculated relative to the mean,
so it is considered a relative measure of
dispersion.
Coefficient of Variation (CV)
CV  (s / X ) * 100 %
The CV is simply the standard deviation
divided by the mean multiplied by 100 to
put it in percentage terms.
Example: Suppose you have data for a sample that
has a mean of 200 and a standard deviation of 10.
What is the coefficient of variation (CV)?
CV  (s / X ) * 100 %
 (10 / 200 ) * 100 %
 ( 0.05 ) * 100 %
5 %
This result tells us that the standard deviation
is 5% as large as the mean.
At my web page,
there is a summary sheet called
“Selected Descriptive Statistics.”
It shows some of the different formulae for
simple lists of numbers and frequency
distributions, for populations and samples.
Print it out and look at how the formulae are
similar and how they’re different.
Empirical Rule
In most data sets, many of the values tend to
cluster near the center of the distribution. In
symmetric, bell-shaped distributions,
approximately 68% of the values are within 1
standard deviations of the mean;
approximately 95% of the values are within 2
standard deviations of the mean; and
approximately 99.7% of the values are within 3
standard deviations of the mean.
Thus, values that are more than 3 standard
deviations from the mean are very atypical
and are often called “outliers.”
Determining whether an observation is an outlier is
equivalent to determining if its “z-score” is less
than -3 or greater than +3.
The formula for the z-score is:
x X
Z  score 
s
Example
Suppose that a particular sample has a mean of
100, and a standard deviation of 10.
Would a value of 120 be considered an outlier?
x  X 120  100
Z  score 

2
s
10
Since the Z-score is not less than -3 or greater than
+3, 120 is not an outlier in this sample.
Example
Is 150 an outlier in this sample (with mean 100 and
standard deviation 10)?
x  X 150  100
Z  score 

5
s
10
Since the Z-score is less than -3 or greater than
+3, 150 is an outlier in this sample.
Example
Is 60 an outlier in this sample (with mean 100 and
standard deviation 10)?
x  X 60  100
Z  score 

 4
s
10
Since the Z-score is less than -3 or greater than
+3, 60 is an outlier in this sample.
Symmetric versus Skewed
Distributions
Symmetric Distribution
For a symmetric distribution,
the left and right sides are
mirror images of each other.
The mean, median, and mode
are the same.
Positively or Right-Skewed Distribution
If the longer tail is to the right,
the distribution is positively
skewed or skewed to the right.
A small number of very large
values pulls the mean up, so
the mean is larger than the
median.
Negatively or Left-Skewed Distribution
If the longer tail is to the left,
the distribution is negatively
skewed or skewed to the left.
A small number of very small
values pulls the mean down,
so the mean is smaller than
the median.
Related documents