Download Lecture 3

Document related concepts
no text concepts found
Transcript
Summarizing Data
Graphical Methods
Histogram
Grouped Freq Table
8
7
6
5
4
3
2
1
0
70 to 80 80 to 90
90 to
100
100 to
110
110 to
120
120 to
130
Stem-Leaf Diagram
8
9
10
11
12
024669
04455699
224559
189
70 to 80
80 to 90
90 to 100
100 to 110
110 to 120
120 to 130
Verbal IQ Math IQ
1
1
6
2
7
11
6
4
3
4
0
1
Numerical Measures
•
•
•
•
Measures of Central Tendency (Location)
Measures of Non Central Location
Measure of Variability (Dispersion,
Spread)
Measures of Shape
The objective is to reduce the data to a small
number of values that completely describe the
data and certain aspects of the data.
Summation Notation
Final value for i
n
 expression in i 
i m
each term of the sum
Quantity changing
in each term of the
sum
Starting value for i
Example
Let x1, x2, x3, x3 , x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i
1
2
3
4
5
xi
10
15
21
7
13
Then the symbol
4
x
i 2
3
i
denotes the sum of these 3 numbers
x x x
3
2
3
3
3
4
= 153 + 213 + 73
= 3375 + 9261 + 343
= 12979
Then the symbol
5
 xi
i 1
denotes the sum of these 5 numbers
x1 + x2 + x3 + x4 + x5
= 10 + 15 + 21 + 7 + 13
= 66
Measures of Central Location
(Mean)
Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean of the n numbers is defined as:
n
x
 xi
i 1
n
x1  x2  x3    xn 1  xn

n
Example
Again let x1, x2, x3, x3 , x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i
1
2
3
4
5
xi
10
15
21
7
13
Then the mean of the 5 numbers is:
5
x
 xi
i 1
5
x1  x2  x3  x4  x5

5
10  15  21  7  13 66


 13 .2
5
5
Interpretation of the Mean
Let x1, x2, x3, … xn denote a set of n numbers.
Then the mean, x , is the centre of gravity of
those the n numbers.
That is if we drew a horizontal line and placed
a weight of one at each value of xi , then the
balancing point of that system of mass is at
the point x .
x1 x3
x4
x2
x
xn
In the Example
7
0
10
10
13
21
15
x  13.2
20
The mean, x , is also approximately the
center of gravity of a histogram
30
25
20
15
10
5
0
60 - 70
70 - 80
80 - 90
90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150
x
The Median
Let x1, x2, x3, … xn denote a set of n numbers.
Then the median of the n numbers is defined
as the number that splits the numbers into two
equal parts.
To evaluate the median we arrange the
numbers in increasing order.
If the number of observations is odd there will
be one observation in the middle.
This number is the median.
If the number of observations is even there
will be two middle observations.
The median is the average of these two
observations
Example
Again let x1, x2, x3, x3 , x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i
1
2
3
4
5
xi
10
15
21
7
13
The numbers arranged in order are:
7
10 13 15 21
Unique “Middle” observation –
the median
Example 2
Let x1, x2, x3, x3 , x4, x5 , x6 denote the 6 denote
numbers:
23 41 12 19 64 8
Arranged in increasing order these
observations would be:
8
12 19 23 41 64
Two “Middle” observations
Median
= average of two “middle” observations =
19  23 42

 21
2
2
Example
The data on N = 23 students
Variables
• Verbal IQ
• Math IQ
• Initial Reading Achievement Score
• Final Reading Achievement Score
Data Set #3
The following table gives data on Verbal IQ, Math IQ,
Initial Reading Acheivement Score, and Final Reading Acheivement Score
for 23 students who have recently completed a reading improvement program
Student
Verbal
IQ
Math
IQ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
86
104
86
105
118
96
90
95
105
84
94
119
82
80
109
111
89
99
94
99
95
102
102
94
103
92
100
115
102
87
100
96
80
87
116
91
93
124
119
94
117
93
110
97
104
93
Total
2244
Initial
Reading
Acheivement
2307
Final
Reading
Acheivement
1.1
1.5
1.5
2.0
1.9
1.4
1.5
1.4
1.7
1.6
1.6
1.7
1.2
1.0
1.8
1.4
1.6
1.6
1.4
1.4
1.5
1.7
1.6
35.1
1.7
1.7
1.9
2.0
3.5
2.4
1.8
2.0
1.7
1.7
1.7
3.1
1.8
1.7
2.5
3.0
1.8
2.6
1.4
2.0
1.3
3.1
1.9
48.3
Means
Verbal
IQ
97.57
Math
IQ
100.30
Initial
Reading
Acheivement
1.526
Final
Reading
Acheivement
2.100
Computing the Median
Stem leaf Diagrams
Median = middle
observation =12th
observation
Summary
Means
Median
Verbal
IQ
97.57
96
Math
IQ
100.30
97
Initial
Reading
Acheivement
1.526
1.5
Final
Reading
Acheivement
2.100
1.9
Some Comments
• The mean is the centre of gravity of a set of
observations. The balancing point.
• The median splits the observations equally
in two parts of approximately 50%
• The median splits the area under a
histogram in two parts of 50%
• The mean is the balancing point of a
histogram
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
50%
50%
0
median
5
10
x
15
20
25
• For symmetric distributions the mean and
the median will be approximately the same
value
0.14
0.12
0.1
0.08
0.06
50%
0.04
50%
0.02
0
0
5
10
Median & x
15
20
25
• For Positively skewed distributions the
mean exceeds the median
• For Negatively skewed distributions the
median exceeds the mean
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
50%
50%
0
median
5
10
x
15
20
25
• An outlier is a “wild” observation in the
data
• Outliers occur because
– of errors (typographical and computational)
– Extreme cases in the population
• The mean is altered to a significant degree
by the presence of outliers
• Outliers have little effect on the value of the
median
• This is a reason for using the median in
place of the mean as a measure of central
location
• Alternatively the mean is the best measure
of central location when the data is
Normally distributed (Bell-shaped)
Measures of Non-Central
Location
•
•
Percentiles
Quartiles (Hinges, Mid-hinges)
Definition
The P100 Percentile is a point , xP ,
underneath a distribution that has a fixed
proportion P of the population (or sample)
below that value
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
P100 %
0
xP
5
10
15
20
25
Definition (Quartiles)
The first Quartile , Q1 ,is the 25 Percentile ,
x0.25
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
25 %
0
x0.25
5
10
15
20
25
The second Quartile , Q2 ,is the 50th Percentile
, x0.50
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
50 %
0
x0.50
5
10
15
20
25
• The second Quartile , Q2 , is also the
median and the 50th percentile
The third Quartile , Q3 ,is the 75th Percentile ,
x0.75
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
75 %
0
x0.75
5
10
15
20
25
The Quartiles – Q1, Q2, Q3
divide the population into 4 equal parts of 25%.
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
25 %
25 %
0
25 % 25 %
Q1 5Q2
Q310
15
20
25
Computing Percentiles and
Quartiles – Method 1
• The first step is to order the observations in
increasing order.
• We then compute the position, k, of the
P100 Percentile.
k = P  (n+1)
Where n = the number of observations
Example
The data on n = 23 students
Variables
• Verbal IQ
• Math IQ
• Initial Reading Achievement Score
• Final Reading Achievement Score
We want to compute the 75th percentile and
the 90th percentile
The position, k, of the 75th Percentile.
k = P  (n+1) = .75  (23+1) = 18
The position, k, of the 90th Percentile.
k = P  (n+1) = .90  (23+1) = 21.6
When the position k is an integer the
percentile is the kth observation (in order of
magnitude) in the data set.
For example the 75th percentile is the 18th (in
size) observation
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e.
k=m+f
then the percentile is
xP = (1-f)  (mth observation in size)
+ f  (m+1st observation in size)
In the example the position of the 90th percentile is:
k = 21.6
Then
x.90 = 0.4(21st observation in size)
+ 0.6(22nd observation in size)
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e.
k=m+f
then the percentile is
xP = (1-f)  (mth observation in size)
+ f  (m+1st observation in size)
mth obs
(m+1)st obs
xp = (1- f) ( mth obs) + f [(m+1)st obs]
st

1  f m th obs  f m  1 obs m th obs

st
th
m  1 obs  m obs
m  1st obs  mth obs
x p  m th obs


f m  1 obs  f m th obs

 f
st
th
m  1 obs  m obs
st
When the position k is an not an integer but an
integer(m) + a fraction(f).
i.e. k = m + f
mth obs
(m+1)st obs
xp = (1- f) ( mth obs) + f [(m+1)st obs]
x p  m th obs
m  1
st
obs  m obs
th
 f
Thus the position of xp is 100f% through the
interval between the mth observation and the
(m +1)st observation
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94
94 95 95 96 99 99 102 102
104 105 105 109 111 118 119
x0.75 = 75th percentile = 18th observation
in size =105
(position k = 18)
x0.90 = 90th percentile
= 0.4(21st observation in size)
+ 0.6(22nd observation in size)
= 0.4(111)+ 0.6(118) = 155.2
(position k = 21.6)
An Alternative method for
computing Quartiles – Method 2
• Sometimes this method will result in the
same values for the quartiles.
• Sometimes this method will result in the
different values for the quartiles.
• For large samples the two methods will
result in approximately the same answer.
Let x1, x2, x3, … xn denote a set of n numbers.
The first step in Method 2 is to arrange the
numbers in increasing order.
From the arranged numbers we compute the
median.
This is also called the Hinge
Example
Consider the 5 numbers:
10 15 21 7
13
Arranged in increasing order:
7
10 13 15 21
Median
(Hinge)
The median (or Hinge) splits the observations
in half
The lower mid-hinge (the first quartile) is the
“median” of the lower half of the observations
(excluding the median).
The upper mid-hinge (the third quartile) is the
“median” of the upper half of the observations
(excluding the median).
Consider the five number in increasing order:
Lower
Half
7
Upper
Half
10
13
15
21
Upper Mid-Hinge
Median
(Hinge)
Upper Mid-Hinge
(First Quartile)
13
(Third Quartile)
(7+10)/2 =8.5
(15+21)/2 = 18
Computing the median and the quartile using
the first method:
Position of the median: k = 0.5(5+1) = 3
Position of the first Quartile: k = 0.25(5+1) = 1.5
Position of the third Quartile: k = 0.75(5+1) = 4.5
7
Q1 = 8. 5
10
13
Q2 = 13
15
21
Q3 = 18
• Both methods result in the same value
• This is not always true.
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119
Upper Mid-Hinge
Upper Mid-Hinge
(First Quartile)
Median
(Hinge)
(Third Quartile)
89
96
105
Computing the median and the quartile using
the first method:
Position of the median: k = 0.5(23+1) = 12
Position of the first Quartile: k = 0.25(23+1) = 6
Position of the third Quartile: k = 0.75(23+1) = 18
80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119
Q1 = 89
Q2 = 96
Q3 = 105
• Many programs compute percentiles,
quartiles etc.
• Each may use different methods.
• It is important to know which method is
being used.
• The different methods result in answers that
are close when the sample size is large.
Box-Plots
Box-Whisker Plots
• A graphical method of of
displaying data
• An alternative to the histogram
and stem-leaf diagram
To Draw a Box Plot
• Compute the Hinge (Median, Q1) and the
Mid-hinges (first & third quartiles – Q2
and Q3 )
• We also compute the largest and smallest
of the observations – the max and the
min.
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119
min = 80
Q1 = 89
Q2 = 96
Q3 = 105
max = 119
The Box Plot is then drawn
• Drawing above an axis a “box” from Q2
to Q3.
• Drawing vertical line in the box at the
median, Q1
• Drawing whiskers at the lower and upper
ends of the box going down to the min
and up to max.
Lower
Whisker
min
Upper
Whisker
Box
Q1
Q2
Q3
max
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
min = 80
Q1 = 89
Q2 = 96
Q3 = 105
max = 119
Box Plot of Verbal IQ
70
80
90
100
110
120
130
130
120
110
100
90
80
70
Box Plot can also be
drawn vertically
Box-Whisker plots
(Verbal IQ, Math IQ)
Box-Whisker plots
(Initial RA, Final RA )
Summary
Information contained in the box plot
25%
25%
25%
Middle 50%
of population
25%
Measures of Variability
Variability
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
Measure of Variability
(Dispersion, Spread)
•
•
•
•
Variance, standard deviation
Range
Inter-Quartile Range
Pseudo-standard deviation
Range
Definition
Let min = the smallest observation
Let max = the largest observation
Then Range =max - min
Range
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
Inter-Quartile Range (IQR)
Definition
Let Q1 = the first quartile,
Q3 = the third quartile
Then the
Inter-Quartile Range
= IQR = Q3 - Q1
Inter-Quartile Range
0.14
0.12
0.1
0.08
0.06
50%
0.04
0.02
25%
0
0
5
Q1
25%
10
Q3
15
20
25
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94
94 95 95 96 99 99 102 102
104 105 105 109 111 118 119
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119
min = 80
Q1 = 89
Q2 = 96
Q3 = 105
max = 119
Range
Range = max – min = 119 – 80 = 39
Inter-Quartile Range
= IQR = Q3 - Q1 = 105 – 89 = 16
Some Comments
• Range and Inter-quartile range are relatively
easy to compute.
• Range slightly easier to compute than the
Inter-quartile range.
• Range is very sensitive to outliers (extreme
observations)
Sample Variance
Let x1, x2, x3, … xn denote a set of n numbers.
Recall the mean of the n numbers is defined
as:
n
x
 xi
i 1
n
x1  x2  x3    xn 1  xn

n
The numbers
d1  x1  x
d2  x2  x
d3  x3  x

d n  xn  x
are called deviations from the the mean
The sum
n
d
i 1
n
2
i
   xi  x 
2
i 1
is called the sum of squares of deviations from
the the mean.
Writing it out in full:
d  d  d  d
2
1
or
2
2
2
3
x1  x   x2  x 
2
2
2
n
   xn  x 
2
The Sample Variance
Is defined as the quantity:
n
d
i 1
n
2
i
n 1

 x  x 
i 1
2
i
n 1
and is denoted by the symbol
s
2
Example
Let x1, x2, x3, x3 , x4, x5 denote a set of 5
denote the set of numbers in the following
table.
i
1
2
3
4
5
xi
10
15
21
7
13
Then 5
 xi
i 1
and
x
= x 1 + x2 + x3 + x4 + x5
= 10 + 15 + 21 + 7 + 13
= 66
n
 xi
i 1
n
x1  x2  x3    xn 1  xn

n
66

 13.2
5
The deviations from the mean d1, d2, d3, d4, d5
are given in the following table.
i
1
2
3
4
5
xi
10
15
21
7
13
di -3.2
1.8
7.8
-6.2 -0.2
The sum
n
d
i 1
n
2
i
   xi  x 
2
i 1
  3.2  1.8  7.8   6.2   0.2
2
2
2
2
 10.24  3.24  60.84  38.44  0.04
 112.80
n
and
2
xi  x 

112.8
2
i 1
s 

 28.2
n 1
4
2
The Sample Standard Deviation s
Definition: The Sample Standard Deviation is
defined by:
n
s
d
i 1
n
2
i
n 1

 x  x 
i 1
2
i
n 1
Hence the Sample Standard Deviation, s, is the
square root of the sample variance.
In the last example
n
s s 
2
 x  x 
i 1
2
i
n 1
112.8

 28.2  5.31
4
Interpretations of s
• In Normal distributions
– Approximately 2/3 of the observations will lie
within one standard deviation of the mean
– Approximately 95% of the observations lie
within two standard deviations of the mean
– In a histogram of the Normal distribution, the
standard deviation is approximately the
distance from the mode to the inflection point
Mode
0.14
0.12
Inflection point
0.1
0.08
0.06
0.04
s
0.02
0
0
5
10
15
20
25
2/3
s
s
2s
Example
A researcher collected data on 1500 males
aged 60-65.
The variable measured was cholesterol and
blood pressure.
– The mean blood pressure was 155 with a
standard deviation of 12.
– The mean cholesterol level was 230 with a
standard deviation of 15
– In both cases the data was normally distributed
Interpretation of these numbers
• Blood pressure levels vary about the value
155 in males aged 60-65.
• Cholesterol levels vary about the value 230
in males aged 60-65.
• 2/3 of males aged 60-65 have blood pressure
within 12 of 155. Ii.e. between 155-12 =143
and 155+12 = 167.
• 2/3 of males aged 60-65 have Cholesterol
within 15 of 230. i.e. between 230-15 =215
and 230+15 = 245.
• 95% of males aged 60-65 have blood
pressure within 2(12) = 24 of 155. Ii.e.
between 155-24 =131 and 155+24 = 179.
• 95% of males aged 60-65 have Cholesterol
within 2(15) = 30 of 230. i.e. between 23030 =200 and 230+30 = 260.
A Computing formula for:
Sum of squares of deviations from the the
mean :
n
 x  x 
i 1
2
i
The difficulty with this formula is that x will
have many decimals.
The result will be that each term in the above
sum will also have many decimals.
The sum of squares of deviations from the the
mean can also be computed using the
following identity:


x



i
n
2
i 1


  xi 
n
i 1
n
n
 x  x 
i 1
2
i
2
To use this identity we need to compute:
n
x
i 1
 x1  x2    xn and
i
n
x
i 1
2
i
 x  x  x
2
1
2
2
2
n
Then:
n
 x  x 
i 1


x



i
n
2
i 1


  xi 
n
i 1
n
2
i
2


x


i
n
2
i 1


xi 

n
i 1

n 1
n
n
and s 
2
 x  x 
i 1
2
i
n 1
2
and


x



i
n
2
i 1


xi 

n
i 1

n 1
n
n
s
 x  x 
i 1
2
i
n 1
2
Example
The data Verbal IQ on n = 23 students
arranged in increasing order is:
80 82 84 86 86 89 90 94
94 95 95 96 99 99 102 102
104 105 105 109 111 118 119
n
x
i
i 1
n
x
i 1
2
i
= 80 + 82 + 84 + 86 + 86 + 89
+ 90 + 94 + 94 + 95 + 95 + 96
+ 99 + 99 + 102 + 102 + 104
+ 105 + 105 + 109 + 111 + 118
+ 119 = 2244
= 802 + 822 + 842 + 862 + 862 + 892
+ 902 + 942 + 942 + 952 + 952 + 962
+ 992 + 992 + 1022 + 1022 + 1042
+ 1052 + 1052 + 1092 + 1112
+ 1182 + 1192 = 221494
Then:
n
 x  x 
i 1


x



i
n
2
i 1


  xi 
n
i 1
n
2
i

2244
 221494 
2
2
23
 2557.652


x



i
n
2
i 1


xi 

n
i 1

n 1
n
n
and s 
2
 x  x 
2
i
i 1
n 1

2244
221494 
2
2

23
22
2557.652

 116.26
22


x


i
n
2
i 1


xi 

n
i 1

n 1
n
n
Also s 
 x  x 
i 1
2
i
n 1

2244
221494 
2
2

 10.782
23
22
2557.652

 116.26
22
A quick (rough) calculation of s
Range
s
4
The reason for this is that approximately all
(95%) of the observations are between x  2s
and x  2s.
Thus max  x  2s and min  x  2s.
and Range  max  min  x  2s   x  2s .
 4s
Range
Hence s 
4
Example
Verbal IQ on n = 23 students
min = 80 and max = 119
119 - 80 39
s

 9.75
4
4
This compares with the exact value of s
which is 10.782.
The rough method is useful for checking
your calculation of s.
The Pseudo Standard Deviation (PSD)
Definition: The Pseudo Standard Deviation
(PSD) is defined by:
IQR InterQuart ile Range
PSD 

1.35
1.35
Properties
• For Normal distributions the magnitude of the
pseudo standard deviation (PSD) and the standard
deviation (s) will be approximately the same value
• For leptokurtic distributions the standard deviation
(s) will be larger than the pseudo standard
deviation (PSD)
• For platykurtic distributions the standard deviation
(s) will be smaller than the pseudo standard
deviation (PSD)
Example
Verbal IQ on n = 23 students
Inter-Quartile Range
= IQR = Q3 - Q1 = 105 – 89 = 16
Pseudo standard deviation
IQR 16
 PSD 

 11.85
1.35 1.35
This compares with the standard deviation
s  10.782
• An outlier is a “wild” observation in the
data
• Outliers occur because
– of errors (typographical and computational)
– Extreme cases in the population
• We will now consider the drawing of boxplots where outliers are identified
To Draw a Box Plot we need to:
• Compute the Hinge (Median, Q2) and the
Mid-hinges (first & third quartiles – Q1
and Q3 )
• To identify outliers we will compute the
inner and outer fences
• Lower inner fence
•f1 = Q1 - (1.5)IQR
Lower outer fence
F1 = Q1 - (3)IQR
Upper outer fence
F2 = Q3 + (3)IQR
Lower inner fence
f1 = Q1 - (1.5)IQR
Upper inner fence
f2 = Q3 + (1.5)IQR
• Observations that are between the lower and
upper fences are considered to be nonoutliers.
• Observations that are outside the inner
fences but not outside the outer fences are
considered to be mild outliers.
• Observations that are outside outer fences
are considered to be extreme outliers.
• mild outliers are plotted individually in a
box-plot using the symbol
• extreme outliers are plotted individually in
a box-plot using the symbol
• non-outliers are represented with the box
and whiskers with
– Max = largest observation within the fences
– Min = smallest observation within the fences
Box-Whisker plot
representing the data
that are not outliers
Extreme outlier
Mild outliers
Inner fences
Outer fence
Example – Illustrating techniques
In this example we are looking at the weight
gains (grams) for rats under six diets differing
in level of protein (High or Low) and source
of protein (Beef, Cereal, or Pork).
– Ten test animals for each diet
Table
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
High Protein
Level
Low protein
Source
Beef
Cereal
Pork
Beef
Cereal
Pork
Diet
1
73
102
118
104
81
107
100
87
117
111
103.0
100.0
24.0
17.78
229.11
15.14
2
98
74
56
111
95
88
82
77
86
92
87.0
85.9
18.0
13.33
225.66
15.02
3
94
79
96
98
102
102
108
91
120
105
100.0
99.5
11.0
8.15
119.17
10.92
4
90
76
90
64
86
51
72
90
95
78
82.0
79.2
18.0
13.33
192.84
13.89
5
107
95
97
80
98
74
74
67
89
58
84.5
83.9
23.0
17.04
246.77
15.71
6
49
82
73
86
81
97
106
70
61
82
81.5
78.7
16.0
11.05
273.79
16.55
Median
Mean
IQR
PSD
Variance
Std. Dev.
Box Plots: Weight Gains for Six Diets
130
120
110
Weight Gain
100
90
80
70
60
50
40
1
2
3
4
Diet
5
6
Non-Outlier Max
Non-Outlier Min
Median; 75%
25%
Box Plots: Weight Gains for Six Diets
130
High Protein
120
Low Protein
110
Weight Gain
100
90
80
70
60
50
Beef
Cereal
Pork
Beef
2
3
4
Cereal
Pork
40
1
Diet
5
6
Non-Outlier Max
Non-Outlier Min
Median; 75%
25%
Conclusions
• Weight gain is higher for the high protein
meat diets
• Increasing the level of protein - increases
weight gain but only if source of protein is a
meat source
Measures of Shape
• Skewness
0.14
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
• Kurtosis
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
-3
-2
-1
0
1
2
3
0
0
5
10
15
20
25
-3
-2
-1
0
1
2
3
• Skewness – based on the sum of cubes
n
 x  x 
i 1
3
i
• Kurtosis – based on the sum of 4th powers
n
 x  x 
i 1
4
i
The Measure of Skewness
n
1
3
 xi  x 

n i 1
g1 
3
s
The Measure of Kurtosis
n
1
4
 xi  x 

n i 1
g2 
3
4
s
Interpretations of Measures of Shape
• Skewness
0.14
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.12
g1 > 0
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
g1 = 0
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
0
5
10
15
20
25
g1 < 0
0
5
10
15
20
25
• Kurtosis
0.14
g2 < 0
0.12
g2 = 0
0.1
0.08
0.06
g2 > 0
0.04
0.02
0
0
-3
-2
-1
0
1
2
3
0
0
5
10
15
20
25
-3
-2
-1
0
1
2
3
Related documents