Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary Statistics
Jake Blanchard
Spring 2008
Uncertainty Analysis for Engineers
1
Summarizing and Interpreting Data
It is useful to have some metrics for
summarizing statistical data (both input and
output)
 3 key characteristics are
◦ central tendency (mean, median, mode)
◦ Dispersion (variance)
◦ Shape (skewness, kurtosis)
Uncertainty Analysis for Engineers
2
Central Tendency
Mean
n
E ( x)   xi pi
E ( x) 
i 1
 x f ( x)dx
Median=point such that exactly half of the
probability is associated with lower values
and half with greater values
z
 f ( x)dx  0.5
Mode=most likely value (maximum of pdf)
Uncertainty Analysis for Engineers
3
For 1 Dice
mean
1 1 1 1 1 1
E ( x)   xi p ( xi )  1   2   3   4   5   6 
6 6 6 6 6 6
xi 1
E ( x)  3.5
6
median
x  3.5
mod e  3.5
Uncertainty Analysis for Engineers
4
Radioactive Decay
For our example, the mean, median, and
mode are given by
mean
0
E (t )   tf (t )dt   te t dt 
1
median
z
 t
e
 dt  0.5
0
z
ln( 2)
The mode is x=0
Uncertainty Analysis for Engineers
5
Other Characteristics
We can calculate the expected value of
any function of our random variable as
  h( x) f ( x)dx
E h x   
 h x  p  x 
i
i
 i
Uncertainty Analysis for Engineers
6
Some Results
E (c )  c
E (cx)  cE ( x)
n  n
E  x j    E x j 
 j 1  j 1
n
 n
E  b j x j    b j E x j 
 j 1
 j 1
Uncertainty Analysis for Engineers
7
Moments of Distributions
We can define many of these parameters in
terms of moments of the distribution
   x f ( x)dx
1
k
x
f ( x)dx
1
k
 k  Ex   1   
 x   k p ( x )
i
1
i
 i
Mean is first moment.
 Variance is second moment
 Third and fourth moments are related to
skewness and kurtosis
Uncertainty Analysis for Engineers
8
Spread (Variance)
Variance is a measure of spread or dispersion
   2 E x   1  
2
2
 x   
2
1
f ( x)dx
For discrete data sets, the biased variance is:
n
1
2
2
s   x  x 
n i 1
and the unbiased variance is
1 n
2
s 
x
x
n  1 i 1
2
The standard deviation is the square root of
the variance
Uncertainty Analysis for Engineers
9
Skewness
skewness is a measure of asymmetry
 3 Ex   1  
3
 x   
3
1
f ( x)dx
For discrete data sets, the biased skewness
is related to:
n
1
3
m3   x  x 
n i 1
The skewness is often defined as
3
1  3
Uncertainty Analysis for Engineers
10
Skewness
Uncertainty Analysis for Engineers
11
Kurtosis
kurtosis is a measure of peakedness
 4 E x   1  
4
 x   
4
1
f ( x)dx
For discrete data sets, the biased kurtosis is
related to:
n
1
4
m4    x  x 
n i 1
The kurtosis is often defined as
4
2  4 3
Uncertainty Analysis for Engineers
12
Kurtosis
Pdf of Pearson type VII distribution with
kurtosis of infinity (red), 2 (blue), and 0 (black)
Uncertainty Analysis for Engineers
13
Using Matlab
Sample data is length of time a person was
able to hold their breath (40 attempts)
 Try a scatter plot
load RobPracticeHolds;
y = ones(size(breathholds));
h1 = figure('Position',[100 100 400 100],'Color','w');
scatter(breathholds,y);
Uncertainty Analysis for Engineers
14
Adding Information
disp(['The mean is ',num2str(mean(breathholds)),' seconds (green line).']);
disp(['The median is ',num2str(median(breathholds)),' seconds (red line).']);
hold all;
line([mean(breathholds) mean(breathholds)],[0.5 1.5],'color','g');
line([median(breathholds) median(breathholds)],[0.5 1.5],'color','r');
Uncertainty Analysis for Engineers
15
Box Plot
title('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines');
xlabel('');
h3 = figure('Position',[100 100 400 100],'Color','w');
boxplot(breathholds,'orientation','horizontal','widths',.5);
set(gca,'XLim',[40 140]);
title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]);
ylabel('');
Uncertainty Analysis for Engineers
16
Box Plot
Min
Box
represents
inter-quartile
range (half of
data)
Median
Max
Outlier
Uncertainty Analysis for Engineers
17
Empirical cdf
h3 = figure('Position',[100 100 600 400],'Color','w');
cdfplot(breathholds);
Uncertainty Analysis for Engineers
18
Multivariate Data Sets
When there are multiple input variables,
we need some additional ways to
characterize the data
 
   h( x, y ) f ( x, y )dxdy continuous
E h( x, y )  
 h( xi , y j ) p xi , y j  discrete
 i j
Cov( x, y )  E ( xy)  E ( x) E ( y )
If x and y are independent, then
Cov(x,y)=0
Uncertainty Analysis for Engineers
19
Correlation Coefficients
Two random variables may be related
 Define correlation coefficient of input (x) and
output (y) as
 x
m
 x, y 
k 1
k
 x  yk  y 
2
x
x
y
y
k 1 k
k 1 k
m
2
m
Cov( x, y)
 ( x)  ( y )
=1 implies linear dependence, positive slope
 =0 no dependence
 =-1 implies linear dependence, negative
slope
Uncertainty Analysis for Engineers
20
Example
=0.98
=1
=-0.98
=-0.38
Uncertainty Analysis for Engineers
21
Example
x=rand(25,1)-0.5;
y=x;
corrcoef(x,y)
subplot(2,2,1), plot(x,y,'o')
y2=x+0.2*rand(25,1);
corrcoef(x,y2)
subplot(2,2,2), plot(x,y2,'o')
y3=-x+0.2*rand(25,1);
corrcoef(x,y3)
subplot(2,2,3), plot(x,y3,'o')
y4=rand(25,1)-0.5;
corrcoef(x,y4)
subplot(2,2,4), plot(x,y4,'o')
Uncertainty Analysis for Engineers
22