Download Preliminary data analysis powerpoint slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Two Concepts of Probability
Statistical
Relative frequency in
repeated experiments
Inductive
Subjective
Based on incomplete
information, judgment
and logical reasoning
Bayesian
Line Diagram
From Kottegoda and Rosso, 1997 p3
Dot diagram
From Kottegoda and Rosso, 1997 p4
Histogram of minimum annual flow in
the Po river between 1918 and 1978
16
14
Number of occurrences
12
10
8
6
4
2
0
200
400
600
800
Minimum annual flow m3/s
1000
1200
Minimum annual flow in the Po river between 1918 and 1978
Alternative histogram axis scaling
- Relative Frequency
- Density
0.25
Histogram
Relative frequency polygon
0.0025
0.002
0.15
0.0015
0.1
0.001
0.05
0.0005
0
200
400
600
800
Minimum annual flow m3/s
1000
0
1200
Density
Relative Frequency
0.2
0.003
Po River, Minimum annual flow
cumulative relative frequency
(number of values ≤ n)/n (KR p 8)
1
0.9
Cumulative relative frequency
0.8
0.7
0.6
qs=sort(q)
n=length(q)
crf=(0:(n-1))/n
plot(qs,crf)
0.5
0.4
0.3
0.2
0.1
0
200
300
400
500
600
700
800
Minimum annual flow m3/s
900
1000
1100
Po River, Minimum annual flow
Quantile plot (Q-plot)
1100
qs=sort(q)
n=length(q)
crf=(0:(n-1))/n
plot(crf,qs)
1000
Minimum annual flow m3/s
Interquartile range IQR
900
800
75% Quantile or quartile
700
600
Median
500
25% Quantile or quartile
400
300
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Cumulative relative frequency
0.8
0.9
1
Quantile Definition
0.2
F(y)
p
0.6
pi
qi
-3
-2
-1
0
1
2
3
x
y
A quantile qi is the random
variable value
associated with a specific cumulative
probability pi
Numerical Quantities
Mean
Variance
Std Deviation
Mean absolute
deviation
1 n
x   xi
n i 1
n
1
2
s 
( x i  x )2

n( 1) i 1
n
1
( x i  x )2

n( 1) i 1
x
n
d
i 1
xi  x
n
n
Skewness
g1 
 (x
i 1
i
 x )3
ns 3
Helsel and Hirsch page 21
7
6
5
7
5
6
8
Box Plot
1930
4
3
4
Box (Red
Lines) enclose
50% of the
values
3
log(alafia)
8
Time Series
1940
1950
1960
1970
Time
1980
1990
2000
Median
3
Box Plot
1
2
Outliers: beyond 1.5*IQR
Whiskers: 1.5*IQR or largest
value
-3
-2
-1
0
Box: 25th %tile to 75th %tile
Line: Median (50th %tile) - not
the mean
Note: The range shown by the box is
called the “Inter-Quartile Range” or
IQR.
This is a robust measure of spread. It is
insensitive to outliers since it is based
purely on the rank of the values.
Seasonality of Flow
1000 1500
500
“Monthly Subseries Plot” - time series
for each month
0
Flow (cfs)
Flow
Jan
Outliers
Feb
Mar
Apr
May
Jun
Jul
Horizontal Line is the mean
1000 1500
Sep
Oct
Nov
Dec
Compare change in mean and
median between Aug-Sep.
Note Skew in September
Flow
500
Box Plots
0
Flow (cfs)
Aug
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
1000
500
0
Alafia Flow (cfs)
1500
Scatter Plot - Flow v. Water Level
20
25
30
MD-11 DP Water Level
35
Multiple Scatterplots
5
10
15
0
5
10
15
20
25
1000
1500
0
10
15
0
500
Flow.ALAFIA
Flow = f(Pumping)
Causality?
Co-effect?
OR
Pumping = f(Flow)
30
35
0
5
Pcp.S259
15
20
25
20
25
WL.MB11DP
0
5
10
Pump.MBTOTAL
0
500
1000
1500
20
25
30
35
Water Level = f(Pumping)
Logical relationship
Scatterplot - between raw
x and y data
12
12
14
14
16
16
y
ys
18
18
20
20
22
22
Q-Q plot - between sorted
x and y data
12
14
16
18
20
x
Compares individual X
and Y values
12
14
16
18
20
xs
Compares the
distributions of X and Y
Quantiles to compare to theoretical distribution
Rank the data
pi
0.6
i
n 1
0.2
prob( X  x i ) 
F(y)
x1
x2
x3
.
.
.
xn
Theoretical distribution,
e.g. Standard Normal
-3
-2
-1
0
qi1
2
3
y
qi is the distribution specific theoretical
quantile associated with ranked data value xi
Quantile-Quantile Plots
7
6
5
3
4
Sample Quantiles
3000
2000
1000
0
Sample Quantiles
xi
ln(xi)
8
Normal
Q-Q Plot
QQ-plot for
Log-Transformed
Flows
4000
Normal
QQ-plot
for Q-Q
RawPlot
Flows
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
Theoretical Quantiles
Theoretical Quantiles
qi
qi
Used as a basis for finding transformation to
make the Raw flows Normally distributed.
2
3
18
12
14
16
xs
16
14
12
xs
18
20
20
Quantile plots and Probability Plots
-2
-1
0
q
Q-Q Plot
1
2
0.1
0.5
0.8
0.95
p
Probability Plot
Theoretical quantile axis is relabeled
with corresponding probability values
Related documents