Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Fall 2015
Sampling Frame
Sampling frame: the sampling frame is the list of the population (this is a general
term) from which the sample is drawn. It is important to understand how the
sampling frame defines the population represented.
Example: If the study seeks to identify the safety effects of traffic signals, the
sample frame should include a sample of signalized intersections in a given
geographical area. If a control group is included, the sampling frame will include
sites categorized under this group.
Sig Int #2
Sig Int #1
Unsig Int #2
Unsig Int #1
Unsig Int #7
Signalized
Sig Int #9
Unsignalized
Sampling Frame
Map crashes for Year 1
Map crashes for Year 2
Sampling Frame
0
3
10
5
2
0
7
1
1
4
2
0
11
2
6
3
Number of Crashes for Year 2
Number of Crashes for Year 1
1
0
8
10
5
1
2
0
4
6
1
3
6
0
3
7
Sampling Frame
Signalized Intersections Database
Intersection
Number
Crashes/Year
Traffic Flow
– Major
Other Site
Characteristics*
Year
1
0
11,500
1
2
3
12,000
1
3
10
10,000
1
…
…
…
9
6
6,300
1
1
1
12,000
2
2
0
12,200
2
…
…
…
9
3
6,100
…
…
* ex: Nb of lanes, actuated signals, exclusive left-turn lane, etc.
1
2
2
Sampling Frame
Signalized Intersections Database
0
1
1
2
6
3
1
2
Intersection 1
Intersection 9
Crash Count
Year
Crash Count
Year
Sampling Frame
Unsignalized Intersections Database
Intersection
Number
Crashes/Year
Traffic Flow
– Major
Other Site
Characteristics*
Year
1
2
8,400
1
2
0
9,000
1
3
1
8,500
1
…
…
…
7
3
7,900
1
1
5
8,600
2
2
1
9,400
2
…
…
…
9
7
7,800
…
…
* ex: Nb of lanes, actuated signals, exclusive left-turn lane, etc.
1
2
2
Source: Washington et al. (2003)
50
45
40
Crashes per Year
35
30
25
20
15
10
5
0
0
10000
20000
30000
40000
Traffic Flow
50000
60000
70000
80000
Scatter Diagrams
Scatter Diagrams
Source: Washington et al. (2003)
3D Bar Charts
Crash
Severity /
Flow Range
Fatal
< 5,000
5,000-9,999
≥ 10,000
10
12
15
Non-Fatal
Injury
100
120
135
PDO
550
700
900
Maps
High-Rate
Mid-Rate
Low-Rate
Maps – GIS Information
http://www.saferoadmaps.org/home/
Confidence Intervals
Statistics are usually calculated from samples, such as the
sample average X, variance s2, the standard deviation s, are
used to estimate the population parameters. For instance:
X is used as an estimate of the population μx
s2 is used as an estimate of the population variance σ2
Interval estimates, defined as Confidence Intervals, allow
inferences to be drawn about the population by providing
an interval, a lower and upper value, within which the
unknown parameter will lie with a prescribed level of
confidence. In other words, the true value of the
population is assumed to be located within the estimated
interval.
Confidence Intervals
Confidence Interval for μ and known σ2

 

0.95  P  X  1.96
   X  1.96

n
n

95% CI
X  1.96
Any CI

n
90% CI
X  1.645

n
X  Z / 2

n
Confidence Intervals
Compute the 95% confidence interval for the mean vehicular
speed. Assume the data is normally distributed. The sample size
is 1,296 and the sample mean X is 58.86. Suppose the
population standard deviation (σ) has previously been computed
to be 5.5.
Confidence Intervals
Compute the 95% confidence interval for the mean vehicular
speed. Assume the data is normally distributed. The sample size
is 1,296 and the sample mean X is 58.86. Suppose the
population standard deviation (σ) has previously been computed
to be 5.5.
Answer
X  1.96

n
5.5
58.86  1.96
 58.86  0.30
1, 296
CI  58.56,59.16
Confidence Intervals
Confidence Interval for μ and unknown σ2
s
s 

0.95  P  X  1.96
   X  1.96

n
n

95% CI
s
X  1.96
n
90% CI
s
X  1.645
n
Any CI
X  t / 2
s
n
Only valid if n > 30
Confidence Intervals
Same example: Compute the 95% confidence interval for the
mean vehicular speed. Assume the data is normally distributed.
The sample size is 1,296 and the sample mean X is 58.86. Now,
suppose a sample standard deviation (s) has previously been
computed to be 4.41.
Answer
s
X  1.96
n
4.41
58.86  1.96
 58.86  0.24
1, 296
CI  58.62,59.10
Confidence Intervals
Confidence Interval for a Population Proportion
The relative frequency in a population may sometimes
be of interest. The confidence interval can be
computed using the following equation:
pˆ  Z / 2
ˆˆ
pq
n
^
Where, p
is an estimator of the proportion in a
population; and, ^q = 1 – ^
p.
Normal approximation is only good when np > 5 and
nq > 5.
Confidence Intervals
A transportation agency located in a small city is interested to
know the percentage of people who were involved in a collision
during the last calendar year. A random sample is conducted
using 1000 drivers. From the sample, it was found that 110
drivers were involved in at least one collision. Compute the 90%
CI.
Confidence Intervals
A transportation agency located in a small city is interested to
know the percentage of people who were involved in a collision
during the last calendar year. A random sample is conducted
using 1,000 drivers. From the sample, it was estimated that 110
drivers were involved in at least one collision. Compute the 90%
CI.
Answer
pˆ  110 1000  0.11
pˆ  Z / 2
ˆˆ
pq
n
qˆ  1  0.11  0.89
0.11 0.89
0.11  1.645
 0.11  0.016
1000
CI  0.094,0.126
Confidence Intervals
Confidence Interval Population Variance
When the population variance is of interest, the
confidence interval can be computed using the
following equation:
  n  1 s 2  n  1 s 2 
, 2


2
1 / 2 
  / 2
Where, X 2 is Chi-Square with n-1 degrees of freedom
Assumption: the population is normally distributed.
Confidence Intervals
Taking the same example before on the vehicular speed,
compute the confidence interval (95%) for variance for the
speed distribution. A sample of 100 vehicles has shown a
variance equal to 19.51 mph.
Confidence Intervals
Taking the same example before on the vehicular speed,
compute the confidence interval (95%) for variance for the
speed distribution. A sample of 100 vehicles has shown a
variance equal to 19.51 mph.
Answer
  n  1 s  n  1 s 
, 2


2
1 / 2 
  / 2
  99  19.51  99  19.51
,


74.22 
 129.56
2
15.05, 26.02
2
Taken from ChiSquare Table
The Chi-Square Goodness-of -fit
Non-parametric test useful for observations that are assumed to
be normally distributed. Need to have more than 5 observations
per cell. The test statistic is
n
 / 2  
2
i 1
 Oi  Pi 
2
Pi
If the value on the right-hand side is less than the Chi-Square
with n-1 degrees of freedom, the observed and estimated values
are the same. If not, the observed and estimated values are not
the same.
You can also perform this test for two-way contingency tables.
Related documents