Download 90-786 Lecture 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 10. Random Sampling
and Sampling Distributions
David R. Merrell
90-786 Intermediate Empirical
Methods for Public Policy and
Management
Agenda





Normal Approximation to Binomial
Poisson Process
Random sampling
Sampling statistics and sampling
distributions
Expected values and standard errors of
sample sums and sample means
Binomial Random Variable
Binomial random variable X is the number
of “successes” in n trials, where
 Probability of success remains the same
from trial to trial
 Trials are independent
Binomial Probability Distribution
Discrete distribution with:
 P(X=x) =
(n!/(x!(n-x)!))px qn-x
 n is number of trials
 x is number of successes in n trials
(x = 0, 1, 2, ..., n)


p is the probability of success on a single trial
q is the probability of failure on a single trial
Properties of the Binomial RV

Mean:
 = np

Variance:
 = npq

Standard Deviation:

Binomial(n = 10, p = .4)
x
0
1
2
3
4
5
6
7
8
9
10
P(X=x)
0.006047
0.040311
0.120932
0.214991
0.250823
0.200658
0.111477
0.042467
0.010617
0.001573
0.000105
0
1
2
3
4
5
6
7
8
9
10
0.006047
0.040311
0.120932
0.214991
0.250823
0.200658
0.111477
0.042467
0.010617
0.001573
0.000105
Approximation to Binomial
Distribution

Use normal distribution when:




n is large
np > 10
n(1 - p) > 10
Parameters of the approximating
normal distribution are the mean and
standard deviation from the binomial
distribution
Approximation of Binomial
Distribution
0.09
0.08
0.07
n = 80, p = .4
C2
0.06
0.05
0.04
0.03
0.02
0.01
0.00
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
10
20
30
40
50
60
C1
How Good is the Approximation?
Binomial with n = 80 and p = 0.400000
P(X < 29)
x P( X <= x)
28.00
0.2131
Normal with mean = 32.0000 and standard deviation = 4.38000
x P( X <= x)
28.0000
0.1806
x P( X <= x)
28.5000
0.2121
Application 1
The Chicago Equal Employment Commission
believes that the Chicago Transit Authority
(CTA) discriminates against Republicans. The
records show that 37.5% of the individuals
listed as passing the CTA exam were
Republicans; the remainder were Democrats
(no one registers as an independent in
Illinois). CTA hired 30 people last year, 25 of
them were Democrats. What is the
probability that this situation could exist if
CTA did not discriminate?
Application 1 (cont.)

Success: a Republican is hired

The probability of success, p = 0.375

The number of trials, n = 30

The number of successes, x = 5

P(x  5) = ???
Application 1 (cont.)



Mean:
 = np = 30*.375 = 11.25
Variance:  = npq = 30*.375*.625 =
7.03
Standard Deviation:
 = 2.65
Normal with mean = 11.25 and standard deviation = 2.65
x P( X <= x)
5.5000
0.0150
Poisson Process
rate

x x
x
0
time
Assumptions
time homogeneity
independence
no clumping
Poisson Process




Earthquakes strike randomly over time
with a rate of  = 4 per year.
Model time of earthquake strike as a
Poisson process
Count: How many earthquakes will
strike in the next six months?
Duration: How long will it take before
the next earthquake hits?
Count: Poisson Distribution

What is the probability that 3
earthquakes will strike during the next
six months?
Poisson Distribution
Count in time period t
e ( t )
P(Y  y ) 
, y  0, 1, 
y!
 t
y
Minitab Probability Calculation

Click: Calc > Probability Distributions >
Poisson
Enter: For mean 2, input constant 3
 Output:
Probability Density Function
Poisson with mu = 2.00000
x
P( X = x)
3.00
0.1804

Duration: Exponential Distribution



Time between occurrences in a Poisson
process
Continuous probability distribution
Mean =1/t
Exponential Probability Problem



What is the probability that 9 months
will pass with no earthquake?
t = 1/12, t= 1/3
1/ t = 3
Minitab Probability Calculation

Click: Calc > Probability Distributions >
Exponential
Enter: For mean 3, input constant 9
 Output:
Cumulative Distribution Function
Exponential with mean = 3.00000
x
P( X <= x)
9.0000
0.9502

Exponential Probability Density
Function









MTB > set c1
DATA > 0:12000
DATA > end
Let c1 = c1/1000
Click: Calc > Probability distributions > Exponential
> Probability density > Input column
Enter: Input column c1 > Optional storage c2
Click: OK > Graph > Plot
Enter: Y c2 > X c1
Click: Display > Connect > OK
Exponential Probability Density
Function
0.3
C2
0.2
0.1
0.0
0
5
10
C1
Sampling


Population - entire set of objects that
we are interested in studying
Sample - a chosen subset of a
population
Some Samples Are ...


random -- each item in the population
has an equal chance of being selected
to be part of the sample
representative -- has the same
characteristics as the population under
study, a microcosm of the population
Population Parameters and Sample
Statistics

Population Parameter




Numerical descriptor of a population
Values usually uncertain
e.g., population mean (), population standard
deviation ()
Sample Statistics



Numerical descriptor of a sample
Calculated from observations in the sample
e.g., sample mean X
, sample standard deviation S
What is a sampling distribution?



Sample statistics are random variables
Sample statistics have probability
distributions
“Sampling distribution” is the probability
distribution of a sample statistic
MTB > Retrieve 'C:\MTBWIN\DATA\RESTRNT.MTW'.
Retrieving worksheet from file: C:\MTBWIN\DATA\RESTRNT.MTW
Worksheet was saved on 5/31/1994
MTB > info
Information on the Worksheet
Column
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
Name
ID
OUTLOOK
SALES
NEWCAP
VALUE
COSTGOOD
WAGES
ADS
TYPEFOOD
SEATS
OWNER
FT.EMPL
PT.EMPL
SIZE
Count
279
279
279
279
279
279
279
279
279
279
279
279
279
279
Missing
0
1
25
55
39
42
44
44
12
11
10
14
13
16
MTB > desc 'sales'
Descriptive Statistics
Variable
SALES
N
254
N*
25
Mean
332.6
Median
200.0
Variable
SALES
Min
0.0
Max
8064.0
Q1
83.7
Q3
382.7
MTB > boxp 'sales'
* NOTE * N missing = 25
8000
7000
SALES
6000
5000
4000
3000
2000
1000
0
TrMean
248.9
StDev
650.5
SEMean
40.8
MTB > hist 'sales'
* NOTE * N missing = 25
Frequency
200
100
0
0
1000 2000 3000 4000 5000 6000 7000 8000
SALES
MTB > let c15 = loge('sales')
MTB > let c15 = loge('sales')
J
*** Values out of bounds during operation at J
Missing returned 1 times
MTB > let c15 = loge('sales' + 1)
MTB > name c15 'logsales'
MTB > desc 'logsales'
Descriptive Statistics
Variable
logsales
N
254
N*
25
Mean
5.1830
Median
5.3033
Variable
logsales
Min
0.0000
Max
8.9953
Q1
4.4394
Q3
5.9500
MTB > boxp 'logsales'
* NOTE * N missing = 25
TrMean
5.2134
StDev
1.1387
SEMean
0.0715
9
8
logsales
7
6
5
4
3
2
1
0
90
80
Frequency
70
60
50
40
30
20
10
0
0
1
2
3
4
5
logsales
6
7
8
9
Four Samples of Size 50 From Restaurant “Logsales” Data--Histograms
25
15
Frequency
Frequency
20
10
5
15
10
5
0
0
3
4
5
6
7
2
4
C16
8
C17
20
Frequency
20
Frequency
6
10
0
10
0
2
3
4
5
C18
6
7
3
4
5
C19
6
7
Random Samples from Restaurant “Logsales” Data--Summary
MTB > Desc c16-c19
Descriptive Statistics
Variable
C16
C17
C18
C19
N
43
43
48
43
N*
7
7
2
7
Mean
5.246
5.351
5.366
5.244
Median
5.375
5.352
5.461
5.198
Variable
C16
C17
C18
C19
Min
2.773
1.099
2.485
3.434
Max
6.621
8.456
7.091
6.868
Q1
4.625
4.710
4.961
4.595
Q3
5.787
6.176
5.994
6.089
TrMean
5.280
5.383
5.388
5.253
StDev
0.867
1.223
0.888
0.937
SEMean
0.132
0.186
0.128
0.143
Next Time ...

Central Limit Theorem--”Sample
averages are approximately normally
distributed”
Related documents