Download Discrete Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ECONOMICS 4630: REVIEW FOR FINAL
General:
The test will consist of two sorts of problems. First will be a series of
statements to which you will be asked to respond, deciding whether each
statement is true, false, or uncertain. You must explain your reasoning in
order to get any credit. The second section will involve problems much like
those on your homework sets (see also problems in the text). You should be
able to not only solve problems, but demonstrate an understanding of what
you are doing and why. The chapters we covered in the 7th edition of the
textbook are 1 – 11 and 13 – 14 (Topics 1 – 13 in course booklet).
I will provide you with a formula sheet (see attached), so there is no need to
memorize formulae. Bring a calculator that is not programmable; make sure
its batteries are fresh. NOTE: cell phones cannot be turned on for any reason
during the test.
I.
Introduction
A.
Probability and Statistics
1.
What is probability and what is statistics?
2.
What is probability and statistics good for?
B.
Types of Data
C.
Types of Variables
D.
Levels of Measurement
E.
Terminology
II.
Descriptive Statistics
A.
Location or central tendency: Using Graphs
B.
Location or Central Tendency: Numerical Methods
1.
Mode
2.
Median and other percentiles
3.
Mean
C.
The Spread of a Distribution
1.
range
2.
interquartile range (IQR)
3.
variance and standard deviation (population)
4.
variance and standard deviation (sample)
III.
Probability Theory
A.
What is Probability in General?
B.
Probabilities of More Complex Events
1.
Probability Trees
2.
Outcome Sets
C.
Combinations of Events
1.
Union
2.
Intersection
3.
Complements
D.
Conditional Probability
E.
Independence
F.
Joint Distributions
IV.
Discrete Probability Distributions
A.
Discrete probability distributions in general
B.
The uniform distribution
C.
The binomial (or Bernoulli)
D.
The hypergeometric distribution
E.
The Poisson distribution
V.
Continuous Probability Distributions
A.
Probability density
B.
Continuous distributions in general
C.
The normal distribution
1.
The standard normal
2.
The general normal
VI.
Sampling
A.
Why sample?
B.
Probability sampling methods
1.
simple random sampling
2.
systematic random sampling
3.
stratified random sampling
4.
stratified cluster sampling
C.
Sampling error and sampling distributions
D.
The central limit theorem
VII.
Point Estimates and Confidence Intervals
A.
When  is known
B.
When  is unknown
C.
Confidence intervals for proportions
D.
Selecting the proper sample size
VIII.
One-Sample Hypothesis Testing
A.
Hypotheses and hypothesis testing in general
B.
Testing procedures
C.
One-tailed tests vs. two-tailed tests
D.
When  is known
E.
When  is unknown
F.
Hypothesis Testing Regarding Proportions
VIII.
Two-Sample Hypothesis Testing
A.
B.
C.
When  is known
When  is unknown
Hypothesis Testing Regarding Proportions
X
Correlation
A.
Correlation versus causation
B.
Scatterplots
C.
Correlation coefficient
D.
Coefficient of determination
E.
Testing hypotheses regarding the correlation coefficient
XI.
Simple Regression
A.
Introduction
1.
deterministic vs. statistical relationships
2.
scatterplots
B.
Ordinary Least Squares
1.
fitted values
2.
residuals
C.
Interpretation of OLS coefficients
D.
Variance of OLS coefficients
E.
Goodness of Fit
F.
Hypothesis Testing
XII.
Multiple Regression
A.
Introduction
B.
Interpreting OLS Coefficients
C.
Multiple Regression Using Excel
D.
Dummy Explanatory Variables
FORMULA SHEET
To find percentiles: grouped data
q th percentile  L 
where: L =
q n  CF (i)
f
the lower limit of the class containing the percentile of interest
q=
the percentile of interest, stated in decimal terms (e.g. 75th percentile would be 0.75)
n=
total number of frequencies
f=
frequency in the class containing the percentile of interest
CF = cumulative number of frequencies in the classes preceding the
class containing the percentile of interest
i=
class interval
To find percentiles (raw data)
Position of qth percentile = (n  1)
Q
100
, where Q is the percentile of interest stated in
percent terms (i.e. 75th percentile would be 75)
2k rule
When grouping data, choose the smallest number, k, such that 2k > n, where n is the
sample size.
Rule for Determining Class Interval
i
H L
, where I is the class interval, H and L are the largest and smallest observations,
k
and k is the number of classes
Complements
If A is the complement of A, then P A 1  PA 
Interquartile Range
IQR = Q3 - Q1, where Q1 is the 25th percentile and Q3 is the 75th percentile
Linear Combinations of Random Variables
If Y  a  bX , then  Y  a  b X and  Y  b 
X
Special rule of multiplication
If A, B, C, … , Z are events, assuming that each outcome is independent of every
other (that is, the occurrence of one outcome has no effect on the probability of
the occurrence of any other outcome), then P(A B  C …  Z) =
P(A)*P(B)*P(C)*…*P(Z).
Unions and Intersections
P( X  Y )  P( X )  P(Y )  P( X  Y )
This is the “general rule of addition”
If X and Y are mutually exclusive, then
P( X  Y )  P( X )  P(Y )
Conditional Probability
This is the “special rule of addition”
P( X Y ) 
P( X  Y )
, or using probability distribution notation
P(Y )
P ( x, y )
P( x y ) 
P( y )
These are the “general rule of multiplication”
Independence
Using set notation, two events, X and Y, are independent if
P( X  Y )  P( X ) P(Y ) or if
P( X Y )  P( X )
Using probability distribution notation, X and Y are independent if
P(x,y) = P(x)P(y) for all x,y or if
P( x y)  P( x) for all x, y
Mean, Variance, and Standard Deviation (population formulae)
   xP( x)
 2   ( x   ) 2 P( x) or
 2   x 2 P( x)   2
  2
Mean, Variance, and Standard Deviation (sample formulae for raw data)
X 
1
 Xi
n
S2 
1
X i  X 2

n 1
S  S2
Mean, Variance, and Standard Deviation (sample formulae for grouped data)
X
1 J
 X j f j , where j = 1, 2, …, J is the class number, Xj is the midpoint of
n j 1
class j, and fj is the frequency in class j
S2 


1 J
2
where j = 1, 2, …, J is the class number, Xj is the
 fj Xj X
n  1 j 1
midpoint of class j, and fj is the frequency in class j
S  S2
Uniform Probability Distribution
P x  
1
,
b  a 1
where a and b are the minimum and maximum values, respectively.
Mean and variance of uniform

ab
2
2 
b  a b  a  2
12
Binomial Probability Distribution
 n
P( x)    x (1   ) n  x
 x
Where:  = probability of success
n = # of trials
X = # of successes in n trials
 n
n!
n(n  1)( n  2)...(1)
  

 x  x!(n  x)! [ x( x  1)( x  2)...(1)][( n  x)( n  x  1)( n  x  2)...(1)
Mean and variance of binomial
 X  n
 2 X  n (1   )
Hypergeometric Probability Distribution
S!
( N  S )!
X !( S  X )! (n  X )![( N  S )  (n  X )]!
P( X ) 
N!
n!( N  n)!
where
S = number of successes in population
n = sample size (# of trials)
N = population size
N-S = # of failures in the population
X = number of successes in the sample
Mean and variance of hypergeometric
x 
 x2 
nS
N
n  S  N  S 
N2

N  n 
N 1
Poisson Probability Distribution
P( X ) 
 x e 
x!
,
where e = 2.7183
X = # of successes
 = average (mean) number of successes
Mean and variance of Poisson
x = 
2x = 
Standard Normal Probability Distribution
P( z ) 
1
2
e
1
( ) Z 2
2
General Normal Probability Distribution
P( x) 
1
 2
e
 1  x   
  

 2   
2
‘Standardizing’ a Normally Distributed Random Variable
If X is distributed normally with mean  and standard deviation , then
Z
x

will be distributed standard normal.
The Central Limit Theorem
In repeated random samples of a particular size, the sampling distribution of the
sample means is distributed approximately normal, with mean =  and standard
deviation of

n
. (If  is unknown, we use our estimate of it, S)
Margin of Error
E t
S
n
,
where E is the tolerable margin of error
t is the critical value associated with a given confidence level from the t-table
S is the sample standard deviation
n is the sample size
Confidence Intervals, σ Known
X z
S
n
where z depends on the level of confidence (for example, if α = 0.05, z = 1.96)
Confidence Intervals, σ Unknown
S
n
X t
where t depends on the level of confidence (for example, if α = 0.01 and degrees
of freedom = 9, t = 3.25)
Confidence Intervals, Proportions
pz
p (1  p )
n
where z depends on the level of confidence (for example, if α = 0.05, z = 1.96),
and p is the sample proportion.
Test Statistic ( is known)
z
X 

n
Test Statistic ( is unknown)
t
X  0
S
n
where 0 is the value of the mean hypothesized under the null.
NOTE: use n-1 degrees of freedom
Test Statistic, Proportions
z
p 
 (1   )
, where p is the sample proportion and  is the population
n
proportion
Test Statistic, Comparing Two Means (σ known)
z
X1  X 2
 12
n1

 22
n2
Test statistic comparing two means (σ unknown)
t
S p2 
X1  X 2
1
1
S p2   
 n1 n2 
, where
(n1  1)( S12 )  (n2  1)( S 22 )
(n1  n2  2)
NOTE: use (n1 + n2 – 2) degrees of freedom
Test statistic comparing two proportions
z
p1  p 2
, where
 1

1

p 1  p 

n

n
2 
 1
p1 is the sample proportion from group 1, p2 is the sample proportion from group
2, n is the sample size of group 1, n2 is the sample size of group 2, and
n p  n2 p2
p 1 1
n n
1
2
Test statistic for correlation coefficient, ρ
t
 n2
1  2
NOTE: use (n – 2) degrees of freedom.
Test statistics, t-distribution (testing estimated regression coefficients)
t
ˆ1  c
where c is the value of β1 hypothesized under the null.
SE ( ˆ1 )
t
ˆ 2  c
where c is the value of β2 hypothesized under the null.
SE( ˆ 2 )
t
ˆ3  c
where c is the value of β3 hypothesized under the null.
SE( ˆ3 )
NOTE: degrees of freedom will be n – k, where k is the number of
coefficients to be estimated.
b b
Using the book’s notation, t  i 0 , where bi is the estimated OLS regression
SE(bi )
coefficient, and SE(bi) is the standard error of that estimate. b0 is the
value of b estimated under the null hypothesis
Estimated OLS Regression Coefficients
2
 X i  Yi   X i  X iYi
ˆ
1 
n X i2  ( X i ) 2
ˆ2 
n X iYi   X i  Yi
n X i2  ( X i ) 2
OR
OR
ˆ2 
ˆ1  Y  ˆ 2 X
 xi yi
, where xi
2
 xi
 X i  X ; yi  Yi  Y 
Standard Errors of Estimated OLS Coefficients
 
SE ˆ1 
 
SE ˆ 2 
2
 Xi
n  xi2
2
 ˆi

2
 ˆi
n2
where

xi   X i  X ; ˆi  Yi  ˆ1  ˆ 2 X i

n  2 where x   X  X ; ˆ  Y  ˆ  ˆ X
i
i
i
i
1
2 i
2
 xi


Binomial Coefficients
 n
n!
  
 x  x!(n  x)!
x
0
1
2
3
4
5
6
7
8
9
10
1
1
1
2
1
1
3
3
1
1
4
6
4
1
1
5
10
10
5
1
1
6
15
20
15
6
1
1
7
21
35
35
21
7
1
1
8
28
56
70
56
28
8
1
1
9
36
84
126
126
84
36
9
1
1
10
45
120
210
252
210
120
45
10
1
1
11
55
165
330
462
462
330
165
55
11
1
12
66
220
495
792
924
792
495
220
66
1
13
78
286
715
1,287
1,716
1,716
1,287
715
286
1
14
91
364
1,001
2,002
3,003
3,432
3,003
2,002
1,001
1
15
105
455
1,365
3,003
5,005
6,435
6,435
5,005
3,003
1
16
120
560
1,820
4,368
8,008
11,440
12,870
11,440
8,008
1
17
136
680
2,380
6,188
12,376
19,448
24,310
24,310
19,448
1
18
153
816
3,060
8,568
18,564
31,824
43,758
48,620
43,758
1
19
171
969
3,876
11,628
27,132
50,388
75,582
92,378
92,378
1
20
190
1,140
4,845
15,504
38,760
77,520
125,970
167,960
184,756
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Note: 0! = 1
Student’s t Distribution
df
0.100
1
2
3
4
5
0.020
3.078
1.886
1.638
1.533
1.476
Confidence Intervals
90%
95%
98%
99%
Level of Significance for One-Tailed Test
0.050
0.025
0.010
0.005
Level of Significance for Two-Tailed Test
0.10
0.05
0.02
0.01
6.314
12.706
31.821
63.657
2.920
4.303
6.965
9.925
2.353
3.182
4.541
5.841
2.132
2.776
3.747
4.604
2.015
2.571
3.365
4.032
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
4.437
4.318
4.221
4.140
4.073
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.853
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
4.015
3.965
3.922
3.883
3.850
21
22
23
24
25
1.323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.831
2.819
2.807
2.797
2.787
3.819
3.792
3.768
3.745
3.725
26
27
28
29
30
1.315
1.314
1.313
1.311
1.310
1.706
1.703
1.701
1.699
1.697
2.056
2.052
2.048
2.045
2.042
2.479
2.473
2.467
2.462
2.457
2.779
2.771
2.763
2.756
2.750
3.707
3.690
3.674
3.659
3.646
40
60
120

1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.423
2.390
2.358
2.326
2.704
2.660
2.617
2.576
3.551
3.460
3.373
3.291
80%
99.9%
0.0005
0.001
636.619
31.599
12.924
8.610
6.869
Related documents