Download Lecture 6 - Inferential Statistics

Document related concepts
no text concepts found
Transcript
PPA 501 – Analytical Methods
in Administration
Lecture 6a – Normal Curve, ZScores, and Estimation
Normal Curve


The normal curve is central to the theory that
underlies inferential statistics.
The normal curve is a theoretical model.



A frequency polygon that is perfectly symmetrical and
smooth.
Bell shaped, unimodal, with infinite tails.
Crucial point distances along the horizontal axis,
when measured in standard deviations, always
measure the same proportion under the curve.
Normal Curve
Computing Z-Scores

To find the percentage of the total area (or
number of cases) above, below, or
between scores in an empirical
distribution, the original scores must be
expressed in units of the standard
deviation or converted into Z scores.
Xi  X
Z
s
Computing Z-Scores – Mean ideology of
House delegation by state.
Computing Z-Scores: Examples

Z




What percentage of the cases fall between -0.5 and 0.01 on
the ideology scale?
X i X  0.5  0.01  0.51


 2.488
s
0.205
0.205
Z
X i  X 0.00  0.01  0.01


 0.049.
s
0.205
0.205
From Excel, =standardize(-0.5, 0.01, 0.205) = Z = -2.4878;
=normsdist(-2.4878); p=0.006427
From Excel, =standardize(0.0,0.01,0.205)= Z = -0.04878;
=normsdist(-0.04878); p=0.480547
P-0.5&0.0 = 0.480547-0.006427 = 0.474120.
47.4% of the distribution lies between -0.56 and 0 on the
ideology scale.
Computing Z-Scores: Examples


What percentage of the House delegations from
1953 to 2005 have more conservative scores
than 0.5? (1 - .992 = 0.008 or 0.8%)
What percentage have more liberal scores than
-0.25? (10.2%).
Xi
Xmean
s
0.5
0.01
-0.25
0.01
Z
p
0.205 2.390244 0.991581
0.205 -1.268293 0.102347
Computing Z-scores: Rules


If you want the distance between a score and
the mean, subtract the probability from .5 if the Z
is negative. Subtract .5 from the probability if Z
is positive.
If you want the distance beyond a score (less
than a score lower than the mean), use the
probability score from Excel. If the distance is
more than a score higher than the mean),
subtract the probability in Excel from 1.
Computing Z-scores: Rules

If you want the difference between two
scores other than the mean:

Calculate Z for each score, identify the
appropriate probability, and subtract the
smaller probability from the larger.
Estimation Procedures


Bias – does the mean of the sampling
distribution equal the mean of the
population?
Efficiency – how closely around the mean
does the sampling distribution cluster. You
can improve efficiency by increasing
sample size.
Estimation Procedures

Point estimate – construct a sample,
calculate a proportion or mean, and
estimate the population will have the same
value as the sample. Always some
probability of error.
Estimation Procedures

Confidence interval – range around the sample
mean.

First step: determine a confidence level: how much
error are you willing to tolerate. The common
standard is 5% or .05. You are willing to be wrong
5% of the time in estimating populations. This figure
is known as alpha or α. If an infinite number of
confidence intervals are constructed, 95% will contain
the population mean and 5% won’t.
Estimation Procedures



We now work in reverse on the normal curve.
Divide the probability of error between the upper
and lower tails of the curve (so that the 95% is in
the middle), and estimate the Z-score that will
contain 2.5% of the area under the curve on
either end. That Z-score is ±1.96.
Similar Z-scores for 90% (alpha=.10), 99%
(alpha=.01), and 99.9% (alpha=.001) are ±1.65,
±2.58, and ±3.29.
Estimation Procedures
  
c.i.  X  Z 

 N
where
c.i.  confidence interval
X  the sample mean
Z  the Z score as determined by the alpha level
  

  the population standard error of the mean
 N
Estimation Procedures – Sample
Mean
 s 
c.i.  X  Z 

 n 1 
where
c.i.  confidence interval
X  the sample mean
Z  the Z score as determined by the alpha level
 s 

  the standard error of the mean
 n 1 
Only use if sample is 100 or greater
Estimation Procedures

You can control the width of the
confidence intervals by adjusting the
confidence level or alpha or by adjusting
sample size.
Confidence Interval Examples
Mean House Ideology for presidential disaster requests (1953 to
2005) with 90%, 95%, and 99% confidence intervals.
Confidence Interval Examples from Presidential Disaster Decisions, 1953 to 2005
Variable
Mean Std. Deviation No. of cases Std. Error Cnfd. Int. Lower Bound Upper Bound
Mean Ideology of House Delegation by State
0.006
0.205
2493
0.004
90%
0.000
0.013
Mean Ideology of House Delegation by State
0.006
0.205
2493
0.004
95%
-0.002
0.015
Mean Ideology of House Delegation by State
0.006
0.205
2493
0.004
99%
-0.004
0.017
Mean Ideology of Senate Delegation by State
Mean Ideology of Senate Delegation by State
Mean Ideology of Senate Delegation by State
-0.022
-0.022
-0.022
0.300
0.300
0.300
2493
2493
2493
0.006
0.006
0.006
90%
95%
99%
-0.031
-0.033
-0.037
-0.012
-0.010
-0.006
PPA 501 – Analytical Methods
in Administration
Lecture 6b – One-Sample and
Two-Sample Tests
Five-step Model of Hypothesis
Testing





Step 1. Making assumptions and meeting
test requirements.
Step 2. Stating the null hypothesis.
Step 3. Selecting the sampling distribution
and establishing the critical region.
Step 4. Computing the test statistic.
Step 5. Making a decision and interpreting
the results of the test.
Five-step Model of Hypothesis
Testing – One-sample Z Scores

Step 1. Making assumptions.




Model: random sampling.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1  
H1:    ; two - tailed
1
test
1   ; one - tailed test or
1   ; one - tailed test
Five-step Model of Hypothesis
Testing – One-sample Z Scores

Step 3. Selecting the sampling distribution
and establishing the critical region.



Sampling distribution = Z distribution.
Α=0.05.
Z(critical)=1.96 (two-tailed); +1.65 or -1.65
(one-tailed).
Five-step Model of Hypothesis
Testing – One-sample Z Scores

Step 4. Computing the test statistic.


Use z-formula.
Step 5. Making a decision.

Compare z-critical to z-obtained. If zobtained is greater in magnitude than zcritical, reject null hypothesis. Otherwise,
accept null hypothesis.
Five-Step Model: Critical Choices


Choice of alpha level: .05, .01, .001.
Selection of research hypothesis.



Two-tailed test: research hypothesis simplify states
that means of sample and population are different.
One-tailed test: mean of sample is larger or smaller
than mean of population.
Type of error to maximize: Type I or Type II.


Type I – rejecting a null hypothesis that is true.
Type II – accepting a null hypothesis that is false.
Five-Step Model: Critical Choices
Five-step Model: Example

Is the average age of voters in the 2000
National Election Study different than the
average age of all adults in the U.S.
population?
Five-step Model of Hypothesis
Testing – Large-sample Z Scores

Step 1. Making assumptions.




Model: random sampling.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1    45.24
H1:   ; two - tailed
1
test
Five-step Model of Hypothesis
Testing – Large-sample Z Scores

Step 3. Selecting the sampling distribution
and establishing the critical region.



Sampling distribution = Z distribution.
α=0.05.
Z(critical)=1.96 (two-tailed)
Five-step Model of Hypothesis
Testing – Large-sample Z Scores

Step 4. Computing the test statistic.
X 
47.21  45.24 1.97
Z (obtained ) 


 4.67

17.88
.4217
N
1798

Step 5. Making a decision.
Z (obtained )  Z (critical )  4.67  1.96
 Reject the null hypothesis of no difference .
The sample is significan tly older than the voting age population
Five-Step Model: Small Sample
T-test (One Sample)

Formula
X 
t (obtained ) 
s N 1
Five-Step Model: Small Sample
T-test (One Sample)

Step 1. Making Assumptions.




Random sampling.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis.


Ho: 1  
H1:    ; two - tailed
1
test
1   ; one - tailed test or
1   ; one - tailed test
Five-step Model of Hypothesis
Testing – One-sample t Scores

Step 3. Selecting the sampling distribution
and establishing the critical region.




Sampling distribution = t distribution.
Α=0.05.
Df=N-1.
t(critical) from Appendix A, Table B in Agresti
and Franklin.
Five-step Model of Hypothesis
Testing – One-sample t Scores

Step 4. Computing the test statistic.
t (obtained ) 

X 
s N 1
Step 5. Making a decision.

Compare t-critical to t-obtained. If t-obtained
is greater in magnitude than t-critical, reject
null hypothesis. Otherwise, accept null
hypothesis.
Five-step Model of Hypothesis
Testing – One-sample t Scores

Is the average age of individuals in the
JCHA 2000 sample survey older than the
national average age for all adults? (Onetailed).
Five-Step Model: Small Sample Ttest (One Sample) – JCHA 2000

Step 1. Making Assumptions.




Random sampling.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis.


Ho: 1    45.24
H1:   ; one - tailed
1
test
Five-Step Model: Small Sample Ttest (One Sample) – JCHA 2000

Step 3. Selecting the sampling distribution
and establishing the critical region.




Sampling distribution = t distribution.
Α=0.05.
Df=41-1=40.
t(critical) =1.684.
Five-Step Model: Small Sample Ttest (One Sample) – JCHA 2000

Step 4. Computing the test statistic.
t (obtained ) 

X 
52.78  45.24 7.54


 2.29
s N  1 20.866 40 3.299
Step 5. Making a decision.

T(obtained) > t(critical). Therefore, reject the
null hypothesis. The sample of residents from
the Jefferson County Housing Authority is
significantly older than the adult population of
the United States.
Two-Sample Models – Large
Samples


Most of the time we do not have the
population means or proportions. All we
can do is compare the means or
proportions of population subsamples.
Adds the additional assumption of
independent random samples.
Two-Sample Models – Large
Samples

Formula.

X
Z (obtained ) 
1
X2
X
X
2
1X 2


1X 2
2
s1
s2

N1  1 N 2  1
Five-Step Model – Large TwoSample Tests (Z Distribution)

Step 1. Making assumptions.




Model: Independent random samples.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1  2
H1:    ; two - tailed
1
2
test
1   2 ; one - tailed test or
1   2 ; one - tailed test
Five-Step Model – Large TwoSample Tests (Z Distribution)

Step 3. Selecting the sampling distribution
and establishing the critical region.



Sampling distribution = Z distribution.
Α=0.05.
Z(critical)=1.96 (two-tailed); +1.65 or -1.65
(one-tailed).
Five-Step Model – Large TwoSample Tests (Z Distribution)

Step 4. Computing the test statistic.
Z (obtained ) 
X

2
1X 2

X
1
X2
X

1X 2
2
s1
s
 2
N1  1 N 2  1
Step 5. Making a decision.

Compare z-critical to z-obtained. If z-obtained is
greater in magnitude than z-critical, reject null
hypothesis. Otherwise, accept null hypothesis.
Five-Step Model – Large TwoSample Tests (Z Distribution)

Do non-white citizens of Birmingham,
Alabama, believe that discrimination is
more of a problem than white citizens?
Five-Step Model – Large TwoSample Tests (Fair Housing)

Step 1. Making assumptions.




Model: Independent random samples.
Interval-ratio measurement.
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1  2
H1:    ; one - tailed
1
2
test
Five-Step Model – Large TwoSample Tests (Z Distribution)

Step 3. Selecting the sampling distribution
and establishing the critical region.



Sampling distribution = Z distribution.
Α=0.05.
Z(critical)=+1.65 (one-tailed).
Five-Step Model – Large TwoSample Tests (Z Distribution)

Step 4. Computing the test statistic.
Z (obtained ) 
X
2
1
X2

2
s1
s
 2
N1  1 N 2  1


2.70  2.14
1.0582 .966 2

141
42

.56
.56

 3.224
.008  .022 .173
Step 5. Making a decision.

Z(obtained) is greater than Z(critical), therefore reject
the null hypothesis of no difference. Non-whites
believe that discrimination is more of a problem in
Birmingham.
Five-Step Model – Small TwoSample Tests

If N1 + N2 < 100, use this formula.

X
t (obtained ) 
1
X2
X
X
1X 2
1X 2
N1s1  N 2 s2
N1  N 2  2
2


2
N1  N 2
N1 N 2
Five-Step Model – Small TwoSample Tests (t Distribution)

Step 1. Making assumptions.





Model: Independent random samples.
Interval-ratio measurement.
2
2



2
Equal population variances 1
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1  2
H1:    ; two - tailed
1
2
test
1   2 ; one - tailed test or
1   2 ; one - tailed test
Five-Step Model – Small TwoSample Tests (t Distribution)

Step 3. Selecting the sampling distribution
and establishing the critical region.




Sampling distribution = t distribution.
Α=0.05.
Df=N1+N2-2
t(critical). See Appendix A, Table B.
Five-Step Model – Small TwoSample Tests (t Distribution)

Step 4. Computing the test statistic.

X
t (obtained ) 
1
X2
X
X


1X 2
N1s1  N 2 s2
N1  N 2  2
2
1X 2

2
N1  N 2
N1 N 2
Step 5. Making a decision.

Compare t-critical to t-obtained. If t-obtained is
greater in magnitude than t-critical, reject null
hypothesis. Otherwise, accept null hypothesis.
Five-Step Model – Small TwoSample Tests (t Distribution)

Did white and nonwhite residents of the
Jefferson County Housing Authority have
significantly different lengths of residence
in 2000?
Five-Step Model – Small TwoSample Tests (JCHA 2000)

Step 1. Making assumptions.





Model: Independent random samples.
Interval-ratio measurement.
2
2



2
Equal population variances 1
Normal sampling distribution.
Step 2. Stating the null hypothesis (no
difference) and the research hypothesis.


Ho: 1  2
H1: 1  2 ; two - tailed test
Five-Step Model – Small TwoSample Tests (JCHA 2000)

Step 3. Selecting the sampling distribution
and establishing the critical region.




Sampling distribution = t distribution.
Α=0.05, two-tailed.
Df=N1+N2-2=14+25-2=37
t(critical) from Appendix B = 2.042
Five-Step Model – Small TwoSample Tests (t Distribution)

Step 4. Computing the test statistic.
Z (obtained ) 

X
1
X2
N1s1  N 2 s2
N1  N 2  2
2
2

N1  N 2
N1 N 2

70.21  82.84
14(56.337) 2  25(93.744) 2 25  14
25  14  2
25(14)

 12.63
7138.7147 .1114
 12.63
 12.63

 .448
84.4909(.3338) 28.2002

Step 5. Making a decision.

Z(obtained) is less than Z(critical) in magnitude.
Accept the null hypothesis. Whites and nonwhites in
the JCHA 2000 survey do not have different lengths
of residence in public housing.
PPA 501 – Analytical Methods
in Administration
Lecture 6c – Analysis of
Variance
Introduction



Analysis of variance (ANOVA) can be
considered an extension of the t-test.
The t-test assumes that the independent
variable has only two categories.
ANOVA assumes that the nominal or
ordinal independent variable has two or
more categories.
Introduction

The null hypothesis is that the populations
from which the each of samples
(categories) are drawn are equal on the
characteristic measured (usually a mean
or proportion).
Introduction


If the null hypothesis is correct, the means
for the dependent variable within each
category of the independent variable
should be roughly equal.
ANOVA proceeds by making comparisons
across the categories of the independent
variable.
Computation of ANOVA


The computation of ANOVA compares the
amount of variation within each category
(SSW) to the amount of variation between
categories (SSB).
2
Total
sum
of
squares.
SST   X  X 
i
SST   X  N X ; computatio nal
2
SST  SSB  SSW
2
Computation of ANOVA

Sum of squares within (variation within
categories).
SSW   X  X 
2
i
k
SSW  the sum of the squares within th e categories
X k  the mean of a category

Sum of squares between (variation
between categories).

SSB   N k X k  X

2
SSB  the sum of squares between th e categories
N k  the number of cases in a category
X k  the mean of a category
Computation of ANOVA

Degrees of freedom.
dfw  N  k
dfb  k  1
where
dfw  degrees of freedom associated with SSW
dfb  degrees of freedom associated with SSB
N  number of cases
k  number of categories
Computation of ANOVA

Mean square estimates.
SSW
Mean square within 
dfw
SSB
Mean square between 
dfb
Mean square between
F
Mean square within
Computation of ANOVA

Computational steps for shortcut.






Find SST using computation formula.
Find SSB.
Find SSW by subtraction.
Calculate degrees of freedom.
Construct the mean square estimates.
Compute the F-ratio.
Five-Step Hypothesis Test for
ANOVA.

Step 1. Making assumptions.





Independent random samples.
Interval ratio measurement.
Normally distributed populations.
Equal population variances.
Step 2. Stating the null hypothesis.
H 0  1   2     k
H1  at least one of the means is different
Five-Step Hypothesis Test for
ANOVA.

Step 3. Selecting the sampling distribution and
establishing the critical region.






Sampling distribution = F distribution.
Alpha = .05 (or .01 or . . .).
Degrees of freedom within = N – k.
Degrees of freedom between = k – 1.
F-critical=Use Appendix D, p. 499-500.
Step 4. Computing the test statistic.

Use the procedure outlined above.
Five-Step Hypothesis Test for
ANOVA.

Step 5. Making a decision.

If F(obtained) is greater than F(critical), reject
the null hypothesis of no difference. At least
one population mean is different from the
others.
ANOVA – Example 1 – JCHA
2000
What impact does marital status have on respondent’s rating
Of JCHA services? Sum of Rating Squared is 615
Report
JCHA Program Rating
Marital Status
Married
Separated
Widowed
Never Married
Divorced
Total
Mean
3.0313
4.5000
4.6667
4.0556
3.6731
3.8289
N
8
2
6
9
13
38
Std. Deviation
1.70837
.70711
.81650
.79822
1.20927
1.25082
ANOVA – Example 1 – JCHA
2000

Step 1. Making assumptions.





Independent random samples.
Interval ratio measurement.
Normally distributed populations.
Equal population variances.
Step 2. Stating the null hypothesis.
H 0  1   2  3   4  5
H1  at least one of the means is different
ANOVA – Example 1 – JCHA
2000

Step 3. Selecting the sampling distribution
and establishing the critical region.





Sampling distribution = F distribution.
Alpha = .05.
Degrees of freedom within = N – k = 38 – 5 =
33.
Degrees of freedom between = k – 1 = 5 – 1 =
4.
F-critical=2.69.
ANOVA – Example 1 – JCHA
2000

Step 4. Computing the test statistic.
ANOVA Table
JCHA Program Rating Between Groups
* Marital Status
Within Groups
Total
(Combined)
Sum of
Squares
10.980
46.908
57.888
df
4
33
37
Mean Square
2.745
1.421
F
1.931
Sig.
.128
ANOVA – Example 1 – JCHA
2000
SST   X  N X  615  38(3.8289)
2
2
2
SST  615  557.0981  57.9019

SSB   N k X k  X

2
 8(3.0313  3.8289) 2
 2(4.5  3.8289) 2  6(4.6667  3.8289) 2  9(4.0556  3.8289) 2
 13(3.6731  3.8289) 2  5.0893  0.9008  4.2115  0.4625  0.3156
SSB  10.9797
SSW  SST  SSB  57.9019 10.9797  46.9222
ANOVA – Example 1 – JCHA
2000
dfw  N  k  38  5  33
dfb  k  1  5  1  4
SSW 46.9222
Mean square within 

 1.4219
dfw
33
SSB 10.9797
Mean square between 

 2.7449
dfb
4
Mean square between 2.7449
F

 1.9304
Mean square within
1.4219
ANOVA – Example 1 – JCHA
2000.

Step 5. Making a decision.

F(obtained) is 1.93. F(critical) is 2.69.
F(obtained) < F(critical). Therefore, we fail to
reject the null hypothesis of no difference.
Approval of JCHA services does not vary
significantly by marital status.
ANOVA – Example 2 –
Presidential Disaster Set
What impact does Presidential administration have on
the president’s recommendation of disaster assistance?
ANOVA – Example 2 –
Presidential Disaster Data Set

Step 1. Making assumptions.





Independent random samples.
Interval ratio measurement.
Normally distributed populations.
Equal population variances.
Step 2. Stating the null hypothesis.
H 0  1   2  3   4  5  6  7  8  9  10
H1  one of the means is different
ANOVA – Example 2 –
Presidential Disaster Data Set

Step 3. Selecting the sampling distribution
and establishing the critical region.





Sampling distribution = F distribution.
Alpha = .05.
Degrees of freedom within = N – k = 2642 –
10 = 2632.
Degrees of freedom between = k – 1 = 10 – 1
= 9.
F-critical=1.883.
ANOVA – Example 2 –
Presidential Disaster Data Set

Step 4. Computing the test statistic.
ANOVA – Example 2 –
Presidential Disaster Data Set

Step 5. Making a decision.

F(obtained) is 12.863. F(critical) is 1.883.
F(obtained) > F(critical). Therefore, we can
reject the null hypothesis of no difference.
Approval of federal disaster assistance does
vary by presidential administration.
Related documents