Download Graphical Methods for Complex Surveys

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Definitions
• Observation unit
• Target population
• Sample
• Sampled population
• Sampling unit
• Sampling frame
Target Population and Sampling Frame
Types of Surveys
Cross-sectional
• surveys a specific population at a given point in
time
• will have one or more of the design components
• stratification
• clustering with multistage sampling
• unequal probabilities of selection
Longitudinal
• surveys a specific population repeatedly over a
period of time
• panel
• rotating samples
Cross Sectional Surveys
Sampling Design Terminology
Methods of Sample Selection
Basic methods
• simple random sampling
• systematic sampling
• unequal probability sampling
• stratified random sampling
• cluster sampling
• two-stage sampling
Simple Random Sampling
0
10
20
30
40
50
60
70
80
90
Why?
• basic building block of sampling
• sample from a homogeneous group of units
How?
• physically make draws at random of the units
under study
• computer selection methods: R, Stata
100
Systematic Sampling
0
10
20
30
40
50
60
70
80
90
100
Why?
• easy
• can be very efficient depending on the structure of
the population
How?
• get a random start in the population
• sample every kth unit for some chosen number k
Additional Note
Simplifying assumption:
• in terms of estimation a systematic sample
is often treated as a simple random sample
Key assumption:
• the order of the units is unrelated to the
measurements taken on them
Unequal Probability Sampling
Why?
• may want to give greater or lesser weight to
certain population units
• two-stage sampling with probability proportional
to size at the first stage and equal sample sizes at
the second stage provides a self-weighting design
(all units have the same chance of inclusion in the
sample)
How?
• with replacement
• without replacement
With or Without Replacement?
• in practice sampling is usually done without
replacement
• the formula for the variance based on without
replacement sampling is difficult to use
• the formula for with replacement sampling at the
first stage is often used as an approximation
Assumption: the population size is large and the
sample size is small – sampling fraction is less
than 10%
Stratified Random Sampling
0
10
20
30
40
50
60
70
80
90
100
Why?
• for administrative convenience
• to improve efficiency
• estimates may be required for each stratum
How?
• independent simple random samples are chosen
within each stratum
Example: Survey of Youth in Custody
• first U.S. survey of youths confined to long-term,
state-operated institutions
• complemented existing Children in Custody
censuses.
• companion survey to the Surveys of State Prisons
• the data contain information on criminal histories,
family situations, drug and alcohol use, and peer
group activities
• survey carried out in 1989 using stratified
systematic sampling
SYC Design
strata
• type (a) groups of smaller institutions
• type (b) individual larger institutions
sampling units
• strata type (a)
• first stage – institution by probability proportional to size of
the institution
• second stage – individual youths in custody
• strata type (b)
• individual youths in custody
• individuals chosen by systematic random sampling
Cluster Sampling
0
10
20
30
40
50
60
70
80
90
Why?
• convenience and cost
• the frame or list of population units may be
defined only for the clusters and not the units
How?
• take a simple random sample of clusters and
measure all units in the cluster
100
Two-Stage Sampling
0
10
20
30
40
50
60
70
80
90
100
Why?
• cost and convenience
• lack of a complete frame
How?
• take either a simple random sample or an unequal
probability sample of primary units and then within a
primary take a simple random sample of secondary units
Synthesis to a Complex Design
Stratified two-stage cluster sampling
Strata
• geographical areas
First stage units
• smaller areas within the larger areas
Second stage units
• households
Clusters
• all individuals in the household
Why a Complex Design?
• better cover of the entire region of interest
(stratification)
• efficient for interviewing: less travel, less
costly
Problem: estimation and analysis are more
complex
Ontario Health Survey
• carried out in 1990
• health status of the population was
measured
• data were collected relating to the risk
factors associated with major causes of
morbidity and mortality in Ontario
• survey of 61,239 persons was carried out in
a stratified two-stage cluster sample by
Statistics Canada
OHS
Sample Selection
• strata: public health units
– divided into rural and
urban strata
• first stage: enumeration
areas defined by the 1986
Census of Canada and
selected by pps
• second stage: dwellings
selected by SRS
• cluster: all persons in the
dwelling
Longitudinal Surveys
Sampling Design
Schematic Representation
Panel Survey
4
Time
3
2
1
0
Respondents
Schematic Representation
Rotation Survey
4
Time
3
2
1
0
Respondents
Survey Weights
Survey Weights: Definitions
initial weight
• equal to the inverse of the inclusion probability
of the unit
final weight
• initial weight adjusted for nonresponse,
poststratification and/or benchmarking
• interpreted as the number of units in the
population that the sample unit represents
Interpretation
Interpretation
• the survey
weight for a
particular
sample unit is
the number of
units in the
population
that the unit
represents
Not sampled, Wt = 2, Wt = 5, Wt = 6, Wt = 7
Effect of the Weights
• Example: age
distribution, Survey of
Youth in Custody
Sum of
Age Counts Weights
11
1
28
12
9
149
13
53
764
14
167
2143
15
372
3933
16
622
5983
17
634
5189
18
334
2778
19
196
1763
20
122
1164
21
57
567
22
27
273
23
14
150
24
13
128
Totals 2621
25012
Unweighted Histogram
Age Distribution of Youth in Custody
0.3
Proportion
0.25
0.2
0.15
0.1
0.05
0
11 12 13 14 15 16 17 18 19 20 21 22 23 24
Age
Weighted Histogram
Age Distribution of Youth in Custody
0.3
Proportion
0.25
0.2
0.15
0.1
0.05
0
11
12
13
14
15
16
17
18
Age
19
20
21
22
23
24
Weighted versus Unweighted
Proportion
Weighted and Unweighted
Histograms
0.3
0.25
0.2
0.15
0.1
0.05
0
11
12
13 14
15
16 17
18 19
20
Age
Weighted
Unweighted
21 22
23
24
Observations
• the histograms are similar but significantly
different
• the design probably utilized approximate
proportional allocation
• the distribution of ages in the unweighted
case tends to be shifted to the right when
compared to the weighted case
• older ages are over-represented in the dataset