Download Chapter 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 7
Sampling Distributions


Every probability distribution is characterized by
its parameter(s). For example a normal
distribution is determined by μ and σ. A binomial
distribution is determined by p. See pages 179 and 214.
In practical situations, we want to decide what
type of probability distribution to use as a model.
However the parameter(s) of that distribution is
(are) not known to us. In such situations, we
obtain information about the values of the
parameters by taking a sample from the
population involved.



Examples:
1. The responses to a “true/false” question posed
to a population is believed to fit a binomial
distribution. In this case the relative frequency of
those in the sample, chosen from the population,
who answered “true” gives information about the
true value of parameter p.
The heights of adult males in north America
follow a normal distribution. In this case the
mean and standard deviation of the sample
taken approximate the actual values of μ and σ.
SAMPLING
As it now clear sampling is important in
determining the value(s) of the parameters
involved.
If you want the sample to provide reliable
information about the population, you must select
your sample in a certain way!
SIMPLE RANDOM SAMPLING
The way a sample is selected is called the sampling
plan or experimental design and determines the
amount of information you can extract, and often
allows you to measure the reliability of your
inference.
 Simple random sampling is a method of sampling
that allows each possible sample of size n from a
population of size N an equal chance (probability) of
being selected.



If the size of the population N is small, the simple
random sampling can be performed by assigning
a number to each member of the population and
writing the numbers on pieces of paper, and mix
them and select a sample of n.
Example: You have a population of 4 objects and
want to choose a sample of 2 from this
population. How would you do a simple random
sampling?
EXAMPLE
•There are 89 students in a statistics class. The instructor
wants to choose 5 students to form a project group. How
should he proceed?
1. Give each student a number from
01 to 89.
2. Choose 5 pairs of random digits
from the random number table.
3. If a number between 90 and 00 is
chosen, choose another number.
4. The five students with those
numbers form the group.
See example 7.1
TYPES OF SAMPLES
Sampling can occur in two types of practical situations:
1. Observational studies: The data existed before you
decided to study it. A type of study in which individuals are
observed or certain outcomes are measured. No attempt
is made to affect the outcome (for example, no treatment
is given).
In this case computer databases make it possible to
assign an id # to each member of the population, even
very large one) and makes it possible to select a simple
random sample.
Most sample surveys, where information are gathered by
asking questions, fall into this category. In such situations
watch out for

Nonresponse: Are the responses biased because only
opinionated people responded?

Undercoverage: Are certain segments of the population
systematically excluded?

Wording bias: The question may be too complicated or
poorly worded.
2. Experimentation: The data are generated by imposing an
experimental condition or treatment on the experimental
units.
Hypothetical populations: Statistical population which
has no real existence but is imagined to be generated
by repetitions of events of a certain kind.
Examples: all possible values of tomorrow's
highest temperature; all possible pH values of some
unknown liquid; all possible heights of men.

Hypothetical populations can make random sampling
difficult if not impossible.

Samples must sometimes be chosen so that the
experimenter believes they are representative of the
whole population.
Experiments vs. Observational Studies
In an experiment investigators apply
treatments to experimental units (people,
animals, plots of land, etc.) and then
proceed to observe the effect of the
treatments on the experimental units.
In an observational study investigators
observe subjects and measure variables
of interest without assigning treatments to
the subjects. The treatment that each
subject receives is determined beyond the
control of the investigator.
For example, suppose we want to study the
effect of smoking on lung capacity in women.
Experiment
• Find 100 women age 20 who do not currently
smoke.
• Randomly assign 50 of the 100 women to the
smoking treatment and the other 50 to the no
smoking treatment.
• Those in the smoking group smoke a pack a day
for 10 years while those in the control group
remain smoke free for 10 years.
• Measure lung capacity for each of the 100
women.
Observational Study
• Find 100 women age 30 of which 50 have
been smoking a pack a day for 10 years
while the other 50 have been smoke free
for 10 years.
• Measure lung capacity for each of the 100
women.
OTHER SAMPLING PLANS
• There are several other sampling plans
that still involve randomization:
1. Stratified random sample: Divide the
population into subpopulations or strata and
select a simple random sample from each strata.
2. Cluster sample: Divide the population into
subgroups called clusters; select a simple
random sample of clusters and take a census of
every element in the cluster.
3. 1-in-k systematic sample: Randomly select
one of the first k elements in an ordered
population, and then select every k-th element
thereafter.
EXAMPLES
 Divide
California into counties and
take a simple random sample within each
Stratified
county.
 Divide California into counties and take a
simple random sample of 10 counties. Cluster
 Divide a city into city blocks, choose a
simple random sample of 10 city blocks,
Cluster
and interview all who live there.
 Choose an entry at random from the phone
book, and select every 50th number
thereafter.
1-in-50 Systematic
NON-RANDOM SAMPLING PLANS
• There are several other sampling plans that
do not involve randomization. They should
NOT be used for statistical inference!
1. Convenience sample: A sample that can be taken
easily without random selection.
•
People walking by on the street
2. Judgment sample: The sampler decides who will and
won’t be included in the sample.
3. Quota sample: The makeup of the sample must reflect
the makeup of the population on some selected
characteristic.
•
Race, ethnic origin, gender, etc.