Download NOTES on SAMPLING DISTRIBUTION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
CLASS NOTES on SAMPLING DISTRIBUTION and Central Limit Theorem (CLT)
Why “Sample” the Population? Why not study the whole population?
• The physical impossibility of checking all items in the population.
• The cost of studying all the items in a population.
• The sample results are usually adequate.
• Contacting the whole population would often be time-consuming.
• The destructive nature of certain tests (e.g., study of light bulb life).
Statisticians advocate Probability Sampling (not judgment sampling)
• A probability sample is a sample selected in such a way that each item or person in the
population being studied has a known likelihood of being included in the sample.
If we use judgment sampling we will have no idea about the accuracy of our estimates
since we have no idea about the quality of judgments. Probability sampling enables us to
construct probabilistic error bounds. (to be studied in a second course in Statistics).
The aim of sampling is to get a sample, which is representative of the population.
Methods of Probability Sampling
• Simple Random Sample (SRS): A sample formulated so that each item or person and
each subset in the population has the same chance of being included. (e.g., from N
items, prob. that any one is selected=1/N.) A simple way to implement this is to use a
lottery or computer program. For example we can mark N cards and write names of
items on these cards, shuffle the cards and select n cards. This will yield a simple
random sample of size n.
• Systematic Random Sampling (SysRS): The items or individuals of the population are
arranged in some order. A random starting point is selected (by lottery) and then
every kth member of the population is selected.
If there are N=1000 stores along Fifth avenue and we want to select n=100 stores in
the sample, k=N/n or 10
We shuffle only the first k, and select one, say #4
Now on we systematically select stores by adding k, 2k, 3k, 4k etc to 4
So a systematic sample will have store #4, 14, 24, 34, 44, 54 etc.
•
Stratified Random Sampling (StrRS)): A population is first divided into subgroups,
called strata, and a sample is selected from each stratum. (e.g., 70% males, 30%
females)
If a sample of 10 is selected, (n=10) 70% of n =7, so select 7 males and 3 females. In
general, N=population size, N1=stratum 1(female), N2= stratum 2 (males), n=sample
size desired. Sample should have = (N1/N)*n from stratum 1 and so on. Thus
Females in sample = (N1/N)*n, Males in the sample=(N2/N)*n
Population has 25 students of whom 15 are white and 10 black. A stratified sample of
size 10 should have how many whites / blacks?
Answer: Let N=population size, N1=blacks=10, N2=whites =15, n=sample size=10.
Note that N1 /N =(10/25)*10 or 4 blacks and
How many whites in the sample? (N2/N)*n= (15/25)*10 or 6
Verify that 6+4=10. We have a representative sample
•
Cluster Sampling: A population is first divided into clusters and a sample of the
clusters is selected. (used in marketing). It works if clusters are as heterogeneous as
the population. For a large country like the US it is convenient to use cluster
sampleing and choose some geographical locations (Oshkosh Wisconsin).
A sampling error is the difference between a sample statistic and its corresponding
parameter. We can make probabilistic statements about this sampling error only if we
have a probability sample (not judgment sample).
In general, sampling distribution is for any sample statistic (mean, median, mode,
standard deviation, etc) defined over a sample space consisting of all possible samples of
size n from the available population of size N.
Let us first study the sampling distribution of sample mean as an example.
Sampling Distribution of the Sample Mean
• The sampling distribution of the sample means is a probability distribution consisting
of all possible sample means based on specified sample sizes selected from the
population. The sampling distribution yields the probability of occurrence associated
with each sample mean over the set of all possible sample mean numbers.
EXAMPLE 1
• The law firm of Hoya and Associates has five partners (A,B,C,D,E). At their weekly
partners meeting each reported the number of hours they charged clients for their
services last week.
A 22, B 26, C 30, D 26, E 22. (eg, Mr. E charged 22 hrs)
• If n=2, two partners are selected randomly, how many different samples are possible?
This is the combination of 5 objects taken 2 at a time. That is, 5C2= 5!/(2!3!)=10.
There are 10 possible samples.
Ten sample means are given below: (e.g. if the sample has A and B, sample mean is 24)
A=22, B=26 means average Av(AB)=(22+26)/2 or 24. Similarly,
Av(AB)=24, Av(AC)=26, Av(AD)=24, Av(AE)=22, Av(BC)=Av(28, Av(BD)=26,
Av(BE)=24, Av(CD)=28, Av(CE)=26, Av(EF)= 24
Exercise: draw a picture with freq on vertical axis for sampling distribution of means.
Note above that mean of A and C is 26, B and D is 26 and mean of C and E is also 26,
which means the x =26 repeats itself three times (has frequency 3). We find following
list of frequencies:
x =22 with freq= 1,
x =24 with freq= 4,
x =26 with freq= 3,
x =28 with freq= 2. This is almost the sampling distribution of means
•
•
•
•
Total frequency =10.
If we divide individual frequencies by total frequency we get “relative frequency” or
probability. These probabilities add up to one, so we have a prob. distribution. The
above information says that the probability that sample mean is 22 is 2 out of 10 or
0.2.
The sampling distribution is simply this probability distribution defined over all
possible samples of size n from the population of size N. In the real world problems N
will be large (e.g. 200 million US population) and n will be also be large (e.g., 1000
people surveyed) and (N C n) will be astronomical number. Then the sampling
distribution can only be imagined. We have chosen a simple example of N=5, n=2 so
that the entire sampling distribution can be explicitly computed and visualized.
This is a sampling distribution of all possible sample means. Now the random variable
is
x
, it is no longer just X.
What are the properties of the sampling distribution of sample means
x
? Properties
include the mean and variance of x
• Compute the mean of the sample means and compare it with the population mean: For
our simple example we can explicitly calculate the mean of means or Expected value
of means or E( x )= 
· The mean of the sample means is obtained by weighting each sample mean by its
frequency= [(22)(1) + (24)(4) + (26)(3) + (28)(2)]/10=25.2 [Read page 214 of your
text]
· Since we know the value of every observation in the population in this
(impractical) simple example, we have the directly calculated population mean  =
(22+26+30+26+22)/5 = (25.2). Note that in the real world we usually cannot find
·
, we can only make inferences about it from sample mean x
Observe that the grand mean of all 10 sample means (25.2) is equal to the
population mean (25.2).
·
Since E(
x )= , we say that Sample mean x
is an UNBIASED estimator of
population mean  We verified this property above for the simple example of
Lawyer hours. In general, such verification is difficult and one needs to use
advanced theory.
Now we turn to the variance of x . It is possible to verify intuitively that larger the
sample size, smaller the variance. For example if X is height (known to be a Normal
random variable) we want to estimate the average height of all Fordham students  from
a small sample of only 10 students. When we consider all possible samples we cannot
rule out the sample of very tall folks (e.g., all 10 from the Fordham basketball team who
are, say, 7 ft tall). Now the average height over seven feet is large and upper limit of the
range of averages will be seven feet. Similarly the average for the shortest 10 students
will be smaller than five feet (say). Thus the range of variability from the smallest to the
largest average heights based on n=10 will be spread over a wide range. Recall that wide
range means large variance.
By contrast, if we choose n=100, the average height for the tallest 100 will not be seven
feet, but smaller. Similarly the average height of shortest 100 will be higher than for
shortest 10 and the range for n=100 will not be as large a range for n=10. Thus the range
spread of the sampling distribution decreases as n increases. In fact the variance can be
proved to be inversely proportional to n as we see below.
Standard Error (SE) of the Sample Means (Sq. root of sampling variance or
standard deviation. It is customary to distinguish between usual standard deviation
(SD) and that of a sampling distribution (SE)
• The standard error of the sample means is the standard deviation of the sampling
distribution of the sample means.
• n is the size of the sample.
•  is the standard deviation of the population (assumed known).
• It is computed by: xbar = ( /n ) as a first approximation if N is not known or N is
large (almost infinity).
• xbar is the symbol for the standard error of the sample means.
• If  is not known and n  30, the standard deviation of the sample, denoted by s is
used to approximate the population standard deviation. Then the formula for the
standard error becomes:
SE(
x ) = s sub x
Always, think of SE(
=s / n
x ) as the standard deviation of the Random Variable x
.
What is the shape of the probability distribution of ( x ) ? The following theorem says
that it is Normal and hence the theorem enables us to solve all kinds of practical
problems.
Central Limit Theorem (CLT) See page 391 of Hawkes textbook.
[Central means it is of central importance to Statistics. Limit theorem because it studies
the behavior as n becomes large, namely as n tends to infinity, in practice for n30.]
This is a powerful result by a mathematician named Polya in 1920's showing that EVEN
IF x is NOT NORMAL, if n30 the process of averaging (is so helpful) that it yields
normality of the sampling distribution of ( x ) with the variance given below.
• For a population with a mean  and variance 2, the sampling distribution of all
possible means of all possible samples of size n generated from that population will be
approximately normally distributed –
•
x
 N { , (2 /n) [(Nn)/(N1)] } assuming sufficiently large n. (n 30). If N is
large the finite population correction term [(Nn)/(N1)] is close to 1 and can be
ignored. Then, this formula simplifies to
x
 N { , (2 /n) }
Even if we start with a bimodal, exponential decay or uniform distributions, which are
decidedly not normal to begin with the process of averaging gives us a normal
distribution for the sample mean provided the sample size is at least 30. We may know
that human intelligence or human height are normally distributed, but we have no reason
to think that Lawyer’s hours are normally distributed. The central limit theorem says that
as long as you are averaging over 30 lawyers, normality can be assumed. This is very
useful since we do not have to verify the underlying shape of the distribution.
A good practice example which highlights the difference between ordinary distribution of
X and sampling distribution of Xbar with separate word problems follows:
IQ=X ~ N(110, 102), Find P(IQ<80)
Intelligence Quotient (IQ) is normally distributed with mean 110 and standard deviation
of 10. A moron is a person with IQ less than 80. Find the probability that a randomly
chosen person is a moron. (Hint this random variable is for a single person X)
Let idiot be defined as one with an IQ less than 90. Find the probability that a randomly
chosen person is an idiot. (Hint this random variable is for a single person X)
If a sample of 25 students is available, what is the probability that the average IQ exceeds
105? (Hint this random variable is for an average over 25 persons or Xbar)
What is the probability that the average IQ exceeds 115 (Hint this random variable is for
an average over 25 persons or Xbar)
Answers are given after many blank lines
X=IQ ~ N(110, (10)2 )
mu=  =110
standard deviation=sd=  = 10
4 times sd= 4 =40
Plausible range of X has the lower limit=  -4 =110 – 40 or 70
upper limit is  +4 =110 + 40 = 150
This corresponds with the plausible range of standard normal z (-4 to 4)
EXERCISE 1: Given that X=IQ ~ N(110, (10)2 ). If a dumb moron’s IQ is 80 or less,
find the probability that a randomly chosen person is a dumb moron.
ANSWER 1: This is just normal distribution word problem.
In symbols, we want to find: P( x<80).
Recall that probability is some area under the Normal bell shaped curve.
We want to evaluate a shaded area between - to 80
This shaded area has the lower limit of - and upper limit of 80
The mapping of - to the z scale is obviously -4 for all practical purposes
Hence we need not bother with the lower limit of desired shaded area.
We still need to map the upper limit 80 to the z scale by using the z transform
any z = (x-) /  = (80-110)/10 =
For our upper limit x=80=IQ or moron,  =110 and  =10
z= (80-110)/8 =-3
when z=3 area between 0 and 3 is 0.4987 from the table A of your text
Tail area is 0.5-0.4987 hence the answer is 0.0013
In R software we compute pnorm(-3) to get 0.0013 for the left tail
EXERCISE 2: X=IQ ~ N(110, (10)2 ) is given.
If a dumb idiot’s IQ is 90 or less, find the probability that a randomly chosen person is a
dumb idiot. In symbols, find: P( x<90).
ANSWER 2: For our upper limit x=90=IQ or idiot,  =110 and  =10
Mapping 90 to the z scale is (90-110)/10 = -2
Tail area to the left of z=-2 is 0.5-0.4772 =0.0228
In R software we compute pnorm(-2) to get 0.0228 for the left tail
EXERCSE 3: X=IQ ~ N(110, (10)2 ) is given.
Find probability that the average IQ of 25 students exceeds 105
ANSWER 3: Since the sample size n=25 is given, this is not a run-of-the-mill normal
distribution word problem. The random variable under consideration here is the average.
Hence, a sampling distribution is relevant when we consider average IQ as the variable of
interest, not he IQ of an individual student, but the average over 25 students.
standard deviation of the sampling distribution = Standard Error = SE =  /n
n=25
 n =  25 =5
SE =  /n = 10/5 = 2
4SE = 8
Plausible range is 110-8 to 110+8 or 102 to 118 for xbar =average IQ
Area to the right of 105 is to be found
Must map 105 to the z scale
Mapping now is z=(xbar -  )/SE = (105-110)/2 = -2.5
Area between 0 to 2.5 is 0.4938
Total area 0.5+ 0.4938 = 0.9938 = probability that the average IQ exceeds 105
In R software we compute pnorm(-2.5,lower.tail=FALSE) to get 0.9938
Now find probability that the average IQ exceeds 115
This is the tail area to the right of z = (115-110)/2 = 0.5-0.4938 = 0.0062
pnorm(0.01,lower.tail=FALSE)
EXAMPLE 4
Library usually has 13% of its books checked out
Find the probability that in a sample of 588 books greater than 14% are checked out.
ANSWER 4:
We have percentages here, so it is not the simple normal distribution word problem. It
uses the fact that p^ ~ N(p, [pq/n] ) which says that the Sampling distribution of the
proportion p^ is Normal with mean p and variance p(1-p)/n
E(p^)=0.13, n=588
Var(p^)=2 (p^) = (0.13)(1-0.13)/n or 0.00019235
We need the square root of this variance for use in our z transform.
SE(p^)= sqrt(0.00019235) = 0.01386903 = 0.0139 (here we round to 4 places)
Plausible range is 0.13  4* 0.0139
4*SE is 0.0556
[0.0744 to 0.1856] is the plausible range.
Find the probability that in a sample of 588 books greater than 14% are checked out.
Hence the desired point is to the right of the center at 0.13
In symbols, we want to compute
P(p^ >0.14). Now let us apply z transform to both sides of the inequality.
P(p^ >0.14)= P(z > (0.14 - 0.13)/SE )
or we have to compute: P(z> 0.7194) = P(z> 0.72). We must round to 2 places to the
right of the decimal since z tables are that way.
We want tail area, but we can look up only the area from 0 to 0.72 for z
0.5 MINUS 0.2642 or ANS= 0.2358
> pnorm(.72,lower.tail=FALSE)
[1] 0.2357625
Copyright: Hrishikesh D. Vinod
Last updated 4/29/17 6:38 PM