Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran Sampling  A. Basics  1. Ways to Describe Data    Histograms Frequency Tables, etc. 2. Ways to Characterize Data  Central Tendency     Mode Median Mean Dispersion   Variance Standard Deviation Sampling(cont.)  3. Probability of Events  If Discrete   Rely on Relative Frequency If Continuous  Rely on the distribution of events   4. Samples   Example: Standard Normal Distribution We can take a sample of the population and make inferences about the population. 5. Central Question  How well does the sample represent the underlying population? Sampling  (cont.) B. Random Sampling  1. Problems with Sample Bias  The way we collect our data may bias our results. That is, the average response in our sample may not represent the average response in the whole population.   Examples:  Literary Digest Phone Book Poll  Primaries  Relation between economic growth and education looking only at OECD countries 2. Solution  Random Sampling Sampling  (cont.) C. Moments of the Sample  1. Characteristics of Sample Mean  2 = variance  = mean Sampling  (cont.) Example  X Draw a single observation  Sampling  X (cont.) Draw two observations mean= X  X Sampling  X X (cont.) Draw 4 Observations mean= X  X X Sampling  (cont.) 2. Generalization     Every sample has an expected mean of . But as our sample size increases, we are more confident of our results. That is, the standard deviation (or standard error as we will call it) of our results is decreasing. So as N increases, X   Sampling  (cont.) 3. Hat Experiment   Mean = 10.5 Standard deviation  = 5.77    Now let's take a sample of size 1. (With replacement.) Now one of size 2. Now one of size 6.  10.5= Sampling  (cont.) 4. Equations  For a sample of size n from a population of mean  and standard deviation , the sample mean X has: E( X )   SE( X )    n . SE( X ): it's called the standard error of the sampling process. Inference We make inferences about a population from a given sample.  A. Population and Sampling Parameters  We have a population with parameters  and .   We then take a sample with parameters X and s. We want to know how well the sample mean X approximates the population mean . Inference (cont.) draw sample Population Sample  X SE(X) x, s make inference about how good an estimate X is of   On average the sample mean equals the population mean. SE( X ) =  n Inference (cont.)  B. Referring Back to the Hat Experiment   1. Sample Error decreases as n increases For instance, before we drew samples of sizes 1, 2, and 6 from the hat.    The first sample of size 1 had standard error 5.77/1 = 5.77. The second sample of size 2 had standard error 5.77/ 2 = 4.08. The third sample of size 6 had standard error 5.77/6 = 2.36. Inference (cont.)  C. Shape of the Sampling Distribution  If you take a sample and find its mean, then take another sample and find its mean and repeat this process a large number of times then  X is a random variable with its own mean and standard error. Inference (cont.)  1. Central Limit Theorem  Take a large number of samples, then, the sample mean X is normally distributed with mean  and standard error. n Standard Error  Inference (cont.)  2. Example: 3 different distributions  Example 1;  A population of men on a small, Eastern campus has a mean height =69" and a standard deviation =3.22". If a random sample of n=10 men is drawn, what is the chance that the sample mean will be within 2" of the population mean? Inference (cont.)  Answer:  From the Central Limit Theorem, we know that X is normally distributed, with mean 69 and standard error:  n = 3.22 = 1.02. 10 Standard Error= 1.02 X = 67  X = 71 Inference (cont.)  Answer (cont.)   Find z-score P(Z>1.96) = 0.025. Since there are two tails, the area in the middle is: 1-.025-.025 = .95. So there's a 95% probability that the sample mean falls between 67 and 71. Inference (cont.)  Example 2:   Suppose a large class in statistics has marks normally distributed around  = 72 with  = 9. Find the probability that a) An individual student drawn at random will have a mark over 80. Inference (cont.)  Answer:   The Z-score is (80-72)/9 = .89 Looking this up in the table gives P(Z>.89) = .187, or about 19%.    80 b) Now, what's the probability that a sample of size 10 has an average of over 80? Inference (cont.)  Answer:     The standard error is n = 9/  10 = 2.85. So the Z-Score becomes (80-72)/2.85 = 2.81. P(Z> 2.81) = .002. .002 SE = 2.85  80 Inference (cont.)  Example 3: I  f the number of miles per gallon achieved by all cars of a particular model has  = 25 and  = 2, what is the probability that for a random sample of 20 such cars, average miles per gallon will be less than 24? (assume that the population is normally distributed.)  Step 1: Standardize X X   24  25  P( X <24) = P SE SE SE =  n LM N O PQ = 2/20 = .4472 P( X <24) = P LMX    24  25O = 2.24 P SE . 4472 N Q Inference (cont.)  Step 2: Then Find the Z scores (From the standard Normal tables) = P[Z<-2.24] = P[Z>2.24] = 0.013 (by symmetry) .013 SE = 0.4472 24   26 So there is about a 1.3 percent chance that from a sample of 20 the average will be less than 24. Inference (cont.)  D. Proportions  1. Proportions as Means   A proportion (P) is just the mean of a dichotomous variable. Example  Ask 50 people what they think of Clinton;    0 if think he's doing a poor job; and 1 if think he is doing a good job. Suppose 30 of the 50 respondents say he's doing a good job   Then, the sample mean P is 30/50 = .60. This is just another way of saying that 60% of those surveyed approved of his job performance. Inference (cont.)  2. Formula for Standard Error  For a large enough sample of size n, P (the proportion) will be normally distributed with mean  and standard deviation .     Population Mean  = Population Proportion  Sample Mean = Sample Proportion P Population SD  =  (1  ) SE   (1   ) n . Inference (cont.)  3. Example: Polling   Suppose that the true approval rating for Clinton is .50. That is, 50 percent of the population believe he is doing a good job.  = .5 If we sample 50 people, what is the probability that we will observe an approval rating as high as 60 percent or above? Inference (cont.)  We know that the true population mean is =.5,    .5(1-.5) The Standard Error = = 0.0707 50 Then the Z-score is (.6-.5) / 0.0707= 1.414 Looking this up in the Z-table, P(Z>1.414) = .079, or about 8 %. Inference (cont.)  4. Example  Of your first 15 grandchildren, what is the chance that there will be more than 10 boys? Inference (cont.)  Answer:   What the probability is that the proportion of boys is at least 10/15=2/3. We know that the population mean is =1/2,  The standard error = .5(1-.5)  0129 . 15   Then the Z-score is (.667-.5) / 0.129 = 1.29. Looking this up in the table, P(Z>1.29) = .099, or about 10%. Point Estimation: Properties  A. Unbiased Estimators  When an estimator has the property that it converges to the correct value, we say that it is unbiased. Def of Unbiased: towards . as N , then X converges Point Est. Properties  (cont.) B. Efficient Estimators  Def of Efficient: One estimator is more efficient than another if its standard error is lower. Point Est. Properties  (cont.) C. N-1 Problem 2  1.  Known ( X   )  i 2    N When we take a sample of size n, if we had the real from the population, we could calculate s2   2 ( X   )  i n Then there wouldn't be a problem; s2would be a 2 consistent estimator of  , if we knew  . Point Est. Properties    (cont.) 2. Unknown But we usually don't have , so we have to use the sample mean X instead. What's the difference? Why don't we just say that 2 ( X  X )  i s2  n It turns out that we can show that X minimizes the expression  ( X i  _ _ ).2 Point Est. Properties    2. Unknown (cont.) So if we used  instead, then, the expression would be bigger. The right way to correct for this is to multiply by n , so n 1 s2 (X    s2   (cont.) i  X )2 n  n n 1 2 ( X  X )  i n 1 . The bottom line is that we use n-1 to make a consistent, unbiased estimate of the population variance. IV. Review Homework  IV. Review Homework