Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 1 of 7 Example 1: Calculating Sample Sizes Assume that systolic blood pressure (SBP) for healthy adults is normally distributed with = 120 mm Hg and = 20 mm Hg. a. What sample size is needed so that 95% of sample means are between 116 mmHg and 124 mmHg? z-scores that correspond to the middle 95% of the area under a standard normal distribution: -1.96 and 1.96. We want these z-scores to correspond to sample means 116 and 124 on the sampling distribution. Now use the z-score formula for either 116 or 124 to solve for n (both will give the same answer): 116 120 1.96 20 n 4 1.96 * 20 n 20 n 1.96 * 4 n 1.96 * 5 9.8 n 9.8 2 96.04 round up to sample size of 97 Due to rounding, more than 95% of the area will be between 116 and 124 – at least 95% of the sample means will be between 116 and 124 if n = 97. z-scores using R and Rcmdr: Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal Quantiles Probabilities = 0.025, 0.975; select Lower Tail R script: qnorm(c(0.025,0.975)) PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 2 of 7 b. What sample size is needed so that 90% of sample means are between 116 mmHg and 124 mmHg? First find the z-scores that correspond to the middle 90% of the area under a standard normal distribution. If 90% of the area is in the middle, there is 5% of the area in each of the two tails of the distribution. 90% of the area on the standard normal distribution is between -1.645 and 1.645. Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal Quantiles Probabilities = 0.05, 0.95; select Lower Tail R script: qnorm(c(0.05,0.95)) Now use the z-score formula for either 116 or 124 to solve for n (both will give the same answer). 124 120 1.645 20 n 4 1.645 * 20 n 20 n 1.645 * 4 n 1.645 * 5 8.2 n 8.2 2 67.2 round up to sample size of 68 Due to rounding, more than 90% of the area will be between 116 and 124 – at least 90% of the sample means will be between 116 and 124 if n = 68 c. What sample size is needed so that SEM of the sampling distribution = 5? Use the formula for SEM to solve for n SEM 5 20 n 20 n 20 n 4 5 n 4 2 16 A sample size of 16 will result in a SEM = 5 when the population standard deviation = 20. PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means d. What sample size is needed so that SEM of the sampling distribution = 2? SEM 2 20 n 20 n 20 n 10 2 n 10 2 100 A sample size of 100 will result in a SEM = 2 when the population standard deviation = 20 e. As sample size increases the SEM of the sampling distribution decreases. 3 of 7 PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 4 of 7 Example 2: Identify the sampling distribution Identify for each of the following population examples whether the sampling distribution of the sample mean is distributed normal with mean _____ and SEM _____ t-distribution with __ df and SEM = ______ a. Based on long term data collection mean ( Distribution of study hours per week is positively skewed. A sample of 60 online students has sample mean = 18.5 hours and sample SD = 3.2 The sampling distribution of mean hours per week for samples of 60 students is: Normal = 0.387. b. Daily caloric intake for adolescent girls is normally distributed with mean () 2200 kcal and standard deviation () 40 kcal. A sample of 36 adolescent girls has sample mean = 2210 kcal and sample SD = 42 The sampling distribution of mean caloric intake for samples of 36 adolescent girls is: Normal 40/sqrt(36) = 6.67. c. Children between the ages of 2 and 5 in the U.S. watch an average () of 25 hours of television per week with unknown population standard deviation (). A sample of 49 children age 2 to 5 has sample mean = 24 hours and sample SD = 3.5. The sampling distribution of mean hours per week for samples of 49 children age 2-5 is: a t-distribution with 48 df and SEM = 3.5/sqrt(49) = 0.5 d. Mean () BMI of marathon runners is 22.4 but standard deviation is unknown. A sample of 100 marathon runners has sample mean BMI = 22.7 and sample standard deviation = 1.9. The sampling distribution of mean BMI for samples of 100 runners is: a t-distribution with 99 df and SEM = 1.9/sqrt(100) = 0.19 e. Mean () BMI of marathon runners is 22.4 but standard deviation is unknown. A sample of 49 marathon runners has sample mean BMI = 22.8 and sample standard deviation = 2.1The sampling distribution of mean BMI for samples of 49 runners is: a t-distribution with 48 df and SEM = 2.1/sqrt(49) = 0.3 PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 5 of 7 Example 3: Calculating Areas Under the t-distribution Researchers are interested in studying systolic blood pressure (SBP) of urban police officers. It has been determined that mean () SBP in this population = 120 mmHg but standard deviation () is unknown. For each of the following identify the sampling distribution a. A sample of 49 urban police officers has sample mean SBP = 117 with sample standard deviation = 18. i. Identify the sampling distribution The sampling distribution of sample means for samples of 49 officers is distributed t48 with SEM = 18/7 = 2.57 ii. Calculate P ( x 117) . Interpret. t_117 = (117-120)/(18/7) = -1.166667 df = 48 Area below t_117 = 0.1245543 Interpretation: The probability that a sample of 49 has a mean SBP less than or equal to 117 when the population mean is 120 = 0.124. Rcmdr: Distributions > Continuous Distributions > t distribution > t probabilities Variable Value = t_117; df=48; select Lower Tail R Script: pt(t_117, df=48) b. A different sample of 49 urban police officers has sample mean SBP = 126 with sample standard deviation = 20. i. Identify the sampling distribution. The sampling distribution of sample means for samples of 49 officers is distributed t48 with SEM = 20/7 = 2.86 ii. Calculate P ( x 126) . Interpret. t_126 = (126-120)/(20/7) = 2.1 df = 48 Area above t_126 = 0.02050463 Interpretation: The probability that a sample of 49 has a mean SBP greater than or equal to 126 when the population mean is 120 is 0.02 Rcmdr: Distributions > Continuous Distributions > t distribution > t probabilities Variable Value = t_126; df=48; select Upper Tail R Script: pt(t_126, df=48,lower.tail=FALSE) 1-pt(t_126,df=48) PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 6 of 7 c. A sample of 64 urban police officers has sample mean SBP = 116 and sample standard deviation = 16. i. Identify the sampling distribution. The sampling distribution of sample means for samples of 64 officers is distributed t63 with SEM = 16/8 = 2. ii. Calculate P ( x 116) ._________ Interpret. t_116 = (116-120)/(2) = -2 df = 63 Area below t_116 = 0.02490788 Interpretation: The probability that a sample of 64 has a mean SBP less than or equal to 116 when the population mean is 120 is 0.025. Rcmdr: Distributions > Continuous Distributions > t distribution > t probabilities Variable Value = -2; df=63; select Lower Tail R Script: pt(-2, df=63) d. A different sample of 64 urban police officers has sample mean SBP = 118 and sample standard deviation = 16. i. Identify the sampling distribution. The sampling distribution of sample means for samples of 64 officers is distributed t63 with SEM = 16/8 = 2. ii. Calculate P ( x 118) . Interpret. t_118 = (118-120)/2 = -1 df = 63 Area below 118 = 0.160568 Interpretation: The probability that a sample of 64 has a mean SBP less than or equal to 118 when the population mean is 120 is 0.16. Rcmdr: Distributions > Continuous Distributions > t distribution > t probabilities Variable Value = -1; df=63; select Lower Tail R Script: pt(-1, df=63) PubH 6414 Worksheet 7b: Sampling Distribution of Sample Means 7 of 7 Take some time to compare the results for 3c and 3d. Both of these examples have a sample of 64 with the same SEM and a sample mean that is less than the population mean. In 3c the sample mean SBP is 116 mmHg and in 3d the sample mean SBP is 118 mmHg. Which is further from the true population mean of 120 mmHg? Which has a larger t-coefficient (in absolute value)? Does it make sense that the probability of having a mean SBP less than or equal to 116 mmHg is smaller than the probability of having a mean SBP less than or equal to 118 mmHg if the true population mean = 120 mmHg (given that sample size and SEM are the same for the two samples)? Yes – All other conditions (sample size and SEM) being the same: Larger differences between the sample mean and the population mean result in larger (absolute value) t-coefficients Larger t-coefficients are further from 0, the center of the t-distribution Therefore, the tail area beyond larger t-coefficients is smaller o The tail area beyond (to the left of) the t-coefficient is equal to the probability of observing the corresponding sample mean or a smaller sample mean when the population mean is known. So, there is smaller probability of observing a sample mean (or smaller) when the sample mean is further from the population mean. The same arguments can be applied to situations where sample means are larger than the population mean: a larger difference between an observed sample mean and the population mean results in a larger t-coefficient and a smaller probability of observing that sample mean or a larger sample mean (given constant sample size and SEM). In 3c, the probability of observing a sample mean less than or equal to 116 mmHg in a sample of 64 when the true population mean is 120 mmHg is very small: 0.025. This small probability indicates that it is unlikely to observe a sample mean of 116 mmHg or smaller if the true population mean is 120 mmHg. What are some possible explanations for this unlikely sample mean? Perhaps this sample is not representative of the population. If the sample wasn’t randomly selected, it may be representing a subset of the population of urban police officers that has lower mean SBP o Maybe the sampling procedure was biased towards younger urban police officers with lower SBP, on average, resulting in this lower sample mean SBP. Perhaps the equipment used to measure SBP in this sample was calibrated incorrectly and the observed SBP is lower than the actual SBP for each officer in the sample resulting in a smaller sample mean SBP. Perhaps the ‘known’ mean SBP in the population of urban police officers is not actually = 120 mmHg as has been stated. o Lesson 8 Part 2 covers hypothesis testing which is a procedure used to test whether sample data provide significant evidence against some statement about the population from which the data were sampled.