Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Drawing Inferences from Data Berghold, IMI, MUG Estimation, standard error and confidence interval Berghold, IMI, MUG Estimation • The sample mean and the sample variance were used to describe a typical value and the variation in the sample. • We may similarly use the population mean, the expected mean, and the population variance to describe the typical value and the variation in the population. • These values are often referred to as the theoretical values and the sample mean and the sample variance are considered as estimates of the analogous population quantities (with certain properties: e.g. unbiased) – using sample information to draw conclusions about the value of a population parameter. Berghold, IMI, MUG Sampling Sampling: 10 random samples out of the same (normal distributed) population; each with sample size n = 100: 1. Mean SD 3.20 .565 2. Mean SD 3.20 .590 3. Mean SD 3.31 .486 4. Mean SD 3.27 .574 5. Mean SD 3.18 .542 6. Mean SD 3.31 .575 7. Mean SD 3.13 .606 8. Mean SD 3.11 .524 9. Mean SD 3.26 .648 10. Mean SD 3.32 .582 Berghold, IMI, MUG Sampling distribution The distribution of all possible sample means is called the sampling distribution of the mean. In general, the sampling distribution of any statistic is the distribution of the values of the statistic which would arise from all possible samples. Berghold, IMI, MUG Distribution of x-values Distribution of x-values μ Berghold, IMI, MUG Sampling distribution Given a population with mean μ and standard deviation σ, the sampling distribution of the mean based on repeated random samples of size n has the following properties: • The expected value of the mean of the distribution of the sample means is equal the population mean μ based on the individual measurements. • The expected value of the standard deviation of the means of several samples is σ / n • If the distribution in the population is Normal then the sampling distribution of the mean is also Normal. Berghold, IMI, MUG Sampling distribution Central limit theorem: the distribution of the sample means is approximately normally distributed, regardless of the shape of the original population distribution as long as the samples are large enough. Berghold, IMI, MUG Standard error of sample mean (SE) Variability of sample means will have the following properties: • It will be less among the means of large samples than small samples. • It will be less than the variability of the individual observations in the population. • It will increase with greater variability among the individual values in the population. • It is estimated by s/ n 0.542 s = = 0.0542 (sample 5) 10 n Berghold, IMI, MUG Standard error of sample mean (SE) It is a measure of the uncertainty of a single sample mean as an estimate of the population mean. Berghold, IMI, MUG Confidence Interval To get an idea of the uncertainty associated with a single sample estimate a (1-α)-Confidence interval is constructed from the data of the sample. 95% confidence interval (σ is known): σ σ ⎞ ⎛ ≤ μ ≤ x + 1.96 P⎜ x − 1.96 ⎟ = 1 − α = 95% n n⎠ ⎝ Berghold, IMI, MUG Confidence Interval All intervals that meet the general requirement P(lower limit ≤ „true parameter“ ≤ upper limit) = 1 - α are called confidence interval with the certainty 1 - α . Berghold, IMI, MUG Confidence Interval Frequency interpretation: if the experiment is repeated a large number of times and a 95% confidence interval is computed for each replication, then 95% of these confidence intervals will contain the true value of the unknown parameter. 95% confidence interval (σ is unknown, small samples): s ⎞ s ⎛ ≤ μ ≤ x + tn −1;1−α / 2 P⎜ x − tn −1;1−α / 2 ⎟ = 1 − α = 95% n n⎠ ⎝ Berghold, IMI, MUG t-distribution („Student‘s t“) The critical ratio, using s as an estimate of σ, defined as x−μ s/ n is not normally distributed, but follows a t-distribution with f=n-1 degrees of freedom (William Gosset, 1908) The distribution is similar to the standard normal distribution that it is symmetric with a mean 0, but its standard deviation depends on a parameter called degrees of freedom. Berghold, IMI, MUG t-distribution Berghold, IMI, MUG Example The parameter PTT (partial thromboplastine time) is assessed in a sample of 25 children. The mean is 42 sec, the standard deviation is 4 sec. Calculate a 95% confidence interval for the expected mean. Berghold, IMI, MUG Example Assumptions: we assume that the PTT-values are normally distributed and that σ is estimated by the standard deviation s of the sample. x − tn −1;(1−α / 2 ) s s ≤ μ ≤ x + tn −1;(1−α / 2 ) n n 4 4 ⎤ ⎡ 42 − 2 . 06 ⋅ ; 42 + 2 . 06 ⋅ = [40.4 ; 43.7] ⎢ ⎥ 25 ⎦ 25 ⎣ Berghold, IMI, MUG Points to Remember • The level of significance α: for α = 1% the interval is wider than for α = 5%. • The sample size n: the estimation is more precise the larger the sample size n is. To halve the width of the confidence interval it requires the fourfold sample size. Berghold, IMI, MUG Points to Remember • 95% - reference range: x ± 2s (in this interval are 95% of the observations) • Standard error of the mean: s n (tells us, about the uncertainty of the estimate of the mean) • 95% - confidence interval: s ⎡ ⎤ P ⎢μ ∈ x ± t n −1;1−α / 2 ⎥ = 0.95 (α = 5%) n ⎣ ⎦ (tells us, how often the "true" value will lie in this interval) Berghold, IMI, MUG