* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hypothesis Testing - One Population Mean
Survey
Document related concepts
Transcript
Hypothesis Testing – Population Mean Z-test About One Mean The Z-test about a mean of population we are using is applied in the following three cases: a. The population distribution is normal and the population standard deviation is known. Here the sample size is irrelevant i.e., we can use this test with large or small samples. b. The sample size is large and the population standard deviation is known. In this case we are using the power of Central Limit Theorem. c. The sample size is large and the population distribution is unknown. In case we do not know the population standard deviation σ we can use sample standard deviation s instead. In this case we also use the power of Central Limit Theorem. The null hypothesis is made about the mean of the population as H 0 : 0 (2.1) and the alternative hypothesis H0 is made as either one of the following: 0 , 0 , 0 . (2.2) As before we shall call the test “right-tail,” “left-tail,” or “two-tail” test depending on the choice of alternative hypothesis in (2.2). In all three cases above we have that the test statistic X 0 2 (2.3) n is approximately standard normal. In fact in case (a) the variable (2.3) is precisely standard normal. Calculation needed to warrant the rejection of the null hypothesis is as in the following table, assuming level of significance of the test as α, H0 H1 Right-Tailed 0 0 Left-Tailed 0 0 Two-Tailed 0 0 In the two-tailed test the rejection criterion X 0 /n 2 z /2 or that X 0 2 /n X 0 2 /n Rejection Criterion X 0 2 /n X 0 2 /n X 0 2 /n z z z /2 z /2 simply means that z /2 . This means that the test statistic is in one of the tails of the rejection region. Left-tailed: Right-tailed: Two-tailed: Fig 1. Example 1. Farmer Bill has grown tomatoes for many years and he used to have tomatoes with the average weight of 150 gm. A random sample of a new batch of 45 tomatoes has average weight of 167 gm. The population standard deviation is known to be 48 gm. Can we conclude, with the level of significance of 10%, that his tomatoes are bigger this year? Answer: In our test classification this is a (b) case for Z-test. We assume the tomatoes are equally big this year as in the past. The alternative hypothesis is that they are bigger, H 0 : 150 H1 : 150 X 0 2 /n 167 150 2.3758 1.282 z0.10 . 48 45 Fig 2. Apparently we can reject the null hypothesis so the answer is yes, the tomatoes are getting bigger. The p-value of the test is fairly small, only 0.0088, see the figure 3. This means we would be able to reject the null hypothesis even if the level of significance was 1%. Fig 3. Example 2. A sample of 120 men frequenting certain local bar shows their average education level is 12.3 with a standard deviation 2.6. The average education level in the US is 13. Can we conclude the men in our local bar are less educated that the population on average, using 1% significance level? Answer: This is the case (c) for Z-test. Since we do not know population standard deviation we shall use s instead of σ. We assume they are equally educated as the general population. The alternative hypothesis is they are less educated, H 0 : 13 H1 : 13 X 0 12.3 13 2.95 2.33 z0.01 . 2.6 2 /n 120 They are less educated (we reject the null hypothesis). The p-value of the test is 0.0016. Example 3. An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $3,500. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly selected 80 claims, and calculated the sample mean of $3,720. Assuming that the standard deviation of claims is $1,450, use significance level of 5% to see if the insurance company should be concerned about the rates. Answer: Once again we are dealing with the (c) case for Z-test of the population mean. H 0 : 3500 H1 : 3500 X 0 2 /n 3720 3500 1.357 1.645 z0.05 1450 80 Fig 4. Based on this we cannot claim that the average value of claims has increased recently. The p-value of the test is 0.0874 which means we would be able to reject the initial hypothesis at 10% significance level. T-test About One Mean If the population is distributed normally and the population standard deviation σ is unknown the test statistic to use is instance of Student’s T random variable with n – 1 degrees of freedom where n is the sample size, . X 0 (2.4) s2 / n In either case our table for testing is now looking just a little different, H0 H1 Right-Tailed 0 0 X 0 Left-Tailed 0 0 X 0 Two-Tailed 0 0 Rejection Criterion S2 / n S2 / n X 0 S2 / n t t t /2 where the critical values are with respect to n – 1 degrees of freedom. Yet again X 0 2 S /n t /2 simply means that X 0 2 S /n t /2 or that X 0 S2 / n t /2 . This means that the test statistic is in one of the tails of the rejection region. The difference in testing now is that we are referring to the areas under the T-curve. Example 4. Let X be the tumor growth (in millimeters) per day induced in a lab mouse. We have measurements in 9 consecutive days. There is a reason to believe the tumor growth follows normal distribution with mean 4 (null hypothesis) and unknown standard deviation. However our sample of 9 shows sample mean of 4.7 and sample standard deviation of 1.2. If we use level of significance of 0.05 should we reject the hypothesis (initial belief)? Answer: H0 : 4 H1 : 4 X 0 S2 / n 1.75 1.860 t0.05 . Fig 5. Therefore the null hypothesis cannot be rejected based on the data we collected at the level of significance of 5%. The p-value of the test is not available in tables, we need to use computer software of TI calculator. This value is 0.0591. We can see that we would reject the null hypothesis is the significance level were 10%. Example 5. The US National Research Council currently recommends that females between the ages of 11 and 50 intake 15 milligrams of iron daily. From a sample of 25 females researchers found sample mean iron intake of 14.1 milligrams. The sample standard deviation was 2.367 milligrams. Can we conclude that the average iron intake of American female is less than the recommended by USNRC? Use significance of 5% and assume that the intake is normally distributed. Answer: H 0 : 15 H1 : 15 X 0 S2 / n 14.1 15 1.901 1.711 t0.05 2.367 25 Fig 6. Yes, we can conclude that the average population intake is less than recommended. The p-value of the test is 0.0347. Fig 8. Homework In couple of these problems you will need to calculate the sample parameters yourself (mean, standard deviation). 8.2: 4 (two-tail), 8 (left-tail), 9 (right-tail), 15. 8.3 (Assume population normally distributed): 3, 5 (two-tail), 9 (left-tail), 13 (right-tail). More: 1. Average age of male lion in captivity is 20 years. The survey of 52 lions that died in zoos on American soil showed the average age at the time of death 19.4 years and standard deviation of 3.4 years. Do lions in American zoos live shorter, using 5% significance level? What is the value of the test statistic? 2. The average age at graduation of students of a certain state university is 23.7. The average age of graduating math students, from the sample of 83, was 24.2 with a standard deviation of 1.6. Do math students graduate later that the rest of the students in the university, using 1% significance level? What is the value of the test statistic? 3. The average IQ in the population is distributed normally with the mean of 100 and standard deviation of 15. A survey of 22 NBA players has the average IQ of 103 with the standard deviation of 18. Using significance level of 5%, can you say that NBA players have higher IQ than the average population? What is the value of the test statistic?