Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IŞIKIE Hypothesis Testing Statistical Hypotheses A problem often confronting the scientist or engineer is the formation of a data-based decision procedure that can produce a conclusion about some scientific system. Examples A medical researcher may decide on the basis of experimental evidence whether coffee drinking increases the risk of cancer in humans. An engineer might have to decide on the basis of sample data whether there is a difference between the accuracy of two kinds of gauges. A sociologist might wish to collect appropriate data to enable him or her to decide whether a person’s blood type and eye color are independent variables. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 1 IŞIKIE Hypothesis Testing In each of these cases the scientist or engineer postulates or conjectures something about a system. In each case, the conjecture can be put in the form of a statistical hypothesis. Definition 10.1. A statistical hypothesis is an assertion or conjecture concerning one or more populations. We take a random sample from the population of interest and use the data to provide evidence that either supports or does not support the hypothesis. •Evidence from the sample that is inconsistent with the stated hypothesis leads to a rejection of the hypothesis •Evidence supporting the hypothesis leads to its acceptance. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 2 IŞIKIE Hypothesis Testing – The role of probability Probability of a Wrong Conclusion The decision procedure must be done with the awareness of the probability of a wrong conclusion. Example: The conjecture postulated by the engineer: “The fraction of defective p in a certain process is 0.10.” Suppose that 100 items are tested and 12 items are found defective. Is it likely to have 12 defective items in a sample of 100? The acceptance of a hypothesis implies that there is no sufficient evidence to refute it. Rejection implies that the sample evidence refutes it. •Rejection means that there is small probability of obtaining the sample information observed when, in fact, the hypothesis is true. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 3 Hypothesis Testing – The role of probability IŞIKIE The formal statement of a hypothesis is often influenced by the structure of the probability of a wrong conclusion. If the scientist is in strongly supporting a contention, he or she hopes to arrive at a contention in the form of a rejection of a hypothesis. •If the medical researcher wishes to show strong evidence in favor of the contention that coffee drinking increases the risk of cancer, the hypothesis should be of the form “there is no increase in cancer risk produced by drinking coffee.” •To support the claim that one kind of gauge is more accurate that another, the engineer test the hypothesis that there is no difference in the accuracy of the two kinds of gauges. The formal statement of the hypothesis is very important when the data analyst formalizes experimental evidence on the basis of hypothesis testing. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 4 IŞIKIE Hypothesis Testing – Formal Statement The Null and Alternative Hypotheses H0 : the null hypothesis; any hypothesis we wish to test is denoted by H0 H1 : alternative hypothesis; the complement of H0 A null hypothesis concerning a population parameter will always be stated so as to specify exact value of the parameter, where as the alternative hypothesis allows for the possibility of several values. p 0.05 If H0 is p = 0.05 H1 would be one of the following IE256 Engineering Statistics – Summer 2010 p 0.05 p 0.05 Hypothesis Testing 5 Testing a Statistical Hypothesis IŞIKIE Example: A certain type of cold vaccine is known to be only 25% effective after a period of 2 years. To determine if a new vaccine is superior in providing protection against the same virus, suppose that 20 people are chosen at random and inoculated. Ifmore than 8 of these receiving the new vaccine surpass the 2 year period without contracting the virus, the new vaccine will be considered superior to the one presently in use. The requirement that the number exceed represents a modest gain over the 5 people that could be expected to receive protection if the 20 people had been inoculated with the vaccine already in use. The alternative hypothesis is that the new vaccine is in fact superior. This is equivalent to testing th hypothesis that the binomial parameter for the probability of a success on a given trial is p = 0.25 against the alternative that p > 0.25. H0 : p 0.25 H1 : p 0.25 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 6 Hypothesis Testing – Test Statistic and Critical Region IŞIKIE The Test Statistic is the value on which we base our decision. In the previous example, the test statistic is X, the number of individuals in the test group who receive protections at least 2 years. All possible scores greater than 8 constitute the critical region. The last number that we observe in passing into the critical region is called the critical value. In our illustration the critical value is the number 8. Therefore If x > 8, we reject H0 in favor of the alternative hypothesis H1 If x ≤ 8, we fail to reject H0 0 1 acceptance region critical region Do not reject H0 Reject H0 2 3 4 5 6 7 8 IE256 Engineering Statistics – Summer 2010 9 10 11 12 13 14 15 16 17 18 19 20 Hypothesis Testing 7 IŞIKIE Hypothesis Testing – Type I and II Errors Definition 10.2. Rejection of the null hypothesis when it is true is called a type I error. Definition 10.3. Nonrejection of the null hypothesis when it is falseis called a type II error. In testing any statistical hypothesis, there are four possible situations that determine whether our decision is correct or in error. Do not reject H0 Reject H0 IE256 Engineering Statistics – Summer 2010 H0 is true H0 is false Correct decision Type II error Type I error Correct decision Hypothesis Testing 8 Hypothesis Testing – Type I and II Errors IŞIKIE The Probability of a Type I Error The probability of committing a type I error, also called the level of significance, is denoted by the Greek letter a . In the previous example, a type I error will occur when more than 8 individuals surpass the 2-year period without contracting the virus using the new vaccine and the effective rate is 0.25. a P( type I error) P( X 8 when p 0.25) 20 b(x ; 20 ,0.25 ) x 9 8 1 b( x ; 20 , 0.25 ) x 0 1 b* (8 ; 20 , 0.25 ) 1 0.9591 0.0409 We say that null hypothesis, p = 0.25, is being tested at the a = 0.0409 level of significance. A critical region of size 0.0409 is very small; it would be most unusual for more than 8 individuals to remain immune for a 2-year period with the new vaccine assuming it is equivalent to the one now on the market. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 9 IŞIKIE Hypothesis Testing – Type I and II Errors The Probability of a Type II Error If we test the null hypothesis that p = 0.25 against the alternative hypothesis that p = 0.50, then we are able to compute the probability of not rejecting H0, when it is false. We simply find the probability of obtaining 8 or fewer in the group that surpass the 2-year period when p = 0. 5: P( type II error) P( X 8 when p 0.50) 8 b(x ; 20 ,0.50 ) 0.2517 x 0 Ideally, we like to use a test procedure for which the type I and type II error probabilities are both small. Assume, the only time he wishes to guard against the type II error is when the true value of p is at least 0.7. If p = 0.70, this test procedure gives P( type II error) P( X 8 when p 0.70) 8 b(x ; 20 ,0.70) 0.0051 x 0 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 10 Hypothesis Testing – Type I and II Errors IŞIKIE The Role of a , , and Sample Size A reduction in is always possible by increasing the size of the critical region. What happens if we set our critical value to 7? a 20 b(x ; 20 ,0.25 ) 1 0.8982 0.1018 x 8 7 b(x ; 20 ,0.50 ) 0.1316 x 0 For a fixed sample size, a decrease in the probability of one error usually result in an increase in the probability of the other error. The probability of committing both type of error can be reduced by increasing the sample size. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 11 IŞIKIE Hypothesis Testing – Type I and II Errors Consider the same problem but this time with a sample of 100 individuals. If more than 36 of the group surpass the 2-year period we reject the null hypothesis that p = 0.25 and accept the alternative hypothesis that p > 0.25 Acceptance region Critical region x x x 36 x 36 Use the normal curve approximation: np (100)(0.25) 25 npq 100(0.25 )(0.75 ) 4.33 36.5 25 4.33 1 P Z 2.66 1 0.9961 0.0039 a P type I error P X 36 when p 0.25 P Z P type II error P X 36 when p 0.50 P Z P Z 2.7 0.0035 36.5 50 5.00 Obviously, the type I and type II errors will rarely occur if the experiment contsists of 100 individuals. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 12 Hypothesis Testing – Type I and II Errors IŞIKIE P X x when p 0.25 x a 0.0039 36.5 25 a P X 36 when p 0.25 P Z 4 . 33 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 13 IŞIKIE Hypothesis Testing – Type I and II Errors 4.33 5.00 25 50 36.5 a 36.5 25 4.33 36.5 50 5.00 a Ptype I error P Z Ptype II error P Z IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 14 IŞIKIE One and Two Tailed Tests Illustration with a Continuous Random Variable Consider the null hypothesis that average weight of male students in a certain college is 68 kilograms against the alternative hypothesis that it is unequal to 68 kilograms. H0 : 68 H1 : 68 The alternative hypothesis allows for the possibility that < 68 or > 68 A sample mean that falls close to the hypothesized value of 68 would be considered evidence in favor of H0. On the other hand, a sample mean that is considerably less than or more than 68 would be evidence inconsistent with H0 and therefore favoring H1. The sample mean, X, is the text statistic in this case. A critical region for the test statistic might arbitrarily be chosen to be the two intervals x 67 and x 69 . The acceptance region will then be the interval 67 x 69 . acceptance region critical region ... Reject H0 63 64 65 66 67 IE256 Engineering Statistics – Summer 2010 Do not reject H0 68 critical region ... Reject H0 69 70 71 72 73 Hypothesis Testing 15 IŞIKIE Hypothesis Testing Assume that the standard deviation of the population of weights to be = 3.6 . For large samples we may substitute s for if no other estimate of is available. Our decision statistic, based on a random sample of size n = 36, will be X, the most efficient estimator of . From the central limit theorem, we know that the sampling distribution of X is approximately normal with standard deviation X n 3.6 6 0.6 . The probability of committing a type I error, or the level of significance of our test, is equal to the sume of the areas that have been shaded red in each tail of the distribution in the following figure. a /2 a /2 67 68 69 x a P X 67 when 68 P X 69 when 68 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 16 Hypothesis Testing IŞIKIE a P X 67 when 68 P X 69 when 68 67 68 69 68 P Z P Z 0.6 0.6 PZ 1.67 PZ 1.67 2PZ 1.67 0.0950 Thus 9.5% of all samples of size 36 would lead us to reject = 68kilograms when, in fact, it is true. To reduce a , we have a choice of increasing the sample size or widening the fail-to-reject (acceptance) region. Suppose that we increase the sample size to n = 64. Then X 3.6 8 0.45. Now 67 68 69 68 P Z 0.45 0.45 PZ 2.22 PZ 2.22 2PZ 2.22 0.0264 a P Z The reduction in a is not sufficient by itself to guarantee a good testing procedure. We must also evaluate for various alternative hypothesis. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 17 IŞIKIE Hypothesis Testing Illustration of the probability of committing a type I error for sample sizes n = 36 and n = 64. X 68 X 0.6 a 2 a 0.0950 a 0.0475 2 67 68 0.0475 x 69 X 68 X 0.45 a 0.0264 a 2 a 0.0132 2 67 IE256 Engineering Statistics – Summer 2010 68 69 0.0132 x Hypothesis Testing 18 IŞIKIE Hypothesis Testing If it is important to reject H0 when the true mean is some value ≥ 70 or ≤ 66, then the probability of committing a type II error should be computed and examined for the alternatives = 66 and = 70. Because of symmetry, it is only necessary to consider the probability of not rejecting the null hypothesis that = 68 when the alternative = 70 is true. A type II error will result when the sample mean x̄ falls between 67 and 69 when H1 is true. Therefore (figure 10.6 p33) 69 70 67 70 Z 0 . 45 0 . 45 P 6.67 Z 2.22 PZ 2.22 PZ 6.67 0.0132 0.0000 0.0132 P67 X 69 when 70 P If the true value of is the alternative = 66 , the value of will again be 0.0132. For all possible values of < 66 or > 70 , the value of will be even smaller when n = 64, and consequently there would be little chance of not rejecting H0 when it is false. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 19 IŞIKIE Hypothesis Testing Illustration of the probability of committing a type I error for sample size n = 64 when the alternative hypothesis is = 70 X 70 X 0.45 X 68 X 0.45 67 68 69 70 71 72 x P67 X 69 when 70 0.0132 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 20 IŞIKIE Hypothesis Testing The probability of committing a type II error increases rapidly when the true value of approaches, but is not equal to, the hypothesized value. Of course, this is usually the situation where we do not mind making a type II error. For example, if the alternative hypothesis = 68.5 is true, we do not mind committing a type II error by concluding that the true answer is = 68. The probability making such an error will be high when n = 64. 69 68.5 67 68.5 Z 0.45 0.45 P 3.33 Z 1.11 PZ 1.11 PZ 3.33 0.8665 0.0004 0.8661 P67 X 69 when 68.5 P IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 21 IŞIKIE Hypothesis Testing Illustration of the probability of committing a type I error for sample size n = 64 when the alternative hypothesis is = 70 X 68 X 0.45 X 68.5 X 0.45 67 68 68.5 69 70 71 x P67 X 69 when 68.5 0.8661 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 22 Hypothesis Testing IŞIKIE Often different types of tests are compared by contrasting power properties. Consider the previous illustration in which we were testing H0 : = 68 vs H1 : ≠ 68 . As before, suppose we are interested in assessing the sensitivity of the test. The test is governed by the rule that we do not reject H0 if 67 ≤ ≤ 69 . We seek the capability of the test for properly rejecting when indeed = 68.5 . We have seen that the probability of a type II error is given by = 0.8661 . Thus the power of the test is 1 – 0.8661 = 0.1339 . In a sense, the power is a more succinct measure of how sensitive the test is for “detecting differences” between a mean of 68 and 68.5. In this case, if is truly 68.5, the test would not be a good one if it is important that the analyst have a reasonable chance of truly distinguishing between a mean of 68.0 (specified by H0) and a mean of 68.5. From the discussion so far, it should be clear that to produce a desirable power (say, greater than 0.8 or 0.9), one must either increase a or increase the sample size. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 23 Hypothesis Testing – Type I and II Errors IŞIKIE Important Properties of Test of Hypothesis 1. The type I error and the type II are related. An decrease in the probability of one generally results in an increase in the probability of the other. 2. The size of the critical region, and therefore the probability of committing a type I error, can always be reduced by adjusting the critical values 3. An increase in the sample size n will reduce a , simultaneously 4. If the null hypothesis is false, is a maximum when the true value of a parameter approaches the hypothesized value. The greater the distance between the true value and the hypothesized value, the smaller will be. Definition 10.4. The power of a test is the probability of rejecting H0 given that a specific alternative is true. The power of a test can be computed as 1- . IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 24 IŞIKIE One and Two Tailed Tests One-Tailed Tests A test of any statistical hypothesis, where the alternative is one sided, such as H0 : 0 H1 : 0 or perhaps H0 : 0 H1 : 0 is called a one-tailed test. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 25 IŞIKIE Hypothesis Testing Two-Tailed Tests A test of any statistical hypothesis, where the alternative is two sided, such as H0 : 0 H1 : 0 is called a two-tailed test. Example: Classify the following examples covered previously as one- or two-tailed tests: The vaccine effectiveness test The male students’ average weight test IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 26 Hypothesis Testing IŞIKIE Example 10.1. A manufacturer of a certain brand of rice cereal claims that average saturated fat content does not exceed 1.5 grams. State the null and alternative hypotheses to be used in testing this claim and determine where (which tail or tails) the critical region is located. Example 10.2. A real estate agent claims that 60% of all private residences being built today are 3-bedroom homes. To test this claim, a large sample of new residences is inspected; the proportion of these homes with 3 bedrooms is recorded and used as our test statistic. State the null and alternative hypotheses to be used in this test and determine the location of the critical region. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 27 IŞIKIE Hypothesis Testing The Use of P-Values for Decision Making In testing hypothesis in which the test statistic is discrete, the critical region may be chosen arbitrarily and its size (level of significance) determined. If a is too large, it can be reduced by making an adjustment in the critical value. It may be necessary to increase the sample size to offset the decrease that occurs automatically in the power of the test. Over a number of generations of statistical analysis, it had become customary to choose an a of 0.05 or 0.01 and select the critical region accordingly. Then, of course, strict rejection or nonrejection of H0 would depend on that critical region. For example, if the test is two tailed and a is set at 0.05 level of significance and the test statistic involves, say, the standard normal distribution, then a z-value is observed from the data and the critical region is z 1.96 or z 1.96 where the value 1.96 is found as z0.025. A value of z in the critical region prompts the statement, “The value of the test statistic is significant.” We can translate that into the user’s language. For example, if the hypothesis is given by H0 : = 10, H1 : ≠ 10 one might say: “The mean differs significantly from the value 10.” IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 28 IŞIKIE Hypothesis Testing Preselection of a Significance Level The preselection of a significance level a has its roots in the philosophy that the maximum risk of making a type I error should be controlled. However, this approach does not account for values of test statistics that are “close” to the critical region. For example, in illustration with H0 : = 10 vs H1 : ≠ 10 , suppose that a value of z = 1.87 is observed; strictly speaking, with a = 0.05 the value is not significant. But the risk of committing a type I error if one rejects H0 in this case could hardly be considered severe. In fact, in a two-tailed scenario one can quantify this risk as Pvalue 2P( Z 1.87 when 10) 2(0.0307) 0.0614 As a result, 0.0614 is the probability of obtaining a value of z as large or larger (in magnitude) than 1.87 when in fact = 10 . Although this evidence against H0 is not as strong as that which result from a rejection at an a = 0.05 level, it is important information to the user. Indeed, continued use of a = 0.05 or a = 0.01 is only a result of what standards have been passed through the generations. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 29 IŞIKIE Hypothesis Testing The P-value approach has been adopted extensively by users in applied statistics. The approach is designed to give the user an alternative in terms of probability to a mere “reject” or “do not reject” conclusion. The P-value computation also gives the user important information when the z-value falls well into the ordinary critical region. For example, if z is 2.73, it is informative for the user to observe that Pvalue 2(0.0032) 0.0064 and thus the z-value is significant at a level considerably less than 0.05. A P-value is the lowest level of significance at which the observed value of the test statistic is significant. The calculation of the P-value will be illustrated through the following two examples, 10.1 on saturated fat content of cereal boxes and 10.2 on proportion of 3bedroom houses. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 30 IŞIKIE Hypothesis Testing Shape of the Critical Region as a Function of a Let us reconsider Example 10.1 (p333) presented above. Assume that the standard deviation of the saturated fat content of all boxes of rice cereal manufactured by this manufacturer is 0.3 grams and the sample size is 40. We are testing H0 : = 1.5, H1 : > 1.5 , hence the critical region has the can be expressed as { x | x > a } where a is the critical value. The critical value, a, hence the location of the critical region determines the level of significance, a , namely the probability of a type I error. Conversely, when we are given a , we can find the critical value a that yields the given level of significance. That is exactly what we shall do now: a 1.5 P Z 0.3 40 P X a when 1.5 a Since PZ za a a a 1.5 za 0.3 40 a 1.5 za 0.3 40 a 1.5 za IE256 Engineering Statistics – Summer 2010 0.3 40 Hypothesis Testing 31 IŞIKIE Hypothesis Testing Therefore for different values of a we get different critical regions: a 10% za 1.28 a 1.561 a 10% 1.5 1.561 a 5% za 1.64 a 1.578 a 5% 1.5 1.578 a 1% za 2.33 a 1.610 a 1% 1.5 IE256 Engineering Statistics – Summer 2010 1.610 Hypothesis Testing 32 IŞIKIE Hypothesis Testing Notice how the critical region and the boundary of the critical region, namely the critical value a, moves as the significance level a changes. In order to calculate the P-value, we need to find the significance level a such that the observed test statistic is right at the boundary of the critical region. Namely, the P-value is the significance level such that the observed statistic equals the observed test statistic. P-value is the a that satisfies this equation: x 1.5 za 0.3 40 Namely we need to look up the following za to find a : za x 1.5 0.3 40 Example 10.1 p333 (cont) Assume that the standard deviation of the saturated fat content of all boxes of rice cereal manufactured by this manufacturer is 0.3 grams. If the sample average saturated fat content of 40 boxes of this brand of cereal is 1.545 grams, compute the associated P-value. 1.545 1.5 x 1.545 za 0.95 a 17.1% 0.3 40 With such a high probability of committing a type I error, we shall not reject the null hypothesis. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 33 IŞIKIE Hypothesis Testing The calculation of the P-value for a two-sided (two-tailed) test will be demonstrated using Example 10.2 (p333) with n = 100 as follows: a 10% za / 2 1.64 a 0.681 b 0.519 a / 2 5% 0.519 0.60 0.681 a 5% za / 2 1.96 a 0.696 b 0.504 a / 2 2.5% 0.504 0.60 0.696 a 1% za / 2 2.58 a 0.726 b 0.474 0.474 IE256 Engineering Statistics – Summer 2010 a / 2 0.5% 0.60 0.726 Hypothesis Testing 34 IŞIKIE Hypothesis Testing The critical values, a and b, are calculated as follows: a 0.60 za /2 0.60 0.40 100 b 0.60 za /2 0.60 0.40 100 We need to select the significance level, a , so that either a or b is equal the observed statistic value. Use the appropriate one of the following equations to solve for a/2 and obtain P-value = a If p̂ > 0.60 then solve za /2 pˆ 0.60 0.60 0.40 / 100 If p̂ < 0.60 then solve za /2 0.60 pˆ 0.60 0.40 / 100 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 35 IŞIKIE Hypothesis Testing Example 10.2 p333 (cont) Calculate the corresponding P-value if the real estate agent found 70 homes to have 3-bedrooms out of 100 inspected. Find the probability of committing a type II error if the null hypothesis that p = 0.60 is not rejected when the alternative hypothesis p = 0.75 is true. a 0.70 0.60 70 za 2 2.04 0.0206 0.70 0.60 0.40 100 2 100 P-value = 2·0.0206 = 4.12% . With a low probability of committing a type I error we reject the null hypothesis that p = 0.60. Since we are rejecting the null hypothesis a type II error is not applicable. But if we were to calculate the probability of a type II error for the stated specific alternative when for the critical region that corresponds to the P-value (i.e. a = 4.12%) here is how: pˆ The critical region when a = 4.12% is { p̂ | p̂ > 0.70 or p̂ < 0.50} P0.50 pˆ 0.70 when p 0.75 P 0.50 0.75 Z 0.75 0.25 100 IE256 Engineering Statistics – Summer 2010 0.70 0.75 0.75 0.25 100 12.41% Hypothesis Testing 36 IŞIKIE Hypothesis Testing – Summary 3 Hypothesis Testing with Fixed Probability of Type I Error 1. State the null and alternative hypothesis. 2. Choose a fixed significance level a . 3. Choose an appropriate test statistic and establish the critical region based on a . 4. From the computed test statistic, reject H0 if the test statistic is in the critical region. Otherwise, do not reject. 5. Draw scientific or engineering conclusions. Significance Testing (P-value Approach) 1. State the null and alternative hypothesis. 2. Choose an appropriate test statistic. 3. Compute P-value based on computed value of test statistic. 4. Use judgement based on P-value and knowledge of scientific system. Homework #8 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 37 IŞIKIE Hypothesis Testing Tests on a Single Mean, Variance Known So far we have been using the statistic X̄ in testing. We would like to standardize our testing procedure from now on and use the statistic Z instead: Z X / n Hence, for the following two-sided test H0 : 0 H1 : 0 we will use the critical region given by z za /2 or z za /2 x 0 where z and za/2 is as defined before. / n With this definition of the critical region, the probability of committing a type I error is a . IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 38 Hypothesis Testing IŞIKIE Similarly for the following one-sided test H0 : 0 H1 : 0 the critical region is given by z za And finally H0 : 0 H1 : 0 the critical region is given by z za IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 39 IŞIKIE Hypothesis Testing Example 10.3 p340 A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years? Use a 0.05 level of significance. The hypothesis: Critical region: Computations: H0 : 70 years H1 : 70 years z z0.05 x 0 z 1.645 where z n 71.8 70 z 2.02 8.9 / 100 Decision: Reject H0 and conclude that the mean life span today is greater than 70 years. The P-value is P value PZ 2.02 0.0217 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 40 IŞIKIE Hypothesis Testing Example 10.4 p340 A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilogram. Test the hypothesis that = 8kilograms against the alternative hypothesis that ≠ 8kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of significance. The hypothesis: H0 : 8 kilograms H1 : 8 kilograms Critical region: z > z0.005 or z < –z0.005 z > 2.575 or z < –2.575 Computations: z x 0 where z / n 7.8 8 2.83 0.5 50 Decision: Reject H0 and conclude that the average breaking strength is not equal to 8 but is, less than 8 kilograms. The P-value is Pvalue 2 PZ 2.83 2 0.0023 0.0046 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 41 Hypothesis Testing IŞIKIE Relationship to Confidence Interval Estimation The hypothesis testing approach to statistical inference we have covered so far is very closely related to the confidence interval approach we have seen in the previous chapter. Confidence interval estimation involves computation of bounds for which it is “reasonable” that the parameter in question is inside the bounds. On the otherhand, hypothesis testing involves checking to see if the test statistic is in a “reasonable” region if the null hypothesis is in fact true. Let us demonstrate this equivalence in the case of a single population mean with 2 known. The structure of both hypothesis testing and confidence interval estimation is based on the random variable X Z / n It turns out that the testing of H0 : = 0 against H1 : ≠ 0 at a significance level a is equivalent to computing a 100(1 – a )% confidence interval on and rejecting H0 if 0 is outside the confidence interval. If 0 is inside the interval the hypothesis is not rejected. The equivalence is very intuitive and quite simple to illustrate. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 42 IŞIKIE Hypothesis Testing If H0 is not rejected then –za/2 < Z < za/2 za 2 x 0 za 2 / n which is equivalent to x za /2 n 0 x za /2 n The confidence interval equivalence to hypothesis testing extends to difference between two means, variances, ratios of variance, and so on. We should not consider confidence interval estimation and hypothesis testing as separate forms of statistical inference. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 43 IŞIKIE Hypothesis Testing Tests on a Single Mean, Variance Unknown Under the assumptions we have made regarding the confidence interval for the population mean when the variance is not known, the following T statistic has a Student t-distribution: X T S/ n In this situation, the following critical regions will be used with the corresponding tests: H0 : 0 H1 : 0 H0 : 0 H1 : 0 H0 : 0 H1 : 0 t ta /2, n1 or t ta /2, n1 t ta ,n1 t ta ,n1 x 0 where t and ta,n-1 is as defined before. s/ n IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 44 Hypothesis Testing IŞIKIE Example 10.5 p343 The Edison Electric Institute has published figures on the annual number of kilowatt hours expended by various home appliances. It is claimed that a vacuum cleaner expends an average of 46 kilowatt hours per year. If a random sample of 12 homes included in a planned study indicates that vacuum cleaners expend an average of 42 kilowatt hours per year with a standard deviation of 11.9 kilowatt hours, does this suggest at the 0.05 level of significance that vacuum cleaners expend, on the average, less than 46 kilowatt hours annually? Assume the population of kilowatt hours to be normal. The hypothesis: Critical region: H0 : 46 kilowatt hours t t0.05 x 0 H1 : 46 kilowatt hours t 1.7960 where t s n 42 46 t 1.16 11 Computations: 11.9 12 Decision: Do not reject H0 and conclude that the average number of kilowatt hours expended annually by home vacuum cleaners is not significantly less than 46. The P-value is P value PT 1.16 0.1353 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 45 IŞIKIE Hypothesis Testing Tests on Two Means, Variances Known The test statistic is Z ( X 1 X 2 ) ( 1 2 ) 12 / n1 22 / n2 The following critical regions will be used with the corresponding tests: H0 : 1 2 d0 H 1 : 1 2 d0 z za /2 H0 : 1 2 d0 H 1 : 1 2 d0 z za H0 : 1 2 d0 H 1 : 1 2 d0 z za where z ( x 1 x2 ) d0 12 / n1 22 / n2 or z za /2 and za is as defined before. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 46 IŞIKIE Hypothesis Testing Tests on Two Means, Variances Equal But Unknown The test statistic is ( X X 2 ) ( 1 2 ) T 1 Sp 1 / n1 1 / n2 and Sp S12 (n1 1) S22 (n2 1) n1 n2 2 The following critical regions will be used with the corresponding tests: H0 : 1 2 d0 H 1 : 1 2 d0 t ta / 2,n1 n2 2 H0 : 1 2 d0 H 1 : 1 2 d0 t ta ,n1 n2 2 H0 : 1 2 d0 H 1 : 1 2 d0 t ta ,n1 n2 2 ( x 1 x 2 ) d0 t , sp where s p 1 / n1 1 / n2 or t ta / 2,n1 n2 2 s12 (n1 1) s22 (n2 1) n1 n2 2 and ta ,n1+n2-2 is as defined before. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 47 IŞIKIE Hypothesis Testing Example 10.6 p346 An experiment was performed to compare the abrasive wear of two different laminated materials. Twelve pieces of material 1 were tested by exposing each piece to a machine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The samples of material 1 gave an average (coded) wear of 85 units with a sample standard deviation of 4, while the samples of material 2 gave an average of 81 and a sample standard deviation of 5. Can we conclude at the 0.05 level of significance that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units? Assume the populations to be approximately normal with equal variances. H0 : 1 2 2 H1 : 1 2 2 t Critical region: t > 1.725 sp 42 (11) 52 (9) 4.478 12 10 2 ( x1 x2 ) d0 (85 81) 2 1.04 s p 1 n1 1 n2 4.478 1 12 1 10 20 Do not reject H0; we are unable to conclude that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units. The P-value is Pvalue PT 1.04 0.1553 IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 48 IŞIKIE Hypothesis Testing Tests on Two Means, Unknown and Unequal Variances The test statistic is T ( X 1 X 2 ) ( 1 2 ) s12 / n1 s22 / n2 and s12 n1 2 s12 n1 s22 (n1 1) n2 s22 2 n2 2 (n2 1) The following critical regions will be used with the corresponding tests: H0 : 1 2 d0 H 1 : 1 2 d0 t ta / 2, or t ta / 2, H0 : 1 2 d0 H 1 : 1 2 d0 t ta , H0 : 1 2 d0 H 1 : 1 2 d0 t ta , where t ( x 1 x 2 ) d0 s12 / n1 s22 / n2 and ta ,n is as defined before. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 49 IŞIKIE Hypothesis Testing Tests on Two Means, Paired Observations The test statistic is D D T SD / n The following critical regions will be used with the corresponding tests: H0 : D d0 H 1 : D d0 t ta / 2, n1 or t ta / 2, n1 H0 : D d0 H 1 : D d0 t ta , n1 H0 : D d0 H 1 : D d0 t ta , n1 where t d d0 and ta ,n-1 is as defined before. sD / n IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 50 Hypothesis Testing IŞIKIE Example 10.7 p346 Androgen (ng/ml) In a study conducted in the forestry and Time of 30 Minutes after Deer di Injection wildlife department at Virginia Polytechnic Injection 1 2.76 7.02 4.26 Institute and State University, J. A. Watson 2 5.18 3.10 -2.08 examined the influence of the drug 3 2.68 5.44 -2.76 succinylcholine on the circulation levels of 4 3.05 3.99 0.94 5 4.10 5.21 1.11 androgens in the blood. Blood samples from 6 7.05 10.26 3.21 the wild, free-ranging deer were obtained 7 6.60 13.91 7.31 via the jugular vein immediately after an 8 4.79 18.53 13.74 9 7.39 7.91 0.52 intramuscular injection of succinylcholine 10 7.30 4.85 -2.45 using darts and a capture gun. Deer were 11 11.78 11.10 -0.68 bled again approximately 30 minutes after 12 3.90 3.74 -0.16 13 26.00 94.03 68.03 the injection and then released. The levels 14 67.48 94.03 26.55 of androgens at time of capture and 15 17.04 41.70 24.66 30 minutes later, measured in nanograms per milliliter (ng/ml), for 15 deer are given on the right. Assuming that the two populations are normally distributed, test at the 0.05 level of significance whether the androgen concentrations are altered after 30 minutes of restraint. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 51 IŞIKIE Hypothesis Testing Let 1 and 2 be the average androgen concentration at the time of injection and 30 minutes later, respectively. We proceed as follows: H0 : 1 2 or D 1 2 0 H1 : 1 2 or D 1 2 0 Critical region: t 2.145 or t 2.145 Computations: d 9.848 and sd 18.474 9.848 0 t 2.06 18.474 / 15 t d d0 sD / n 14 Though the t-statistic is not significant at the 0.05 level, the P-value is P value 2 PT 2.06 2 0.0292 0.0584 As a result, there is some evidence that there is difference in mean circulating levels of androgen. Summary of Test Procedures see page 351 of the textbook for a summary of the hypothesis testing procedures IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 52 IŞIKIE Hypothesis Testing Choice of Sample Size for Testing Means: A Single Mean Suppose that we wish to test the hypothesis H0 : 0 H1 : 0 with a significance level a when the variance 2 is known and the specific alternative is = 0 + d , is calculated as follows (see the figure on the next page) P X a when 0 d X ( 0 d ) a ( 0 d ) P when 0 d / n / n a 0 d d P Z P Z z a / n / n / n where a 0 za n . From which we conclude that z za IE256 Engineering Statistics – Summer 2010 d / n Hypothesis Testing 53 IŞIKIE Hypothesis Testing H0 true H1 true a 0 a 0d P X a when 0 a a 0 za n P X a when 0 d d P Z za n z za IE256 Engineering Statistics – Summer 2010 d / n Hypothesis Testing 54 IŞIKIE Hypothesis Testing We obtain the sample size as n ( za z ) 2 2 d2 In the case of a two-sided test we obtain the power 1 – for a specified alternative by = 0 + d when ( za /2 z ) 2 2 n 2 d Example 10.8 p352 Suppose we wish to test the hypothesis H0 : = 68kg; H1 : > 68kg for the weights of male students at a certain college using an a = 0.05 level of significance when it is known that = 5kg . Find the sample size required if the power of our test is to be 0.95 when the true mean is 69 kilograms. Since a = = 0.05 we have za = z = 1.645 . For the alternative = 69, we have d = 1 and then (1.645 1.645) 2 25 n 270.6 271 1 In plain English, samples of size 271 are required if the test is to reject the null hypothesis 95% of the time when, in fact, is as large as 69 kg. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 55 IŞIKIE Hypothesis Testing Choice of Sample Size for Testing Means: Two Means A similar procedure can be used to determine the sample size n = n1 = n2 required for a specific power of the test in which two population means are being compared. Suppose we wish to test the hypothesis H0 : 1 2 d0 H 1 : 1 2 d0 with a significance level a when the variances 1 and 2 are known and the specific alternative is 1 – 2 = d0 + d , the power of our test is 1 P X 1 X 2 d0 a when 1 2 d0 d where a za /2 ( 12 22 ) / n P a X 1 X 2 d0 a when 1 2 d0 d P a d ( 12 22 ) n X 1 X 2 (d0 d ) ( 12 22 ) n For the specific alternative 1 – 2 = d0 + d the statistic has the standard normal distribution. IE256 Engineering Statistics – Summer 2010 ( 12 22 ) n a d X 1 X 2 (d0 d ) ( 12 22 ) / n Hypothesis Testing 56 IŞIKIE Hypothesis Testing H0 true a /2 d0a H1 true a /2 d0 d0a d0d a za /2 ( 12 22 ) / n P a X 1 X 2 d0 a when 1 2 d0 d P X 1 X 2 d0 a when 1 2 d0 d IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 57 IŞIKIE Hypothesis Testing Solving for za/2 we get a za /2 ( 12 22 ) / n Then za /2 a ( 12 22 ) / n d d P za 2 Z za 2 ( 12 22 ) n ( 12 22 ) n d d P Z za 2 P Z za 2 ( 12 22 ) n ( 12 22 ) n Approximating the very small quantity P Z za /2 d / ( 12 22 ) / n as zero, we obtain d P Z za /2 ( 12 22 ) / n From which we conclude that z za /2 d ( 12 22 ) /n IE256 Engineering Statistics – Summer 2010 n ( za /2 z ) 2 ( 12 22 ) d2 Hypothesis Testing 58 IŞIKIE Hypothesis Testing Tests on a Single Proportion There are three possible tests on a single proportion: H0 : p p0 H1 : p p0 H0 : p p0 H1 : p p0 H0 : p p0 H1 : p p0 We can use the binomial test statistic X or we could just as well use P̂ = X / n . Because X is a discrete binomial variable, it is unlikely that a critical region can be established whose size is exactly equal to a prespecified value of a . For this reason it is preferable, in dealing with small samples, to base our decisions on P-values. Here is how we can calculate the P-values depending on the alternative hypothesis: H0 : p p0 H1 : p p0 P-value 2P X x when p p0 P-value 2P X x when p p0 H0 : p p0 H1 : p p0 P-value P X x when p p0 H0 : p p0 H1 : p p0 P-value P X x when p p0 IE256 Engineering Statistics – Summer 2010 if x np0 if x np0 Hypothesis Testing 59 IŞIKIE Hypothesis Testing In the preceeding formulas, x is the number of successes in our sample of size n. If the P-value is less than or equal to a , our test is significant at the a level and we reject H0 in favor of H1. Example 10.10 p363 A builder claims that heat pumps are installed in 70% of all homes being constructed today in the city of Richmont, Virginia. Would you aggree with this claim if a random survey of new homes in this city shows that 8 out of 15 had heat pumps installed? Use a 0.10 level of significance. The hypothesis: H0 : p 70% H1 : p 70% Test statistic: Binomial variable X with p = 0.7 and n = 15 Since x = 8 is less than np0 = (15)(0.7) = 10.5 the P-value is calculated as follows: 8 P value 2 P X 8 when p 0.7 2 b( x;15,0.7) 0.2622 0.10 x 0 Decision: Do not reject H0. Conclude that there is insufficient reason to doubt the builder’s claim. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 60 IŞIKIE Hypothesis Testing For small sample sizes (e.g. n < 20) we would use the binomial distribution to calculate the P-values. For bigger samples we can approximate the P-value via the normal approximation: x np0 z np0q0 or z pˆ p0 p0q0 n Hence the critical regions become: H0 : p p0 H1 : p p0 z za /2 H0 : p p0 H1 : p p0 z za H0 : p p0 H1 : p p0 z za IE256 Engineering Statistics – Summer 2010 or z za /2 Hypothesis Testing 61 IŞIKIE Hypothesis Testing Example 10.11 p363 A commonly prescribed drug for relieving nervous tension is believed to be only 60% effective. Experimental results with a new drug administered to a random sample of 100 adults who were suffering from nervous tension show that 70 received relief. Is this sufficient evidence to conclude that the new drug is superior to the one commonly described? Use a 0.05 level of significance. The hypothesis: H0 : p 60% H1 : p 60% Critical region: z z0.05 z 1.645 where z pˆ p0 p0q0 / n Computations: pˆ x 70 0.7 n 100 z 0.7 0.6 2.04 (0.6)(0.4) / 100 Pvalue PZ 2.04 0.0207 Decision: Reject H0 and conclude that the new drug is superior. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 62 IŞIKIE Hypothesis Testing Tests on Two Proportions Although we could conceive of more types of hypotheses, we will focus on the following three: H0 : p1 p2 H1 : p1 p2 H0 : p1 p2 H1 : p1 p2 H0 : p1 p2 H1 : p1 p2 We use the random variable P̂1 – P̂2 as our test statistic. As discussed before, if the sizes of the two samples from which P̂1 and P̂2 are computed are sufficiently large, then the difference of these two point estimators is approximately normally distributed with mean p1 – p2 and variance p1q1/n1 + p2q2/n2 . Therefore, our critical region can be established by using the standard normal variable (Pˆ1 Pˆ2 ) ( p1 p2 ) Z p1q1 / n1 p2 q2 / n2 When H0 is true, we can substitute p1 = p2 = p and q1 = q2 = q (where p and q are the common values) Pˆ1 Pˆ2 Z pq(1 / n1 1 / n2 ) IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 63 IŞIKIE Hypothesis Testing However, we must estimate the parameters p and q that appear in the above statistic. When the null hypothesis is assumed true, we basically assume that both samples come from populations with equal proportion values. Therefore we can pool the data from both samples to estimate the unknown parameter: pˆ 1 pˆ 2 z pˆ qˆ (1 / n1 1 / n2 ) where pˆ 1 x1 n1 x pˆ 2 2 n2 As before, the critical regions are: H0 : p1 p2 z za /2 H1 : p1 p2 H0 : H1 : p1 p2 p1 p2 z za H0 : H1 : p1 p2 p1 p2 z za IE256 Engineering Statistics – Summer 2010 x x pooled estimate of pˆ 1 2 the proportion p n1 n2 or z za /2 Hypothesis Testing 64 Hypothesis Testing IŞIKIE Example 10.12 p365 A vote is to be taken among the residents of a town and the surrounding county to determine whether a proposed chemical plant should be constructed. The construction site is within the town limits, and for this reason many voters in the county feel that the proposal will pass because the large proportion of town voters who favor the construction. To determine if there is a significant difference in the proportion of town voters and county voters favoring the proposal, a poll is taken. If 120 of 200 town voters favor the proposal and 240 of 500 county residents favor it, would you agree that the proportion of town voters favoring the proposal is higher than the proportion of county voters? Use an a = 0.05 level of significance. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 65 IŞIKIE Hypothesis Testing Let p1 and p2 be the true proportion of voters in the town and county, respectively, favoring the proposal. The hypothesis: H0 : p1 p2 H1 : p1 p2 Critical region: z z0.05 z 1.645 Computations: pˆ 1 z 120 0.60 200 pˆ 2 240 0.48 500 pˆ 120 240 0.51 200 500 0.60 0.48 2.90 (0.51)(0.49)(1 / 200 1 / 500) P value PZ 2.90 0.0019 Decision: Reject H0 and agree that the proportion of town voters favoring the proposal is higher than the proportion of county voters. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 66 IŞIKIE Hypothesis Testing Tests on a Variance When testing a population variance we need to consider the following three hypothesis: H0 : 2 02 H0 : 2 02 H0 : 2 02 H1 : 2 02 H1 : 2 02 H1 : 2 02 We base our rejection/nonrejection decision on the sample variance information obtained from the data. Assuming the population being sampled has normal distribution, the chi-squared value for testing 2 02 is given by 2 (n 1) s 2 / 02 where s2 is the sample variance, and 02 is the value of 2 given by the null hypothesis. H0 : 2 02 2 2 2 2 or a /2 1a /2 H1 : 2 02 H0 : 2 02 H1 : 2 02 2 a2 H0 : 2 02 H1 : 2 02 2 12a IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 67 IŞIKIE Hypothesis Testing Example 10.13 p367 A manufacturer of car batteries claims that the life of his batteries is approximately normally distributed with a standard deviation equal to 0.9 year. If a random sample of 10 of these batteries has a standard deviation of 1.2 years, do you think that > 0.9 year? Use a 0.05 level of significance. The hypothesis: H0 : 2 0.81 Critical region: 2 02.05 H1 : 0.81 2 Computations: 2 16.919 2 (10 1)(1.2) 2 P value P (0.9) 2 where 2 (n 1) s 2 02 16.0 2 16.0 0.07 Decision: The 2-statistic is not significant at the 0.05 level. However, based on the P-value 0.07, there is evidence that > 0.9 year. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 68 IŞIKIE Hypothesis Testing Testing the Equality of Two Variances When testing the equality of two variances the usual alternatives are: H0 : 12 22 H1 : 12 22 H0 : 12 22 H1 : 12 22 H0 : 12 22 H1 : 12 22 For independent samples of size n1 and n2, respectively, we use the f-value f s12 / s22 for testing 12 22 where s12 and s22 are the variances computed from the two samples. If the two populations are approximately normally distributed and the null hypothesis is true, the ratio f s12 / s22 is a value of the F-distribution with n1 = n1 – 1 and n2 = n2 – 1 degrees of freedom. Therefore the critical regions are: H0 : 12 22 H1 : 12 22 f fa /2( 1 , 2 ) H0 : 12 22 H1 : 12 22 f fa ( 1 , 2 ) H0 : 12 22 H1 : 12 22 f f1a ( 1 , 2 ) IE256 Engineering Statistics – Summer 2010 or f f1a /2( 1 , 2 ) Hypothesis Testing 69 IŞIKIE Hypothesis Testing Example 10.14 p369 In testing for the difference in abrasive wear of the two materials in Example 10.6, we assumed that the two unknown population variances are equal. Were we justified in making this assumption? Use a 0.10 level of significance. The hypothesis: H0 : 12 22 Critical region: f f0.05 (11,9) or f 3.11 H1 : 12 22 or f f0.95 (11,9) f 0.34 where f s 12 / s22 Computations: f 16 0.64 25 Decision: Do not reject H0. Conclude that there is insufficient evidence that the variances differ (i.e. are no equal). IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 70 Review Questions IŞIKIE Exercise 10.14. A manufacturer has developed a new fishing line, which he claims has a mean breaking strength of 15 kilograms with a standard deviation of 0.5 kilograms. To test the hypothesis that mu=15 kilograms against the alternative that mu<15 kilograms, a random sample of 50 lines were tested. The critical region was defined to be x_bar<14.9. (a) Find the probability of type I error (b) Evaluate type II error for the alternatives mu=14.8 and mu=14.9 kilograms IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 71 Review Questions IŞIKIE 10.36. A large automobile manufacturing company is trying to decide whether to purchase brand A or brand B tires for its new models. To help arrive at a decision, an experiment is conducted using 12 of each brand. The tires are run until they wear out. The results are as shown on the table. Test the hypothesis that there is no difference in the average wear of 2 brands of tires. Assume the populations are approximately normally distributed with equal variances. Use a P-value. IE256 Engineering Statistics – Summer 2010 Hypothesis Testing 72