Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis testing In research we want to get answers to posed questions (hypothesis). • Are all coffee flavors equally popular? • Is the use of bike helmets effective in protecting people in bicycle accidents from head injuries? • Is there a connection between gender and alcohol consumption among the students at Umeå university? HYPOTHETIC-DEDUCTIVE METHOD 1 Hypothesis Statement Deduction – logically valid argument (Predictive inference) 2 3 Induction (Inductive inference) 1Tries to predict what will happen if the hypothesis holds. 2 ”Dialogue with reality” Observation Logical valid hypothesis (example) Valid Invalid Hypothesis: It is raining. Hypothesis: It is raining. Statement: If it is raining the ground will be wet. Statement: If it it is raining the ground will be wet. Observation: The ground is not wet. Observation: The ground is wet. Conclusion: It does not rain. Conclusion: It rains. Non valid conclusion. The ground can be wet due to several reasons. Contradiction proofs Within statistical hypothesis testing (inference theory) we are not looking for ”impossible” events” in order to reject posed hypotheses. (e.g. it is impossible that the ground is dry if it rains. If the ground is dry hypothesis ”it rains” is rejected) Instead we are looking for contradictions in terms of ”improbable events”. Improbable event Assume that we suspect that the usage of bicycle helmets is an effective way to protect people in bicycle accidents from skull damage. Null hypothesis: The percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets. Statement: If the percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets, in a sample survey there should only be a small difference in the percentage of people with skull damage in the two groups. If the hypothesis holds, it is an improbable event in a sample survey, to observe a large percentage difference between these kinds of groups. Improbable event Assume that we suspect that there is a difference between male and female students at Umeå university concerning the opinion about EMU. Null hypothesis: The percentage of students that are against EMU is the same whether or not they are male or females. Statement: If the percentage of students that are against EMU is the same whether or not they are male or females, in a sample survey there should only be a small percentage difference of students against EMU between the two groups. If the hypothesis holds, it is an improbable event in a sample survey, to observe a large percentage difference between these groups. Test statistic Within statistical inference theory the statements are summarized in a test statistic. The value of the test statistic is estimated from a sample. The value of the test statistic varies between different samples. From our hypothesis and from the probability theory we can predict the value of the test statistic if the null hypothesis is true. Next, we draw a sample and calculate the value of the test statistic. If we get an improbable value the null hypothesis is rejected. Note: Different types of test statistics are used for different types of tests. (The computer program SPSS keeps track of that?) P-VALUE • Assuming that the null hypothesis is true, the p-value is the probability of obtaining a sample result that is at least as unlikely as what is observed. • If the p-value is small, we either have something which is improbable or the null hypothesis does not hold. • If the p-value is small (< 0.05 or <0.01) the null hypothesis should be rejected. 9 Mosquito cream example: We have tested anti mosquito creams on 10 persons. Each person did get the cream A on a random chosen arm and cream B on the other arm. The persons was then forced to walk in Amazon jungle. The number of mosquito bites were counted on each arm. Suppose 7 out of the 10 persons did have less mosquito bites on the arm with cream A. Is this enough evidence to say that there is a difference in effectiveness between the creams? Help me with the null hypothesis. Example: • Null hypothesis: The anti mosquito creams A and B are equally effective. • Alternative hypothesis: the anti mosquito creams are nor equally effective • Statement: If the Null hypothesis holds then we expect that about half of the people in our sample get more mosquito bites with cream A. • Math Calculations gives that if the null hypothesis is true then the number of people in our sample that get more mosquito bites on arm with cream A is binominal distributed. If Null hypothesis is true. Is 7 out of 10 a Improbable event? The probability of getting more than 7 or less than 3 is about 34%. Conclusion • The P-value is 34%. This means that it is not uncommon to get the data we got in our sample or anything more extreme if the null hypothesis is true. • We can not reject the Null hypotheses. Reasons for non-significant results • There is no difference • There is a difference, but we have too few observations to detect it • Important. The fact that we can’t reject the null hypothesis does not mean that the null hypothesis is true. Steps of hypothesis testing • Develop a null hypothesis • Develop an alternative hypothesis (what we want to know) • Specify the level of significance α: 0.05, 0.01, 0.001 (How certain do we want to be?) • Select the test statistic that will be used to test the hypothesis. • Perform the test and calculate the p-value. • Draw a conclusion by comparing the level of significance (α) and the p-value. - Reject the null hypothesis (p-value < α ) - Do not reject the null hypothesis (p-value ≥ α ) Choose the right test • Hand out the summery picture. • Different test uses different assumptions. • Generally, the less assumption a test uses the less power the test has. The power is the ability to reject a false hull hypotheses. • Many tests requires that the sample or some transformation of the sample is normally distributed. Parametric/non-parametric test • Parametric tests: – if data are normally distributed – describe your data with mean and SD • Non-parametric tests: – primarily if data are not normaly distributed – can also be used if data is normally distributed, but less powerful – less sensitive to outliers – describe your data with median and percentiles Normal probability distribution • How do I know if my variable is normally distributed? – continuous variable, no cut-off point – draw histogram, normal probability plot – symmetric, bell-shaped, mean=median – Unsure? Use non-parametric tests if available Comparing means example A. Comparing means from 2 samples (using T-test) B. Comparing means from several samples (using ANOVA). C. Comparing means from several samples (using Blocked ANOVA) A: Do gender affect the mean score on a statistical exam? A: SPSS gives (T-test) What does the SPSS output imply? B: Do students with different grades put down different amount of time in the studies? A: SPSS gives (One Way ANOVA) • What is the simple idée behind the analysis? • What does the SPSS output imply? • Where is the difference? Tukey intervals (Where does the mean differ?) C: Do math background or Gender or both influence the time put into the course? SPSS gives (two way ANOVA) If time: Why the tests works 1 • “The law of large numbers (LLN). Given a sample of independent and identically distributed random variables with a finite population mean, the average of these observations will eventually approach and stay close to the population mean.” • This result tells us that the larger the sample, the better precision of the estimates. If time: Why the tests works 2 • “The central limit theorem (CLT) states that if the sum of independent identically distributed random variables has a finite variance, then it will be approximately normally distributed (i.e., following a Gausian distribution, or bellshaped curve). “ • This result (and similar) is important because it lets us approximate the distribution of test statistics which is necessary to test hypothesis If even more time: Example 100 people took part in a survey about different brands of coffee. Each person tasted four different brand (in a blind test), and noted which one they preferred. The result of the test was as follows: Brand: Ellips Gexus Luber Eco Number of people 26 28 16 30 Does the result of the survey show that any of the brands are more popular than the others, or are they all equal? In statistical terms we can formulate the problem as: Null hypothesis: All the coffee brands are equally popular. Alternative hypothesis: All the coffee brands are not equally popular. If the null hypothesis is true, we could expect the following result of the survey: Brand: Ellips Gexus Luber Eco Number of people 25 25 25 25 Can we with a significance level of 5% say anything about whether or not the null hypothesis is true. One way of measuring how much the observed table differs from the expected table is to look at the differences: 26 25 28 25 16 25 30 25 2 2 2 2 However, there is a problem with the fact that the difference between 10 and 20 is relatively larger than the difference between 10000 and 10010. How can we take that into account? Divide with the expected value and formulate a test statistic: 26 25 2 2 obs 25 4.64 28 25 2 25 16 25 2 25 30 25 2 25 If the null hypothesis is true, ought to be close to zero. Is 4.64 so far away from zero that we can reject the null hypothesis? 2 obs We compare the obtained p-value with our chosen level of significance. Observed p-value: 0,20 Conclusion? Distribution under the null hypothesis. (To get 4.64 or more is not unusual. We can not reject the null hypothesis.)