Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Hypothesis Testing Chapter 8 Applying what we know: inferential statistics z-scores + probability distribution of sample means HYPOTHESIS TESTING! Some Familiar Concepts… Sampling error: There is always some diff. btwn. samples and populations, even when sample is untreated (control) M ≠μ just by chance. So… how can we tell if a difference we observe is due to: – chance (random sampling error or fluctuation) or – treatment effect or true group differences (differences do exist in the population) ? …and Some New Concepts H1: Alternate hypothesis – What we believe to be true – There is a change, difference, relationship But it’s easier to disprove than to prove, so… H0: Null hypothesis – No change, no difference, no relationship – Try to prove this is wrong! – Disproving H0 provides support for (but does not prove) H1. Decide ahead of time which sample statistics (means) are: – likely to be obtained if H0 is true – likely to be obtained if H0 not true (critical region!) THE HYPOTHESIZED (NULL) DISTRIBUTION What is this called? What is this called? What is this value called? Figure 8-3 (p. 236): The set of potential samples is divided into those that are likely to be obtained and those that are very unlikely if the null hypothesis is true. Sampling Distribution Z-scores in a new light z=M–μ σ M z = obtained M – hypothesized μ . Ratio of: obtained difference (distance) typical, expected, standard distance How far away from typical, or expected, is our sample? standard error between M and μ Hypotheses A hypothesis states an expected relationship between two or more variables. May be causal: one variable causes the other. A May be descriptive: one variable is simply related to the other. A B B Much of this chapter focuses on causal hypotheses, from experimental studies (treatment group and control group) Where do Hypotheses come from? Personal observations, opinions Existing research Theory Models – more specific and concrete than theories – usually describe specific relationships among constructs/variables Scientific Hypotheses Must Be Testable: Can a test be designed? Falsifiable: Could it potentially be incorrect? Room to be disproven? Precise: Is it clearly defined? Rational: Does it fit with existing facts? Parsimonious: Is it as simple as possible? Hypotheses cannot be proven! A single experiment cannot PROVE a hypothesis Hypotheses are only supported or not supported by scientific data. We add evidence toward confirmation or disconfirmation of a hypothesis A Hypothesis Test The null hypothesis: The alpha level: The sample data: The critical region: The conclusion: A Jury Trial A Hypothesis Test A Jury Trial The null hypothesis: We assume there is no treatment (tx) effect until there is enough evidence to show otherwise. Assume an individual is innocent until proven guilty. The alpha level: We are confident that the tx does have an effect because it is very unlikely that the data could occur simply by chance. Jury must be convinced beyond a reasonable doubt before finding defendant guilty. The sample data: The research study is conducted to gather data (evidence) to demonstrate that the treatment had an effect. Prosecutor presents evidence to demonstrate defendant guilty. The critical region: Either the sample data fall in the critical region (enough evidence to reject H0) or the data don’t fall into critical region (not enough evidence to reject H0). Either there is enough evidence to convince jury that defendant is guilty, or there is not. The conclusion: If the data aren’t in the critical region, the decision is to “fail to reject the null hypothesis.” We have not proven that the null is true; we simply have failed to reject it. If there is not enough evidence, the decision is “not guilty”. Directional vs. Nondirectional Tests (one-tailed) (two-tailed) . Nondirectional hypothesis/test – Critical region is split between both tails: on either side of the mean – Allows possibility that tx effect in either direction – More common, more conservative test Directional hypothesis/test – H1 specifies direction of the effect / difference – Critical region is only in one tail (either above or below mean) – Less conservative Error Type I: Ho true (treatment does not have an effect), but: – Hypothesis test detects a false treatment effect – Reject Ho even though it’s true – Think have support for H1 even though it’s not true Type II: Ho false (treatment does have an effect), but: – Hypothesis test failed to detect it – Retain Ho even though it’s false Type I and Type II Error ACTUAL SITUATION Decision No Effect / Ho true Effect Exists / Ho false Reject Ho Type I Error False positive (probability = ) True positive (effect exists = correct!) (decide effect does exist) Retain Ho (decide no effect exists) Ability to detect effect=POWER test too sensitive: p(reject false Ho) = 1- good sensitivity to detect effect True negative (no effect=correct!) good specificity, selectivity to catch a non-effect Type II Error False negative (probability = ) detect nonexistent effect test too specific: fail to detect true effect Power Probability that a test will correctly: – reject a false null hypothesis – detect a real treatment effect in other words: Sensitivity of a statistical test to detect an effect that does exist Group Activity! Make a graphical representation of these concepts: – – – – – Type I error (false positive) Type II error (false negative) True positive / negative Alpha, Beta, Power Sensitivity, specificity Some ideas: – Draw a concept map, decision tree, flow chart – Sketch all possibilities using the null distribution, the alternative distribution (see pp. 266-268) – Use sample data / a sample hypothesis (Ho and H1) – Use an analogy (like the trial by jury analogy) Beyond p and chance: Effect Sizes Limitations of hypothesis tests: – give ratio of obtained to expected difference – evaluate relative size of obtained difference (or tx effect) – Strongly influenced by sample size (big enough n small σM easy to reject Ho!) Effect sizes: – Give the absolute size of the obtained difference (or tx effect) – Scaled with std deviation, not std error – Thus, not influenced by sample size Cohen’s d Figure 8-15 (p. 268) The relationship between sample size and power. The top figure (a) shows a null distribution and a 20-point treatment distribution based on samples of n = 16 and a standard error of 10 points. Notice that the right-hand critical boundary is located in the middle of the treatment distribution so that roughly 50% of the treated samples fall in the critical region. In the bottom figure (b) the distributions are based on samples of n = 100 and the standard error is reduced to 4 points. In this case, essentially all of the treated samples fall in the critical region and the hypothesis test has power of nearly 100%. For Wednesday Finish reading Chapter 9 Finish HW Chapter 9 (turn in start of class)