Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chi-Square X2 Parking lot exercise • Graph the distribution of car values for each parking lot • Fill in the frequency and percentage tables The “null” hypothesis • • • • • • • • Inferential statistics use samples to make conclusions about a population Whenever we use inferential statistics the “null hypothesis” applies – Null hypothesis: Any apparent effect of the independent variable(s) on the dependent variable(s) was produced by chance – Unless you can show otherwise, THE NULL IS ALWAYS TRUE Researchers always want to REJECT the null hypothesis – Rejecting the null hypothesis is the same as confirming the working hypothesis The only way to reject the null is for the results of statistical tests (e.g., difference between the means) to be very, very substantial How substantial? The test statistic (e.g., r, t, b, X2, etc.) must be of such magnitude - so large - that it goes way beyond what one would expect because of sampling error How far is that? To reject the null the probability that it’s true must be LESS than 5 in 100 (p <.05) How do we know if it is? – If you’re doing the computation, compare the test statistic to a table – If you’re reading a study, is there an asterisk by the test statistic? Usually one asterisk (*) means the probability the null is true is less than 5/100. Two asterisks (**) are better (p <.01, probability the null is true is less than 1/100). Three (***) is great (p <.001, probability the null is true is less than 1/1000.) If there are NO asterisks, the null hypothesis is true 2) is used when all Chi-Square (X Chi square ( 2) variables are categorical (not ordinal) • Example: Does gender affect court disposition? • Used with moderate size random samples • Tests for relationship between two nominal variables (categorical, cannot be ordered) that have been cross tabulated • Evaluates difference between Observed and Expected cell frequencies: – “Observed” means the cell frequencies that are actually present – “Expected” means the cell frequencies we would “expect” if there was no relationship between the variables (null hypothesis is true) – If there is no difference, 2 is zero – Greater the difference, the larger the value of 2 Class exercise Hypothesis: Gender Disposition Observed cell frequencies Court disposition Gender Jail Released Total Male 84 16 100 Female 30 20 50 Total 114 36 n = 150 Creating the “Expected” table – cell frequencies if the null hypothesis is true Independent variable category total Grand total X Dependent variable category total Court disposition Gender Jail Released Total Male 100 Female 50 Total 114 36 n = 150 Male/Jail: 100/150 X 114 = 75.9 = 76 Male/Released: 100/150 X 36 = 23.9 = 24 Female/Jail: 50/150 X 114 = 37.9 = 38 Female/Released: 50/150 X 36 = 11.9 = 12 Expected frequencies Court disposition Gender Jail Released Total Male 76 24 100 Female 38 12 50 Total 114 36 n = 150 Obtaining X2 • • • • • (O - E)2 2 = ---------E O= observed frequency E= expected frequency (what we would get if the null hypothesis is true) 2 is the ratio of systematic variation to chance variation The higher the ratio – the greater the systematic than the chance variation – the more likely that we can reject the null Chi-square is not a good measure because its significance level is closely tied to sample size Over-estimate significance with very large samples, under-estimate with very small samples Expected frequencies Court disposition Observed frequencies Court disposition Gender Jail Released Total Gender Jail Released Total Male 84 16 100 Male 76 24 100 Female 30 20 50 Female 38 12 50 Total 114 36 n = 150 Total 114 36 n = 150 (O - E)2 (84-76)2 (16-24)2 (30-38)2 (20-12)2 2 = --------- = ----------- + ------------ + ------------ + ------------ = 10.5 E 76 24 38 12 2 = 10.5 df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1 Reject null hypothesis – there is less than one chance in a hundred that the relationship between gender and court disposition is due to chance (p = <.01) Class exercise Hypothesis: More building alarms Less crime • Hypothesis: Building alarms lead to less crime • Randomly sampled 120 businesses with alarms – 50 had crimes, 70 didn’t • Randomly sample 90 businesses without alarms – 50 had crimes, 40 didn’t • Build an observed table, then an expected table • Remember, they’re tables, so place the values of the independent variable in rows • Compute 2 (O - E)2 2 = ---------E Observed (obtained) frequencies Crime Y N Expected (by chance) frequencies Crime Total Alarm Y N Total Alarm Y 50 70 120 Y 57 63 120 N 50 40 90 N 43 47 90 Total 100 110 210 Total 100 110 210 (O - E)2 (50-57)2 (70-63)2 (50-43)2 (40-47)2 2 = --------- = ----------- + ------------ + ------------ + ------------ = 3.82 E 57 63 43 47 2 = 3.82 df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1 To reject at .05 level need 2 = 3.841 or greater Accept null hypothesis – NO significant relationship; what’s there is due to chance Demonstrating the meaning of “expected” Expected (by chance) frequencies Crime Y N Expected (by chance) frequencies Crime Total Alarm Y N Total Alarm Y 57 63 120 Y 47% 53% 120 N 43 47 90 N 48% 52% 90 Total 100 110 210 Total 100 110 210 Checking the expected frequencies table by converting it into percentages In a properly done expected table as you change the value of the independent variable, the distribution across the dependent variable shouldn’t change A properly done expected table will always show no relationship -- it’s the null hypothesis) Back to the parking lots… • Use the frequency (not percentage) table to create a “frequencies expected” table (meaning, expected if there is no relationship) • This table should artificially reflect no relationship between income and car value • Instructions on next slide… ROW MARGINALS TOTAL CASES COLUMN MARGINALS Computing expected frequencies: Row marginal Total cases X Column marginal Expected frequencies • Now compute the Chi-Square • Instructions on next slide Computing Chi-Square Minus Minus 1. Cell by corresponding cell, subtract EXPECTED from OBSERVED. 2. Square each difference. 3. Divide each result by the frequency EXPECTED. 4. Total them up. In scientific research the greatest risk we can take of being wrong is five in onehundred (.05 column). Our Chi-square, 8.66, is more than the minimum required of 7.815. So we can reject the NULL hypothesis and accept the WORKING hypothesis that higher income persons drive more expensive cars. Homework Homework exercise Hypothesis: Sergeants have more stress than patrol officers Job Stress Low High Total Sergeant 30 60 90 Patrol Officer 86 24 110 116 84 200 Position on police force Total Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165 1. Calculate expected cell frequencies (null hypothesis of no relationship is true) 2. Compute Chi-square 3. Use table in Appendix E to determine your chi-square’s probability level 4. Can we reject the null hypothesis? Homework answer Job Stress Low High Total Sergeant 30 60 90 Patrol Officer 86 24 110 116 84 200 Position on police force Total Observed Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165 Job Stress Low High Total Sergeant 52 38 90 Patrol Officer 64 46 110 116 84 200 Position on police force Expected Total Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165 (30-52)2 (60-38)2 (86-64)2 (24-46)2 2 = --------- + ---------- + --------- + --------- = 40.1 52 38 64 46 2 = 40.1 df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1 To reject at .05 level need 2 = 3.841 or greater Reject null hypothesis – Less than 1 chance in 1,000 that relationship is due to chance Practice for the final • You will test a hypothesis using two categorical variables and determine whether the independent variable has a statistically significant effect. • You will be asked to state the null hypothesis. • You will used supplied data to create an Observed frequencies table. You will use it to create an Expected frequencies table. You will be given a formula but should know the procedure. • You will compute the Chi-Square statistic and degrees of freedom. You will be given formulas but should know the procedures by heart. • You will use the Chi-Square table to determine whether the results support the working hypothesis. – Print and bring to class: http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf • Sample question: Hypothesis is that alarm systems prevent burglary. Random sample of 120 business with an alarm system and 90 without. Fifty businesses of each kind were burglarized. – Null hypothesis: No significant difference in crime between businesses with and without alarms Observed frequencies Expected frequencies Observed frequencies (50-57)2 --------- + 57 Expected frequencies (70-63)2 (50-43)2 (40-47)2 ---------- + ----------- + ----------- = 63 43 47 .86 + .78 + 1.14 + 1.04 = 3.82 – Chi-Square = 3.82 – Df = (r-1) X (c-1) = 1 – Check the table. Do the results support the working hypothesis? No - Chi-Square must be at least 3.84 to reject the null hypothesis of no relationship between alarm systems and crime, with only five chances in 100 that it is true