Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical analysis - cause, probability, and effect I) What counts as a difference - this is a statistical philosophical question of two parts A) How shall we compare measurements - statistical Specific hypothesis HA: methodology Number of mussel settlers is B) What will count as a significant difference greater in areas inside adult philosophical question and as such subject to distribution than outside it convention II) Statistical Methodology - General A) Null hypotheses - Ho 1) In most sciences, we are faced with: Conception - Inductive reasoning a) NOT whether something is true or false Perceived Problem (Popperian decision) Belief Previous Observations b) BUT rather the degree to which an effect exists (if at all) - a statistical decision. INSIGHT 2) Therefore 2 competing statistical hypotheses are Existing Theory posed: General hypothesis a) HA: there is a difference in effect between Specific hypotheses (usually posed as < or >) Confirmation Falsification (and predictions) b) HO: there is no difference in effect between HA rejected HO supported (accepted) HA supported (accepted) HO rejected Comparison with new observations Assessment - Deductive reasoning Statistical analysis - cause, probability, and effect Statistical tests: a set of rules whereby a decision about hypotheses is reached (either do or don’t reject) 1) Associated with rules - some indication of the accuracy of the decisions -- that measure is a probability statement -- p-value 2) Statistical hypotheses: a) Critical p-values indicate what counts as an acceptable level of uncertainty – set by convention, usually 0.05 b) Example: if critical p-value = 0.05, this means that we are unwilling to reject the Null hypothesis (i.e. no effect) unless: 1) we are 95% sure that HA is correct, or equivalently that 2) we are willing to accept an error rate of 5% or less that we are wrong when we do not reject HA 1 We sample mussel settlement and produce two frequency distributions: outside Frequency of Observations Are these different?? X2 X1 1 inside 7 14 Number of settlers Statistical analysis - cause, probability, and effect The logic of statistical tests - how they are performed 1) Assume the null hypothesis is true (i.e., if we don’t detect a difference, then null Ho of no difference between distributions – inside vs. outside -- is true) 2) Compare measurements generally this means calculating the mean difference between two sample distributions (e.g. the numbers from the experiment or set of observations), summarized as a test statistic (e.g., t or F). 3) Determine the probability that the difference between our two distributions is due to chance alone, rather than the hypothesized effect (translate test statistic to calculated p-value) 4) Compare calculated p-value with a critical p-value to assign significance (i.e., whether accept or reject null hypothesis) 2 Are they different??? Xoutside How do you compare distributions? Xinside a) Generally assume (but test) that distributions are the same shape (usually normal – “bell” – “parametric”) b) Compare means, taking into account the confidence you have in your estimate of the means… By estimating the error (variation) associated with your estimate: Standard Error of the Mean (SEM) = variation / sample size Standard Deviation (sd) / n where, sd = (Xi – X)2 n-1 Then we calculate a standardized difference between the means of the two distributions (again, taking into account their error) Xinside – Xoutside 2 S2 / n S, is the variance, = (sd)2 And ask… “what is the likelihood that the calculated difference (given our error) is due to random chance (or, alternatively, reflects a real difference between the two sample distributions)?” Probability that estimated difference is not due to random chance (i.e. no real difference, H0 is true) is the calculated P-value, derived from a table or stats package If Pcalc > Pcrit (0.05), then do not reject H0 If Pcalc < Pcrit (0.05), then reject H0 3 Example HA= More mussel settlers inside adult distribution Ho= No difference in mussel settlement inside vs outside adult distribution Calculated p-value (α) Critical p-value Decision 0.90 0.05 Do not reject Ho 0.50 0.05 Do not reject Ho Do not reject Ho 0.051 0.05 Reject Ho, not HA 0.049 0.05 But we still don’t know the probability of concluding that mussel settlement is greater inside the adult distribution when in fact it truly is greater inside distribution Types of statistical error – Type I and II Type I and Type II error: 1) By convention, the hypothesis tested is the null hypothesis (no difference between sample distributions) a) In statistics, assumption is made that that hypothesis is true (assume HO true = assume HA false) b) accepting HO (saying it is likely to be true) is the same as rejecting HA (falsification) c) Scientific method is to falsify competing alternative hypotheses (alternative HA’s) 2) Errors in decision making Truth HO true HO false Decision Accept HO Reject HO no error (1-α) Type I error (α) Type II error ( β ) no error (1- β) Type I error - probability α that we mistakenly reject a true null hypothesis (HO) Type II error - probability β that we mistakenly fail to reject (accept) a false null hypothesis Power of Test - probability (1-β) of not committing a Type II error - The more powerful the test the more likely you are to correctly conclude that an effect exists when it really does (reject HO when HO false = accept HA when HA true). 4 Simply Type I error = α = Reject Ho when Ho true Accept HA when HA false Type II error = β = Accept Ho when Ho false Reject HA when HA true Power of Test = 1-β = Reject Ho when Ho false Accept HA when HA true Control of Error A) Type I error can be directly controlled. The acceptable level is set by the experimenter = critical p-value (often 0.05) B) Type II error is indirectly controlled by the design of the experiment (more about this later) 1. Level of critical P-value (usually 0.05) 2. Magnitude of difference (“effect”) between means 3. Amount of natural variation in data 4. Sample size (level of replication) 5 Statistical Power, effect size, replication and alpha Statistical comparison of two distributions X1 X2 Calculation of statistical distributions Simulated distribution of mussel settlers - inside adult distribution VALUES 50 45 40 35 30 25 20 15 10 5 0 Samples = 100 cells (10 x 10) Mean per sample = 25 Total settlers = 2500 6 Evaluate effect of sample size on calculation of Mean Compare for sample size's of 5, 10, 20, 50, 99 cells Iterate each sample size 50 times Example: sample 10 sites to determine mean 36 23 36 22 24 19 27 28 25 28 41 … x fifty iterations 12 25 33 23 21 17 16 40 31 X1= 21.5 X2= 25.8 True (simulated) Mean = 25 36 22 19 27 41 12 25 33 23 31 X=21.5 36 23 24 28 25 28 21 17 16 40 X=25.8 0 20 40 60 7 Effect of number of observations on estimate of Mean Frequency distributions of sample means 16 16 0.3 Count Count 12 0.2 8 0.1 4 8 0 15 20 25 30 Proportion per Bar 12 0.0 35 Estimate of Mean 4 0 15 20 25 30 Estimate of Mean 35 Mean based on X observations 5 10 20 50 99 Notice how estimated mean approaches true mean with larger sample size! (“Central Limit Theorem”) Notice how variation around estimated mean increases with lower sample size! Statistical Power, effect size, replication and alpha Statistical comparison of two distributions y1 y2 8 Statistical Power, effect size, replication and alpha Area under the curve = 1.00 16 0.3 Count 0.2 8 0.1 4 0 15 20 25 Proportion per Bar 12 0.0 35 30 Estimate of Mean Ho true: Distributions of means are truly the same y1 y2 20 0.2 Count 0.3 10 0.1 Proportion per Bar 30 t distribution 25 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 t= sp 1 n1 + 1 n2 0.2 15 10 0.1 Proportion per Bar 0 9.5 Count 20 y1 − y2 5 25 0 -4 0.2 15 10 0.1 Proportion per Bar Count 20 -3 -2 -1 0 1 t - value 2 3 0.0 4 5 0 9.5 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 9 Ho true: Distributions of means are truly the same 1) Estimate of y1 = estimate of y2 y1 y2 20 0.2 Count 0.3 10 0.1 Proportion per Bar 30 t distribution 25 t= y1 − y2 sp 1 n1 + 1 n2 0.2 15 10 0.1 Proportion per Bar 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 Count 20 0 9.5 5 25 0 -4 0.2 15 10 0.1 -3 -2 Proportion per Bar Count 20 -1 0 1 t - value 2 3 0.0 4 5 0 9.5 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 Ho true: Distributions of means are truly the same 2) Estimate of y1 = estimate of y2 y1 y2 20 0.2 Count 0.3 10 0.1 Proportion per Bar 30 t distribution 25 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 t= sp 1 n1 + 1 n2 0.2 15 10 0.1 Proportion per Bar 0 9.5 Count 20 y1 − y2 5 25 0 -4 0.2 15 10 0.1 Proportion per Bar Count 20 -3 -2 -1 0 1 t - value 2 3 0.0 4 5 0 9.5 9.7 9.9 10.1 10.3 Mean Value from Sample 0.0 10.5 10 Ho true: Distributions of means are truly the same y1 y2 t= Decision Truth HO true Accept HO no error (1-alpha) HO false Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) y1 − y2 sp 1 n1 + 1 n2 0 t distribution (df) Ho true: Distributions of means are truly the same assign critical p (assume = 0.05) t t= y1 − y2 sp 1 n1 + 1 n2 0 t distribution (df) Decision Truth HO true Accept HO no error (1-alpha) HO false Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) tc If distributions are truly the same then: area to right of critical t C represents the Type 1 error rate (Blue); any calculated t > t C will cause incorrect conclusion that distributions are different 11 Ho true: Distributions of means are truly the same assign critical p (assume = 0.05) t t= 0 t distribution (df) y1 − y2 sp 1 n1 + Decision 1 n2 Truth HO true Accept HO no error (1-alpha) HO false Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) tc If distributions are truly the same then: area to right of critical t C represents the Type 1 error rate (Blue); any calculated t > t C will cause incorrect conclusion that distributions are different Ho false: Distributions of means are truly different 0.5 40 0.4 30 0.3 20 0.2 10 0.1 t= 0.5 40 0.4 30 0.3 20 0.2 10 0.1 10.0 10.5 Mean value from sample Proportion per Bar 50 0 9.5 y1 − y2 0.0 11.0 sp 1 n1 + 1 n2 40 0.4 30 0.3 20 0.2 10 0.1 0 -5 0 5 Proportion per Bar 10.0 10.5 Mean value from sample Count 50 0 9.5 Count y1 Proportion per Bar Count y2 0.0 10 t - value 0.0 11.0 12 Ho false: Distributions of means are truly different 0.5 40 0.4 30 0.3 20 0.2 10 0.1 t= 0.5 40 0.4 30 0.3 20 0.2 10 0.1 10.0 10.5 Mean value from sample sp 1 n1 + 1 n2 Proportion per Bar 50 0 9.5 y1 − y2 0.0 11.0 40 0.4 30 0.3 20 0.2 10 0.1 0 -5 0 5 Proportion per Bar 10.0 10.5 Mean value from sample Count 50 0 9.5 Count y1 Proportion per Bar Count y2 0.0 10 t - value 0.0 11.0 Ho false: Distributions of means are truly different 0.5 40 0.4 30 0.3 20 0.2 10 0.1 t= 0.5 40 0.4 30 0.3 20 0.2 10 0.1 10.0 10.5 Mean value from sample Proportion per Bar 50 0 9.5 y1 − y2 0.0 11.0 sp 1 n1 + 1 n2 40 0.4 30 0.3 20 0.2 10 0.1 0 -5 0 5 Proportion per Bar 10.0 10.5 Mean value from sample Count 50 0 9.5 Count y1 Proportion per Bar Count y2 0.0 10 t - value 0.0 11.0 13 Ho false: Distributions of means are truly different y2 y1 t= y1 − y2 sp 1 n1 + 1 n2 Non-central t distribution Central t distribution Assumes similarity of distributions 0 t t distribution (df) Ho false: Distributions of means are truly different assign critical p (assume = 0.05) Decision Truth HO true HO false Central t distribution t= y1 − y2 sp 1 n1 + 1 n2 0 tc t distribution (df) Accept HO no error (1-alpha) Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) t If distributions are truly different then: area to left of critical t C represents the Type II error rate (Red); any Calculated t < t C will cause incorrect conclusion that distributions are same 14 How to control Type II error (distributions are truly different) This will maximize statistical power to detect real impacts 1) Vary critical P-Values 2) Vary Magnitude of Effect 3) Vary replication Decision Truth HO true HO false Accept HO no error (1-alpha) Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) Critical p t 0 Central t distribution tc Type I error Type II error How to control Type II error (distributions are truly different) 1) Vary critical P-Values (change blue area) Critical p t 0 Decision Truth HO true Reference HO false Accept HO no error (1-alpha) Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) tc Type II error Type I error Critical p Type II error increases Power decreases A) Make critical P more stringent (smaller) tc Critical p A) Relax critical P (larger Type II error decrease Power increases values) tc 15 How to control Type II error (distributions are truly different) 2) Vary magnitude of effect (vary distance between y1 and y2, which affects non-central t distribution Critical p t 0 Decision Truth HO true Reference t= HO false y1 − y2 sp 1 n1 1 n2 + Accept HO no error (1-alpha) Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) tc Type I error Type II error 0 t Type II error increases Power decreases A) Make distance smaller tc t 0 Type II error decreases Power increases A) Make distance greater tc How to control Type II error (distributions are truly different) 3b) Vary replication (which controls estimates of error) - Note Type 1 error is constant and tc is allowed to vary Critical p t 0 Decision Truth HO true Reference t= HO false y1 − y2 sp 1 n1 + 1 n2 Accept HO no error (1-alpha) Type II error (beta) Reject HO Type I error (alpha) no error (1-beta) tc Type II error Type I error t 0 Type II error increases Power decreases A) Decrease replication tc t 0 Type II error decrease Power increases A) Increase replication tc 16 Relationships between power and changes in alpha, replication and magnitude of effect High Power Low Low High Replication High High Power Power Low Small Low Low Large High Critical alpha Magnitude of Effect How to estimate optimal sample size 1) Do a preliminary study of variables that will be evaluated in project Mean Standard Deviation 30 2) Plot the mean and some estimate of variability of data as a function of sample size 20 Value 3) Look for sample size where estimates of mean and variance (or standard deviation) converge on a stable value 25 15 10 5 4) Calculate a “bang for buck” relationship to determine if a robust design (sufficient sampling) can be paid for 0 0 10 20 30 40 50 60 70 80 90 100 SAMPLES 17