* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Research Project Planning: Sample Size and Statistical Power
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					RESEARCH PROJECT PLANNING SAMPLE SIZE AND STATISTICAL POWER; STATISTICS OVERVIEW Catherine R. Messina PhD Research Associate Professor Department of Family, Population & Preventive Medicine September 28, 2016 CONSIDER …….. Dr. X compares a new method for treating diaper rash to the usual care for this condition . Dr. X randomly assigns 5 infants to the new method group and 5 infants to the usual care group (10 infants total). Study findings favor the new method as most effective compared to usual care. The p value for this comparison is 0.08. How do you interpret this situation? CONSIDER …….. When comparing groups and the value for p is > 0.05 …….. There may truly BE NO effect ….. There may truly BE an effect in the population but your statistical test which is based on your sample, suggests no significant effect … ????? STATISTICAL POWER  The ability to detect a significant difference of a specific magnitude (i.e. effect size) between groups, if it actually exists   Minimum acceptable statistical power for a proposed study set at 80% - that is at minimum, we tolerate an 80% chance that a difference that really exists will show up as a statistically significant finding What influences statistical power ……… STATISTICAL POWER  Statistical power is directly related to sample size, effect size, and alpha ……… Power increases as effect size increases, for a given sample size  Power increases as sample size increases, for a given effect size  Power increases as alpha increases (typically set at p < 0.05)   Power in inversely related to variability   Power decreases as variability increases What is alpha? Threshold at which statistical significance is reached – that is, the risk of concluding that there is a difference when one does not exist cannot exceed 5%. POWER VS. EFFECT SIZE WHEN SAMPLE SIZE IS FIXED 1 .80 Power 0 Effect size POWER VS. SAMPLE SIZE WHEN EFFECT SIZE IS FIXED 1 .80 Power 0 Sample size POWER VS. ALPHA 1 .80 Power 0 0.05 Alpha 1 POWER VS. VARIABILITY WHEN EFFECT SIZE IS FIXED 1 .80 Power 0 Variability ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER Critical aspect of research planning  The size of the sample can influence your ability to detect meaningful differences between study groups    Underpowered study makes it hard to detect real differences Not always a good idea to base your sample size on prior lit  Many published studies actually have very low statistical power  If power of a published study is 50%, then they had only a 50% probability of finding an effect if it really existed  If you use the same sample size, then you may only have a 50% chance of replicating that effect ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER  Critical aspect of research planning – con’t  Underpowered study makes it hard to interpret differences that appear to be real: the lower the power of a study, the lower the probability that an observed effect that reaches statistical significance(e.g., p < 0.05) actually reflects a true effect. (Ioannidis, JP (2005); Ioannidis, JP, Tarone, R, McLaughlin, JK (2011)) Stop collecting data here? ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER  Critical aspect of research planning – con’t  Not only tells you how many participants you need – tell you how many you don’t need Saves resources  Ethical considerations   Over-powered study can increase risk for detecting meaningless differences ESTIMATING PARAMETERS NEEDED FOR POWER CALCULATIONS  Sample size calculations require: estimation of power (not less than 80%),  alpha level (typically p < 0.05)  estimate effect size (this should be the smallest difference that is clinically significant)  estimate population variability (e.g., standard deviations) – as sample size increases, variability decreases  Estimate from pilot data  Estimate from prior studies using the same outcome measure – if more than one study, you can use the average SD  Should be close to true values but don’t need to be perfect  Qualified quess-timate: preliminary results / pilot data  Other studies / published literature  Smallest clinically relevant effect ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS  Critical aspect of research planning  The size of the sample can influence the representativeness of the study sample   “Representativeness” – how well does the sample represent the population NOTE: having enough people in your sample does not necessarily guarantee representativeness if sample selection was biased in some way ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS Provided sample selection is unbiased:  In general, the larger the sample, the greater the likelihood that study findings will accurately reflect the population because larger samples have lower sampling error Sampling error = differences between the sample and the population that are due solely to the particular sample that happens to have been selected  As sample size increases, sampling error decreases.  ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS  Size of representative sample based on level of precision and confidence regarding your estimates  E.g., 95% confidence level (alpha = 0.05) and high precision (narrow confidence interval) requires greater sample size than 95% confidence level and low precision (wide confidence interval)  E.g., 95% confidence level (alpha = 0.05) and high precision (narrow confidence interval) requires greater sample size than 90% confidence level (alpha = 0.10) and high precision (narrow confidence interval) ESTIMATING THE SAMPLE SIZE Considerations!  Sample size too small May not yield precise, reliable findings  Clinically significant findings may be missed   Sample size too big Clinically insignificant findings may emerge as statistically significant due solely to the sample size  Waste of resources  Unethical  The sample size you need vs. what is at hand (or your timeline) – avoid spending time / resources on project that may yield very little  Planning ahead for subgroup analyses  STATISTICAL ANALYSIS PLAN INFORMED BY YOUR RESEARCH QUESTION!!!  Research question is a general statement of purpose identifies the focus of study  Are you describing a set of characteristics?  Are you evaluating degree of correlation between 2 measures?  Are you comparing measure(s) between 2 or more groups?   Goes hand in hand with operational (i.e., measurable) definitions of variables of interest and choice of study measures E.g., if plan to compare means or compare proportions – need to obtain appropriate data  E.g., cross sectional study or repeated measures design – each requires a different statistical approach  STATISTICS – MAJOR TYPES Descriptive vs. inferential statistics  Descriptive statistics –  Describe or summarize data and describe patterns of variability  Provide an overview of the attributes of a data set  Include:  summary statistics (e.g., group size, proportions, ratios, rates)  measures of central tendency (e.g. mean, mode, median)  measures of dispersion (e.g., range, variance, standard deviation)  QUESTIONS ANSWERED BY DESCRIPTIVE STATISTICS    What is the mean age of children in the study sample What is the age distribution of children who were vaccinated for flu in the past 5 years What percentage of children were screened for second hand smoke exposure STATISTICS – MAJOR TYPES  Inferential statistics – A set of procedures for generalizing (or inferring) to a population of individuals based on information obtained from a limited number of individuals drawn from that population (i.e., the sample)  Provide a measure of how well your data support your hypothesis  APPLYING INFERENTIAL STATISTICS     Select test of significance (method of inference used to support or reject claims based on sample data – think of this as your statistical test of choice) Decide whether significance test will be onetailed or two-tailed Select alpha, the probability that the sample effect really exists in the population and is not due to chance (usually set as  < 0.05) Compute test of significance (the actual p value) RELATEDNESS VS. DIFFERENCES Does your research question focus on associations (or relationships) among measures or does it focus on differences between measures or groups??? DESCRIBE RELATIONSHIPS BETWEEN VARIABLES Correlation Are procedure time and patient age correlated?  DESCRIBE RELATIONSHIPS BETWEEN VARIABLES Correlation  Determines whether and to what degree a relationship exits between variables Quantifies the direction of the relationship (direct or indirect)  Quantifies the strength of relationship expressed as a coefficient which ranges from –1 to + 1  1 = perfect correlation; 0 = no correlation  Pearson correlation coefficient (Pearson r) – a measure of correlation used for interval scale data and assumes that the relationship between variables is linear  Spearman Rho – a measure of correlation used for ordinal data  CORRELATION DOES NOT IMPLY CAUSE / EFFECT  Does not imply agreement – other measures such as Kappa are better  DESCRIBING RELATIONSHIPS BETWEEN VARIABLES Simple and multiple linear regression Linear regression estimates or predicts values of a dependent variable for any value of one or more independent variables  Dependent variable (DV; outcome) is continuous  Does patient age predict procedure time? Does procedure time increase at a constant rate for each addition year of patient age?  Used for interval scale data  Assumes that the relationship between DV and IV is linear (i.e., if means of dependent and independent variables plotted against each other – would fall on straight line).  Cannot imply causation  “Simple” (univariate) linear regression model – only one independent variable (IV) as predictor of DV  Multivariate linear regression model – more than one IV as predictors of DV  TYPES OF REGRESSION MODELS  Logistic Regression      Dependent variable (outcome) is categorical / usually dichotomous (e.g., above median vs. below median) Provides odds ratios (also 95% confidence intervals and p values) What is the probability that procedure time will be above the median (rather than below) when patients are older compared to younger? OR = 1.5 Older patients 1.5 times more likely to evidence procedure times above the median than younger patients Simple or multiple logistic regression modeling Does not assume a linear relationship between DV and IV DESCRIBING RELATIONSHIPS BETWEEN VARIABLES Correlation Nominal data  Chi square test of independence Categorical data arranged in 2 x 2 , 2 x 3, etc contingency tables  Data in the cells are frequency counts Example: is patient gender associated with CRC screening use  Flu vaccine NO Flu vaccine YES Female 10 (28%) 25 (71%) Male 18 (53%) 16 (47%) X 2 = 4.25, p = 0.04 COMPARISONS BETWEEN GROUPS  T-tests – compare means for 2 groups Appropriate for interval data  Independent samples t-test  To compare 2 groups that are mutually exclusive  E.g., Do mean values for HbA1C differ between intervention vs. control groups?   Paired samples t-test  To compare pretest vs. posttest (repeated) measures for the same individual  E.g., Do mean HbA1c values at baseline differ from those post intervention? COMPARISONS BETWEEN GROUPS  Analysis of Variance ANOVA – compare means for more than 2 groups Appropriate for interval data  Avoids the need to compute multiple t-tests to compare groups  Do mean values for HbA1C vary significantly by age group (children < 6 yrs, children 6 -12 years, and children > 12 years)?  Evaluates all of the mean differences in a single hypothesis test using a single alpha level  This means that you may conclude that a difference exists but will not tell you where that difference is ……..  PLANNED VS. POST HOC COMPARISONS  Planned comparisons Multiple comparisons of means that is decided upon before NOT AFTER - the study is conducted and is hypothesis driven  E.g., We expect that mean values for HbA1C will vary significantly by age group and that children < 6 yrs will have significantly lower values than children > 12 years but not children 6 -12 years.   Post hoc comparisons Multiple comparisons of means decided while study is conducted / during analysis stage –not driven by original hypothesis  Can lead to spurious findings  Only conducted if ANOVA test indicates significant difference  P-value corrected for multiple comparisons- i.e., Bonferroni adjustment  PARAMETRIC VS. NON-PARAMETRIC TESTS  Parametric tests: t-test, ANOVA Data measured in interval scale  Underlying assumptions about the shape of the distribution of population data (normal), selection of participants (independent), etc.   Non-parametric tests: chi square tests; Mann-Whitney U (two independent samples), Wilcoxon test (paired samples); Kruskal Wallis (3 or more samples) Data can be nominal or ordinal (chi-square)  Interval data if parametric assumptions violated (assumption free) or nature of distribution of population data not known  MULTIPLE VS. BIVARIATE STATISTICAL METHODS  Bivariate methods examine the effect of one variable at a time, on an outcome  Cannot control for potential effects of other measures which may be associated with an outcome Age CRC screening with colonoscopy Health insurance Your intervention Gender MULTIVARIABLE VS. BIVARIATE STATISTICAL METHODS  Multivariable methods examine the simultaneous effect of multiple variables on an outcome variable  Statistically controls for / adjusts for effects of other measures which may be associated with an outcome Independent contribution of your intervention Gender on outcome controlling for gender, age, and Your CRC screening with health insurance intervention colonoscopy Health insurance Age “SOMEWHERE, SOMETHING INCREDIBLE IS WAITING TO BE KNOWN” CARL SAGAN PHD (AMERICAN ASTRONOMER, WRITER AND SCIENTIST, 1934-1996) CONTACT INFORMATION      Catherine R. Messina PhD Department of Family, Population & Preventive Medicine HSC-L3, Rm 086 4-8266 catherine.messina@stonybrookmedicine.edu
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            