Download Statistics 101: Power, p-values and - Evidence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics 101: Power, p-values and
………... publications.
Dr. Gordon S Doig,
Senior Lecturer in Intensive Care,
Northern Clinical School
gdoig@med.usyd.edu.au
www.EvidenceBased.net/talks
University of Sydney
Analysis 101: The basic tests
•
•
•
•
•
•
•
•
•
•
t-test
paired t-test
Wilcoxon Rank Sum test (Mann-Whitney U test)
Wilcoxon Signed Rank Sum test
Kolmogorov-Smirnov (one and two sample test)
Chi-square test
Fisher’s Exact test
ANOVA
Kruskal-Wallis rank test
repeated measures ANOVA
Why do we need statistics???
When we conduct any type of research, we can make at
least two major types of errors when we draw our
conclusions:
I)
II)
Why do we need statistics???
When we conduct any type of research, we can make at
least two major types of errors when we draw our
conclusions:
I) we claim to have found an important treatment
effect when in reality there is no treatment effect.
II) we claim that no treatment effect exists when in
reality there is an important treatment effect.
Why do we need statistics???
Some important definitions:
What is a p-value?
What is power?
Why do we need statistics???
Some important definitions:
What is a p-value?
P-value: The probability that the difference we
observed could be due to chance alone.
What is power?
Why do we need statistics???
Some important definitions:
What is a p-value?
P-value: The probability that the difference we
observed could be due to chance alone.
What is power?
Power: The probability that if there is a real
difference, our experiment will find it.
Why do we need statistics???
When we conduct any type of research, we can make at
least two major types of errors when we draw our
conclusions:
I) we can claim to have found an important treatment
effect when in reality there is no treatment effect.
P-value: The probability that the difference we
observed could be due to chance alone.
II) we can claim that no treatment effect exists
when in reality there is an important treatment
effect
Power: The probability that if there is a real
difference, our experiment will find it.
Sample size calculations: The use of Power
Every experiment should start with a sample size calculation.
• Having adequate power protects us from Type II errors.
Sample size calculations: The use of Power
Every experiment should start with a sample size calculation.
• Having adequate power protects us from Type II errors.
• Forces us to consider a primary outcome for our
experiment.
• primary outcomes can be continuous, categorical
(interval, ordered, unordered), dichotomous
Sample size calculations: The use of Power
Every experiment should start with a sample size calculation.
• Having adequate power protects us from Type II errors.
• Forces us to consider a primary outcome for our
experiment.
• primary outcomes can be continuous, categorical
(interval, ordered, unordered), dichotomous
• Should consider issues of design in order to simplify
analysis.
Analysis 101: The use of P???
Selection of appropriate study design / analytic technique:
• protects from Type I errors.
• is driven by driven by a combination of study outcome and
study design.
Analysis 101: Basics of experimental design
1) Before and after trial
• physiological parameter/outcome measured
• intervention delivered
• physiological parameter/outcome measured again
• compare measurement before with measurement after,
usually in same subject
Analysis 101: Basics of experimental design
1) Before and after trial
• physiological parameter/outcome measured
• intervention delivered
• physiological parameter/outcome measured again
• compare measurement before with measurement after,
usually in same subject
2) Comparison between two groups
• subjects are randomly assigned to one of two groups
• one group receives intervention
• compare outcome between two groups after intervention
Analysis 101: Basics of experimental design
1) Before and after trial
• physiological parameter/outcome measured
• intervention delivered
• physiological parameter/outcome measured again
• compare measurement before with measurement after,
usually in same subject
2) Comparison between two groups
• subjects are randomly assigned to one of two groups
• one group receives intervention
• compare outcome between two groups after intervention
3) Comparison between more than two groups
• as above but subjects are assigned to more than two groups
• could compare 3 different drugs or 3 different doses
Analysis 101: Outcome identification
Primary outcomes can be 1) continuous, 2) categorical (interval,
ordered, unordered), 3) dichotomous
1) Continuous outcomes:
• most physiological parameters (Hb, pressures, biochemistry)
• usually involves a direct measurement
• often Normally distributed
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
• length of stay, age, time to event, some scoring systems
• may be Normally distributed
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
• length of stay, age, time to event, some scoring systems
• may be Normally distributed
b) ordered
• unequal unit change between each ordered category
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
• length of stay, age, time to event, some scoring systems
• may be Normally distributed
b) ordered
• unequal unit change between each ordered category
• most scoring systems, tumor stage or grade, low-mediumhigh
• not usually Normally distributed
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
• length of stay, age, time to event, some scoring systems
• may be Normally distributed
b) ordered
• unequal unit change between each ordered category
• most scoring systems, tumor stage or grade, low-mediumhigh
• not usually Normally distributed
c) unordered
• no sequential order to categories
Analysis 101: Outcome identification
2) Categorical outcomes:
a) interval
• equal unit change between each ordered category
• length of stay, age, time to event, some scoring systems
• may be Normally distributed
b) ordered
• unequal unit change between each ordered category
• most scoring systems, tumor stage or grade, low-mediumhigh
• not usually Normally distributed
c) unordered
• no sequential order to categories
• type of tumor, location, diagnosis
• re-think outcome selection!!!!
Analysis 101: Outcome identification
3) Dichotomous outcomes:
• only two possible outcome states
• tumor / no tumor
• dead / alive
• follows Binomial distribution
Analysis 101: The basic tests
•
•
•
•
•
•
•
•
•
•
t-test
paired t-test
Wilcoxon Rank Sum test (Mann-Whitney U test)
Wilcoxon Signed Rank Sum test
Kolmogorov-Smirnov (one and two sample test)
Chi-square test
Fisher’s Exact test
ANOVA
Kruskal-Wallis rank test
repeated measures ANOVA
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
Step 1: Determine if outcome is Normally distributed
• plot histogram with density function line
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
Step 1: Determine if outcome is Normally distributed
• plot histogram with density function line
• could ‘formally’ test using Wilkes-Shapiro statistic
6
0
.
0
4
0
.
0
2
0
.
0
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
Step 1: Determine if outcome is Normally distributed
• plot histogram with density function line
• could ‘formally’ test using Wilkes-Shapiro statistic
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
paired t-test
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Wilcoxon Signed Rank Sum Test
Analysis 101: Design and Analysis
1) Before and after trial (same subjects, continuous and interval )
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
paired t-test
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Wilcoxon Signed Rank Sum Test
NB - if ordered categorical outcome, use one sample KolmogorovSmirnov test
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
Step 1: Determine if outcome is Normally distributed
• plot histogram (use all data) with density function line
• could ‘formally’ test using Wilkes-Shapiro statistic
0
.
0
4
0
.
0
3
0
.
0
2
0
.
0
1
0
.
0
0
0
1
0 2
0 3
0 4
0 5
0
a
p
a
c
h
e
2
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
Step 1: Determine if outcome is Normally distributed
• plot histogram (use all data) with density function line
• could ‘formally’ test using Wilkes-Shapiro statistic
0
.
0
0
5
0
.
0
4
0
.
0
0
4
0
.
0
3
0
.
0
0
3
0
.
0
2
0
.
0
0
2
0
.
0
1
0
.
0
0
1
0
.
0
0
0
1
0 2
0 3
0 4
0 5
0
a
p
a
c
h
e
2
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
0
.
0
0
5
0
.
0
4
0
.
0
0
4
0
.
0
3
0
.
0
0
3
0
.
0
2
0
.
0
0
2
0
.
0
1
0
.
0
0
0
0
.
0
0
1
1
0 2
0 3
0 4
0 5
0
a
p
a
c
h
e
2
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
0
.
0
0
5
0
.
0
4
0
.
0
0
4
0
.
0
3
0
.
0
0
3
0
.
0
2
0
.
0
0
2
0
.
0
1
0
.
0
0
1
0
.
0
0
0
1
0 2
0 3
0 4
0 5
0
a
p
a
c
h
e
2
t-test
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Wilcoxon Rank Sum test
Analysis 101: Design and Analysis
2) Comparison between two groups (continuous and interval)
0
.
0
0
5
0
.
0
4
0
.
0
0
4
0
.
0
3
0
.
0
0
3
0
.
0
2
0
.
0
0
2
0
.
0
1
0
.
0
0
1
0
.
0
0
0
1
0 2
0 3
0 4
0 5
0
a
p
a
c
h
e
2
t-test
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Wilcoxon Rank Sum test
NB - if ordered categorical outcome, use two sample KolmogorovSmirnov test
Analysis 101: Design and Analysis
3) Comparison between more than two groups
Analysis 101: Design and Analysis
3) Comparison between more than two groups
Step 1: Determine if outcome is Normally distributed
• plot histogram (use all data) with density function line
• could ‘formally’ test using Wilkes-Shapiro statistic
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
3) Comparison between more than two groups
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Analysis 101: Design and Analysis
3) Comparison between more than two groups
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
ANOVA
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Kruskal-Wallis rank test
Analysis 101: Design and Analysis
3) Comparison between more than two groups
0
.
0
0
5
6
0
.
0
0
.
0
0
4
4
0
.
0
0
.
0
0
3
0
.
0
0
2
2
0
.
0
0
.
0
0
1
0
0
.
0
0
7
01
6
01
5
01
4
01
3
01
2
1
a
n
i
h
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
ANOVA
Kruskal-Wallis rank test
NB - could transform (calculate the log or ln) each outcome value and
redo histogram…. if transformed values are Normally distributed, can
now use ‘more powerful’ ANOVA (or t-test if 2 samples).
Analysis 101: Dichotomous outcomes
1) Before and after trial
• rate before intervention compared to rate after intervention
• McNemer’s chi-square
Analysis 101: Dichotomous outcomes
1) Before and after trial
• rate before intervention compared to rate after intervention
• McNemer’s chi-square
2) Comparison between two groups
• create 2x2 table, calculate rate for each Group
Dead Alive
Group A
2
8
20% mortality
Group B
7
3
70% mortality
• compare using chi-square test
Analysis 101: Dichotomous outcomes
1) Before and after trial
• rate before intervention compared to rate after intervention
• McNemer’s chi-square
2) Comparison between two groups
• create 2x2 table, calculate rate for each Group
Dead Alive
Group A
2
8
20% mortality
Group B
7
3
70% mortality
• compare using chi-square test
NB - if any one cell contains < 5 counts, use Fisher’s Exact test
Analysis 101: Dichotomous outcomes
1) Before and after trial
• rate before intervention compared to rate after intervention
• McNemer’s chi-square
2) Comparison between two groups
• create 2x2 table, calculate rate for each Group
Dead Alive
Group A
2
8
20% mortality
Group B
7
3
70% mortality
• compare using chi-square test
NB - if any one cell contains < 5 counts, use Fisher’s Exact test
3) Comparison between more than two groups
• undertake a series of comparisons via 2x2 tables as above
Analysis 101: Special considerations
Transformations:
0
.
0
0
5
0
.
0
0
4
0
.
0
0
3
0
.
0
0
2
0
.
0
0
1
0
.
0
0
0
0
2
0
0
4
0
0 6
0
0
h
i
c
r
e
a
t
8
0
0
Sometimes its possible to ‘transform’ a long tailed distribution to a
normal distribution.
Calculate the log or ln of each outcome value and redo histogram.
Allows us to apply ‘more powerful’ tests based on assumption of
Normality (paired t-test, t-test, ANOVA).
Try non-parametric test first <- fewer assumptions!!!!
Analysis 101: Special considerations
The t-test has 3 basic, fundamental underlying assumptions:
1) Outcomes are Normally distributed
• test assumptions of Normality
• use non-parametric tests
Analysis 101: Special considerations
The t-test has 3 basic, fundamental underlying assumptions:
1) Outcomes are Normally distributed
• test assumptions of Normality
• use non-parametric tests
2) Outcomes are independent
• if outcomes are from same subjects, use paired t-test
Analysis 101: Special considerations
The t-test has 3 basic, fundamental underlying assumptions:
1) Outcomes are Normally distributed
• test assumptions of Normality
• use non-parametric tests
2) Outcomes are independent
• if outcomes are from same subjects, use paired t-test
3) The variance of each group is similar
• stats package should formally test equality of variances
• different p-values for each condition
Analysis 101: Summary
•
•
•
•
•
•
•
•
•
•
t-test (two groups, Normally distributed)
paired t-test (before/after, Normally distributed)
Wilcoxon Rank Sum test (two groups, non-parametric)
Wilcoxon Signed Rank Sum test (before/after, non-parametric)
Kolmogorov-Smirnov (before/after, two groups, ordered categorical)
Chi-square test (dichotomous outcome)
Fisher’s Exact test (dichotomous outcome, any cell size < 5)
ANOVA (more than two groups, Normally distributed)
Kruskal-Wallis rank test (more than two groups, non-parametric)
repeated measures ANOVA
www.EvidenceBased.net/talks
Related documents