Download Statistical analysis - cause, probability, and effect Specific

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical analysis - cause, probability, and effect
I) What counts as a difference - this is a statistical
philosophical question of two parts
A) How shall we compare measurements - statistical
Specific hypothesis HA:
methodology
Number of mussel settlers is
B) What will count as a significant difference greater in areas inside adult
philosophical question and as such subject to
distribution than outside it
convention
II) Statistical Methodology - General
A) Null hypotheses - Ho
1) In most sciences, we are faced with:
Conception - Inductive reasoning
a) NOT whether something is true or false
Perceived Problem
(Popperian decision)
Belief
Previous Observations
b) BUT rather the degree to which an effect exists
(if at all) - a statistical decision.
INSIGHT
2) Therefore 2 competing statistical hypotheses are
Existing Theory
posed:
General hypothesis
a) HA: there is a difference in effect between
Specific hypotheses
(usually posed as < or >)
Confirmation
Falsification
(and predictions)
b) HO: there is no difference in effect between
HA rejected
HO supported (accepted)
HA supported (accepted)
HO rejected
Comparison with
new observations
Assessment - Deductive reasoning
Statistical analysis - cause, probability, and effect
Statistical tests: a set of rules whereby a decision about hypotheses
is reached (either do or don’t reject)
1)
Associated with rules - some indication of the accuracy of the decisions
-- that measure is a probability statement -- p-value
2)
Statistical hypotheses:
a)
Critical p-values indicate what counts as an
acceptable level of uncertainty – set by convention, usually 0.05
b)
Example: if critical p-value = 0.05, this means that we are unwilling to
reject the Null hypothesis (i.e. no effect) unless:
1) we
are 95% sure that HA is correct, or equivalently that
2) we
are willing to accept an error rate of 5% or less
that we are wrong when we do not reject HA
1
We sample mussel settlement
and produce two frequency distributions:
outside
Frequency of Observations
Are these different??
X2
X1
1
inside
7
14
Number of settlers
Statistical analysis - cause, probability, and effect
The logic of statistical tests - how they are performed
1) Assume the null hypothesis is true
(i.e., if we don’t detect a difference, then null Ho of no difference
between distributions – inside vs. outside -- is true)
2) Compare measurements
generally this means calculating the mean difference between
two sample distributions (e.g. the numbers from the experiment
or set of observations), summarized as a test statistic (e.g., t or F).
3) Determine the probability that the difference between our two
distributions is due to chance alone, rather than the hypothesized effect
(translate test statistic to calculated p-value)
4) Compare calculated p-value with a critical p-value to assign significance
(i.e., whether accept or reject null hypothesis)
2
Are they different???
Xoutside
How do you compare distributions?
Xinside
a) Generally assume (but test) that distributions are the same shape
(usually normal – “bell” – “parametric”)
b) Compare means, taking into account the confidence you have
in your estimate of the means…
By estimating the error (variation) associated with your estimate:
Standard Error of the Mean (SEM) = variation / sample size
Standard Deviation (sd) / n
where, sd =
(Xi – X)2
n-1
Then we calculate a standardized difference between the means
of the two distributions (again, taking into account their error)
Xinside – Xoutside
2 S2 / n
S, is the variance, = (sd)2
And ask…
“what is the likelihood that the calculated difference (given our error)
is due to random chance (or, alternatively, reflects a real difference
between the two sample distributions)?”
Probability that estimated difference is not due to random chance
(i.e. no real difference, H0 is true) is the calculated P-value,
derived from a table or stats package
If Pcalc > Pcrit (0.05), then do not reject H0
If Pcalc < Pcrit (0.05), then reject H0
3
Example
HA= More mussel settlers inside adult distribution
Ho= No difference in mussel settlement inside vs
outside adult distribution
Calculated p-value (α) Critical p-value Decision
0.90
0.05
Do not reject Ho
0.50
0.05
Do not reject Ho
Do not reject Ho
0.051
0.05
Reject Ho, not HA
0.049
0.05
But we still don’t know the probability of concluding that mussel
settlement is greater inside the adult distribution when in fact it
truly is greater inside distribution
Types of statistical error – Type I and II
Type I and Type II error:
1) By convention, the hypothesis tested is the null hypothesis (no difference
between sample distributions)
a) In statistics, assumption is made that that hypothesis is true (assume HO true =
assume HA false)
b) accepting HO (saying it is likely to be true) is the same as rejecting HA
(falsification)
c) Scientific method is to falsify competing alternative hypotheses (alternative
HA’s)
2) Errors in decision making
Truth
HO true
HO false
Decision
Accept HO
Reject HO
no error (1-α)
Type I error (α)
Type II error ( β )
no error (1- β)
Type I error - probability α that we mistakenly reject a true null hypothesis (HO)
Type II error - probability β that we mistakenly fail to reject (accept) a false null
hypothesis
Power of Test - probability (1-β) of not committing a Type II error - The more powerful
the test the more likely you are to correctly conclude that an effect exists when it really
does (reject HO when HO false = accept HA when HA true).
4
Simply
Type I error = α = Reject Ho when Ho true
Accept HA when HA false
Type II error = β = Accept Ho when Ho false
Reject HA when HA true
Power of Test = 1-β = Reject Ho when Ho false
Accept HA when HA true
Control of Error
A) Type I error can be directly controlled.
The acceptable level is set by the
experimenter = critical p-value (often 0.05)
B) Type II error is indirectly controlled by the
design of the experiment (more about this
later)
1. Level of critical P-value (usually 0.05)
2. Magnitude of difference (“effect”) between means
3. Amount of natural variation in data
4. Sample size (level of replication)
5
Statistical Power, effect size, replication and alpha
Statistical comparison of two distributions
X1
X2
Calculation of statistical distributions
Simulated distribution of mussel settlers - inside adult distribution
VALUES
50
45
40
35
30
25
20
15
10
5
0
Samples = 100 cells (10 x 10)
Mean per sample = 25
Total settlers = 2500
6
Evaluate effect of sample size on calculation of Mean
Compare for sample size's of 5, 10, 20, 50, 99 cells
Iterate each sample size 50 times
Example: sample 10 sites to determine mean
36
23
36
22
24
19
27
28
25
28
41
… x fifty iterations
12
25
33
23
21
17
16
40
31
X1= 21.5
X2= 25.8
True (simulated) Mean = 25
36
22
19
27
41
12
25
33
23
31
X=21.5
36
23
24
28
25
28
21
17
16
40
X=25.8
0
20
40
60
7
Effect of number of observations on estimate of Mean
Frequency distributions of sample means
16
16
0.3
Count
Count
12
0.2
8
0.1
4
8
0
15
20
25
30
Proportion per Bar
12
0.0
35
Estimate of Mean
4
0
15
20
25
30
Estimate of Mean
35
Mean based on
X observations
5
10
20
50
99
Notice how estimated mean approaches true mean with larger sample size!
(“Central Limit Theorem”)
Notice how variation around estimated mean increases with lower sample size!
Statistical Power, effect size, replication and alpha
Statistical comparison of two distributions
y1
y2
8
Statistical Power, effect size, replication and alpha
Area under the curve = 1.00
16
0.3
Count
0.2
8
0.1
4
0
15
20
25
Proportion per Bar
12
0.0
35
30
Estimate of Mean
Ho true: Distributions of means are truly the same
y1
y2
20
0.2
Count
0.3
10
0.1
Proportion per Bar
30
t distribution
25
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
t=
sp
1
n1
+
1
n2
0.2
15
10
0.1
Proportion per Bar
0
9.5
Count
20
y1 − y2
5
25
0
-4
0.2
15
10
0.1
Proportion per Bar
Count
20
-3
-2
-1 0
1
t - value
2
3
0.0
4
5
0
9.5
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
9
Ho true: Distributions of means are truly the same
1) Estimate of y1 = estimate of y2
y1
y2
20
0.2
Count
0.3
10
0.1
Proportion per Bar
30
t distribution
25
t=
y1 − y2
sp
1
n1
+
1
n2
0.2
15
10
0.1
Proportion per Bar
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
Count
20
0
9.5
5
25
0
-4
0.2
15
10
0.1
-3
-2
Proportion per Bar
Count
20
-1 0
1
t - value
2
3
0.0
4
5
0
9.5
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
Ho true: Distributions of means are truly the same
2) Estimate of y1 = estimate of y2
y1
y2
20
0.2
Count
0.3
10
0.1
Proportion per Bar
30
t distribution
25
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
t=
sp
1
n1
+
1
n2
0.2
15
10
0.1
Proportion per Bar
0
9.5
Count
20
y1 − y2
5
25
0
-4
0.2
15
10
0.1
Proportion per Bar
Count
20
-3
-2
-1 0
1
t - value
2
3
0.0
4
5
0
9.5
9.7
9.9
10.1 10.3
Mean Value from Sample
0.0
10.5
10
Ho true: Distributions of means are truly the same
y1
y2
t=
Decision
Truth
HO true
Accept HO
no error
(1-alpha)
HO false
Type II error
(beta)
Reject HO
Type I
error
(alpha)
no error
(1-beta)
y1 − y2
sp
1
n1
+
1
n2
0
t distribution (df)
Ho true: Distributions of means are truly the same
assign critical p (assume = 0.05)
t
t=
y1 − y2
sp
1
n1
+
1
n2
0
t distribution (df)
Decision
Truth
HO true
Accept HO
no error
(1-alpha)
HO false
Type II error
(beta)
Reject HO
Type I
error
(alpha)
no error
(1-beta)
tc
If distributions are truly the same then: area to right of
critical t C represents the Type 1 error rate (Blue); any
calculated t > t C will cause incorrect conclusion that
distributions are different
11
Ho true: Distributions of means are truly the same
assign critical p (assume = 0.05)
t
t=
0
t distribution (df)
y1 − y2
sp
1
n1
+
Decision
1
n2
Truth
HO true
Accept HO
no error
(1-alpha)
HO false
Type II error
(beta)
Reject HO
Type I
error
(alpha)
no error
(1-beta)
tc
If distributions are truly the same then: area to right of
critical t C represents the Type 1 error rate (Blue); any
calculated t > t C will cause incorrect conclusion that
distributions are different
Ho false: Distributions of means are truly different
0.5
40
0.4
30
0.3
20
0.2
10
0.1
t=
0.5
40
0.4
30
0.3
20
0.2
10
0.1
10.0
10.5
Mean value from sample
Proportion per Bar
50
0
9.5
y1 − y2
0.0
11.0
sp
1
n1
+
1
n2
40
0.4
30
0.3
20
0.2
10
0.1
0
-5
0
5
Proportion per Bar
10.0
10.5
Mean value from sample
Count
50
0
9.5
Count
y1
Proportion per Bar
Count
y2
0.0
10
t - value
0.0
11.0
12
Ho false: Distributions of means are truly different
0.5
40
0.4
30
0.3
20
0.2
10
0.1
t=
0.5
40
0.4
30
0.3
20
0.2
10
0.1
10.0
10.5
Mean value from sample
sp
1
n1
+
1
n2
Proportion per Bar
50
0
9.5
y1 − y2
0.0
11.0
40
0.4
30
0.3
20
0.2
10
0.1
0
-5
0
5
Proportion per Bar
10.0
10.5
Mean value from sample
Count
50
0
9.5
Count
y1
Proportion per Bar
Count
y2
0.0
10
t - value
0.0
11.0
Ho false: Distributions of means are truly different
0.5
40
0.4
30
0.3
20
0.2
10
0.1
t=
0.5
40
0.4
30
0.3
20
0.2
10
0.1
10.0
10.5
Mean value from sample
Proportion per Bar
50
0
9.5
y1 − y2
0.0
11.0
sp
1
n1
+
1
n2
40
0.4
30
0.3
20
0.2
10
0.1
0
-5
0
5
Proportion per Bar
10.0
10.5
Mean value from sample
Count
50
0
9.5
Count
y1
Proportion per Bar
Count
y2
0.0
10
t - value
0.0
11.0
13
Ho false: Distributions of means are truly different
y2
y1
t=
y1 − y2
sp
1
n1
+
1
n2
Non-central t distribution
Central t distribution
Assumes similarity of distributions
0
t
t distribution (df)
Ho false: Distributions of means are truly different
assign critical p (assume = 0.05)
Decision
Truth
HO true
HO false
Central t distribution
t=
y1 − y2
sp
1
n1
+
1
n2
0
tc
t distribution (df)
Accept HO
no error
(1-alpha)
Type II error
(beta)
Reject HO
Type I error
(alpha)
no error
(1-beta)
t
If distributions are truly different then: area to left of
critical t C represents the Type II error rate (Red); any
Calculated t < t C will cause incorrect conclusion that
distributions are same
14
How to control Type II error (distributions are truly different)
This will maximize statistical power to detect real impacts
1) Vary critical P-Values
2) Vary Magnitude of Effect
3) Vary replication
Decision
Truth
HO true
HO false
Accept HO
no error
(1-alpha)
Type II error
(beta)
Reject HO
Type I error
(alpha)
no error
(1-beta)
Critical p
t
0
Central t distribution
tc
Type I error
Type II error
How to control Type II error (distributions are truly different)
1) Vary critical P-Values (change blue area)
Critical p
t
0
Decision
Truth
HO true
Reference
HO false
Accept HO
no error
(1-alpha)
Type II error
(beta)
Reject HO
Type I error
(alpha)
no error
(1-beta)
tc
Type II error
Type I error
Critical p
Type II error increases
Power decreases
A) Make critical P more
stringent (smaller)
tc
Critical p
A) Relax critical P (larger
Type II error decrease
Power increases
values)
tc
15
How to control Type II error (distributions are truly different)
2) Vary magnitude of effect (vary distance between y1 and y2, which affects non-central t
distribution
Critical p
t
0
Decision
Truth
HO true
Reference
t=
HO false
y1 − y2
sp
1
n1
1
n2
+
Accept HO
no error
(1-alpha)
Type II error
(beta)
Reject HO
Type I error
(alpha)
no error
(1-beta)
tc
Type I error
Type II error
0
t
Type II error increases
Power decreases
A) Make distance smaller
tc
t
0
Type II error decreases
Power increases
A) Make distance greater
tc
How to control Type II error (distributions are truly different)
3b) Vary replication (which controls estimates of error)
- Note Type 1 error is constant and tc is allowed to vary
Critical p
t
0
Decision
Truth
HO true
Reference
t=
HO false
y1 − y2
sp
1
n1
+
1
n2
Accept HO
no error
(1-alpha)
Type II error
(beta)
Reject HO
Type I error
(alpha)
no error
(1-beta)
tc
Type II error
Type I error
t
0
Type II error increases
Power decreases
A) Decrease replication
tc
t
0
Type II error decrease
Power increases
A) Increase replication
tc
16
Relationships between power and changes in
alpha, replication and magnitude of effect
High
Power
Low
Low
High
Replication
High
High
Power
Power
Low
Small
Low
Low
Large
High
Critical alpha
Magnitude of Effect
How to estimate optimal sample size
1) Do a preliminary study of
variables that will be evaluated in
project
Mean
Standard Deviation
30
2) Plot the mean and some
estimate of variability of data as a
function of sample size
20
Value
3) Look for sample size where
estimates of mean and variance
(or standard deviation) converge
on a stable value
25
15
10
5
4) Calculate a “bang for buck”
relationship to determine if a
robust design (sufficient sampling)
can be paid for
0
0 10 20 30 40 50 60 70 80 90 100
SAMPLES
17
Related documents