Download Fundamentals of Research Project Planning: Hypotheses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Basic Elements of Testing
Hypothesis
Dr. M. H. Rahbar
Professor of Biostatistics
Department of Epidemiology
Director, Data Coordinating Center
College of Human Medicine
Michigan State University
Inferential Statistics
• Estimation: This includes point and
interval estimation of certain
characteristics in the population(s).
• Testing Hypothesis about population
parameter(s) based on the
information contained in the sample(s).
Important Statistical Terms
• Population: A set which includes all
measurements of interest to the researcher
• Sample: Any subset of the population
• Parameter of interest: The characteristic
of interest to the researcher in the
population is called a parameter.
Estimation of Parameters
•
•
•
•
Point Estimation
Interval Estimation (Confidence Intervals)
Bound on the error of estimation (???)
The width of a confidence interval is directly
related to the bound on the error.
Factors influencing the Bound on
the error of estimation
• Narrow confidence intervals are preferred
• As the sample size increases the bound on
the error of estimation decreases.
• As the confidence level increases the bound
on the error of estimation increases.
• You need to plan a sample size to achieve
the desired level of error and confidence.
Testing hypothesis about
population parameters
•
•
•
•
•
•
OR or RR
Mean = 
Standard deviation = 
Difference between two population means
Proportion = p
Difference between two population
proportions
• Incidence
Testing Hypothesis about a
Population Prevalence “p”
Suppose the Government report that prevalence of
hypertension among adults in Pakistan is at most
0.20 but you as a researcher believe that such
prevalence is greater than 0.20
Now we want to formally test these hypothesis.
Null Hypothesis H0: P0.20
vs
Alternative Hypothesis Ha: P>0.20
A sample of n=100 adults is selected from
Pakistan. In this sample 28 adults are
hypertensive. Do the data provide sufficient
evidence that the Government’s figure is
wrong, i.e., P>0.20? Test at 5% level of
significance, that is, =0.05.
Question:
Estimate prevalence=Þ=0.28
Hypothesized prevalence =0.20
Is the gap of 0.08= 0.28-0.20 considered
statistically significant at 5% level?
Testing hypothesis about P
• We need to calculate a test statistic
• How many standard deviations have we
deviated if the null hypothesis p=0.20 was
true?
Z  (0.28  0.20) /(( (0.20)(1  0.20)
Z  2.0
100))
What is the likelihood of observing a
Z=2.0 or more extreme if the
Government’s figure was correct?
P-value= P[Z > 2.0] = 0.025
How does this p-value as compared with =0.05?
If p-value < , then reject the null hypothesis H0
in favor of the alternative hypothesis Ha.
In this situation we reject the Government’s
claim in favor of the alternative hypothesis.
Elements of Testing hypothesis
•
•
•
•
•
•
•
Null Hypothesis
Alternative hypothesis
Level of significance
Test statistics
P-value
Conclusion
Power of the test
Is there an association between
Drinking and Lung Cancer?
What is the most appropriate and
feasible study design in order to
test the above research hypothesis?
Case Control Study of Smoking
and Lung Cancer
Null Hypothesis: There is no
association between Smoking and
Lung cancer, P1=P2
Alternative Hypothesis: There is
some kind of association between
Smoking and Lung cancer, P1P2.
In the following contingency table estimate the
proportion and odds of drinkers among those
who develop Lung Cancer and those without the
disease?
Drinker Yes
No
Lung Cancer
Case
Control
A=33
B=27
C=1667
D= 2273
Total
60
3940
P1=33/1700
P2=27/2300
Odds1=33/1667 Odds2=27/2273
QUESTION: Is there a difference
between the proportion of drinkers among
cases and controls?
Group 1
Disease
P1= proportion of drinkers
Group 2
No Disease
P2= proportion of drinkrs
Test Statistic
• A statistical yard stick which is computed
based on the information contained in the
sample under the assumption that the
null hypothesis is true.
• Knowledge about the sampling
distribution of the test statistics is needed
in determining the likelihood of
observing extreme values for the test
statistics in a given situation.
P-value
• An indicator which measures the
likelihood of observing values as extreme
as the one observed based on the sample
information, assuming the null
hypothesis is true.
• P-value is also known as the observed
level of significance.
The level of significance ( )
•  is known as the nominal level of
significance.
• If p-value < , then we reject the null
hypothesis in favor of the alternative
hypothesis.
• Most of statistical packages give P-value in
their computer output.
•  needs to be pre-determined. (Usually 5%)
Type I and Type II errors
• Type I error is committed when a true
null hypothesis is rejected.
•  is the probability of committing type I
error.
• Type II error is committed when a false
null hypothesis is not rejected.
•  is the probability of committing type II
error.
Null
Hypothesis
Decision made about the
validity of null hypothesis
Rejected Not rejected
True Type I
No Error
Error
False No Error Type II Error
Power of a test
• The power of a test is the probability that
a false null hypothesis is rejected.
• Power = 1 - , where  is the probability
of committing type II error.
• More powerful tests are preferred. At the
design stage one should identify the
desired level of power in the given
situation.
Factors influencing the Power
• The power of a test is influenced by the
magnitude of the difference between the
null hypothesis and the true parameter.
• The power of a test could be improved by
increasing the sample size.
• The power of a test could be improved by
increasing . (this is a very artificial way)
Minimum Required Sample Size
• Usually a Sample size calculation formula
is available for most of the well known
study designs. Some software packages
such as Epi-Info could also be utilized for
the sample size calculation purpose.
• It is extremely important to consult a
biostatistician at the design phase to
ensure adequate sample is considered for
the study.
Testing hypothesis about one
population mean
• H0:  =16 vs Ha:  >16
• Z= (sample mean – hypothesized mean)
SE of the Mean 
n
• Under the null hypothesis and when n is
large, (n>30), the distribution of Z is
standard normal.
• P-value
• Conclusion
Related documents