Download Lecture 17-18. - Columbia Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Review - Week 9
Read: Chapters 18-21
Review :
The purpose of a confidence interval is to estimate an unknown parameter and give an indication
of how accurate the estimate is and of how confident we are that the result is correct. Any
confidence interval consists of two parts:
(a) An interval computed from the data.
(b) A confidence level
The interval usually has the form: Estimate ± margin of error
The confidence level states the probability that the method will give the correct answer.
A level C confidence interval for a parameter is an interval computed from the data in such a way
that C% of all random samples yield intervals containing the true value of the parameter.
Suppose a SRS of size n is drawn from a large population with unknown proportion p of
successes. A confidence interval for p is
pˆ ± z *
pˆ (1 − pˆ )
.
n
z * is called the critical value and is the number of standard deviations away from the mean that
corresponds to the specified level of confidence.
The sample size required to produce a confidence interval with a given margin of error m at a
given confidence level is
2
⎛ z* ⎞
n = ⎜⎜ ⎟⎟ pˆ (1 − pˆ )
⎝m⎠
where z * is the critical value for the confidence level you specified. The margin of error is
greatest when pˆ = 1 / 2 , so when we want to be conservative we can use:
2
⎛ z* ⎞
⎟⎟ .
n = ⎜⎜
⎝ 2m ⎠
Exercise 1: A simple random sample of size n=182 yielded pˆ = 0.73
(a)
(b)
(c)
(d)
(e)
What is the standard error of p̂ .
Find a 99% confidence interval for p.
Find a 95% confidence interval for p.
Find a 90% confidence interval for p.
How does the margin of error change as the confidence level decreases?
Exercise 2: In a clinical trial of 760 patients who received a daily dose of a certain drug, 43
reported a headache as a side effect. Construct a 90% confidence interval for the proportion of
patients receiving the drug who will experience headache as a side effect.
Exercise 3: Assuming p is near 0.3, find the sample size required to construct a 95% confidence
interval for p with margin of error 0.01. Repeat the calculations, this time assuming p is 0.6.
Exercise 4: A politicians wishes to measure her approval rating. What sample size is needed if
she wishes the estimate to be within 4 percentage points with 90% confidence if
(a) past estimates show her approval rating to be around 0.65?
(b) she has no prior information about her approval rating.
Review:
Tests of significance are used to assess the evidence provided by the data against some statement
about the population called the null hypothesis H 0 in favor of an alternative hypothesis H a .
The hypotheses are stated in terms of the population parameters. A test is based on the statistic
that estimates the parameter.
A test statistic measures compatibility between the null hypothesis and the data. The probability,
computed assuming H 0 is true, that the test statistic would take a value as or more extreme than
that actually observed is called the P-value of the test.
The smaller the P-value, the stronger the evidence against H 0 provided by the data. If the Pvalue is as small or smaller than some value α, we say that the data are statistically significant at
significance level α.
A significance test for the statement H 0 : p = p0 , is based on the one-sample z statistic:
z=
pˆ − p0
p0 (1 − p0 )
n
.
with P-values calculated from the N(0,1) distribution.
For one-sided tests only values that differ in a specific direction from the null value counts
against the null hypothesis. For two-sided tests values that differ in either direction from the null
value counts against the null hypothesis.
We can use significance tests to reject a certain hypothesis. But if the test does not give sufficient
information to reject a hypothesis that does not mean that we accept it, only that we do not have
information to justify rejecting it.
Exercise 1: In 1998 a report showed that 42.1% of households in the US owned a personal
computer. Set up the null and alternative hypothesis for testing whether the percentage of
households that own a personal computer has
(a) changed since 1998.
(b) increased since 1998.
(c) decreased since 1998.
Exercise 2: Suppose we want to estimate the proportion women, p, in a certain population. A
SRS of 100 people is selected from the population and we obtain pˆ = 0.65 . Test H 0 : p = 0.55
against H a : p > 0.55 .
(a)
(b)
(c)
(d)
Calculate the P-value.
Would you reject H 0 at the 5% level of significance?
Would you reject H 0 at the 1% level of significance?
Redo (a) - (c) using H a : p ≠ 0.55
Exercise 3: A poll was conducted where a simple random sample of adult Americans were
asked if they were for or against a certain proposal up for debate in Congress. Suppose 800
people were asked their opinion and 360 replied that they supported the proposal, while the rest
opposed it. Is there significant evidence at the 5% level that less than a majority of the population
agrees with the proposal?
Related documents