Download Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SOLUTIONS
Stat 250.3 - Second Midterm Exam
Monday, November 24, 2003
Part I: (100 points) Written Problems: Show ALL work, calculations and formulas used. Partial
credit will be awarded for using the correct procedures.
Problem 1: Suppose that the age at death for all American citizens is normally distributed with mean 70.5
years old and standard deviation 5.9 years old. [10 points]
A. Describe the sampling distribution of x from a sample of size 100. (Distributions and parameters! )
It is Normal (exaclty normal, since X is Normal), with mean E( x )=µ=70.5, and standard
deviation sd( x )=

5.9

n
100
 .59.
B. Turn your description into a picture, including as much information as possible. (Draw the pdf.)
It is a normal curve, centered at 70.5, between 70.5±0.59 falls 68% of the area under the
curve, and between 70.5±2(0.59) falls 68% of the area under the curve. (You should draw
this curve!)
C. If I select a sample of 100 death ages of American citizens, what is the probability that I will get a
sample mean greater than 72 years?
P( x  72)  P( Z 
72  70.5
)  P( Z  2.54)  1  P( Z  2.54)  1  0.9945  0.0055
0.59
Problem 2: A scientist is studying human evolution. He is particularly interested in changes in cranial
capacity (measured in cubic centimeters). As part of his research the scientist is studying a sample of 25
adult Neanderthal skulls. For each skull he records the cranial capacity, the summary statistics are below:
[30 points]
Descriptive Statistics: CC Skulls
Variable
CC Skulls
N
25
Mean
1386.5
Median
1388.8
StDev
106.9
Construct a 90% confidence interval for the mean cranial capacity. If the conditions are not met, STILL
construct the confidence interval, AND state what additional information you would need to evaluate the
conditions. Show ALL the work, and include an interpretation!
Conditions: Since n=25<30 we need to have some information regarding the distribution of
the population from where we collected the data. In order that the conditions are satisfied,
we want the distribution to be bell shaped.
CI calculations:
x =1385.5, se( x )=
s
n

106.9
25
 21.25 , t*=1.71 (using table A.2, with n-1=24 df and 90%
confidence level).
90% CI for the mean is: 1385.5±(1.71)21.25 = ( 1349.162, 1421.838)
Interpretation: We are 90% confident that the mean cranial capacity for adult Neanderthal
skulls is from 1349.162 to 1421.838 cubic centimeters.
Problem 3: An employee of Consumer Reports is evaluating a candy bar contest. The wrapper of each
candy bar contains either a prize or no prize. The candy bar company claims that the odds of winning a
prize is 1 in 10 (that is 10% of candy bars should have prizes). The employee suspects that the company
is lying, and that the chance of winning a prize is less than this. The employee takes a random sample of
400 bars and finds that 26 of them contain a prize. Does the employee have strong evidence to conclude
that the candy company lied? Construct an appropriate hypothesis test.
Note: Show ALL steps including an interpretation (Step 5)! If the conditions are not met just point it out
but STILL perform the hypothesis test. [35 points]
Step 1: H0: p=.1 and Ha: p < .1, where p= the proportion of candy bars with a prize.
Step 2: np0 = 400(.1) = 40, and n(1-p0)=360, thus the conditions are satisfied.
Step 3: z 
pˆ  p 0
p 0 (1  p 0 )
n

26 / 400  0.1
0.1(1  0.1)
400
 2.33 , p-value=P(Z<-2.33) = .0099 (From table A.1).
Step 4: The test is significant, thus we reject the null hypothesis.
Step 5: We have enough evidence to claim that the proportion of candy bars with a prize is less
than 0.1, and thus, we can claim that the company is lying.
Problem 4: A researcher is interested in the relationship between smoking and high blood pressure. He
takes a random sample of 1000 adults and records their smoking status (Y/N) and blood pressure
(High/Low). He then performs a hypothesis test in MINITAB. Use the description of the problem and the
output to answer the questions below: [35 points]
Test and CI for Two Proportions
X = High BP
Sample
NonSmoke
Smoking
X
154
50
N
821
179
Sample p
0.187576
0.279330
Estimate for p(ns) - p(s): -0.0917535
95% CI for p(ns) - p(s): (-0.162698, -0.0208087)
Test for p(ns) - p(s) = 0 (vs not = 0): Z = -2.53
A. Fill in the following:
Parameter of interest (use symbols):
pns  p s
Sample estimate (use symbols AND actual value): pˆ ns  pˆ s  0.09175
Alternative Hypothesis:
Ha : pns  ps  0
Z-Statistic:
Z=-2.53
B. Calculate the p-value and write an appropriate conclusion. Be SPECIFIC to the problem at hand.
Simply reject or fail to reject is NOT SUFFICIENT!
p-value=2P(Z>|-2.53|)=2(0.0057)=0.0114
Since the p-value is less that 0.05, we can reject the null hypothesis and claim that there is
significant difference between the two proportions. Therefore, there is a significant
relationship between smoking and blood pressure.
C. Explain what the p-value means. Again, be specific to this problem.
There is .0114 probability that in a random sample the difference between the proportion of smokers and
non smokers with high blood pressure to be greater than 0.0917 or less than -0.0917, if there is no
relationship between smoking and blood pressure (Ho).
Midterm Exam Part II: (100 points) Multiple Choice Problems
SOLUTIONS: BCDAC CBBBB DDACA DDCDB
Questions 1-3. A random sample of 2,470 12th grade students in the United States is asked how often they wear
seatbelts when driving, and also is asked about their typical grades in school. There are three grade categories (As
and Bs, C, Ds and Fs) and five seatbelt use categories (Never, Rarely, Sometimes, Most times, and Always). The
following Minitab output is for a chi-square analysis of the relationship between typical grades and seatbelt use.
Rows = typical grades in school,
Columns = how often student wears seatbelt when driving
A_and_B
C
D_and_F
All
Never
52
32
18
102
Rarely
128
93
22
243
Sometimes Mosttmes Always
166
298
1056
104
128
300
8
24
41
278
450
1397
All
1700
657
113
2470
Chi-Square = 126.203, DF = 8, P-Value = 0.000
1. Based on the results given above, an appropriate conclusion for a significance test is
A. The observed relationship is not statistically significant because the p-value is less than .05.
B. The observed relationship is statistically significant because the p-value is less than .05.
C. The observed relationship is statistically significant because the chi-square value is greater than 0.
D. The observed relationship is not statistically significant because the chi-square value is greater than 0.05.
2. In this problem, the connection between the chi-square value and the p-value is
A. The p-value is the area to the right of 126.203 in a chi-square distribution with df =15.
B. The p-value is the area to the left of 126.203 in a chi-square distribution with df = 15.
C. The p-value is the area to the right of 126.203 in a chi-square distribution with df = 8.
D. The p-value is the area to the left of 126.203 in a chi-square distribution with df = 8.
3. The “expected” count for the cell “grades = A_and_B, seatbelt = Never” is
A. 52
B. 2470/15 = 164.67
C. 1700/5 = 340
D. (1700)(102)/2470 = 70.2
_____________________________________________________________________________________________
Questions 4-8: Identify the proper statistical procedure for each of the following scenarios:
4. Engineers at a ceramics factory create a new process that is designed to reduce the number of flaws in the
ceramic bowls the factory produces. The engineers take a sample of 100 bowls using the old method, and 100 with
the new method. Each bowl is classified as Flawed/Non-Flawed. Is the new method superior to the old one?
A. Test of 2-proportions.
B. Test of 2-means.
C. CI for 2-means.
D. Chi-Square Test.
5. There is a debate among scientists about whether the ozone hole over Antarctica is getting smaller. Scientists
know that in 1995 the mean ozone concentration over Antarctica was 200ppm (parts per million). In January of 2003
they took a sample of 100 ozone measurements over Antarctica. Has the average ozone level increased from
1995?
A. Test of 1-proportion.
B. CI for 2-proportions.
C. Test of 1-mean.
D. Test of 2-means.
6. A stat 200 student is interested in estimating the proportion of all PSU students who favor legalization of
marijuana.
A. CI for 1 mean.
B. Test of 1 mean.
C. CI for 1-proportion.
D. Test for 1-proportion.
7. Researchers record both the smoking status (Smoke, Don’t Smoke), and blood pressure (recorded as Low, Good,
Hight) from 200 PSU student volunteers. Is there a relationship between Smoking Status and Blood Pressure?
A. Test of 2-means.
B. Chi-Square test.
C. Test of 2-proportions.
D. CI for 2-proportions.
8. It is known that for right-handed people, the dominant (right) hand tends to be stronger. For left-handed people
who live in a world designed for right-handed people, the same may not be true. To test this, muscle strength was
measured on the right and left hands of a random sample of 15 left-handed men. Is the dominant hand of left handed
people stronger than the right hand?
A. A two-sample t-test.
B. A paired t-test.
C. Chi-Square test.
D. Test for 1-proportion.
_____________________________________________________________________________________________
9. Which of the following statements is true about a parameter and a statistic for samples taken from the same
population?
A. The value of the parameter varies from sample to sample.
B. The value of the statistic varies from sample to sample.
C. Both A and B are true.
D. Neither A nor B are true.
10. Suppose a researcher is interested in answering the question, “Is the percentage of all males who use drugs
different than the percentage of all females who use drugs?” Which of the following would be appropriate null and
alternative hypotheses?




A. Ho: p males = p females, Ha: p males ≠ p females
B. Ho: p males = p females, Ha: p males ≠ p females




C. Ho: p males ≠ p females, Ha: p males = p females
D. Ho: µ males = μ females, Ha: µ males ≠ μ females
__________________________________________________________________________________________
Use with Questions 11-13. A null hypothesis is that the mean nose lengths of men and women are the same. The
alternative hypothesis is that men have a larger nose length than women.
11. Which of the following is the correct way to state the null hypothesis?
A. D= 0
B. x1  x 2  0
C. p1 - p2 = 0
D. 1 - 2 = 0
12. A statistical test is done and the p-value is 0.339. Which of the following is the most correct way to appropriately
state the conclusion?
A. The mean nose lengths of men and women are identical
B. Men have a greater mean nose length.
C. The probability is 0.339 that men and women have the same mean nose length.
D. There is not enough evidence to say that that men and women have different nose lengths.
13. Refer back to the information in question 5. Which of the following statements is correct?
A. A 95% confidence interval for the difference in means will include 0.
B. A 95% confidence interval for the difference in means will not include 0.
C. Not enough information is available to know if the interval for the difference in means will include 0.
__________________________________________________________________________________________
14. Suppose that the p-value for testing Ho: p =0.5 vs. the alternative Ha: p < 0.5 was 0.002. If the alternative
hypothesis had been
Ha: p  0.5, what would the p-value of the test be?
A. 0.002
B. 0.001
C. 0.004
D. 0.5
__________________________________________________________________________________________
Questions 15 and 16. Based on sample data from the 1993 General Social Survey, a 95% confidence interval for the
difference between the proportions of men and women in the United States who think marijuana should be legalized is
0.035 to 0.145. In the sample, 30% of the men favored legalization and only 21% of the women favored legalization.
15. Based on this confidence interval we can reject the null hypothesis that the population proportions are equal.
A. True
B. False
C. We can not decided.
16. Fill in the blank. The confidence interval listed above is an estimate of _____.
A. 1 - 2
B. p̂1  p̂ 2
C. x 1  x 2
D. p1  p 2
_________________________________________________________________________________________
17. Suppose that a researcher writes that she found a statistically significant relationship between gender and
whether or not a person “smokes”? What does that statement mean?
A. She has concluded that a relationship exists between gender and “smoking status” in the population
represented by the sample.
B. She rejected the null hypothesis that the two proportions are the same when comparing the proportion that said
“yes” for
each gender.
C. The p-value of a significance test was less than .05 (5%)
D. All of choices A, B, and C are correct.
18. Suppose a researcher is interested in testing a new drug (Compound X) versus an older drug (Compound A).
He initially designs a study that has 1000 subjects, but because of a lack of funds his sample size is reduced to 200.
Which of the following is true?
A. The new study design will have more power than the original study design.
B. The new study design will have a smaller chance of a type II error compared with the original design.
C. The new study design is less powerful, it has a smaller chance of detecting a difference between Compound
X and Compound A.
D. Both B and C are true
19. A random sample of 600 adults is taken from a population of over one million, in order to compute a confidence
interval for a proportion. If the researchers wanted to decrease the width of the confidence interval, they could:
A. Decrease the size of the population
B. Decrease the size of the sample
C. Increase the size of the population
D. Increase the size of the sample
20. Suppose that the chi-square statistic equals 10.9 for a two-way table with 4 rows and 2 columns. Which range
gives the approximate p-value for this situation?
A. Less than 0.001
B. Between 0.01 and 0.025
C. Between 0.025 and 0.05
D. Between 0.10 and 0.25
_____________________________________________________________________________________________
The following information might be useful:
s.d.( p̂ ) =
p(1  p)
n
s.d.( x ) =

,
n
Row total  Column tot al
(Obs.  Exp.) 2
and df = (r-1)x(c-1)
 
, where Expected 
Total sample size n
Exp.
all
2
cells
n( AD  BC ) 2
Special case for 2x2 talbes:  
R1 R2 C1C 2
2
Inference
Parameter
One Mean
(1-sample t)
µ or µd
Difference of
two means
(2-sample t)
µ1-µ2
One proportion P
Difference of
two proport.
p1-p2
Statistic Standard Error
x or d
x1  x 2
p̂
pˆ 1  pˆ 2
s
sd
or
n
n
2
2
s1 s 2

n1 n2
pˆ (1  pˆ )
n
pˆ 1(1  pˆ 1) pˆ 2(1  pˆ 2)

n1
n2
Multiplier
t*
Test Statistic
t
df=n-1
t*
x  0
d 0
or t 
sd
s
n
n
t
df=min(n1-1,
n2-1)
z
z*
z
z*
( x1  x 2 )  0
s1 2 s 2 2

n1 n 2
pˆ  p0
p0 (1  p0 )
n
pˆ1  pˆ 2
n pˆ  n pˆ
, pˆ  1 1 2 2
n1  n2
pˆ (1  pˆ ) pˆ (1  pˆ )

n1
n2
Related documents