* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exam 3
Inductive probability wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Secretary problem wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Student's t-test wikipedia , lookup
1
Exam 3 STAT305A Spring 2017 Due 4/27(R) Name_______________________________________________
PROBLEM 1(30pts) You are charged with conducting an investigation in relation to herbicide pollution of IA lakes. Let
X denote the act of measuring the level of a certain chemical in any randomly chosen lake. Assume X ~ N ( X , X ) , and that
the lakes to be chosen for testing are such that the data collection variables { X k }nk 1 can be assumed mutually independent.
Let ( X , X ) denote the usual estimators of ( X , X ) . Data collected on n 50 lakes resulted in ( X 323, X 69) .
(a)(10pts) Compute the estimate of the 95% 2-sided confidence interval (CI) for each of ( X , X ) . Show ALL steps.
Solution:
(b(5pts) Federal law requires that any state having X 300 must develop a plan to rectify the situation. Conduct the test
H 0 : X 300 vs. H1 : X 300 at a significance level 0.05 to determine whether or not such a plan will be ordered.
Solution:
(c)(5pt) Find the p-value of the test in (b).
Solution:
(d)(5pts) In view of (b-c), you should have found that a clean-up plan will be called for. Your company has been
contacted to submit a bid for the work. Before deciding whether you will bid on the project, you asked for, and received
the data associated with the investigation. A careful look at it revealed that for the northern half of IA the results were
(1 316,1 52.01, n1 33) , and for the southern half they were (2 330,1 45.35, n1 18) . Test the hypotheses:
H 0 : 1 2 0 vs. H1 : 1 2 0 for 0.05 .
Solution:
(e)(5pts) Test the hypotheses: H 0 : 1 / 2 1 vs. H 0 : 1 / 2 1 for 0.05 .
Solution:
2
PROBLEM 2(25pts) This problem addresses the relation between the weight of a package being air-shipped (X) to a
given location, and the amount of fuel used (Y). The data associated with n 100 packages is included in the file named
wgtfueldata.txt located in the exam folder.
(a)(10pts) Consider the model: Y ( x) b1 x b0 . Denote the
associated model error as: W ( x) Y ( x) Y ( x) . Compute the
estimates (b1 , b0 ) using the method addressed in relation to linear
modeling. Then overlay your model on a scatter plot of the data.
Finally, obtain an estimate, W , of the error std. deviation.
Solution: [See code @ 2(a).]
Figure 2(a) Scatter plot and linear model.
(b)(8pts) In Lecture 19 the following fact was given:
FACT: For a given x (resulting from a given x-data set): T (b1 b1 ) x n 2
W
Use this fact to arrive at a 95% 2-sided CI for the slope, b1 .
Solution:
~ t n 2 .[Miller & Miller p.395].
2
2
(c)(7pts) Formula (11-30) on p.447 gives the CI for b0 : b0 t / 2,n2W 1 x b1 b1 t / 2,n2W 1 x . Use this to
n
n
arrive at the CI for b0 . [Note: from (11-10) we have S xx ( xk x ) 2 .]
k 1
Solution: [See code @ 2(c).]
S xx
n
S xx
3
PROBLEM 3(20pts) The sample mean X is the most popular of all statistics. A close second is the sample correlation
coefficient . It is not a ‘pretty’ statistic, as is evident in (11-43) on p.459. Let X=the act of measuring the temperature at
which a reaction is carried out, and let Y= the act of measuring the reaction rate.
[c.f. https://en.wikipedia.org/wiki/Reaction_rate ]
n 2 . For
(a)(6pts) To test H : 0 vs. H : 0 the appropriate test statistic is [see (11-46)]:
0
Tn 2
1
1 2
n 30 samples of ( X , Y ) , the estimate was 0.248 . Conduct this test with a false alarm probability 0.05 .
Solution:
(b)(8pts) The code that resulted in the estimate in (a) is given in the
Appendix. Modify it to generate nsim 105 simulations of . Then
use these to compare a simulation-based pdf for T
n2
n 2 , and
1 2
overlay the pdf for Tn 2 on it. Comment on how they compare.
Solution: [See code @ 3(b).]
Figure 4(b) Plots of pdfs for Tn 2 and Tn 2 .
(c)(6pts) In the case where 0 , the test statistic T n 2 no longer has a tn2 pdf. However, as noted on .459, for
n2
1 2
n 25 the statistic W atanh ( ) ~ N ( W , W ) where W atanh ( ) and W 1/ n 3 . Use W and (A) in Table 1 to arrive
at the 95% 2-sided CI for . [Note: You still have n 30 and 0.248 .]
Solution:
4
PROBLEM 4(25pts) This problem addresses a situation where announcing H1 does not cost anything. In fact, you can
profit by it. Example 9-10 on p.345: A semiconductor manufacturer claims that its defect rate does not exceed p 0.05 ,
and that it demonstrates process capability at this level using 0.05 . A recent inspection of n 200 devices found
only 4 defective ones. This corresponds to p 0.02 . Management would like to use this result to convince potential
customers that its defect rate is actually lower than p 0.05 . To this end, consider the test
H 0 : p 0.05 vs. H1 : p 0.05 .
The decision rule is: If p is sufficiently smaller than 0.05, we will announce H1 ; supporting the claim that the printed
maximum defect rate is actually lower than advertised. The authors carry out the test, first, assuming that the CLT holds
(i.e. they can use a normal test statistic. They then carry out the test, using the fact that the number of defect
Y np ~ bino (n, p) . We will focus on this latter approach.
(a)(5pts) The false alarm probability is Pr[ p pth ] Pr[Y yth ] . Show that the p-value of the test is 0.0264 (as is given
at the top of p.347).
Solution:
(b)(5pts) Compute the Type-2 error for a true value p 0.025 .
Solution:
(c)(10pts) Suppose that we now consider the hypotheses: H 0 : p 0.025 vs. H1 : p 0.025 . We chose the value 0.025
since our data defect proportion 0.02 will result in announcing H1 . Write Y ~ bino (200, p) Y ( p) . Then our false alarm
probability for this new test ( p) Pr[Y ( p) 4] is only valid for p 0.025 . Similarly, our Type-2 error probability
( p) Pr[Y ( p) 4] is only valid for p 0.025 . Since the Type-2 error is the event that we announce H 0 when H1 is
true, then the probability of announcing H1 when it is, indeed, true is 1 ( p) . Here again, we note that this probability
is only valid for p 0.02 . Hence, the probability that we will announce H1 , whether or not it is true, is:
( p) for p 0.025
.
1 ( p) for p 0.025
This quantity is called the power function for our new test. Show that
( p) binocdf (4,200, p) . Then plot of it over the range p = 0 : .001 : 0.1.
Solution:
( p)
(1)
Figure 4(c) Plot of ( p) for pth 0.02 .
(d)(5pts) (i) A random sample of 1000 student's statistics exam scores was drawn from the population of all possible
scores. The computed sample mean is the true population mean. TRUE / FALSE (circle your answer)
(ii)While trying to figure out the probability that the sample mean for a sample size n=10 from a population would exceed
a specified value, use of the Central Limit Theorem is usually justified. TRUE / FALSE (circle your answer)
5
APPENDIX Table 1 and Your Matlab Code
Table 1. Some Handy-Dandy Test Statistics
For X ~ N ( x ; X , X2 ) and associated iid data collection variables {X k }nk 1 :
(A): Z ( X X ) /( X / n ) ~ N (0,1)
;
(B): T X X ~ tn1
X2 / n
2
(C): n X2 / X2 ~ n2 when X is used ;
(D): (n 12) X ~ n21 when X is used.
2
/ X2
~ f n ,n when X & Y are used ;
X / X2
(E): F 2X1
1
2
2
1
2
X
X2 1 / X2 1
(F): F 2
~ f n11,n2 1 when X & Y used.
2
X2 / X2
%PROGRAM NAME: exam3.m (Spring 2017)
%PROBLEM 2: X=pkg wgt (lb) & Y=fuel used (gal.)
%(a):
load wgtfueldata.txt
xy=wgtfueldata;
figure(20)
plot(x,y,'*')
hold on
plot(x,yhat,'r','LineWidth',2)
title('Scatter Plot & Linear Model of Fuel vs. Weight')
xlabel('Weight (lbs)')
ylabel('gal.')
grid
%(c):
%=======================================================
%PROBLEM 3: X=temperature(C) & %Y=rate(moles/ltr)/sec
%Truth Model Parameters:
muX=100; muY=20;
Mu=[muX muY];
stdX=5; stdY=2;
C=[stdX^2 0 ; 0 stdY^2]; %Assumes rho=0
n=30; %Sample Size
xy=mvnrnd(Mu,C,n);
Rhat=corrcoef(xy);
rhat=Rhat(1,2);
figure(30)
%============================================
%PROBLEM 4
%(c):
p=0:.001:.1; np=length(p);
pwr=zeros(1,np);
figure(40)
plot(p,pwr)
title('Power Function for p_t_h=0.025')
xlabel('p')
ylabel('power')
grid