Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Today’s lesson
Review of Examination 1
•
•
•
•
•
Basic summary statistics
Definition of alpha and beta
One sample test beta and sample size
Distribution of Sum of iid values
Two sample test Confidence interval for
E(W-Y)
• Computer output questions
Basic summary statistics
• Sample average well handled.
• Sample variance needs some work.
• P-value is crucial--don’t give away points.
2A. Find sample variance.
• A random sample of four was taken from
the random variable W. The four observed
values were 280, 210, 200, and 190. The
sample average was 220. What is the value
of the unbiased estimate of the variance
based on this sample?
Solution to 2A.
• Recognize that the problem has asked for
the usual estimate of the variance.
• Find the deviations from the mean: (280220)=60, -10, -20, -30.
• Check that the deviations from the mean
sum to zero.
• Square the deviations from the mean: 3600,
100, 400, 900.
Solution to 2A.
• Sum the squared deviations from the mean:
5000.
• Divide by n-1 (here 3) to get correct
answer: 5000/3=1666.67.
Problem 3A.
• The observed significance level (p-value)
reported in a computer printout for a
statistical test was 0.086. Which of the
following is a correct decision for this
result?
• Usual options.
Solution to 3A.
• Remember: when the p-value is less than or
equal to alpha, reject at the alpha level of
significance.
• When the p-value is greater than alpha,
accept at the alpha level of significance.
• P-value of 0.086 is greater than 0.05 (hence
accept at 0.05, also at 0.01)
• P-value of 0.086 is less than 0.10 (hence
reject at 0.10); option c is correct.
Definition of alpha and beta.
• Description tells that E11 is test statistic;
eleven question true-false test using the
number of errors as the test statistic. The
rejection region is E11 less than or equal to
3. The usual tables of cdf’s follow.
Problem 4A.
• What is alpha, the probability of a Type I
error for the rejection rule E11 less than or
equal to 3?
Solution to Problem 4A.
• Alpha=Pr0{Reject H0}=Pr0{E11 less than or
equal to 3}=F0(3).
• By table look-up, find F0(3)=0.113281.
• This is alpha. The correct answer is
0.113281.
Problem 5A.
• What is beta, the probability of a Type II
error, for the rejection rule E11 less than or
equal to 3 when the examination is
administered to a student who has a 0.05
probability of incorrectly answering each
question?
Solution to 5A.
• Recognize that calculation of beta requires
you to use the cdf of the alternative.
• Beta=Pr1{Accept H0}=Pr1{E11 greater than
or equal to 4}=1- Pr1{E11 less than or equal
to 3}=1-F1(3).
• Use the tables to find that F1(3)=0.9984.
• Beta=1-0.9984=0.0016.
• The correct answer is 0.0016.
Comments on alpha and beta.
• Alpha (0.1133) is very much larger than
beta (0.0016).
• This imbalance indicates that this rejection
rule is not very wise for this problem.
• Changing the rejection rule to E11 less than
or equal to 2 would be a wise change (check
out alpha and beta for this rule).
Problem 6A.
• In a test of the null hypothesis that a student
is a random guesser against the alternative
the student is better than a random guesser
using the results of the eleven question
examination, what is the observed
significance level (that is, p-value) for a
student who gives two incorrect answers to
the eleven questions?
Solution to Problem 6A.
• Since the rejection rule is a left sided region
(reject when E11 less than or equal to 3), the
left sided p-value should be reported.
• Left-sided p-value=Pr0{E11 less than or
equal to 2}=F0(2)
• The correct answer is F0(2)=0.032715.
• The value 2 was given as the number of
errors made in the statement of the problem.
Story for Questions 7B-9B.
• A research team will test the null hypothesis
that E(Y)=2500 at the 0.01 level of
significance against the alternative that
E(Y)>2500. When the null hypothesis is
true, Y has a normal distribution with
standard deviation 800. They will take a
random sample of 64 observations and use
the sample mean to test the null hypothesis.
Problem 7B.
•
•
•
•
What is the critical value for this test?
Solution: The critical value is
E0 sign |zα|σ0/n0.5.
Here, E0=2500, sign is positive because the
test is right-sided, |zα|=2.326,
σ0/n0.5=800/640.5
• That is, cv=2500+2.326*100=2732.6.
• The correct answer is 2732.6
Problem 8B.
• What is the probability of a Type II error
when E(Y)=2800, σY=800, and α=0.01?
• Solution: β=Pr1{Accept H0}=Pr1{Sample
mean of 64 < 2732.6}.
• This is a normal probability problem in
which the sample mean of 64 is normal with
mean 2800 and standard deviation 100.
Problem 8B.
• Find probability by standardizing:
• Pr{[(mean-E(mean))/se(mean)]<[(2732.62800)/100]}=Pr{Z<-0.674}.
• This is the cdf of the standard normal at the
argument -0.674. Look up in table to find
that the answer is 0.25.
• The correct answer is 0.25.
Problem 9B.
• What is the smallest value of n, the sample
size, so that the probability of a Type II
error is no more than 0.01 when
E(Y)=2800, σY=800, and α=0.01?
• Solution: In the notation of the formula,
E0=2500, E1=2800, σ0= σ1=800,
|zα|=|zβ|=2.326.
• Then, n0.5=12.4, and the correct answer for
n is 154.
Story for Questions 10B-13B.
• The winnings W in one play of a game of
chance is a normally distributed random
variable with expected value -$300 and
standard deviation $2000.
Question 10B.
• What is the probability that a gambler will
lose money in one play of this game of
chance?
• Solution: recognize that you are asked to
calculate Pr{W<0}.
• This is a normal probability; put the
inequality in standard score form.
Question 10B.
• Pr{[(W-EW)/σW]<[(0-(-300))/2000]}
• =Pr{Z<0.15}=Φ(0.15)=0.5596.
• The correct answer is 0.5596.
Question 11B.
• What are the expected total winnings after
100 independent plays of this game of
chance?
• Solution: The basic principle is that
E(Sn)=nE(W).
• E(S100)=100(-$300)=-$30,000.
• The correct answer is -$30,000.
Question 12B.
• What is the standard deviation of the total
winnings after 100 independent plays of this
game of chance?
• Solution: Recall the basic fact that
σ(Sn)=n0.5σW.
• Here, σ(S100)=1000.5($2000)=$20,000.
• The correct answer is $20,000.
Question 13B.
• What is the probability that a gambler will
have total winnings that are less than zero
after 100 independent plays of this game of
chance?
• Solution: Recognize that you have to
calculate the Pr{S100<0}.
• Recognize that S100 is normally distributed
with mean -$30,000 and standard deviation
$20,000.
Question 13 B.
• Calculate every normal probability by
putting the inequality in standard score
form:
• That is, Pr{[(S100-E(S100))/σ(S100)]<
[(0-(-30,000)/20000]}
• This is Pr{Z<1.5}=Φ(1.5).
• Do a table lookup to find that the correct
answer is 0.9332.
Story for Questions 14C to 16C
• Each patient in a study will take a specified
medicine, and the patient’s response to that
medicine will be measured. Forty patients
will be randomly assigned to two groups of
twenty each. Group 1 will receive an
experimental medicine. The random
variable X denotes a patient’s response to
the experimental medicine and is normally
Story for Questions 14C to 16C
• distributed with unknown expected value
E(X) and unknown standard deviation σ.
Group 2 will receive the best available
medicine. The random variable B denotes a
patient’s response to this medicine and is
normally distributed with unknown
expected value E(B) and unknown standard
deviation σ.
Story for Questions 14C to 16C
• The null hypothesis is this experiment is
that E(X-B)=0, and the alternative
hypothesis is that E(X-B)>0.
• The experiment was run. The observed
sample averages were 642.4 in the X group
and 529.8 in the B group. The observed
standard deviations were 233.7 for the X
group, and 348.0 for the B group. The
resulting pooled estimate of σ was 296.5.
Question 14C.
• What is the standard deviation of the
difference of the two means?
• Solution: Use the formula for the variance
of the difference of two random variables.
• variance(X mean of 20)=σ2/20.
• variance(B mean of 20)=σ2/20.
• Covariance(X mean, B mean)=0, since this
is a randomized experiment.
Question 14C.
• Variance(X mean of 20 - B mean of 20) =
var(X mean of 20)+var(B mean of 20) 2covariance(X mean, B mean)=
• (σ2/20)+ (σ2/20)-2(0)= σ2/10
• The standard deviation is the square root of
the variance=(0.10)0.5σ=0.316σ
• The correct answer is 0.316σ.
Question 15C.
• Which of the following is the correct
decision for accepting or rejecting the null
hypothesis based on the sample averages
and standard deviations given in the
common paragraph?
• Usual options.
Solution to 15C.
• The test statistic is the x sample average-b
sample average=642.4-529.8=112.6; this is
positive and in the direction supportive of
the alternative.
• The estimated standard deviation (standard
error) of the test statistic is
0.316*296.5=93.699.
• The t-statistic is (112.6-0)/93.699=1.20.
Solution to 15C.
• Next, you have to stretch the normal theory
critical values to account for the estimated
standard deviation 296.5 having 38 degrees
of freedom (40-2 df).
– 2.326 is stretched to about 2.43; 1.645 to about
1.686; and 1.282 to 1.304.
• The t-value of 1.20 is to the left of the 0.10
critical value of 1.304; hence accept at 0.10.
• D is the correct answer.
Question 16C.
• What is the 95 percent confidence interval
for E(X-B).
• Solution:
• Center the confidence interval at the x mean
minus the b mean=112.6.
• The sampling margin of error is the product
of the stretch of 1.960 for 38 df (about
2.026) and the standard error (93.699). It
equals 189.8
Solution to 16C Continued.
• The 95 percent CI for E(X-B) is 112.6 plus
and minus 189.8.
• The correct answer is that the 95 percent
confidence interval for E(X-B) ranges from
-77.2 to 302.4.
Story for 17C to 19 C.
• I used the Explore command in SPSS to
summarize 100 values of a variable
L1MHLOD determined from a simulation
study of a dominant trait that affected all
families in a simulated genetic study. I
reported descriptives output, histogram and
box and whiskers plot. Use the output to
answer the following three questions.
Question 17C
• Which of the following is a correct decision
about the two tests of null hypotheses about
E(L1MHLOD)?
• I: Null: E(LIMHLOD)=1, alpha=0.05; Alt:
E(LIHMLOD) not equal to 1.
• II: Null: E(LIMHLOD)=2, alpha=0.05; Alt:
E(LIHMLOD) not equal to 2.
• Usual options.
Solution to 17C.
• Read the descriptives output to find that the
95 percent CI for the mean ranges from
1.5230 to 2.4318.
• 1 is not in the 95% CI for the mean; hence
reject I.
• 2 is in the 95% CI for the mean; hence
accept II.
• The correct answer is C.
Question 18C.
• Does the distribution of L1MHLOD appear
to be normal? Support your answer with
specific references to values of statistics and
plots.
Solution to 18C.
• Examine the histogram of L1MHLOD;
observe that it is very skew with a number
of outliers; hence it appears not to be
normal.
• Standardized skewness is (4.045-0)/0.241
(from descriptives output)=16.8, way out of
range.
Solution to 18C.
• Standardized kurtosis is (18.1720)/0.478=38.0, also way out of range.
• Every indication points that the data does
not appear to be normal.
• The correct answer is “NO”.
Question 19C.
• Are there outliers or other unusual patterns
in the distribution of L1MHLOD?
• Solution: Look at the box and whiskers plot
and note that there are values indicated
beyond the whiskers.
• Correct answer: YES, there are outliers.
• Also remark on the four apparently
disconnected values in the histogram.
Story for Questions 20D to 22D.
• I used a paired t-test to compare L1MLOD
to L1MKCEXP for 100 replicates of a study
of recessive genetic trait that affected all
families in the simulated study. The
objective of the analysis was to determine
whether one of the two statistics came
closer to the trait locus than the other.
Computer output followed.
Question 20D
• Which of the following is a correct decision
about the test of the following null
hypothesis? The null is that
E(L1MLOD)=E(L1MKCEXP), and the
alternative is that E(L1MLOD) is not equal
to E(L1MKCEXP).
• Usual options.
Solution to 20D.
• Find the significance level (2-sided) is the
paired samples test output. Here it is 0.000.
• Use this as usual.
– Do not use the sig of the paired samples
correlation, 0.402.
• The correct answer is A, reject at the 0.01
level of significance.
Question 21D.
• Which of the following is a correct decision
about the following two tests of about
E(L1MLOD)-E(L1MKCEXP)?
• I. Null: E(L1MLOD)-E(L1MKCEXP)=-1,
alpha=0.05; Alt: E(L1MLOD)E(L1MKCEXP) not equal to -1.
• II. Null: E(L1MLOD)-E(L1MKCEXP)=0,
alpha=0.05; Alt: E(L1MLOD)E(L1MKCEXP) not equal to 0.
Solution to 21D.
• Find the 95% CI for the difference in the
paired samples test output, 4.19 to 7.23.
• Use the paired samples statistics to confirm
that the difference in the output is for
E(L1KCEXP)-E(L1MLOD).
• NOTE THAT THE QUESTION ASKS
ABOUT THE REVERSE ORDER.
Solution to 21D.
• Check -1*(-1) (reverse the sign of the
values!); 1 is not in the 95% CI for the
expected difference; hence reject I.
• Check -1*0=0. It is not in as well.
• The correct answer is D, Reject both null
hypotheses.
Question 22D.
• What is the value of the t-test for the null
hypothesis give in question 20D? Be sure to
tell how many degrees of freedom it has.
• Solution: t-test is the (mean-0)/std. Error
mean.
• T-test=(5.71-0)/0.77=7.41.
• There are 100 pairs -1=99 degrees of
freedom.
Story for Questions 23D to 25D.
• I used the independent samples t-test to
compare the first day’s sample of 10
observations to the second day’s sample of
5 observations. I reported the output from
this analysis.
Question 23D.
• Which of the following is a correct decision
about the test of the following null
hypothesis? The null is that E(PNC on day
1)=E(PNC on day 2) against the alternative
that E(PNC on day 1) is not equal to the
E(PNC on day 2).
• Usual options.
Solution to 23D.
• Find the significance levels in the t-test for
the equality of means section. These are
0.236 for the equal variance assumption test
and 0.120 for the unequal variance
assumption.
• Compare them as before.
• The correct answer is D, accept at the 0.10
level of significance.
Question 24D.
• Which of the following is a correct decision
about the following two tests of null
hypotheses about E(PNC on day 2)-E(PNC
on day 1)?
• I. Null: E(PNC on day 2)-E(PNC on day
1)=-5, alpha=0.05; Alt: E(PNC on day 2)E(PNC on day 1) not equal to -5.
• II. Null: E(PNC on day 2)-E(PNC on day
1)=+5, alpha=0.05; Alt: E(PNC on day 2)-
Solution to 24D.
• Find the 95% CI for the difference of the
two means in the output for the t-test of the
equality of the means. For the equal
variance assumption this is -1.9387 to
7.1387.
• Determine the difference of means
considered in the output by checking the
group statistics. Output is for PNC on day 1
minus PNC on day 2. Reverse again!
Solution to 24 D.
• For I, -1*(-5)=5; the value 5 is in the CI;
hence accept I.
• For II, -1*(+5)=-5; the value -5 is not in the
CI; hence reject II.
• The correct answer is B, accept I and reject
II.
Question 25 D.
• Which of the following is a correct decision
about the test of the following null
hypothesis? The null is that variance(PNC
on day 1)=variance(PNC on day 2) against
the alternative that variance(PNC on day 1)
is not equal to variance (PNC on day 2)?
• Usual options.
Solution to 25D.
• Find the observed significance level in
Levene’s test for the equality of variances.
This value is 0.066.
• Use as usual.
• The correct answer is C, reject at the 0.10
level of significance and accept at the 0.05
level.
Advice
• Study your exam.
• Don’t make the same mistake twice.
• I care about how good you are at the end of
the course.