Download Summary and Review Section 4.4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Section 4.4 Summary and Review Exercises
4.50 Because both values refer to the percentage of all voters, they are parameters.
4.51 If we are only interested in automobile accidents in 1993 then it is a parameter. If
the population is all automobile accidents then these data are assumed to be a sample and
then 15.9 is a statistic.
4.52
Because both values refer to the percentage in the poll, they are statistics.
4.53
Let X = number of households that earn more than $ 50,000. X ~ Bin(200, .176)
P(X>40)=1-P(X<=40)
> 1-pbinom(40,200,.176)
[1] 0.1622405
# Exact answer
Let X = number of households that earn more than $ 50,000. X ~ approx N(200*.176,
sqrt(200*.176*(1-.176)) )
P(X>40)=1-P(X<=40)
> 1-pnorm(40,200*.176,sqrt(200*.176*(1-.176)))
[1] 0.1863938
Let p = proportion of households that earn more than $ 50,000. p ~ approx N( .176,
sqrt(.176*(1-.176)/200) )
P(p>40/200)=1-P(p<=40/200)
> 1-pnorm(40/200,.176,sqrt(.176*(1-.176)/200)) # approx
[1] 0.1863938
4.54 mean = 65 and standard deviation = 1.5. Because n = 100, by the Central Limit
Theorem the sampling distribution of X is approximately normal.
4.55
a
4.56
d
10, .2
b 10, .4
c 50, 1.2
d 50, 2.4
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.57 The sampling distribution of y should be approximately normal, so the standard
deviation of y can be approximated with Range/4. i.e., standard deviation is
approximately 800/4 = 200.
4.58
Let X = yards per reception; X~(17, 8); when n=75 X ~ approxN (75,8 / 75)
P ( X > 20) = 1 − P( X ≤ 20)
> 1-pnorm(20,17,8/sqrt(75))
[1] 0.0005819235
It is highly unlikely that he would average 20 yards for a season.
4.59
Let X = time to install new brakes. X ~ (56.7, 9.3); X ~ approxN (56.7,9.3 / 36)
P ( X > 60) = 1 − P ( X ≤ 60)
> 1-pnorm(60,56.7,9.3/sqrt(36))
[1] 0.01662580
OR
Z = (60 - 56.7)/(9.3/√36) = 2.13 P( y > 60) = P(Z > 2.13) = 1 - .9834 = .0166
4.60
Let Y be the score on the exam. Y~(µ = 70, σ = 15). With n = 40,
Y ~ approxN 70,15 / 40
(
)
P (Y > 75) = 1 − P (Y ≤ 75)
> 1-pnorm(75,70,15/sqrt(40))
[1] 0.01750749
OR
Z = ( y - 70)/2.37.
P( y > 75) = P(Z > 2.11) = 1 -.9826 = .0174 Yes, it would be unusual.
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.61 For sufficiently large n, the sampling distribution of p is approximately normally
distributed with a mean of π and a standard deviation of π (1 − π ) / n
4.62.1 Let X =number of patients cured out of 40. X~Bin(40,.70)
P ( X > 30) = 1 − P ( X ≤ 30) = 0.1959254
> 1-pbinom(30,40,.7)
[1] 0.1959254
OR
(
Let X =number of patients cured out of 40. X ~ approxN 40 × .7, 40 × .7 × .3
P ( X > 30) = 1 − P ( X ≤ 30) = 0.2450765
> 1-pnorm(30,40*.7,sqrt(40*.7*.3))
[1] 0.2450765
)
OR
Let p=proportion of patients cured out of 40.
p ~ approxN 0.70, 0.70 × 0.30 / 40
(
)
P ( p > 30 / 40) = 1 − P ( p ≤ 0.75) = 0.2450765
> 1-pnorm(.75,.7,sqrt(.7*.3/40))
[1] 0.2450765
OR by hand…
By the Central Limit Theorem p is approximately normal with mean = .7, standard
deviation = .7 × .3 / 40 = 0.725 and Z = (p - .7)/.0725. P(p > .75) = P(Z > .69) = 1 .7549 = .2451
4.63
0.5 × 0.5 / 200 = 0.354
> sqrt(.5*.5/200)
[1] 0.03535534
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.64.1 Let X = amount spend by employees a day for lodging, rental car, and food.
X~(167.21, 50).
X ~ approxN 167.21,50 / 60
(
)
P ( X ≤ 148.75) = 0.002119468
> pnorm(148.75,167.21,50/sqrt(60))
[1] 0.002119468
OR by hand…
Z = (148.75 - 167.21)/(50/√60) = -2.86 Thus the amount they spent is 2.86 standard
deviations below the national average. The chances they would spend that much or less
is .0021, very unlikely.
4.65
a
Let X=time served for homocide. X~(149, 40).
X ~ approxN 149, 40 / 100
(
)
P ( X > 145) = 1 − P ( X ≤ 145) = 0.8413447
> 1-pnorm(145,149,4)
[1] 0.8413447
OR by hand…
Z = (145 - 149)/(40/√100) = -1
P( y > 145) = P(Z > -1) = 1 -.1587 = .8413
b
P ( X < 155) = 0.9331928
> pnorm(155,149,4)
[1] 0.9331928
Or by hand...
Z = (155 - 149)/(40/√100) = 1.5
P( y < 155) = P(Z < 1.5) = .9332
c
P (151.2 ≤ X ≤ 159.6) = P ( X ≤ 159.6) − P ( X ≤ 151.2) = 0.2871351
> pnorm(159.6,149,4)-pnorm(151.2,149,4)
[1] 0.2871351
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
OR by hand …
Z = (159.6 - 149)/(40/√100) = 2.65
Z = (151.2 - 149)/(40/√100) = .55
P(151.2 < y < 159.6) = P(.55 < Z < 2.65) = 9960 -.7088 = .2872
4.66
a.
Let X = number of students favoring construction on campus.
X~Bin(30, 0.60).
P ( X > 15) = 1 − P ( X ≤ 15) = 0.824631
> 1-pbinom(15,30,.6)
[1] 0.824631
Let X = number of students favoring construction on campus.
X ~ approxN 30 × 0.60, 30 × 0.60 × 0.40
(
)
P ( X > 15) = 1 − P ( X ≤ 15) = 0.8682238
> 1-pnorm(15,30*.6,sqrt(30*.6*.4))
[1] 0.8682238
Let p=proportion of students favoring construction on campus.
p ~ approxN 0.60, 0.60 × 0.40 / 30
(
)
P ( p > 0.50) = 1 − P ( p ≤ 0.50) = 0.8682238
> 1-pnorm(.5,.6,sqrt(.6*.4/30))
[1] 0.8682238
OR by hand …
Z = (.5 - .6)/ √(.6)(.4)/30 = -1.12
P(p > .5) = P(Z > -1.12) = 1 - .1314 = .8686
b
P ( X ≥ 25) = 1 − P ( X ≤ 24) = 0.005658796
> 1-pbinom(24,30,.6)
[1] 0.005658796
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
P ( X ≥ 25) = 1 − P ( X ≤ 25) = 0.004543734
> 1-pnorm(25,30*.6,sqrt(30*.6*.4))
[1] 0.004543734
P ( p ≥ 25 / 30) = 1 − P ( p ≤ 25 / 30) =
> 1-pnorm(25/30,.6,sqrt(.6*.4/30))
[1] 0.004543734
OR by hand …
Z = (.833 - .6)/ √(.6)(.4)/30 = 2.61
P(p ≥ .833) = P(Z ≥ 2.61) = 1 - .9955= .0045
4.67
c
P( X = 30) = (.6)30 = 2.210739e-07
a.
X~N(75, 15)
P(X>80)=1-P(X<=80)
> 1-pnorm(80,75,15)
[1] 0.3694413
OR…
Z = (80 - 75)/15 = .33
P(y > 80) = P(Z > .33) = 1 - .6293 = .3707
b
P(84<X<95)=P(X<95)-P(X<84)
> pnorm(95,75,15)-pnorm(84,75,15)
[1] 0.1830419
OR…
Z = (95 - 75)/15 = 1.33
Z = (84 - 75)/15 = .6 P(84 < y < 95)
P(.6 < Z < 1.33) = .9082 - .7257 = .1825
c
P(X<b)=0.80 b=87.62432
> qnorm(.80,75,15)
[1] 87.62432
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
OR…
75 + (.84)(15) = 87.6
d
P (X > 80) = 1 − P (X ≤ 80 )
> 1-pnorm(80,75,15/sqrt(25))
[1] 0.04779035
OR…
Z = (80 - 75)/(15/√25) = 1.67
P( y > 80) = P(Z > 1.67) = 1 - .9525 = .0475
4.68 Let the population represent the number of days spent in the hospital, we then have
µ = 9 and σ = 4. With n = 30, y is approximately normal with mean 9 and
standard deviation 4/√30 = .73. So Z = ( y - 9)/.73.
P( y > 9.5) = P(Z > .68) = 1-.7517 = .2483
X~approx N(9, 4/sqrt(30))
P(X>9.5)=1-P(X<=9.5)
> 1-pnorm(9.5,9,4/sqrt(30))
[1] 0.2467814
4.69
standard deviation = 2.5 / n = 0.4 , so n = (2.5/.4)2 = 39.0625 ≈ 40.
4.70 Because of the Central Limit Theorem and the fact that the sample size is 100 the
sampling distribution of y will be approximately normally distributed as long as the
bimodality is not too severe. The mean of the sampling distribution = 20 and the
standard deviation = 4/√100 = .4.
4.71
(
)
Let Y be the index score. Y~(45,8) and Y ~ approxN 45,8 / 35 and
Z = ( y - 45)/1.35
P( y < 43) = P(Z < -1.48) = .0694
> pnorm(43,45,8/sqrt(35))
[1] 0.06956749
4.72
(
)
Let Y be the number of passengers per flight. Y ~ approxN 212,42 / 50 and
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
Z = ( y - 212)/5.94
P( y < 200) = P(Z < -2.02) = .0217
OR…
> pnorm(200,212,42/sqrt(50))
[1] 0.02167588
4.73
Let X= # of people that use a credit card when dining out.
X~Bin(160, 0.12)
P(X<15)=P(X<=14)
> pbinom(14,160,.12)
# Exact answer
[1] 0.1238946
OR…
Let X = # of people that use a credit card when dining out.
X ~ approx N(160*.12, sqrt(160*.12*.88))
P(X<15)=P(X<=15)
> pnorm(15,160*.12,sqrt(160*.12*.88))
[1] 0.1534426
OR…
Let p= the proportion of use a credit card when dining out.
p~ approx N(.12, sqrt(.12*.88/160))
P(p<15/160)
> pnorm(15/160,.12,sqrt(.12*.88/160))
[1] 0.1534426
OR by hand…and charts
Z = (p - .12)/.0257
P(p < .094) = P(Z < -1.01) = .1562
4.74
If n = 200 then the maximum standard deviation is
.5 × .5
= 0.0354
200
.5 × .5
= 0.05 ⇒ n = 100
n
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.75
Let X = # of crimes where a firearm was used. X~Bin(200,.29)
P(X>65)=1-P(X<=65)
> 1-pbinom(65,200,.29)
[1] 0.1219605
OR…
Let X = # of crimes where a firearm was used. X~approx
N(200*.29,sqrt(200*.29*.71) )
P(X>65)=1-P(X<=65)
> 1-pnorm(65,200*.29,sqrt(200*.29*.71))
[1] 0.1376751
OR…
Let p=the proportion of crimes where a firearm was used.
p~approx N(.29,sqrt(.29*.71/200) )
P(p>65/200)=1-P(p<=65/200)
> 1-pnorm(65/200,.29,sqrt(.29*.71/200))
[1] 0.1376751
Or by hand and charts…
Z = (p - .29)/√(.29(.71)/200
If p = 65/200 = .325 then Z = (.325 - .29)/√(.29(.71)/200 = 1.09 which is not
unusual at all. It is very conceivable that more than 65 out of a random sample of
200 criminal victimizations would involve firearms.
NOTE: We consider something unusual if it generally occurs less than 10%
of the time. Some may use 5%.
4.76
Let X = number of US students awarded a PhD.
X ~Bin(32, .48)
P(X <=14)
> pbinom(14,32,.48)
[1] 0.3815667
which is not an unusually low probability. The percent of PhDs awarded US
students was not unusually low.
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.77
standard deviation = 15/√n = 2 thus n = (15/2)2 = 56.25 ≈ 57
4.78
standard deviation = 12/√n = 1 thus n = (12/1)2 = 144
4.79
Let X=amount college students study.
X ~ (8, 3)
n=100
X ~ approxN ( 8, 0.3)
P ( X > 9.2) = 1 − P ( X ≤ 9.2) =
> 1-pnorm(9.2,8,.3)
[1] 3.167124e-05
OR…
Z = (9.2 - 8)/(3/√100) = 4. The amount these students study is a full 4 standard
deviations above what typical college students study. This almost certainly did
not happen by chance.
4.80
a
rnval <- rnorm(20000*5,2,.5)
# 20000 samples of size 5
rnvalM <- matrix(rnval,20000,5)
# 20000 * 5 matrix of rnval
xbars
<- apply(rnvalM,1,mean)
# xbars
medians <- apply(rnvalM,1,median) # medians
par(mfrow=c(2,2))
hist(xbars,breaks="Scott",xlim=c(1,3),col="pink",main=list(expres
sion(paste("Simulated Sampling Distribution of ",
bar(X)))),prob=T)
hist(medians,breaks="Scott",xlim=c(1,3),col="lightblue",main="Sim
ulated Sampling Distribution of Md")
boxplot(xbars,medians,names=c(expression(bar(X)),"Md"),col=c("pin
k","lightblue"))
plot(density(xbars),col="pink",lwd=2,xlab="",ylab="",main="")
lines(density(medians),col="lightblue",lwd=2)
par(mfrow=c(1,1))
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
Simulated Sampling Distribution of X
1000
Frequency
0
500
1.0
0.0
0.5
Density
1.5
1500
Simulated Sampling Distribution of Md
1.0
1.5
2.0
2.5
3.0
1.0
1.5
2.0
2.5
3.0
medians
0.0
1.0
1.5
0.5
2.0
1.0
2.5
1.5
3.0
xbars
X
Md
1.0
1.5
2.0
2.5
3.0
b. Both sampling distributions are centered at 2.0.
c. Both distributions are symmetrical and appear to be normally distributed.
d. The distribution of the median is more variable.
e. The values for the sample mean are more tightly clustered about the population
mean than are the values of the sample median when the population is normally
distributed.
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
4.81
a
Simulated Sampling Distribution of X
800
Frequency
0
400
2
0
1
Density
3
4
1200
Simulated Sampling Distribution of Md
1.6
1.8
2.0
2.2
2.4
1.6
1.8
2.0
2.2
2.4
medians
0
1.6
1
1.8
2
2.0
3
2.2
4
2.4
xbars
X
Md
1.6
1.8
2.0
2.2
The results are essentially the same as the results found in Exercise 4.81 except that
both distributions are less variable. When the sample size is 5 the distributions
vary from about 1.6 to 2.4 but when the sample size is 30 they vary from about 1.8
to 2.2. Recall that as the sample size increases the standard deviation of the
statistic decreases.
4.82
a.
rnval <- rexp(20000*30,1/5)
# 20000 samples of size 30
rnvalM <- matrix(rnval,20000,30)
# 20000 * 30 matrix of rnval
xbars
<- apply(rnvalM,1,mean)
# xbars
medians <- apply(rnvalM,1,median)
# medians
par(mfrow=c(2,2))
hist(xbars,breaks="Scott",xlim=c(0,10),col="pink",main=list(expression(
paste("Simulated Sampling Distribution of " , bar(X)))),prob=T)
hist(medians,breaks="Scott",xlim=c(0,10),col="lightblue",main="Simulate
d Sampling Distribution of Md")
boxplot(xbars,medians,names=c(expression(bar(X)),"Md"),col=c("pink","li
ghtblue"))
plot(density(xbars),col="pink",lwd=2,xlab="",ylab="",main="")
lines(density(medians),col="lightblue",lwd=2)
par(mfrow=c(1,1))
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
2.4
Simulated Sampling Distribution of X
Frequency
0
200 400 600 800
0.3
0.2
0.0
0.1
Density
0.4
Simulated Sampling Distribution of Md
0
2
4
6
8
10
0
2
4
6
8
10
medians
0.0
2
0.1
4
0.2
6
0.3
8
0.4
10
xbars
X
Md
2
4
6
8
10
> summary(values)
xbars
medians
Min.
: 2.166
Min.
:1.117
1st Qu.: 4.346
1st Qu.:2.896
Median : 4.937
Median :3.463
Mean
: 4.992
Mean
:3.542
3rd Qu.: 5.574
3rd Qu.:4.117
Max.
:10.217
Max.
:8.777
> sd(values)
xbars
medians
0.9192525 0.9101621
> IQR(xbars)
[1] 1.227763
> IQR(medians)
[1] 1.220481
Using summary and sd we see that the sampling distribution of the mean is
centered around 5 (the mean of the population) and the sampling distribution of the
median is centered around 3.5.
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
c. Both distributions are skewed right.
d. From the summary statistics we see that the distribution of the median is a little
less variable than is the distribution of the mean.
e. The sampling distributions are skewed right because the original population
(exponential) is skewed right. These sampling distributions are based on averages
of 5 observations from the population, consequently they are not as skewed as the
population.
4.84
Simulated Sampling Distribution of X
600
200
400
Frequency
0.4
0
0.0
0.2
Density
0.6
800
0.8
Simulated Sampling Distribution of Md
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
5
6
7
8
medians
0.0
2
3
0.2
4
0.4
5
0.6
6
7
0.8
xbars
X
Md
1
2
3
4
> summary(values)
xbars
medians
Min.
:3.416
Min.
:1.773
1st Qu.:4.656
1st Qu.:3.143
Median :4.980
Median :3.464
Mean
:5.001
Mean
:3.490
3rd Qu.:5.332
3rd Qu.:3.813
Max.
:7.106
Max.
:5.829
> sd(values)
xbars
medians
0.5003937 0.4996664
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
> IQR(xbars)
[1] 0.6757592
> IQR(medians)
[1] 0.6699149
From the descriptive statistics we see that the sampling distribution of the mean is
centered around 5 (the same as the population mean) and the sampling distribution
of the median is centered around 3.5. These sampling distributions are less skewed
than in Exercise 4.83 when the sample size was only 5. As the sample size
increases the sampling distributions of the mean and the median become more
normally distributed.
4.85
> library(BSDA)
> data(Kidsmoke)
> str(Kidsmoke)
`data.frame':
1000 obs. of 2 variables:
$ gender: int 0 0 0 0 1 1 1 0 1 1 ...
$ smoke : int 0 0 0 0 1 0 0 0 0 0 ...
> attach(Kidsmoke)
> table(gender,smoke)
smoke
gender 0
1
0 375 105
1 418 102
> x <- table(gender,smoke)
> addmargins(x)
smoke
gender
0
1 Sum
0
375 105 480
1
418 102 520
Sum 793 207 1000
> addmargins(x)/1000
smoke
gender
0
1 Sum
0
0.375 0.105 0.48
1
0.418 0.102 0.52
Sum 0.793 0.207 1.00
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
a
b
c
d
e
f
20.7% of the kids smoke.
105 of the 480 females smoke. The percent is 105/480 = 21.875%
102 of the 520 males smoke. The percent is 102/480 = 21.25%
There are 520 males and 480 females in the study.
Slightly more females (21.875%) smoke than males (21.25%).
These percents are considerably higher than those found in the NCHS
survey.
Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.
Related documents