Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 4.4 Summary and Review Exercises 4.50 Because both values refer to the percentage of all voters, they are parameters. 4.51 If we are only interested in automobile accidents in 1993 then it is a parameter. If the population is all automobile accidents then these data are assumed to be a sample and then 15.9 is a statistic. 4.52 Because both values refer to the percentage in the poll, they are statistics. 4.53 Let X = number of households that earn more than $ 50,000. X ~ Bin(200, .176) P(X>40)=1-P(X<=40) > 1-pbinom(40,200,.176) [1] 0.1622405 # Exact answer Let X = number of households that earn more than $ 50,000. X ~ approx N(200*.176, sqrt(200*.176*(1-.176)) ) P(X>40)=1-P(X<=40) > 1-pnorm(40,200*.176,sqrt(200*.176*(1-.176))) [1] 0.1863938 Let p = proportion of households that earn more than $ 50,000. p ~ approx N( .176, sqrt(.176*(1-.176)/200) ) P(p>40/200)=1-P(p<=40/200) > 1-pnorm(40/200,.176,sqrt(.176*(1-.176)/200)) # approx [1] 0.1863938 4.54 mean = 65 and standard deviation = 1.5. Because n = 100, by the Central Limit Theorem the sampling distribution of X is approximately normal. 4.55 a 4.56 d 10, .2 b 10, .4 c 50, 1.2 d 50, 2.4 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.57 The sampling distribution of y should be approximately normal, so the standard deviation of y can be approximated with Range/4. i.e., standard deviation is approximately 800/4 = 200. 4.58 Let X = yards per reception; X~(17, 8); when n=75 X ~ approxN (75,8 / 75) P ( X > 20) = 1 − P( X ≤ 20) > 1-pnorm(20,17,8/sqrt(75)) [1] 0.0005819235 It is highly unlikely that he would average 20 yards for a season. 4.59 Let X = time to install new brakes. X ~ (56.7, 9.3); X ~ approxN (56.7,9.3 / 36) P ( X > 60) = 1 − P ( X ≤ 60) > 1-pnorm(60,56.7,9.3/sqrt(36)) [1] 0.01662580 OR Z = (60 - 56.7)/(9.3/√36) = 2.13 P( y > 60) = P(Z > 2.13) = 1 - .9834 = .0166 4.60 Let Y be the score on the exam. Y~(µ = 70, σ = 15). With n = 40, Y ~ approxN 70,15 / 40 ( ) P (Y > 75) = 1 − P (Y ≤ 75) > 1-pnorm(75,70,15/sqrt(40)) [1] 0.01750749 OR Z = ( y - 70)/2.37. P( y > 75) = P(Z > 2.11) = 1 -.9826 = .0174 Yes, it would be unusual. Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.61 For sufficiently large n, the sampling distribution of p is approximately normally distributed with a mean of π and a standard deviation of π (1 − π ) / n 4.62.1 Let X =number of patients cured out of 40. X~Bin(40,.70) P ( X > 30) = 1 − P ( X ≤ 30) = 0.1959254 > 1-pbinom(30,40,.7) [1] 0.1959254 OR ( Let X =number of patients cured out of 40. X ~ approxN 40 × .7, 40 × .7 × .3 P ( X > 30) = 1 − P ( X ≤ 30) = 0.2450765 > 1-pnorm(30,40*.7,sqrt(40*.7*.3)) [1] 0.2450765 ) OR Let p=proportion of patients cured out of 40. p ~ approxN 0.70, 0.70 × 0.30 / 40 ( ) P ( p > 30 / 40) = 1 − P ( p ≤ 0.75) = 0.2450765 > 1-pnorm(.75,.7,sqrt(.7*.3/40)) [1] 0.2450765 OR by hand… By the Central Limit Theorem p is approximately normal with mean = .7, standard deviation = .7 × .3 / 40 = 0.725 and Z = (p - .7)/.0725. P(p > .75) = P(Z > .69) = 1 .7549 = .2451 4.63 0.5 × 0.5 / 200 = 0.354 > sqrt(.5*.5/200) [1] 0.03535534 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.64.1 Let X = amount spend by employees a day for lodging, rental car, and food. X~(167.21, 50). X ~ approxN 167.21,50 / 60 ( ) P ( X ≤ 148.75) = 0.002119468 > pnorm(148.75,167.21,50/sqrt(60)) [1] 0.002119468 OR by hand… Z = (148.75 - 167.21)/(50/√60) = -2.86 Thus the amount they spent is 2.86 standard deviations below the national average. The chances they would spend that much or less is .0021, very unlikely. 4.65 a Let X=time served for homocide. X~(149, 40). X ~ approxN 149, 40 / 100 ( ) P ( X > 145) = 1 − P ( X ≤ 145) = 0.8413447 > 1-pnorm(145,149,4) [1] 0.8413447 OR by hand… Z = (145 - 149)/(40/√100) = -1 P( y > 145) = P(Z > -1) = 1 -.1587 = .8413 b P ( X < 155) = 0.9331928 > pnorm(155,149,4) [1] 0.9331928 Or by hand... Z = (155 - 149)/(40/√100) = 1.5 P( y < 155) = P(Z < 1.5) = .9332 c P (151.2 ≤ X ≤ 159.6) = P ( X ≤ 159.6) − P ( X ≤ 151.2) = 0.2871351 > pnorm(159.6,149,4)-pnorm(151.2,149,4) [1] 0.2871351 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. OR by hand … Z = (159.6 - 149)/(40/√100) = 2.65 Z = (151.2 - 149)/(40/√100) = .55 P(151.2 < y < 159.6) = P(.55 < Z < 2.65) = 9960 -.7088 = .2872 4.66 a. Let X = number of students favoring construction on campus. X~Bin(30, 0.60). P ( X > 15) = 1 − P ( X ≤ 15) = 0.824631 > 1-pbinom(15,30,.6) [1] 0.824631 Let X = number of students favoring construction on campus. X ~ approxN 30 × 0.60, 30 × 0.60 × 0.40 ( ) P ( X > 15) = 1 − P ( X ≤ 15) = 0.8682238 > 1-pnorm(15,30*.6,sqrt(30*.6*.4)) [1] 0.8682238 Let p=proportion of students favoring construction on campus. p ~ approxN 0.60, 0.60 × 0.40 / 30 ( ) P ( p > 0.50) = 1 − P ( p ≤ 0.50) = 0.8682238 > 1-pnorm(.5,.6,sqrt(.6*.4/30)) [1] 0.8682238 OR by hand … Z = (.5 - .6)/ √(.6)(.4)/30 = -1.12 P(p > .5) = P(Z > -1.12) = 1 - .1314 = .8686 b P ( X ≥ 25) = 1 − P ( X ≤ 24) = 0.005658796 > 1-pbinom(24,30,.6) [1] 0.005658796 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. P ( X ≥ 25) = 1 − P ( X ≤ 25) = 0.004543734 > 1-pnorm(25,30*.6,sqrt(30*.6*.4)) [1] 0.004543734 P ( p ≥ 25 / 30) = 1 − P ( p ≤ 25 / 30) = > 1-pnorm(25/30,.6,sqrt(.6*.4/30)) [1] 0.004543734 OR by hand … Z = (.833 - .6)/ √(.6)(.4)/30 = 2.61 P(p ≥ .833) = P(Z ≥ 2.61) = 1 - .9955= .0045 4.67 c P( X = 30) = (.6)30 = 2.210739e-07 a. X~N(75, 15) P(X>80)=1-P(X<=80) > 1-pnorm(80,75,15) [1] 0.3694413 OR… Z = (80 - 75)/15 = .33 P(y > 80) = P(Z > .33) = 1 - .6293 = .3707 b P(84<X<95)=P(X<95)-P(X<84) > pnorm(95,75,15)-pnorm(84,75,15) [1] 0.1830419 OR… Z = (95 - 75)/15 = 1.33 Z = (84 - 75)/15 = .6 P(84 < y < 95) P(.6 < Z < 1.33) = .9082 - .7257 = .1825 c P(X<b)=0.80 b=87.62432 > qnorm(.80,75,15) [1] 87.62432 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. OR… 75 + (.84)(15) = 87.6 d P (X > 80) = 1 − P (X ≤ 80 ) > 1-pnorm(80,75,15/sqrt(25)) [1] 0.04779035 OR… Z = (80 - 75)/(15/√25) = 1.67 P( y > 80) = P(Z > 1.67) = 1 - .9525 = .0475 4.68 Let the population represent the number of days spent in the hospital, we then have µ = 9 and σ = 4. With n = 30, y is approximately normal with mean 9 and standard deviation 4/√30 = .73. So Z = ( y - 9)/.73. P( y > 9.5) = P(Z > .68) = 1-.7517 = .2483 X~approx N(9, 4/sqrt(30)) P(X>9.5)=1-P(X<=9.5) > 1-pnorm(9.5,9,4/sqrt(30)) [1] 0.2467814 4.69 standard deviation = 2.5 / n = 0.4 , so n = (2.5/.4)2 = 39.0625 ≈ 40. 4.70 Because of the Central Limit Theorem and the fact that the sample size is 100 the sampling distribution of y will be approximately normally distributed as long as the bimodality is not too severe. The mean of the sampling distribution = 20 and the standard deviation = 4/√100 = .4. 4.71 ( ) Let Y be the index score. Y~(45,8) and Y ~ approxN 45,8 / 35 and Z = ( y - 45)/1.35 P( y < 43) = P(Z < -1.48) = .0694 > pnorm(43,45,8/sqrt(35)) [1] 0.06956749 4.72 ( ) Let Y be the number of passengers per flight. Y ~ approxN 212,42 / 50 and Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. Z = ( y - 212)/5.94 P( y < 200) = P(Z < -2.02) = .0217 OR… > pnorm(200,212,42/sqrt(50)) [1] 0.02167588 4.73 Let X= # of people that use a credit card when dining out. X~Bin(160, 0.12) P(X<15)=P(X<=14) > pbinom(14,160,.12) # Exact answer [1] 0.1238946 OR… Let X = # of people that use a credit card when dining out. X ~ approx N(160*.12, sqrt(160*.12*.88)) P(X<15)=P(X<=15) > pnorm(15,160*.12,sqrt(160*.12*.88)) [1] 0.1534426 OR… Let p= the proportion of use a credit card when dining out. p~ approx N(.12, sqrt(.12*.88/160)) P(p<15/160) > pnorm(15/160,.12,sqrt(.12*.88/160)) [1] 0.1534426 OR by hand…and charts Z = (p - .12)/.0257 P(p < .094) = P(Z < -1.01) = .1562 4.74 If n = 200 then the maximum standard deviation is .5 × .5 = 0.0354 200 .5 × .5 = 0.05 ⇒ n = 100 n Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.75 Let X = # of crimes where a firearm was used. X~Bin(200,.29) P(X>65)=1-P(X<=65) > 1-pbinom(65,200,.29) [1] 0.1219605 OR… Let X = # of crimes where a firearm was used. X~approx N(200*.29,sqrt(200*.29*.71) ) P(X>65)=1-P(X<=65) > 1-pnorm(65,200*.29,sqrt(200*.29*.71)) [1] 0.1376751 OR… Let p=the proportion of crimes where a firearm was used. p~approx N(.29,sqrt(.29*.71/200) ) P(p>65/200)=1-P(p<=65/200) > 1-pnorm(65/200,.29,sqrt(.29*.71/200)) [1] 0.1376751 Or by hand and charts… Z = (p - .29)/√(.29(.71)/200 If p = 65/200 = .325 then Z = (.325 - .29)/√(.29(.71)/200 = 1.09 which is not unusual at all. It is very conceivable that more than 65 out of a random sample of 200 criminal victimizations would involve firearms. NOTE: We consider something unusual if it generally occurs less than 10% of the time. Some may use 5%. 4.76 Let X = number of US students awarded a PhD. X ~Bin(32, .48) P(X <=14) > pbinom(14,32,.48) [1] 0.3815667 which is not an unusually low probability. The percent of PhDs awarded US students was not unusually low. Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.77 standard deviation = 15/√n = 2 thus n = (15/2)2 = 56.25 ≈ 57 4.78 standard deviation = 12/√n = 1 thus n = (12/1)2 = 144 4.79 Let X=amount college students study. X ~ (8, 3) n=100 X ~ approxN ( 8, 0.3) P ( X > 9.2) = 1 − P ( X ≤ 9.2) = > 1-pnorm(9.2,8,.3) [1] 3.167124e-05 OR… Z = (9.2 - 8)/(3/√100) = 4. The amount these students study is a full 4 standard deviations above what typical college students study. This almost certainly did not happen by chance. 4.80 a rnval <- rnorm(20000*5,2,.5) # 20000 samples of size 5 rnvalM <- matrix(rnval,20000,5) # 20000 * 5 matrix of rnval xbars <- apply(rnvalM,1,mean) # xbars medians <- apply(rnvalM,1,median) # medians par(mfrow=c(2,2)) hist(xbars,breaks="Scott",xlim=c(1,3),col="pink",main=list(expres sion(paste("Simulated Sampling Distribution of ", bar(X)))),prob=T) hist(medians,breaks="Scott",xlim=c(1,3),col="lightblue",main="Sim ulated Sampling Distribution of Md") boxplot(xbars,medians,names=c(expression(bar(X)),"Md"),col=c("pin k","lightblue")) plot(density(xbars),col="pink",lwd=2,xlab="",ylab="",main="") lines(density(medians),col="lightblue",lwd=2) par(mfrow=c(1,1)) Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. Simulated Sampling Distribution of X 1000 Frequency 0 500 1.0 0.0 0.5 Density 1.5 1500 Simulated Sampling Distribution of Md 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 medians 0.0 1.0 1.5 0.5 2.0 1.0 2.5 1.5 3.0 xbars X Md 1.0 1.5 2.0 2.5 3.0 b. Both sampling distributions are centered at 2.0. c. Both distributions are symmetrical and appear to be normally distributed. d. The distribution of the median is more variable. e. The values for the sample mean are more tightly clustered about the population mean than are the values of the sample median when the population is normally distributed. Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 4.81 a Simulated Sampling Distribution of X 800 Frequency 0 400 2 0 1 Density 3 4 1200 Simulated Sampling Distribution of Md 1.6 1.8 2.0 2.2 2.4 1.6 1.8 2.0 2.2 2.4 medians 0 1.6 1 1.8 2 2.0 3 2.2 4 2.4 xbars X Md 1.6 1.8 2.0 2.2 The results are essentially the same as the results found in Exercise 4.81 except that both distributions are less variable. When the sample size is 5 the distributions vary from about 1.6 to 2.4 but when the sample size is 30 they vary from about 1.8 to 2.2. Recall that as the sample size increases the standard deviation of the statistic decreases. 4.82 a. rnval <- rexp(20000*30,1/5) # 20000 samples of size 30 rnvalM <- matrix(rnval,20000,30) # 20000 * 30 matrix of rnval xbars <- apply(rnvalM,1,mean) # xbars medians <- apply(rnvalM,1,median) # medians par(mfrow=c(2,2)) hist(xbars,breaks="Scott",xlim=c(0,10),col="pink",main=list(expression( paste("Simulated Sampling Distribution of " , bar(X)))),prob=T) hist(medians,breaks="Scott",xlim=c(0,10),col="lightblue",main="Simulate d Sampling Distribution of Md") boxplot(xbars,medians,names=c(expression(bar(X)),"Md"),col=c("pink","li ghtblue")) plot(density(xbars),col="pink",lwd=2,xlab="",ylab="",main="") lines(density(medians),col="lightblue",lwd=2) par(mfrow=c(1,1)) Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. 2.4 Simulated Sampling Distribution of X Frequency 0 200 400 600 800 0.3 0.2 0.0 0.1 Density 0.4 Simulated Sampling Distribution of Md 0 2 4 6 8 10 0 2 4 6 8 10 medians 0.0 2 0.1 4 0.2 6 0.3 8 0.4 10 xbars X Md 2 4 6 8 10 > summary(values) xbars medians Min. : 2.166 Min. :1.117 1st Qu.: 4.346 1st Qu.:2.896 Median : 4.937 Median :3.463 Mean : 4.992 Mean :3.542 3rd Qu.: 5.574 3rd Qu.:4.117 Max. :10.217 Max. :8.777 > sd(values) xbars medians 0.9192525 0.9101621 > IQR(xbars) [1] 1.227763 > IQR(medians) [1] 1.220481 Using summary and sd we see that the sampling distribution of the mean is centered around 5 (the mean of the population) and the sampling distribution of the median is centered around 3.5. Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. c. Both distributions are skewed right. d. From the summary statistics we see that the distribution of the median is a little less variable than is the distribution of the mean. e. The sampling distributions are skewed right because the original population (exponential) is skewed right. These sampling distributions are based on averages of 5 observations from the population, consequently they are not as skewed as the population. 4.84 Simulated Sampling Distribution of X 600 200 400 Frequency 0.4 0 0.0 0.2 Density 0.6 800 0.8 Simulated Sampling Distribution of Md 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5 6 7 8 medians 0.0 2 3 0.2 4 0.4 5 0.6 6 7 0.8 xbars X Md 1 2 3 4 > summary(values) xbars medians Min. :3.416 Min. :1.773 1st Qu.:4.656 1st Qu.:3.143 Median :4.980 Median :3.464 Mean :5.001 Mean :3.490 3rd Qu.:5.332 3rd Qu.:3.813 Max. :7.106 Max. :5.829 > sd(values) xbars medians 0.5003937 0.4996664 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. > IQR(xbars) [1] 0.6757592 > IQR(medians) [1] 0.6699149 From the descriptive statistics we see that the sampling distribution of the mean is centered around 5 (the same as the population mean) and the sampling distribution of the median is centered around 3.5. These sampling distributions are less skewed than in Exercise 4.83 when the sample size was only 5. As the sample size increases the sampling distributions of the mean and the median become more normally distributed. 4.85 > library(BSDA) > data(Kidsmoke) > str(Kidsmoke) `data.frame': 1000 obs. of 2 variables: $ gender: int 0 0 0 0 1 1 1 0 1 1 ... $ smoke : int 0 0 0 0 1 0 0 0 0 0 ... > attach(Kidsmoke) > table(gender,smoke) smoke gender 0 1 0 375 105 1 418 102 > x <- table(gender,smoke) > addmargins(x) smoke gender 0 1 Sum 0 375 105 480 1 418 102 520 Sum 793 207 1000 > addmargins(x)/1000 smoke gender 0 1 Sum 0 0.375 0.105 0.48 1 0.418 0.102 0.52 Sum 0.793 0.207 1.00 Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know. a b c d e f 20.7% of the kids smoke. 105 of the 480 females smoke. The percent is 105/480 = 21.875% 102 of the 520 males smoke. The percent is 102/480 = 21.25% There are 520 males and 480 females in the study. Slightly more females (21.875%) smoke than males (21.25%). These percents are considerably higher than those found in the NCHS survey. Partial Solutions Using R – Alan Arnholt – typos are completely my fault – If you find them let me know.