Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat120C Homework 6 () Instructor: Zhaoxia Yu Problem 0: This problem is optional Let Y = (Y1 , Y2 , Y3 ) follows a multinomial distribution with n trials and probabilities p = (p1 , p2 , p3 ). Noet that the sum of pi ’s is 1, i.e., p1 + p2 + p3 = 1. The probability mass function (pmf) is n! P r(Y = (y1 , y2 , y3 )) = py 1 py 2 py 3 y1 !y2 !y3 ! 1 2 3 Here y1 , y2 , y3 are nonnegative integers that satisfy y1 + y2 + y3 = n. (a) Show that Y1 follows Binomial(n, p1 ) by showing that P r(Y1 = y1 ) = X P (Y1 = y1 , Y2 = y2 , Y3 = y3 ) = y2 ,y3 n! py1 (1 − p1 )n−y1 y1 !(n − y1 )! 1 Hint: the Binomial theorem is useful: (a + b)n = n X n! ax bn−x x!(n − x)! x=0 (b) Prove that Cov(Y1 , Y2 ) = −np1 p2 , Cov(Y1 , Y3 ) = −np1 p3 , Cov(Y2 , Y3 ) = −np2 p3 . P Hint: E[Y1 Y2 ] = y1 ,y2 ,y3 y1 y2 P r(Y1 = y1 , Y2 = y2 , Y3 = y3 ). Show that it equals n(n − 1)p1 p2 . The P n! trinomial theorem is useful: (a + b + c)n = x+y+z=n x!y!z! ax by cz . Problem 1: (Modified from 13.8 of Rice with cell values changed) Adult-onset diabetes is known to be highly genetically determined. A study was done comparing frequencies of a particular allele in a sample of such diabetics and a sample of nondiabetics. The data are shown in the following table: Bb or bb BB Diabetic 3 5 Normal 1 6 Are the relative frequencies of the allele significantly different in the two groups? State your hypotheses, test statistic, significance level and whether you should reject your null based on Fisher’s exact test. Problem 2: Suppose that 300 persons are selected at random from a large population, and each person in the sample is classified according to blood type: O, A, B, or AB, also according to Rh: positive or negative. The observed numbers are given below. 1 2 Homework 6 O 82 13 Rh+ Rh- A 89 27 B 54 7 AB 19 9 (a) Conduct a Pearson’s chi-square test (at level α = 0.05) to test the hypothesis that the two classifications of blood types are independent. (b) Confirm your calculation in (a) using R. > rhp = c(82, 89, 54, 19) > rhn = c(13, 27, 7, 9) > chisq.test(rbind(rhp, rhn), correct=F) (c) Calculate the likelihood ratio statistic for testing independence. To do so, first calculate the maximized likelihood under the full model, i.e., the model with no constraint. Denote it by L1 . Second, calculate the maximized likelihood under the reduced model, i.e., the model assumes independence. Denote it by L0 . Third, calculate 2(log(L1 ) − log(L0 )). (d) Compare the test statistic in (c) to Pearson’s chi-square statistic. Under the null hypothesis of independence, the likelihood ratio statistic follows a chi-squared distribution with three degrees of freedom. Based upon the likelihood ratio statistic, would you reject the null the hypothesis at level α = 0.05? Problem 3: Consider random samples taken from J populations, and each observation can be classified as one of I different types. Let nij be the number of subjects classified to the ith type from population j. The data can be arranged in the following I × J table. n11 n21 .. . n12 n22 .. . ··· ··· .. . n1J n2J .. . n1· n2· .. . nI1 n·1 nI2 n·2 ··· ··· nIJ n·J nI· n·· Let pij denote the probability that an observation chosen at random from the jth population will be of type i. Thus I X pij = 1 for j = 1, · · · , J, i=1 and the data from the jth population (n1j , n2j , · · · , nIj ) come from a multinomial distribution with n·j trials and cell probabilities (p1j , p2j , · · · , pIj ). The null hypothesis of homogeneity claims H0 : pi1 = pi2 = · · · = piJ = pi for i = 1, · · · , I Derive the maximum likelihood estimates of p1 , p2 , · · · , pI under the assumption of homogeneity. Problem 4: In a study 1200 schoolchildren were questioned on whether they had severe colds at the age of 10 and at the age 12. The data are summarized in the table below: Homework 6 3 Severe colds at age 10 Severe colds at age 12 Yes No 165 Yes 200 No 235 600 Conduct a statistical analysis to test whether there was a significant change of the prevalence of severe cold.