Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hw #3 – STAT 110 ~ More Descriptive Statistics and Probability (Due Sunday, Sept. 13th) 1.) A random sample n = 9 of duck nests in Mississippi Wildlife Refuge yielded the following data for the number eggs in the nest: 13 11 8 6 6 4 7 9 10 a) Find the sample mean (2 pts.) b) Find the sample median (2 pts.) c) Find the sample mode (1 pt.) d) Find the sample variance and sample standard deviation (6 pts.) 2.) Cigar Smoking and Cancer Death The Journal of the National Cancer Institute (Feb. 16, 2000) published the results of a study that investigated the association between cigar smoking and death from tobaccorelated cancers. Data were obtained for a national sample of 137,243 American men. The results are summarized in the table below. Each man in the study was classified according to his cigar-smoking status and whether or not he died from a tobacco-related cancer. (16 pts. – 2 pts. each) Cigars Never Smoked Former Smoker Current Smoker Column Totals Yes 782 91 141 1,014 Died from Cancer No 120,747 7,757 7,725 136,229 Row Totals 121,529 7,848 7,866 137,243 a) Find the estimated probability that a randomly selected man from this population never smoked cigars and died from cancer. b) Find the estimated probability that a randomly selected man was a former cigar smoker and died from cancer. c) Given that a male was current cigar smoker, find the probability that he died from cancer. d) Given that a male was a former cigar smoker, find the probability that he died from cancer. e) Given that a male never smoked cigars, find the probability that he died from cancer. f) How many times more likely to die from cancer is a male who is a current cigar smoker than a male who has never smoked cigars. g) How many times more likely to die from cancer is a male who is a former cigar smoker than a male who has never smoked cigars. h) Do the results from parts (f) and (g) prove that cigar smoking causes cancer? Explain. 1 3. Bias in Graduate School Admissions at Univ. of Calif. – Berkeley Bickel and O’Connell (1975) investigated whether there was any evidence of gender bias in graduate admission at the University of California at Berkeley. The data file Berkeley Sex Discrim 1975 contains the results of a cross-classification of 4,526 applicants to six of the largest graduate programs in 1973 by Sex (M or F) and Admission (whether or not the applicant was admitted to the program to which they applied). a) What percent of male applicants were admitted to these programs? What percent of female applicants were admitted? What percent of all applicants were admitted? Does there appear to be any sexual discrimination in graduate school admissions, i.e. are sex and admission related? (4 pts.) b) Now consider a cross-classification of sex and admission for each of these six programs individually. Do these indicate any sex discrimination in admissions? Explain and justify your conclusions. (4 pts.) Program A Program B Program C Program D Program E Program F 2 LEFTOVERS FROM ASSIGNMENT 2 4. Comparison of Cell Characteristics of Benign and Malignant Breast Tumors Data File: BreastDiag.JMP Key Words: Histograms, Summary Statistics, and Comparative Displays These data come from a study of breast tumors conducted at the University of WisconsinMadison. The goal was determine if malignancy of a tumor could be established by using shape characteristics of cells obtained via fine needle aspiration (FNA) and the subsequent digitized scanning of the cells. The sample of tumor cells were examined under an electron microscope and a variety of cell shape characteristics were measured. Your goal is to use summary statistics and graphical displays to examine potential differences between benign and malignant tumors cells on two of the characteristics of cells measured as part of the study: cell radius and cell symmetry. The variables in the data file are: Diagnosis determined by biopsy - B = benign or M = malignant Radius = radius (mean of distances from center to points on the perimeter Symmetry = symmetry (measure of symmetry of the cell nucleus) Medical literature citations: W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171. W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995;130:511-516. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology, 26:792--796, 1995. See also: http://www.cs.wisc.edu/~olvi/uwmp/mpml.html http://www.cs.wisc.edu/~olvi/uwmp/cancer.html a) Use JMP to obtain histograms, outlier boxplots, and numerical summary statistics (mean, median, quantiles, standard deviation, etc.) for the cell radii for both malignant and benign breast tumor cells. Make sure the histograms have uniform scaling. To do this in JMP select Analyze > Distribution and place Radius in the Y, Columns box and put Diagnosis in the By box. Be sure to select both Stack and Uniform Scaling 3 from the Distributions pull-down menu as shown below. Use the results to make a comparison between malignant and benign tumors cells in terms of cell radius. Your comparison should address each of the following aspects: measures of location (mean,median,quantiles) (2 pts.) measures of variability (range,interquartile range,standard deviation, CV) (2 pts.) distributional shape (histogram and boxplot) (1 pt.) b) Use the Fit Y by X option in the Analyze menu to construct comparative boxplots for the cell radii in the two groups. To do this place the variable Diagnosis in the X box and Radius in the Y box. Then select the appropriate display options. Briefly summarize what you see from this plot. What are the advantages/disadvantages, if any, in using comparative boxplots versus the stacked histograms you examined in part (a). (3 pts.) 5. WSU Student Survey For WSU student survey the summary statistics for book costs and GPA are given below. Use these summary statistics to determine which is more extreme a student with a GPA = 3.85 or student with book costs of $550. (4 pts.) GPA Book Costs ($) 6. Book Problem 2.60 (5 pts.) - all parts. (Datafile: PGADRIVER) 7. Book Problem 2.98 (4 pts.) TURN THIS PROBLEM IN ON A SEPARATE PIECE OF PAPER IN CLASS NEXT WEEK. 4