Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EDA Answer Keys EDA Activities #1-3 Answers 1. Women’s Ages Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Women Age (years) 40 33.225 12.453992 31.5 12 59 23 41 18 This data describes the ages of 40 women in years. The shape of the data set was slightly skewed right. The best center was the median of 31.5 years. So the average age of these women is 31.5 years old. The best measure of spread is the IQR of 18 years. So typical women’s ages were 18 years from each other. In fact, typical ages were in between 23 years old (Q1) and 41 years old (Q3). There were no outliers or unusual ages in the data set. The youngest girl was 12 years old and the oldest woman was 59 years old, but neither was unusual. 2. Men’s Ages Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Men Age (years) 40 35.475 13.926524 32.5 17 73 25.5 45 19.5 This data describes the ages of 40 men in years. The shape of the data set was slightly skewed right. The best center was the median of 32.5 years. So the average age of these men is 32.5 years old. The best measure of spread is the IQR of 19.5 years. So typical men’s ages were 19.5 years from each other. In fact, typical ages were in between 25.5 years old (Q1) and 45 years old (Q3). There were no outliers or unusual ages in the data set. The youngest man was 17 years old and the oldest man was 73 years old, but neither was unusual. 3. Women’s Heights in inches Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Women Ht (in) 40 63.195 2.7412284 63.35 57 68 61.35 64.9 3.55 Mean = 63.20 Standard Deviation = 2.74 63.20-2.74<Typical <63.20+2.74 60.46 < Typical < 65.94 The data describes the heights of 40 women in inches. The data set was bell shaped. The best center was the mean of 63.20 inches. So the average height for the women was 63.20 inches. The best measure of spread was the standard deviation of 2.74 inches. So typical women’s heights were 2.74 inches from the mean. So typical heights were between 60.46 inches and 65.94 inches. There were no outliers (no unusual heights). The tallest woman was 68 inches and the shortest was 57 inches, but neither was unusual. 4. Men’s Heights in inches Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Men Ht (in) 40 68.335 3.0195559 68.3 61.3 76.2 66.3 70.15 3.85 Mean = 68.34 Standard Deviation = 3.02 68.34 – 3.02 <Typical < 68.34 + 3.02 65.32 < Typical < 71.36 The data describes the heights of 40 men in inches. The data set was bell shaped. The best center was the mean of 68.34 inches. So the average height for the men was 68.34 inches. The best measure of spread was the standard deviation of 3.02 inches. So typical men’s heights were 3.02 inches from the mean. So typical heights were between 65.32 inches and 71.36 inches. The tallest man was 76.2 inches and this was an outlier. He was considered unusually tall compared to the other men in the data. The shortest man was 61.3 inches, but this was not considered unusual. 5. Women’s weights in pounds Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Women Wt (Lbs) 40 146.22 37.62104 135.8 94.3 255.9 116.9 162.95 46.05 This data describes the weights of 40 women in pounds. The shape of the data set was skewed right. The best center was the median of 135.8 pounds. So the average weight of these women is 135.8 pounds. The best measure of spread is the IQR of 46.05 pounds. So typical women’s weights were 46.05 pounds from each other. In fact, typical weights were in between 116.9 pounds (Q1) and 162.95 pounds (Q3). There were two outliers in the data set at 238.4 pounds and 255.9 pounds. The heaviest woman was unusual considered unusual at 255.9 pounds and the lightest woman was 94.3 pounds, but this was not unusual. 6. Men’s Weights in Pounds Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Men Wt (Lbs) 40 172.55 26.327163 169.95 119.5 237.1 152.2 190.1 37.9 Mean = 172.55 Standard Deviation = 26.33 172.55 – 26.33 <Typical < 172.55 + 26.33 146.22 < Typical < 198.88 The data describes the weights of 40 men in pounds. The data set was bell shaped. The best center was the mean of 172.55 pounds. So the average height for the men was 172.55 pounds. The best measure of spread was the standard deviation of 26.33 pounds. So typical men’s weights were 26.33 pounds from the mean. So typical weights were between 146.22 pounds and 198.88 pounds. There were no outliers in the data set. The heaviest man was 237.1 pounds and the lightest man was 119.5 pounds but neither of these was unusual. 7. Women’s Pulse Rates Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Women Pulse (Beats per min) 40 76.3 12.498615 74 60 124 68 80 12 This data describes the pulse rates of 40 women in beats per minute (bpm). The shape of the data set was skewed right. The best center was the median of 74 bpm. So the average pulse rate of these women 74 bpm. The best measure of spread is the IQR of 12 bpm. So typical women’s pulse rates were 12 bpm from each other. In fact, typical pulse rates were in between 68 bpm (Q1) and 80 bpm (Q3). There were two outliers in the data set at 104 bpm and 124 bpm. The highest pulse rate was 124 bpm and was considered unusual. The lowest pulse rate was 60 bpm, but this was not unusual. 8. Men’s pulse rates Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Men Pulse (BPM) 40 69.4 11.297379 66 56 96 60 76 16 This data describes the pulse rates of 40 men in beats per minute (bpm). The shape of the data set was skewed right. The best center was the median of 66 bpm. So the average pulse rate of these men 66 bpm. The best measure of spread is the IQR of 16 bpm. So typical men’s pulse rates were 16 bpm from each other. In fact, typical pulse rates were in between 60 bpm (Q1) and 76 bpm (Q3). There were no outliers in the data set. The highest pulse rate was 96 bpm and the lowest pulse rate was 56 bpm, but neither of these was unusual. 9. Women’s Body Mass Index (BMI) Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Women BMI 40 25.74 6.1655702 23.9 17.7 44.9 20.95 29.4 8.45 This data describes the body mass index of 40 women. The shape of the data set was skewed right. The best center was the median of 23.9. So the average BMI for these women 23.9. The best measure of spread is the IQR of 8.45. So typical women’s BMI were 8.45 BMI points from each other. In fact, typical BMI scores were in between 20.95 (Q1) and 29.4 (Q3). The highest BMI of 44.9 was an outlier and unusually high, but the lowest BMI of 17.7 was not unusual. 10. Men’s Body Mass Index (BMI) Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3 IQR Men BMI 40 25.9975 3.4307424 26.2 19.6 33.2 23.65 27.6 3.95 Mean = 26.00 Standard Deviation = 3.43 26.00 – 3.43 <Typical < 26.00 + 3.43 22.57 < Typical < 29.43 The data describes the body mass index (BMI) of 40 men. The data set was bell shaped. The best center was the mean of 26.00. So the average BMI for the men was 26.00. The best measure of spread was the standard deviation of 3.43. So typical men’s weights were 3.43 BMI points from the mean. So typical BMI scores were between 22.57 and 29.43. There were no outliers in the data set. The highest BMI score was 33.2 and the lowest BMI score was 19.6, but neither of these was unusual. EDA Activity 4 Answers 1. For each of the following sample statistics, classify it as a measure of spread (variability), a measure of center (average), or a measure of position. Then write a sentence describing what the statistic tells us. a) Mean: A measure of center or average. It is the balancing point in the data set in terms of distances. It is only accurate when the data is bell shaped. b) Standard Deviation: A measure of spread. It is how far typical values in the data are from the mean. It is only accurate when the data is bell shaped. c) Minimum: A measure of position. The smallest # in the data set. d) Range: A measure of spread. Does not represent typical values in the data and is influenced by outliers. e) Median: A measure of center or average. This is the true center of the data when the values are put in order. Approximately 50% of the numbers in the data set will be less than the median and 50% of the numbers will be higher than the median. A very accurate measure of center. Often used when the data set is skewed. f) Quartile 3 (Q3): A measure of position. Approximately 75% of the data set is less than Q3. Often used as part of the typical range for skewed data sets. g) Interquartile Range (IQR): A measure of spread. IQR is a highly accurate measure of typical spread. It is how far typical values are from each other. It also measures the middle 50% of the data values and is the length of the box in a boxplot. h) Maximum: A measure of position. The largest # in the data set. i) Quartile 1 (Q1): A measure of position. Approximately 25% of the data set is less than Q1. Often used as part of the typical range for skewed data sets. j) Mode: A measure of center. This is the number that occurs most often. It is a useful statistic in finance and sales. It is often a good measure of center for bi-modal data sets. k) Variance: A measure of spread. The Variance is the standard deviation squared and is a vital statistic in ANOVA testing, but it is only accurate if the data set is bell shaped. 2. List all the measures of center. Which is the most accurate for bell shaped (normal) data sets? Which is the most accurate for skewed data sets? Measures of Center: Mean, Median, Mode, Bell shape: The mean and median are both accurate, but it is custom to use the mean as your center and average when bell shaped. Skewed: The mean is not very accurate and should not be used. The median is the most accurate center and average for skewed data sets. 3. List all the measures of spread. Which is the most accurate for bell shaped (normal) data sets? Which is the most accurate for skewed data sets? Measures of Spread: Range, Standard Deviation, Variance, IQR Bell shaped: We like to use the standard deviation when the data set is bell shaped. Skewed: We should not use the standard deviation or variance when the data set is skewed. Since quartiles are not effected by outliers and a skew, we should use IQR as the most accurate measure of spread when a data set is skewed. 4. List all the measures of position. Min, Max, Q1, Q3 5. A very important statistic that is not a center, spread or position, is the frequency or sample size. Write a sentence describing the meaning of the sample size. The sample size or frequency (n) counts how many numbers are in the data set. 6. Use Statcrunch and the Bear data to find all of the summary statistics we discussed for the bears weight. You need to give the name of the statistic, the number and the units. All the weights are in pounds. Summary statistics: Column n Mean Variance Std. dev. Median Weight (Lbs) 54 182.88889 14835.535 121.80121 150 Summary statistics: Column Range Min Max Q1 Q3 IQR Mode Weight (Lbs) 488 26 514 86 236 150 No unique