Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Different Distributions • Consider the range of the data (the minimum point to the maximum point). • If there is no mode, then the distribution is relatively uniform. • If the mean, median, and mode are about equal, then the distribution is roughly “normal”. • If the mean and median are not roughly equal, then the distribution is “skewed”. Advantages/Disadvantages of Displays • Categorical data: pictograph, bar graph, circle graph • Numerical data (one variable): line plot, histogram, stem and leaf plot, box and whisker plot • Two categorical variables: double bar graph • Two of same numerical variables: side-by-side stem and leaf plot, box and whisker plot • Two different numerical variables: line graph, scatterplot Scatterplot • Used to see if two variables are related. If they are closely related, then we say that correlation is high--this will look like a straight line. Line Graph • Used to see if there is a trend over time. Good place to find graphs • http://images.google.com/images?um= 1&hl=en&client=safari&rls=en&q=usa+t oday+graphs&btnG=Search+Images Normal Distributions • Advantage: let us know quickly the mean and the spread of the data. • Mean is the center of the distribution • Standard Deviation: Quick illustration of standard deviation • • • • Here are 2 data sets: 3, 4, 10, 11 3, 4, 5, 8 If we look at them, we see that the second data set is much closer together than the first. We also see that the data are closer to the mean in the second data set. We use the term standard deviation to describe how close the data is to the mean. 3, 4, 10, 11: Mean = 7 3, 4, 5, 8: Mean = 5 • We are basically finding the average of the distances to the mean for each data set. Find the distance from the mean for each data point. Then square this distance. • • • • • • • (3-7)2 + (4-7)2 + (10-7)2 + (11-7)2 (3-5)2 + (4-5)2 + (5-5)2 + (8-5)2 Add the squares: 42 + 32 + 32 + 42 = 50 22 + 12 + 02 + 32 = 14 Divide by number of data points. 50/4 14/4 Now take the square root: 12.5 3.5 Standard deviation: 3.5 1.9 Example • Graph a normal distribution with a mean of 5 and a standard deviation of 1. • Compare to mean of 6 and standard deviation of 1. • Compare to mean of 3 and a standard deviation of 1 • Compare to a mean of 5 and a standard deviation of 2. • Compare to a mean of 5 and a standard deviation of 0.5. Normal Distribution 0 1 2 3 4 5 6 7 8 9 10 Issues to consider • Bias: the way the data was collected; the way the data is displayed to mislead • Validity: Answers the question asked. • Reliability: Get the same answers each time. Examples of bias • Asking a sample of freshmen to learn college students’ preference of dorms. • Asking only working mothers about childcare issues. • Asking only men about marriage issues. • Using categories/intervals that hide important information in displays Examples of poor validity • Asking questions on the first exam that focus upon content from 302A to determine if students learned the material from Chapter 7. • Asking questions about political party affiliation to determine likelihood of voting. Examples of poor reliability • Are you pleased with your life right now? • Select the best way to mislead in a display; vs. what is the most important concept in not misleading the reader with a display? • On a scale of 1 - 10, how ready are you for a test next week?