* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Descriptive Statistics Inferential Statistics   Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was drawn. Generalizability is important is this type of statistic because it is the ability to use the results of data collected from a sample to reach conclusions about the characteristics of the population.   Definition: Statistics used to described the characteristics of a distribution of scores. They apply only to the members of a sample or population from which data have been collected. Generalizability to the population is not the objective of descriptive statistics Population  Definition: The collection of cases that comprise the entire set of cases with the specified characteristics (e.g., “All living adult males in the United States”)  Example: In order to find the average salary of Psychology majors who graduated from college in 2004, collect information about the salaries of all the 2004 Psychology graduates and derive an average from that data.  Any value generated from or applied to the population is a parameter. Sample    Definition: A collection of cases selected from a larger population Example: In order to find the average salary of Psychology majors who graduated from college in 2004, you select (randomly or non-randomly) some of these graduates and derive a mean from their salaries. Any value derived from the sample, such as the mean, is a statistic. Sampling Methods RANDOM    Definition: Selecting cases from a population in a manner that ensures each member of the population has an equal chance of being selected into the sample. One of the most useful, but most difficult to use. The major benefit of random sampling is that any differences between the sample and the population from which the sample was selected will not be systematic. REPRESENTATIVE  Definition: A method of selecting a sample in which members are purposely selected to create a sample that represents the population on some characteristic(s) of interest (e.g., when a sample is selected to have the same percentages of various ethnic groups as the larger population). • This type of sampling can be expensive and time consuming, however it ensures that your sample looks the population on some important variables, therefore increasing the generalizability of the sample.    CONVENIENCE Definition: Selecting a sample based on ease of access or availability. This method of selecting a sample is less labor-intensive than selecting a random or representative sample. In order for it to be an acceptable method, it cannot differ from my population of interest in ways that influence the outcome of the study. Variable   Any construct with more than one value that is examined in research. Examples include income, gender, age, height, attitudes about school, score on a measure of depression, etc. Types of Variables  Quantitative (continuous) variable A variable that has assigned values and the values are ordered and meaningful, such that 1 is less than 2, 2 is less than 3, etc.  Qualitative (categorical) variable A variable that has discrete categories. If the categories are given numerical values, the values have meaning as nominal references but not as numerical values (e.g., in 1 = “male” and 2 = “female” 1 is not more or less than 2). Scales of Measurement for Variables    Nominally (or categorical) scaled variable: A variable in which the numerical values assigned to each category are simply labels rather than meaningful numbers. Ordinal variable: Variables measured with numerical values where the numbers are meaningful (e.g., 2 is larger than 1) but the distance between the numbers is not constant. Interval or Ratio variable: Variables measured with numerical values with equal distance, or space, between each number (e.g., 2 is twice as much as 1, 4 is twice as much as 2, the distance between 1 and 2 is the same as the distance between 2 and 3). Collecting Data    Collecting data produces a group of scores on one or more variables To get the distribution of scores you must arrange the scores from lowest to highest Researchers are usually interested in central tendency, a set of distribution characteristics that consist of the mean, median, and mode The Mean     Definition: The arithmetic average of a distribution of scores Provides a single, simple number that gives a rough summary of the distribution The most commonly used statistic in all social science research Useful, but does not tell you anything about how spread out the scores are (i.e., variance) or how many scores in the distribution are close to the mean The Median    Definition: The score in a distribution that marks the 50th percentile. It is the score at which 50% of the distribution falls below and 50% fall above Used when dividing distribution scores into two groups (median split) Useful statistic to examine when the scores in a distribution are skewed or when there are a few extreme scores at the high end or the low end of the distribution The Mode   Definition: The score in the distribution that occurs most frequently Least used of the measures of central tendency; provides the least amount of information Calculating the Mean 1. Add, or sum, all of the scores in a distribution 2. Divide by the number of scores Formula for calculating the mean of a distribution is the sample mean  is the population mean S means “the sum of ” X is an individual score in the distribution n is the number of scores in the sample N is the number of scores in the population X X,  X n, N OR 1. Multiply each value by the frequency for which the value occurred 2. Add all of these products 3. Divide by the number of scores Calculating The Median 1. 2. Arrange all of the scores in the distribution in order, from smallest to largest Find the middle score in the distribution If there is an odd number of scores... there will be a single score that marks the middle of the distribution If there are an even number of scores in the distribution... the median is the average of the two scores in the middle of the distribution (as long as the scores are arranged in order, from largest to smallest) Finding the average add the two scores in the middle together and divide by two Finding The Mode Example of bimodal distribution   Remember, the mode is simply the category in the distribution that has the highest number of scores, or the highest frequency Multimodal: When a distribution of scores has two or more values that have the highest frequency of scores • Example - Bimodal distribution: A distribution that has two values that have the highest frequency of scores; often occurs when people respond to controversial questions that tend to polarize the public On the following scale, please indicate how you feel about capital punishment. 1—————2—————3—————4—————5 Strongly Opposed Strongly In Favor Frequency of Responses Category of Responses on the Scale Frequency of Responses in Each Category 1 2 3 4 5 45 3 4 3 45 Example: The Mean, Median, and Mode of a Distribution The following distribution of test scores are given: 86 90 96 96 100 105 115 121 Mean = 86+90+96+96+100+105+115+121 = 101.13 8 Calculating the mean: Add up all the scores, then divide by the number of scores. In this case, there are 8 IQ scores. Median = 96+100 = 98 2 Calculating the median: Because there is an even amount of scores, sum the two scores that are found in the middle of the distribution when it is put into numerical order, then divide by two. Mode = 96 Calculating the mode: 96 is the most frequent number that occurs Skewed Distribution   Definition: A distribution of scores has a high number of scores clustered at one end of the distribution with relatively few scores spread out toward the other end of the distribution, forming a tail. When working with a skewed distribution, the mean, median, and mode are usually all at different points rather than at the center of distribution.  Similarities between a skewed and normal distribution: •  The procedures used to calculate a mean, median, and mode are the same Differences between a skewed and normal distribution: • The position of the three measures of central tendency in the distribution Left or Negative Right or Positive Skewness  Skewness Ranges    If skewness is less than −1 or greater than +1, the distribution is highly skewed. If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately skewed. If skewness is between −½ and +½, the distribution is approximately symmetric.   If a distribution is symmetric, the next question is about the central peak: is it high and sharp, or short and broad Kurtosis The reference standard is a normal distribution, which has a kurtosis of 3. Often the excess kurtosis is presented: excess kurtosis = kurtosis−3.  A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) is called mesokurtic.  A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a normal distribution, its central peak is lower and broader, and its tails are shorter and thinner.  A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a normal distribution, its central peak is higher and sharper, and its tails are longer and fatter. kurtosis = 3, excess = 0 kurtosis = 1.8, excess = −1.2 kurtosis = 4.2, excess = 1.2 Measures of Central Tendency vs. Measures of Variability    Measures of central tendency provide useful information, but are limited. Measures of central tendency provide insufficient information on the dispersion of scores in a distribution or, in other words, the variety of the scores in a distribution. 3 measures of dispersion that researchers typically examine: range, variance, and standard deviation. Standard deviation is the most informative and widely used of the three. Range     Definition: The range is the difference between the largest (maximum value) score and the smallest score (minimum value) of a distribution Gives researchers a quick sense of how spread out the scores of a distribution are Not practical; misleading at times Helps see whether all or most of the points on a scale, such as a survey, were covered Interquartile Range (IQR)   Definition: The difference between the 75th percentile (third quartile) and 25th percentile (first quartile) scores in a distribution IQR contains scores in the two middle quartiles if scores in a distribution were arranged in order numerically Variance    Definition: The sum of the squared deviations divided by the number of cases in the population, or by the number of cases minus one in the sample Provides a statistical average of the amount of dispersion in a distribution of scores Rarely look at variance by itself because it does not use the same scales as the original measure of a variable; although this is true, it is helpful for the calculation of other statistics (i.e., analysis of variance, regression) Standard Deviation   Definition: The average deviation between the individual scores in the distribution and the mean for the distribution To understand standard deviation, consider the meanings of the two words: • Standard: typical or average •  Deviation: refers to the difference between an individual score and the average score for the distribution Useful statistic; provides handy measure of how spread out the scores are in the distribution  When combined, the mean and standard deviation provide a pretty good picture of what the distribution of scores is like Sample Statistics as Estimates of Population Parameters   For the most part, researchers are concerned with what a sample tells us about the population from which the sample was drawn. This is important because most of the statistics, although generated from sample data, are used to make inferences about the population The formulas for calculating the variance and standard deviation of sample data are actually designed to make sample statistics better estimates of the population parameters (i.e., the population variance and standard deviation) Making Sense of the Formulas for Calculating the Variance    Not interested in the average score of the distribution, rather in the average difference, or deviation, between each score in the distribution and the mean of the distribution First, calculate a deviation score for each individual score in the distribution See next slide for formula Similarities Between the Variance and Standard Deviation Formulas Variance and Standard Deviation Formulas Population  sum  a score in the distribution X the population mean  the number of cases in the Variance population N 2  ( X   ) 2 N  to sum a score in the distribution  the number of cases in the population N  X ( X   ) 2 N  s2  X  sum a score in the distribution the sample mean the number of cases in the sample N X  X the population mean Standard Deviation Estimate Based on Sample ( X  X ) 2 n 1 sum a score in the distribution the sample mean the number of cases in the sample N X s ( X  X ) 2 n 1  Formulas for calculating the variance and the standard deviation are virtually identical. Square root in standard deviation formula is only difference. Calculating the variance is the same for both sample and population data except the denominator for the sample formula, which is n-1 Formula for calculating the variance is known as deviation score formula Differences Between the Variance and Standard Deviation Formulas: Why n – 1?  Brief explanation: • If population mean is unknown, use the sample mean as an estimate. But sample mean probably will differ from the population mean • • Whenever using a number other than the actual mean to calculate the variance, a larger variance will be found. This will be true regardless of whether the number used in the formula is smaller or larger than the actual mean Because the sample mean usually differs from the population mean, the variance and standard deviation will probably be smaller than it would have been if used the population mean • When using the sample mean to generate an estimate of the population variance or standard deviation, it will actually underestimate the size of the population mean • To adjust underestimation:  • use n – 1 in the denominator in sample formulas Smaller denominators produce larger overall variance and standard deviation statistics, making it a more accurate estimate of the population parameters Working with a Population Distribution   Researchers usually assume they are working with a sample that represents a larger population How much of a difference between using N and n-1 in the denominator depends on size of sample • If sample is large, virtually no difference • If sample is small, relatively large difference between the results produced by the population and sample formulas Why Have Variance?  Why not go straight to standard deviation? • We need to calculate the variance before finding the standard deviation. That is because we need to square the deviation scores (so they will not sum to zero). These squared deviations produce the variance. Then we need to take the square root to find the standard deviation. • The fundamental piece of the variance formula, which is the sum of the squared deviations, is used in a number of other statistics, most notably analysis of variance (ANOVA) Students’ responses to the item “I would feel really good if I were the only one who could answer the teacher’s question in class.”  Sample Size = 491 Mean = 2.92 Standard Deviation = 1.43 Variance = (1.43)2 = 2.04 Range = 5 – 1 = 4 Range does not provide very much information. The mean of 2.92 not particularly informative because from the mean it is impossible to determine whether:  Most students circled a 3 on the scale   Roughly equal numbers of students circled each of the five numbers on the response scale Almost half of the students circled 1 whereas the other half circled 5 140 120 120 115 98 100 Frequency  81 77 80 60 40 20 0 1 2 3 4 Scores on desire to demonstrate ability item 5 Drawing Conclusions…  6 Consider the standard deviation in conjunction with the mean • 5 4 3 Predicting what the size of the standard deviation will be:    2 1 If almost all of the students circled a 2 or a 3 on the response scale, expect a fairly small standard deviation If half of the students circled 1 whereas the other half circled 5, expect a large standard deviation (about 2.0) because each score would be about two units away from the mean If the responses are fairly evenly spread out across the five response categories, expect a moderately sized standard deviation (about 1.50) 0 1. 2.  Boxplot for the desire to appear able variable Presented for the same variable that is represented in the previous graph, wanting to demonstrate ability Conclusions: • The distribution looks somewhat symmetrical due to the mean of 2.92 being somewhat in the middle • From the standard deviation of 1.43, we know that the scores are pretty well spread out across the five response categories
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            