* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ch16 - courses.psu.edu
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					CHAPTER 16 0011 0010 1010 1101 0001 0100 1011 BIVARIATE STATISTICS: PARAMETRIC TESTS Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 1 2 4 PPT-1 What The Experts Say 0011 0010 1010 1101 0001 0100 1011 What’s the point in doing surveys if you can’t analyze the data? Converting and reducing data into meaningful results is a marketing researcher’s key responsibility. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 --SPSS Web Page, “Analysis,” http://www.spss.com/spssmr/solutions/ analysis.htm, February 19, 2001. PPT-2 Learning Objectives 0011 0010 1010 1101 0001 0100 1011 • Discuss the importance of parametric statistics • Describe the difference between tests of differences and tests of associations • Explain how to use z- and t-tests to compare two groups • Describe and calculate the F-test • Discuss the meaning and use of analysis of variance • Describe correlation and regression analyses • Calculate and interpret correlation and regression statistics • Compute one-way analysis of variance manually and by computer 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-3 Get This! My Name Is Important to Me 0011 •0010 1101How 0001to 0100 In 1010 the book, Win1011 Friends and Influence People, Dale Carnegie wrote, “Remember that a person’s name is to that person the sweetest and most important sound in any language.” • A professor classified students into three groups: names (those he could remember), no-names (those he could not remember), and neutral-names (those whose names he never made reference to during the conversations). • At the end of a meeting with each student, the professor would state “Oh, I have to ask you something else. My wife is selling cookies for the church. If you want any, they’re only 25 cents.” This offer was made to examine if remembrance of a student’s name made a difference regarding whether or not he or she would comply with a request (that is, purchase the cookies). 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-4 Get This! My Name Is Important to Me – cont’d 0011 0010 1010 1101 0001 0100 1011 • The results were analyzed using several different statistical techniques, one being analysis of variance. • He found: 1 2 – Not being able to remember a student’s name produced compliance results (that is, purchasing the cookies) no different from those of a condition in which the issue of a student’s name was never raised. – The higher purchasing rate for those students whose names were remembered indicates that name remembrance facilitates compliance. Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 4 PPT-5 Now Ask Yourself 0011 0010 1010 1101 0001 0100 1011 • Based on your knowledge of statistics, do you have faith in the findings since various statistical tools were used to analyze the data? Was it really necessary for the researchers to run statistical tests to generate their findings? • What was meant by, “The professor decided to use this method [analysis of variance] since it tests whether there are statistically significant differences among the means of each of the student groups”? • Were the results surprising to you? If so, what did you expect? If not, why not? 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-6 Parametric Tests The hypothesis tests assume that variables under investigation are measured using either interval or ratio scales. Furthermore, it is necessary to make some additional assumptions. 0011 0010 1010 1101 0001 0100 1011 1 2 • The sample data should be randomly drawn from a normally distributed population. • The sample data drawn must be independent of each other. • When examining central tendency for which two or more samples are drawn, the population should have equal variances. Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 4 PPT-7 Tests of Difference 0011 0010 1010 1101 0001 0100 1011 Can be used whenever a researcher is interested in comparing some characteristic of one group with a characteristic of another and determining whether or not a significant difference exists between the two groups. • • • • 1 2 the first population and its samples are identified by subscript 1 the second population and its samples are identified by subscript 2  1 represents the mean of the sample drawn from population 1  2 represents the mean of the sample drawn from population 2 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 4 PPT-8 Z-test: Difference Between Means 0011 Used to determine whether two population means differ from each 1010 other.1101 This0001 can be determined by using either the z-test or t0010 0100 1011 test, depending on the sample size and whether or not the population standard deviation is known for either group. If the sample size is at least 30 and the population standard deviations are known, the z-test should be used. z ( X 1  X 2 )  ( 1   2 )  ( x x ) 1 where 2 where  (x x )  1 2  21 n1   22 ( X 1  X 2 ) = the difference between sample means n2 1 4 (1   2 ) = the difference between population means X 2 and X 1 = sample means for the two variables  x x 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 2 [Formula 16-1] = standard error of the difference between the means PPT-9 t-Test: Difference Between Means 0011 0010 1010 1101 0001 0100 1011 When the sample size is less than 30 and the population standard deviations are unknown, we can determine whether or not a significant difference exists between two means (or whether the two population means are equal). t where Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 1 ( X 1  X 2 )  ( 1   2 ) s( x1  x2 ) s ( x1  x2 )  n1 s1 2  n2 s 2 2    n1  n2  2  n1  n2   n n  1 2    2 4 PPT-10 Difference Between Two Proportions and Independent Samples 0011 0010 1010 1101 0001 0100 1011 Let p1 and p2 be the proportions of two samples drawn from respective populations with proportions P1 and P2 . The null hypothesis is that there is no difference between the two population proportions; that is P1 = P2 or stated another way, P1 P2 = 0. If the null hypothesis is true, P1 = P2 , the two populations are really the same population. The basic concept concerning the difference between two sample proportions is analogous to that concerning the difference between two sample means. 1 2 4 1. The mean of the sampling distribution (p1 - p2) is equal to the difference between the two population proportions, P1 and P2, or p1 – p2 = P1 – P2. Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-11 Difference Between Two Proportions and Independent Samples – cont’d 2. 1010 The 1101 variance the difference 0011 0010 0001of0100 1011 between two sample proportions is the sum of variances of the two sample proportions,  2( p  p )   2 p   2 p  1 2 1 2 Q1  1  P1 where Q2  1  P2 P1Q1 P2 Q2  n1 n2 1 2 When the sampling distributions of p1 and p2 are normal, the distribution of the differences between p1 and p2 is also normal. Since the mean of the sampling distribution of p1 - p2 is equal to the difference between the two population proportions, the distribution that follows is normal. z ( p1  p 2 )  ( P1  P2 ) ( p  p 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2) 4 PPT-12 Difference Between Two Proportions and Independent Samples – cont’d 0011 0010 1010 1101 0001 0100 1011 p1 p2 P1 P2 = sample proportion successes in first group = sample proportion successes in second group = population proportion of first group = population proportion of second group  ( p1  p2 ) = variance of the difference between two sample proportions 1 When P1 = P2, P1 - P2 = 0 and P1Q1 = P2Q2 = PQ where Q = 1 – P. Thus z p1  p 2  (p p 1 where Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western ( p  p )  1 2 2) 1 1 P1Q1 P2 Q2   PQ   n1 n2  n1 n2  2 4 PPT-13 Analysis of Variance 0011 0010 1010 1101 0001 0100 1011 • The two tests (z-tests and t-tests) are useful when testing a null hypothesis when only two samples are involved. Analysis of Variance (ANOVA) is often the preferred method to test whether there is a significant difference among means of two or more independent samples. It is applicable whenever a study involves an interval- or ratio-scaled dependent variable. • One-Way Analysis of Variance is discussed in this chapter. It is a bivariate statistical technique that involves only one independent variable, although there may be multiple levels of that variable. • The null hypothesis for ANOVA is that the means of normally distributed populations, such as three populations, a, b, c, are equal or a = b = c. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-14 Analysis of Variance – cont’d 0011 0010 1010 1101 0001 0100 1011 If we take a random sample from each of the three original populations, we may consider the three samples of subsets of a single large sample drawn from the single large population. X Grand mean = X   Xb   Xc a 454  X 13 1 2 4 The unbiased estimate of the large population variance ( ) based on the preceding samples may be obtained by calculating the variance between groups [MSA ( ŝ )] and the variance within groups [MSE (ŝ )]. 2 2 1 2 2 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-15 Variance Between Groups The variance between (or between samples) is also referred to as 0011 0010 1010 1101 0001 groups 0100 1011 the “mean sum of squares between (among) groups.” It is sometimes denoted as MSA or ŝ1 2 . It is written in a general form: 2 n ( X  X ) SS  i i between sˆ1 2  MSA   df r 1 i = individual groups or samples a, b, c, … ni = size of group i, or size of sample drawn from population i, such as na  4, nb  5, nc  4 in the preceding illustration X i = mean of the items in group or sample i X i  X = grand mean, or mean of all items in the single large sample X = deviation of group mean from grand mean ( X i  X ) 2 = variation, or squared deviation (The term “variation” has been used loosely in previous discussions. Here, the term is limited to represent the squared deviation.) r = number of groups or samples, such as three groups in the above illustration 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-16 Variance Between Groups – cont’d 0011 0010 1010 1101 0001 0100 1011 Note that the deviation X  X is called the effect, and the nature of the sample i is called the treatment. Furthermore, whenever ANOVA is used, the independent variables are called factors, so the different levels (or categories) of a factor are the treatments. i 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-17 Variance Within Groups 0011 0010 1010 1101 0001 0100 1011 The variance within groups (or within individual samples) is also referred to as the mean square error (MSE) or ŝ 2 2, since it is an estimate of the random error existing in the data. It is written in a general form sˆ2 2 SS within  MSE   df   ( X 2   X ) i i nr where X i  individual items in group i 1 2 4 n  na  nb  nc  number of items in the single large sample Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-18 F-Test 0011 Represents the variance ratio in showing the relationship between the two estimated 0010 1010independently 1101 0001 0100 1011 population variances 2 sˆ1 F 2 sˆ2 where the subscripts 1 (in the numerator) and 2 (in the denominator) indicate the sample numbers and each represents the estimate of the population variance based on the sample. 1 2 The F-statistic is the variance between groups divided by the variance within groups. It is used to test for group differences and compares one sample variance with another sample variance. It can be presented this way: Variance between groups F = Variance within groups Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western = 4 MSA MSE PPT-19 Tests of Associations 0011 0010 1010 1101 0001 0100 1011 • Examine associations between two or more variables. • When two groups are studied, there will always be a variable that predicts the actions of another variable. The predictor variable is the independent variable, and the criterion variable is the dependent variable. • Tests to measure statistical relationships between variables are: 1 – Regression Analysis – Correlation Analysis Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-20 Scatter Diagrams 0011 0010 1010 1101 0001 0100 1011 When two related variables, called bivariate data, are plotted as points on a graph, the graph is called a scatter diagram. A scatter diagram indicates whether the relationship between the two variables is positive or negative. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-21 Regression Analysis • Refers to statistical techniques for measuring the linear or curvilinear relationship between a dependent variable and one or more 0011 0010independent 1010 1101variables. 0001 0100 The1011 relationship between two variables is characterized by how they vary together. • Given pairs of X and Y variables, regression analysis measures the direction (positive or negative) and rate of change (slope) in Y as X changes, or vice versa. Using the values of the independent variable, it attempts to predict the values of an interval- or ratio-scaled dependent variable. • Regression analysis requires two operations: (1) Derive an equation, called the regression equation, and a line representing the equation to describe the shape of the relationship between the variables. (2) Estimate the dependent variable (Y) from the independent variable (X), based on the relationship described by the regression equation. • The regression line is the line drawn through a scatter diagram that “best fits” the data points and most accurately describes the relationship between the two variables. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-22 Regression Equation and Regression Line 0011 0010 1101 0001 0100 1011 While1010 all shapes are informative, a straight line is especially useful, because it is the easiest to deal with in regression analysis to describe the shape of the average relationship between two variables. The straight line can be expressed by the linear equation: Yc  a  bX where Yc = computed value of the dependent variable 1 2 4 a = Y-intercept where X equals zero b = slope of the regression line, which is the increase or decrease in Y for each change of one unit of X X = a given value of the independent variable Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-23 Regression Equation and Regression Line – cont’d 0011To 0010 1010 1101 0001model, 0100 1011 create a regression researchers estimate the regression line using the following equation Y   o  1 X 1   i where o 1 Xi i i Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western = = = = = 1 2 Y-intercept where X equals zero slope of the regression line, which is the increase or decrease in Y for each change of one unit of X a given value of the independent variable observation number error term associated with the ith observation 4 PPT-24 Least-Squares Method 0011 0010 1010 1101 0001 0100 1011 • A statistical technique that fits a straight line to a scatter diagram by finding the smallest sum of the vertical distances squared 2 (i.e., ) of  allei the points from the straight line. The equation derived by this method will yield a regression line that best fits the data. • To calculate the straight line by the least-squares method, the equation Yc  a  bX is used. We must first determine the constants, a and b, which are called regression coefficients. Regression coefficients are the values that represent the effect of the individual independent variables on the dependent variable. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-25 Least-Squares Method – cont’d 0011 0010 1010 1101 0001 0100 1011 b n XY    X   Y n X 2    X  2 Y X   a b n n or a  Y  bX Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 1 2 4 PPT-26 Standard Deviation of Regression 0011 0010 1010 1101 0001 0100 1011 The standard deviation of the Y values from the regression line ( Yc) is called the standard deviation of regression. It is also popularly called the standard of error of estimate, since it can be used to measure the error of the estimates of individual Y values based on the regression line. Thus 1 2 s y = the standard deviation of Y values from the mean  4 s x = the standard deviation of X values from the mean X s yx = the standard deviation of regression of Y values from Y c s xy = the standard deviation of regression of X values from X c Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-27 Standard Deviation of Regression – cont’d The standard deviation of Y values from the regression line is based on the points representing values0001 scattered 0011 0010 1010Y1101 0100around 1011 the least-squares line. The closer the points to the line, the smaller the value of the standard deviation of regression. Thus, the estimates of Y values based on the line are more reliable. On the other hand, the wider the points are scattered around the least-squares line, the larger the standard deviation of regression and the smaller the reliability of the estimates based on the line or the regression equation. The general formula for the standard deviation of regression of Y values on X is s yx  2   Y  Y  c nk 1 2 4 where k = number of total (dependent and independent) variables. However, a simpler method of computing s yx is to use the following formula s yx  Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 Y   aY  b XY nk PPT-28 Correlation Analysis 0011 0010 1010 1101 0001 0100 1011 Correlation Analysis: Refers to the statistical techniques for measuring the closeness of the relationship between two metric (interval- or ratioscaled) variables. It measures the degree to which changes in one variable are associated with changes in another. The computation concerning the degree of closeness is based on regression statistics. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-29 Total Deviation, Coefficient of Determination, and Correlation Coefficient 0011 0010 1010 1101 0001 0100 1011 Total Deviation ( Y  Y ). Assume there are two variables, X and Y. The mean of Y values = ( Y)/n,  , is obtained without referring to X values. TheYc , representing the regression line of Y values = a + bx, is obtained with the influence of X values. If Y values are related to X values to some degree, the deviations of Y values from  must be reduced somewhat by the introduction of X values in computing Yc values. The total deviation of Y from the mean  is divided into two parts: 1 Y Y Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western = Y  Yc + 2 4 Total deviation = Unexplained deviation + Explained deviation Yc  Y PPT-30 Total Deviation, Coefficient of Determination, and Correlation Coefficient – cont’d 0011 0010 1010 1101 0001 0100 1011 The explained variation  (Y  Y ) 2 may also be referred to as the 2 regression sum of squares (RSS). The unexplained variation  (Y  Yc ) is called the error sum of squares (ESS). This relationship may be expressed as Total variation TSS  (Y  Y )2 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 1 2 4 = Unexplained variation + Explained variation = + = ESS  (Y  Yc ) 2 + RSS 2 ( Y  Y )  c PPT-31 Coefficient of Determination (r2) The coefficient of determination (r2) is the strength of association or 0011 0010 1010 1101 0001 0100 1011 degree of closeness of the relationship between two variables measured by a relative value. It demonstrates how well the regression line fits the scattered points. It may be defined as the ratio of the explained variation to the total variation: Coefficient of determination = Explained variation Total variation  Y  Y    Y  Y  2 r or symbolically, r2 2 c 2 = 1 RSS TSS 2 4 r2 The range of the value is therefore from 0 to 1. When is close to 1, the Y values are very close to the regression line. When r2 is close to 0, the Y values are not close to the regression line. Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-32 Correlation Coefficient square root of r2 or r 2   r is frequently computed to indicate the direction of the relationship in addition to indicating the degree of the relationship. It is the correlation between the observed and predicted values of the dependent variable. Since the range of r2 is from 0 to 1, the coefficient of correlation r will vary within the range of 0 to 1, or from 0 to +1. 0011 0010 1101 0001 0100 1011 the The 1010 correlation coefficient, 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-33 Decision Time! 0011 0010 1010 1101 0001 0100 1011 As a marketing manager, you want information from marketing researchers that can enhance your decisionmaking abilities. If correlation analysis is a popular and informative statistical method, why should researchers bother using more complex, somewhat intimidating bivariate statistical techniques? Do you feel that there is really that much to gain from these methods? Why or why not? 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-34 Net Impact 0011 0010 1010 1101 0001 0100 1011 • The Internet can be a valuable tool to learn about bivariate statistical techniques. • Using almost any search engine, you can find a variety of discussions about the topic. • These discussions may be available on the Internet as part of a company’s promotion of its statistical services, a university professor’s statistical seminar notes, or PowerPoint slides that were used in a seminar presentation. 1 Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 2 4 PPT-35 0011 0010 1010 1101 0001 0100 1011 Chapter 16 End of Presentation Marketing Research, 2nd Edition Alan T. Shao Copyright © 2002 by South-Western 1 2 4 PPT-36
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            