Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MBACATÓLICA JAN/APRIL 2006 Marketing Research Fernando S. Machado Week 7 • Fundamentals of Data Analysis: Selecting a Data Analysis Strategy • Frequency Distribution and Cross-Tabulation • Hypothesis Testing: Basic Concepts • Hypothesis Testing: Chi-square Tests of Association and Goodness of fit MBACATÓLICA Marketing Research week 7 1 Fundamentals of Data Analysis: Selecting a DA Strategy ü Overview of Statistical Techniques ü Choice of Statistical Technique MBACATÓLICA Marketing Research week 7 2 1 Overview of Statistical Techniques Univariate Techniques ü Appropriate when there is a single measurement of each of the 'n' sample objects or there are several measurements of each of the `n' observations but each variable is analyzed in isolation Multivariate Techniques ü A collection of procedures for analyzing association between two or more sets of measurements that have been made on each object in one or more samples of objects ü Dependence or interdependence techniques MBACATÓLICA Marketing Research week 7 3 A Classification of Univariate Techniques Univariate Techniques Non-metric Data Metric Data One Sample * t test * Z test Two or More Samples One Sample * Frequency Two or More Samples Chi-Square Independent * Two-Group t test * Z test * One -Way Related * Paired t test ANOVA MBACATÓLICA Marketing Research * K-S * Runs * Binomial Independent * Chi-Square * Mann-Whitney * K-S * Median week 7 Related * Sign * Wilcoxon * McNemar * Chi-Square 4 2 A Classification of Multivariate Techniques Multivariate Techniques Dependence Technique One Dependent Variable * Cross- Tabulation * Analysis of Variance and Covariance * Multiple Regression * Conjoint Analysis Interdependence Technique More Than One Dependent Variable * Multivariate Analysis of Variance and Covariance * Canonical Correlation * Multiple Discriminant Analysis MBACATÓLICA Marketing Research Variable Interdependence * Factor Analysis Interobject Similarity * Cluster Analysis * Multidimensional Scaling week 7 5 Selecting a Data Analysis Strategy Choice of Statistical Technique is influenced by: by: üResearch üType Objectives of Data •Mode •Both is the only measure of central tendency for nominal scaling median and mode can be used for ordinal scale •Mean, median and mode can all be used for interval and ratio sca led data (metric data) •Non-parametric metric data) üResearch •Sample •Number ü tests can be run on ordinal or nominal data (non- Design independence (ex: indep. sample vs. paired sample t test) of groups being analyzed (ex: t test vs. ANOVA) Assumptions Underlying the Test Statistic If assumptions on which a statistical test is based are violate d, the test will provide meaningless results • MBACATÓLICA Marketing Research week 7 6 3 Frequency Distribution and Cross-tabulation ü Frequency Distribution ü Descriptive Statistics ü Cross Tabulation MBACATÓLICA Marketing Research week 7 7 Internet Usage Data RESP. SEX NUMBER 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1.00 2.00 2.00 2.00 1.00 2.00 2.00 2.00 2.00 1.00 2.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00 2.00 1.00 2.00 1.00 2.00 2.00 1.00 1.00 FAMILIARITY 7.00 2.00 3.00 3.00 7.00 4.00 2.00 3.00 3.00 9.00 4.00 5.00 6.00 6.00 6.00 4.00 6.00 4.00 7.00 6.00 6.00 5.00 3.00 7.00 6.00 6.00 5.00 4.00 4.00 3.00 MBACATÓLICA INTERNET USAGE 14.00 2.00 3.00 3.00 13.00 6.00 2.00 6.00 6.00 15.00 3.00 4.00 9.00 8.00 5.00 3.00 9.00 4.00 14.00 6.00 9.00 5.00 2.00 15.00 6.00 13.00 4.00 2.00 4.00 3.00 Marketing Research ATTITUDE TOWARD Internet Technology 7.00 3.00 4.00 7.00 7.00 5.00 4.00 5.00 6.00 7.00 4.00 6.00 6.00 3.00 5.00 4.00 5.00 5.00 6.00 6.00 4.00 5.00 4.00 6.00 5.00 6.00 5.00 3.00 5.00 7.00 6.00 3.00 3.00 5.00 7.00 4.00 5.00 4.00 4.00 6.00 3.00 4.00 5.00 2.00 4.00 3.00 3.00 4.00 6.00 4.00 2.00 4.00 2.00 6.00 3.00 6.00 5.00 2.00 3.00 5.00 USAGE OF INTERNET Shopping Banking 1.00 2.00 1.00 1.00 1.00 1.00 2.00 2.00 1.00 1.00 2.00 2.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 week 7 1.00 2.00 2.00 2.00 1.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 1.00 2.00 2.00 2.00 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 2.00 1.00 1.00 2.00 2.00 2.00 8 4 Simple Tabulation ü Consists of counting the number of cases that fall into various categories Use of Simple Tabulation ü Determine empirical distribution (frequency distribution) of the variable in question ü Calculate summary statistics, particularly the mean or percentages ü Aid in "data cleaning" aspects MBACATÓLICA Marketing Research week 7 9 Frequency Distribution ü Reports the number of responses that each question received ü Organizes data into classes or groups of values ü Shows number or % of observations that fall into each class Frequency Distribution of Familiarity with the Internet Value label Not so familiar Very familiar Missing Valid Frequency ( N) Percentage percentage Value 1 2 3 4 5 6 7 9 TOTAL MBACATÓLICA 0 2 6 6 3 8 4 1 30 Marketing Research 0.0 6.7 20.0 20.0 10.0 26.7 13.3 3. 3 100.0 0.0 6.9 20.7 20.7 10.3 27.6 13.8 Cumulative percentage 0.0 6.9 27.6 48.3 58.6 86.2 100.0 100.0 week 7 10 5 Frequency Histogram 8 7 Frequency 6 5 4 3 2 1 0 2 3 4 5 6 7 Familiarity with Internet MBACATÓLICA Marketing Research week 7 11 Descriptive Statistics ü Statistics normally associated with a frequency distribution to help summarize information in the frequency table ü Measures of central tendency mean, median and mode ü Measures of dispersion (range, standard deviation, and coefficient of variation) ü Measures of shape (eg. skewness) MBACATÓLICA Marketing Research week 7 12 6 Skewness of a Distribution Symmetric Distribution Skewed Distribution Mean Median Mode (a) Mean Median Mode (b) MBACATÓLICA Marketing Research week 7 13 Cross Tabulations ü Statistical analysis technique to study the relationships among and between variables ü Sample is divided to learn how the dependent variable varies from subgroup to subgroup (ex: compare male with female’s attitudes towards a brand/product) ü The two variables that are analyzed must be nominally scaled Gender and Internet Usage Sex Internet Usage Male Female Row Total Light (1) 5 10 15 Heavy (2) 10 5 15 Column Total 15 15 MBACATÓLICA Marketing Research week 7 14 7 Internet Usage by Sex Sex Internet Usage Male Female Light 33.3% 66.7% Heavy 66.7% 33.3% Column total 100% 100% Sex by Internet Usage Internet Usage Sex Light Heavy Total Male 33.3% 66.7% 100.0% Female 66.7% 33.3% 100.0% MBACATÓLICA Marketing Research week 7 15 Purchase of Fashion Clothing by Marital Status Purchase of Fashion Clothing Current Marital Status Married Unmarried High 31% 52% Low 69% 48% Column 100% 100% 700 300 Number of respondents MBACATÓLICA Marketing Research week 7 16 8 Purchase of Fashion Clothing by Marital Status Pur chase of Fashion Sex Male Female High 35% Unmarried Not Marr ied 40% Low 65% 60% 75% 40% Column totals Number of cases 100% 100% 100% 100% 400 120 300 180 Clothing MBACATÓLICA Marr ied Mar ried Marketing Research Unmarried Not Married 60% 25% week 7 17 Ownership of Expensive Automobiles by Education Level Own Expensive Automobile Education College Degree No College Degree Yes 32% 21% No 68% 79% Column totals 100% 100% 250 750 Number of cases MBACATÓLICA Marketing Research week 7 18 9 Ownership of Expensive Automobiles by Education Level and Income Levels Own Expensive Automobile Income Low Income Yes 20% No College Degr ee 20% No 80% 80% 60% 60% Column totals Number of respondents 100% 100% 100% 100% 100 700 150 50 MBACATÓLICA College Degr ee High Income Marketing Research College Degree 40% No College Degr ee 40% week 7 19 Hypothesis Testing: Basic Concepts ü Purpose of Hypothesis Testing ü Steps Involved in Hypothesis Testing ü Critical-Value and P-value Approach to Hypothesis Testing MBACATÓLICA Marketing Research week 7 20 10 Hypothesis Testing ü Assumption (hypothesis) population parameter made about a ü Purpose of Hypothesis Testing • To make a judgement about the difference between two sample statistics or the sample statistic and a hypothesized population parameter MBACATÓLICA Marketing Research week 7 21 Step 1: Formulate H 0 and H1 ü The null hypothesis (Ho) is tested against the alternate hypothesis (Ha) ü The null and alternate hypotheses are stated Suppose we want to test the hypothesis that the proportion of consumers who use the internet for shopping is not larger than 40% H0: Π c 0.40 H1: Π > 0.40 MBACATÓLICA Marketing Research week 7 22 11 Step 2: Select appropriate test Select the appropriate probability distribution based on two criteria: • Size of the sample • Whether the population standard deviation is known or not The sample proportion (test statistic) follows a normal distribution. Therefore the appropriate test is a one sample Z test. p ~ N Π , MBACATÓLICA π (1− π ) n Marketing Research week 7 23 Step 3: Choose level of Significance ü Type I error: rejecting a null hypothesis when it is true ü Significance level: ü Probability of type I error that researcher is willing to accept ü Type II error: Accepting (not rejecting) a null hypothesis when it is false. ü Its probability is (β) ü Power of hypothesis test (1 - β) ü A good test of hypothesis ought to reject a null hypothesis when it is false (1 - β should be as high as possible) ü When choosing a level of significance, there is an inherent tradeoff between these two types of errors MBACATÓLICA Marketing Research week 7 24 12 Step 4: Collect Data and Calculate Test Statistic We collected data from a sample of 30 consumers and obtained a sample proportion of 0.567 (56.7% use the internet for shopping). H0 : Πc0.40 Ha : Π>0.40 The distribution of the sample proportion is p ~ N (Π, σp ) π (1−π ) 0.4× 0.6 = = 0.089 n 30 σp = Then the Z statistic is given by: 0.4 0.567 Z= MBACATÓLICA p − π 0.567 − 0.4 = = 1.88 σp 0.089 Marketing Research week 7 25 Step 5: Find critical value Area = 0.05 Zcrit =1.645 Steps 6 and 7: Compare critical value with test statistic and draw a conclusion p Z statistic 56.7% 1.88 critical 56.4% 1.645 MBACATÓLICA Marketing Research If Z > Zcrit → reject H0 If Z < Zcrit → do not rejectH0 1.88>1.645 → reject H0 week 7 26 13 Trade--off Between Type I & Type II Trade Errors Error s 0.567 H0 : Πc0.40 Ha : Π>0.40 α (level of significance) Π = 0.40 Z 0.564 For each Π>0.40 there is a different β β =? Z Π = 0.60 MBACATÓLICA Marketing Research week 7 27 The Probability Values (P(P-value) Approach to Hypothesis Testing P-value Probability of obtaining this value for the statistic or a value even more contrary to null hypothesis, when H0 is true. Using the p-Value ü Researcher can determine "how unlikely is the result that has been observed?" ü In general, the smaller the p-value, the greater is the researcher's confidence in sample findings (researcher can be more confident in rejecting H0) ü P-value is generally sensitive to sample size; A large sample should yield a low pp-value. MBACATÓLICA Marketing Research week 7 28 14 Step 5: Find p value (critical value value)) Shaded Area Unshaded Area = 0.0301 = 0.9699 P value Largest level of significance at which we would not reject Ho z = 1.88 0 Steps 6 & 7: Compare pp-value with significance level and draw a conclusion If p-value >α → do not rej. H0 If p-value < α → reject H0 MBACATÓLICA 0.03<0.05 → reject H0 Marketing Research week 7 29 Steps Involved in Hypothesis Testing Formulate H0 and H1 Select Appropriate Test Choose Level of Significance, _ Collect Data and Calculate Test Statistic Determine Probability Associated with Test Statistic Determine Critical Value of Test Statistic TSCR Compare with Level of Significance,α Determine if TSCR falls into (Non) Rejection Region Reject or do not reject H0 Draw Marketing Research Conclusion MBACATÓLICA Marketing Research week 7 30 15 Hypothesis Testing: Chi- Square Tests of Association and Goodness of Fit ü Chi-Square Test of Independence ü Chi-Square Test of Goodness of Fit for a Single Sample MBACATÓLICA Marketing Research week 7 31 Cross--tabulation and Chi Square Cross In Marketing Applications, Chi-square Statistic Is Used As Test of Independence ü Are there associations between two or more variables in a study? Test of Goodness of Fit ü Is there a significant difference between an observed frequency distribution and a theoretical frequency distribution? MBACATÓLICA Marketing Research week 7 32 16 Gender and Internet Usage Sex Internet Usage Male Female Row Total Light (1) 5 10 15 Heavy (2) 10 5 15 Column Total 15 15 Sex Internet Usage Male Female Light 33.3% 66.7% Heavy 66.7% 33.3% Column total 100% 100% MBACATÓLICA Marketing Research week 7 33 Chi--square as a Test of Independence Chi Statistical Independence ü 2 variables are statistically independent knowing one offers no information as to the identity of the other Null Hypothesis Ho 2 (nominally scaled) variables are statistically independent (there is no association between them) • Alternative Hypothesis Ha •The two variables are not independent Degrees of Freedom v = (r - 1) * (c - 1) r = number of rows in contingency table c = number of columns • Mean of chi-sq. distribution = Degrees of freedom (v) • Variance = 2v MBACATÓLICA Marketing Research week 7 34 17 Chi--square Statistic ((χ Chi χ2) ü Measures of the difference between the actual numbers observed in cell i -- Oi, and number expected (Ei) under independence if the null hypothesis were true. χ = 2 ∑ (Oi − Ei )2 df= (r-1)× (c-1) Ei allcells ü Expected frequency in each cell under no association Ei = pL × p A × n = nr nc n where pL and pA are proportions for independent variables. nr is the column total and n c is the row total MBACATÓLICA Marketing Research week 7 35 Gender and Internet Usage Sex E11 = Internet Usage Male Female Row Total Light (1) 5 10 15 Heavy (2) 10 5 15 Column Total 15 15 15×15 =7.5 30 E21 = 15×15 30 E12 = E22 = MBACATÓLICA 15×15 30 15×15 30 χ2=3.333 χ2 Reject H0 df=1 Marketing Research Crit. Value =3.841 for α=0.5 week 7 36 18 Limitations of χ2 as an Association Measure It Is Basically Proportional to Sample Size üDifficult to interpret in absolute sense and compare cross-tabs of unequal size It Has No Upper Bound üDifficult to obtain a feel for its value üDoes not indicate how two variables are related MBACATÓLICA Marketing Research week 7 37 Indicators of Strength of Association Phi Coefficient φ = (2×2 tables) Cramer’s V (generalization of Phi) Contingency Coefficient (C) V= χ 2 n 0≤ φ ≤1 φ2 min (r −1), (c −1) χ2 C= χ 2 +n 0≤ V ≤1 0≤ C ≤1 Maximum value of C depends on the size of table compare only tables of same size • MBACATÓLICA Marketing Research week 7 38 19 Chi--square goodness of fit test Chi ü Frequently used when the researcher is interested in analysing objects/responses which fall within categories (ex: respondents who are indifferent, against or for a proposal). ü Compares observed and expected (under null hypothesis) numbers of observations in each category. ü Degrees of freedom v=k-1 where k is the number of categories. MBACATÓLICA Marketing Research week 7 39 20