Download Marketing Research Fundamentals of Data Analysis: Selecting a DA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MBACATÓLICA
JAN/APRIL 2006
Marketing Research
Fernando S. Machado
Week 7
• Fundamentals of Data Analysis: Selecting a Data
Analysis Strategy
• Frequency Distribution and Cross-Tabulation
• Hypothesis Testing: Basic Concepts
• Hypothesis Testing: Chi-square Tests of
Association and Goodness of fit
MBACATÓLICA
Marketing Research
week 7
1
Fundamentals of Data Analysis:
Selecting a DA Strategy
ü Overview of Statistical Techniques
ü Choice of Statistical Technique
MBACATÓLICA
Marketing Research
week 7
2
1
Overview of Statistical Techniques
Univariate Techniques
ü
Appropriate when there is a single measurement of
each of the 'n' sample objects or there are several
measurements of each of the `n' observations but
each variable is analyzed in isolation
Multivariate Techniques
ü
A collection of procedures for analyzing association
between two or more sets of measurements that
have been made on each object in one or more
samples of objects
ü
Dependence or interdependence techniques
MBACATÓLICA
Marketing Research
week 7
3
A Classification of Univariate Techniques
Univariate Techniques
Non-metric Data
Metric Data
One Sample
* t test
* Z test
Two or More
Samples
One Sample
* Frequency
Two or More
Samples
Chi-Square
Independent
* Two-Group
t test
* Z test
* One -Way
Related
* Paired
t test
ANOVA
MBACATÓLICA
Marketing Research
* K-S
* Runs
* Binomial
Independent
* Chi-Square
* Mann-Whitney
* K-S
* Median
week 7
Related
* Sign
* Wilcoxon
* McNemar
* Chi-Square
4
2
A Classification of Multivariate Techniques
Multivariate Techniques
Dependence
Technique
One Dependent
Variable
* Cross-
Tabulation
* Analysis of
Variance and
Covariance
* Multiple
Regression
* Conjoint
Analysis
Interdependence
Technique
More Than One
Dependent
Variable
* Multivariate
Analysis of
Variance and
Covariance
* Canonical
Correlation
* Multiple
Discriminant
Analysis
MBACATÓLICA
Marketing Research
Variable
Interdependence
* Factor
Analysis
Interobject
Similarity
* Cluster Analysis
* Multidimensional
Scaling
week 7
5
Selecting a Data Analysis Strategy
Choice of Statistical Technique is influenced by:
by:
üResearch
üType
Objectives
of Data
•Mode
•Both
is the only measure of central tendency for nominal scaling
median and mode can be used for ordinal scale
•Mean,
median and mode can all be used for interval and ratio sca led
data (metric data)
•Non-parametric
metric data)
üResearch
•Sample
•Number
ü
tests can be run on ordinal or nominal data (non-
Design
independence (ex: indep. sample vs. paired sample t test)
of groups being analyzed (ex: t test vs. ANOVA)
Assumptions Underlying the Test Statistic
If assumptions on which a statistical test is based are violate d, the
test will provide meaningless results
•
MBACATÓLICA
Marketing Research
week 7
6
3
Frequency Distribution and
Cross-tabulation
ü Frequency Distribution
ü Descriptive Statistics
ü Cross Tabulation
MBACATÓLICA
Marketing Research
week 7
7
Internet Usage Data
RESP. SEX
NUMBER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1.00
2.00
2.00
2.00
1.00
2.00
2.00
2.00
2.00
1.00
2.00
2.00
1.00
1.00
1.00
2.00
1.00
1.00
1.00
2.00
1.00
1.00
2.00
1.00
2.00
1.00
2.00
2.00
1.00
1.00
FAMILIARITY
7.00
2.00
3.00
3.00
7.00
4.00
2.00
3.00
3.00
9.00
4.00
5.00
6.00
6.00
6.00
4.00
6.00
4.00
7.00
6.00
6.00
5.00
3.00
7.00
6.00
6.00
5.00
4.00
4.00
3.00
MBACATÓLICA
INTERNET
USAGE
14.00
2.00
3.00
3.00
13.00
6.00
2.00
6.00
6.00
15.00
3.00
4.00
9.00
8.00
5.00
3.00
9.00
4.00
14.00
6.00
9.00
5.00
2.00
15.00
6.00
13.00
4.00
2.00
4.00
3.00
Marketing Research
ATTITUDE TOWARD
Internet Technology
7.00
3.00
4.00
7.00
7.00
5.00
4.00
5.00
6.00
7.00
4.00
6.00
6.00
3.00
5.00
4.00
5.00
5.00
6.00
6.00
4.00
5.00
4.00
6.00
5.00
6.00
5.00
3.00
5.00
7.00
6.00
3.00
3.00
5.00
7.00
4.00
5.00
4.00
4.00
6.00
3.00
4.00
5.00
2.00
4.00
3.00
3.00
4.00
6.00
4.00
2.00
4.00
2.00
6.00
3.00
6.00
5.00
2.00
3.00
5.00
USAGE OF INTERNET
Shopping Banking
1.00
2.00
1.00
1.00
1.00
1.00
2.00
2.00
1.00
1.00
2.00
2.00
2.00
2.00
1.00
2.00
1.00
1.00
1.00
2.00
2.00
2.00
2.00
1.00
1.00
1.00
1.00
2.00
1.00
1.00
week 7
1.00
2.00
2.00
2.00
1.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
1.00
2.00
2.00
2.00
1.00
2.00
1.00
2.00
2.00
1.00
2.00
1.00
2.00
1.00
1.00
2.00
2.00
2.00
8
4
Simple Tabulation
ü
Consists of counting the number of cases that
fall into various categories
Use of Simple Tabulation
ü
Determine empirical distribution (frequency
distribution) of the variable in question
ü
Calculate summary statistics, particularly the
mean or percentages
ü
Aid in "data cleaning" aspects
MBACATÓLICA
Marketing Research
week 7
9
Frequency Distribution
ü
Reports the number of responses that each question
received
ü
Organizes data into classes or groups of values
ü
Shows number or % of observations that fall into each class
Frequency Distribution of Familiarity with the Internet
Value label
Not so familiar
Very familiar
Missing
Valid
Frequency ( N) Percentage percentage
Value
1
2
3
4
5
6
7
9
TOTAL
MBACATÓLICA
0
2
6
6
3
8
4
1
30
Marketing Research
0.0
6.7
20.0
20.0
10.0
26.7
13.3
3. 3
100.0
0.0
6.9
20.7
20.7
10.3
27.6
13.8
Cumulative
percentage
0.0
6.9
27.6
48.3
58.6
86.2
100.0
100.0
week 7
10
5
Frequency Histogram
8
7
Frequency
6
5
4
3
2
1
0
2
3
4
5
6
7
Familiarity with Internet
MBACATÓLICA
Marketing Research
week 7
11
Descriptive Statistics
ü
Statistics normally associated with a
frequency distribution to help summarize
information in the frequency table
ü
Measures of central tendency mean, median
and mode
ü
Measures of dispersion (range, standard
deviation, and coefficient of variation)
ü
Measures of shape (eg. skewness)
MBACATÓLICA
Marketing Research
week 7
12
6
Skewness of a Distribution
Symmetric Distribution
Skewed Distribution
Mean
Median
Mode
(a)
Mean Median Mode
(b)
MBACATÓLICA
Marketing Research
week 7
13
Cross Tabulations
ü
Statistical analysis technique to study the relationships
among and between variables
ü
Sample is divided to learn how the dependent variable
varies from subgroup to subgroup (ex: compare male
with female’s attitudes towards a brand/product)
ü
The two variables that are analyzed must be nominally
scaled
Gender and Internet Usage
Sex
Internet Usage
Male
Female
Row
Total
Light (1)
5
10
15
Heavy (2)
10
5
15
Column Total
15
15
MBACATÓLICA
Marketing Research
week 7
14
7
Internet Usage by Sex
Sex
Internet Usage
Male
Female
Light
33.3%
66.7%
Heavy
66.7%
33.3%
Column total
100%
100%
Sex by Internet Usage
Internet Usage
Sex
Light
Heavy
Total
Male
33.3%
66.7%
100.0%
Female
66.7%
33.3%
100.0%
MBACATÓLICA
Marketing Research
week 7
15
Purchase of Fashion Clothing by
Marital Status
Purchase of
Fashion
Clothing
Current Marital Status
Married
Unmarried
High
31%
52%
Low
69%
48%
Column
100%
100%
700
300
Number of
respondents
MBACATÓLICA
Marketing Research
week 7
16
8
Purchase of Fashion Clothing by
Marital Status
Pur chase of
Fashion
Sex
Male
Female
High
35%
Unmarried
Not
Marr ied
40%
Low
65%
60%
75%
40%
Column
totals
Number of
cases
100%
100%
100%
100%
400
120
300
180
Clothing
MBACATÓLICA
Marr ied
Mar ried
Marketing Research
Unmarried
Not
Married
60%
25%
week 7
17
Ownership of Expensive
Automobiles by Education Level
Own Expensive
Automobile
Education
College Degree
No College Degree
Yes
32%
21%
No
68%
79%
Column totals
100%
100%
250
750
Number of cases
MBACATÓLICA
Marketing Research
week 7
18
9
Ownership of Expensive Automobiles
by Education Level and Income Levels
Own
Expensive
Automobile
Income
Low Income
Yes
20%
No
College
Degr ee
20%
No
80%
80%
60%
60%
Column
totals
Number of
respondents
100%
100%
100%
100%
100
700
150
50
MBACATÓLICA
College
Degr ee
High Income
Marketing Research
College
Degree
40%
No
College
Degr ee
40%
week 7
19
Hypothesis Testing:
Basic Concepts
ü Purpose of Hypothesis Testing
ü Steps Involved in Hypothesis Testing
ü Critical-Value and P-value Approach to
Hypothesis Testing
MBACATÓLICA
Marketing Research
week 7
20
10
Hypothesis Testing
ü
Assumption (hypothesis)
population parameter
made
about
a
ü Purpose of Hypothesis Testing
• To make a judgement about the difference
between two sample statistics or the sample
statistic and a hypothesized population
parameter
MBACATÓLICA
Marketing Research
week 7
21
Step 1: Formulate H 0 and H1
ü The
null hypothesis (Ho) is tested against the
alternate hypothesis (Ha)
ü The
null and alternate hypotheses are stated
Suppose we want to test the hypothesis that the
proportion of consumers who use the internet for
shopping is not larger than 40%
H0: Π c 0.40
H1: Π > 0.40
MBACATÓLICA
Marketing Research
week 7
22
11
Step 2: Select appropriate test
Select the appropriate probability distribution
based on two criteria:
• Size of the sample
• Whether the population standard deviation is
known or not
The sample proportion (test statistic) follows a
normal distribution. Therefore the appropriate test is
a one sample Z test.

p ~ N  Π ,

MBACATÓLICA
π (1− π ) 
n 
Marketing Research
week 7
23
Step 3: Choose level of Significance
ü
Type I error: rejecting a null hypothesis when it is true
ü
Significance level:
ü Probability of type I error that researcher is willing to accept
ü
Type II error: Accepting (not rejecting) a null hypothesis
when it is false.
ü Its probability is (β)
ü Power of hypothesis test (1 - β)
ü A good test of hypothesis ought to reject a null hypothesis
when it is false (1 - β should be as high as possible)
ü
When choosing a level of significance, there is an inherent
tradeoff between these two types of errors
MBACATÓLICA
Marketing Research
week 7
24
12
Step 4: Collect Data and Calculate Test Statistic
We collected data from a sample of 30 consumers and
obtained a sample proportion of 0.567 (56.7% use the
internet for shopping).
H0 : Πc0.40
Ha : Π>0.40
The distribution of the sample proportion is
p ~ N (Π, σp )
π (1−π )
0.4× 0.6
=
= 0.089
n
30
σp =
Then the Z statistic is given by:
0.4
0.567
Z=
MBACATÓLICA
p − π 0.567 − 0.4
=
= 1.88
σp
0.089
Marketing Research
week 7
25
Step 5: Find critical value
Area = 0.05
Zcrit =1.645
Steps 6 and 7: Compare critical value with
test statistic and draw a conclusion
p
Z
statistic
56.7%
1.88
critical
56.4%
1.645
MBACATÓLICA
Marketing Research
If Z > Zcrit → reject H0
If Z < Zcrit → do not rejectH0
1.88>1.645 → reject H0
week 7
26
13
Trade--off Between Type I & Type II
Trade
Errors
Error
s
0.567
H0 : Πc0.40
Ha : Π>0.40
α (level of significance)
Π = 0.40
Z
0.564
For each Π>0.40
there is a different β
β =?
Z
Π = 0.60
MBACATÓLICA
Marketing Research
week 7
27
The Probability Values (P(P-value) Approach to
Hypothesis Testing
P-value
Probability of obtaining this value for the statistic
or a value even more contrary to null hypothesis,
when H0 is true.
Using the p-Value
ü
Researcher can determine "how unlikely is the
result that has been observed?"
ü
In general, the smaller the p-value, the greater is
the researcher's confidence in sample findings
(researcher can be more confident in rejecting H0)
ü P-value is generally sensitive to sample size; A
large sample should yield a low pp-value.
MBACATÓLICA
Marketing Research
week 7
28
14
Step 5: Find p value
(critical value
value))
Shaded
Area
Unshaded Area
= 0.0301
= 0.9699
P value
Largest level of
significance at
which we would
not reject Ho
z = 1.88
0
Steps 6 & 7: Compare pp-value with
significance level and draw a conclusion
If p-value >α → do not rej. H0
If p-value < α → reject H0
MBACATÓLICA
0.03<0.05 → reject H0
Marketing Research
week 7
29
Steps Involved in Hypothesis Testing
Formulate H0 and H1
Select Appropriate Test
Choose Level of Significance, _
Collect Data and Calculate Test Statistic
Determine Probability
Associated with Test
Statistic
Determine Critical
Value of Test Statistic
TSCR
Compare with Level of
Significance,α
Determine if TSCR falls
into (Non) Rejection
Region
Reject or do not reject H0
Draw Marketing Research Conclusion
MBACATÓLICA
Marketing Research
week 7
30
15
Hypothesis Testing:
Chi- Square Tests of Association
and Goodness of Fit
ü Chi-Square Test of Independence
ü Chi-Square Test of Goodness of Fit for a
Single Sample
MBACATÓLICA
Marketing Research
week 7
31
Cross--tabulation and Chi Square
Cross
In Marketing Applications, Chi-square
Statistic Is Used As
Test of Independence
ü
Are there associations between two or more
variables in a study?
Test of Goodness of Fit
ü
Is there a significant difference between an
observed frequency distribution and a theoretical
frequency distribution?
MBACATÓLICA
Marketing Research
week 7
32
16
Gender and Internet Usage
Sex
Internet Usage
Male
Female
Row
Total
Light (1)
5
10
15
Heavy (2)
10
5
15
Column Total
15
15
Sex
Internet Usage
Male
Female
Light
33.3%
66.7%
Heavy
66.7%
33.3%
Column total
100%
100%
MBACATÓLICA
Marketing Research
week 7
33
Chi--square as a Test of Independence
Chi
Statistical Independence
ü
2 variables are statistically independent knowing one
offers no information as to the identity of the other
Null Hypothesis Ho
2 (nominally scaled) variables are statistically
independent (there is no association between them)
•
Alternative Hypothesis Ha
•The
two variables are not independent
Degrees of Freedom
v = (r - 1) * (c - 1)
r = number of rows in contingency table
c = number of columns
•
Mean of chi-sq. distribution = Degrees of freedom (v)
•
Variance = 2v
MBACATÓLICA
Marketing Research
week 7
34
17
Chi--square Statistic ((χ
Chi
χ2)
ü
Measures of the difference between the actual
numbers observed in cell i -- Oi, and number
expected (Ei) under independence if the null
hypothesis were true.
χ =
2
∑
(Oi − Ei )2
df= (r-1)× (c-1)
Ei
allcells
ü Expected frequency in each cell under no association
Ei = pL × p A × n =
nr nc
n
where pL and pA are proportions for independent variables.
nr is the column total and n c is the row total
MBACATÓLICA
Marketing Research
week 7
35
Gender and Internet Usage
Sex
E11 =
Internet Usage
Male
Female
Row
Total
Light (1)
5
10
15
Heavy (2)
10
5
15
Column Total
15
15
15×15
=7.5
30
E21 =
15×15
30
E12 =
E22 =
MBACATÓLICA
15×15
30
15×15
30
χ2=3.333
χ2
Reject H0
df=1
Marketing Research
Crit. Value =3.841
for α=0.5
week 7
36
18
Limitations of χ2 as an Association
Measure
It Is Basically Proportional to Sample Size
üDifficult to interpret in absolute sense and
compare cross-tabs of unequal size
It Has No Upper Bound
üDifficult to obtain a feel for its value
üDoes not indicate how two variables are
related
MBACATÓLICA
Marketing Research
week 7
37
Indicators of Strength of Association
Phi Coefficient
φ =
(2×2 tables)
Cramer’s V
(generalization of Phi)
Contingency
Coefficient (C)
V=
χ 2
n
0≤ φ ≤1
φ2
min (r −1), (c −1)
χ2
C=
χ 2 +n
0≤ V ≤1
0≤ C ≤1
Maximum value of C depends on the size of table compare only tables of same size
•
MBACATÓLICA
Marketing Research
week 7
38
19
Chi--square goodness of fit test
Chi
ü Frequently used when the researcher is
interested in analysing objects/responses which
fall within categories (ex: respondents who are
indifferent, against or for a proposal).
ü Compares observed and expected (under null
hypothesis) numbers of observations in each
category.
ü Degrees of freedom v=k-1 where k is the
number of categories.
MBACATÓLICA
Marketing Research
week 7
39
20
Related documents