Download theory slides

Document related concepts
no text concepts found
Transcript
Analysis
• Preliminary examination and classification of open
ended responses
• Coding and input
• transformations
• description, checking of normality
• Testing the hypotheses
• Discussion and conclusions
• Software: Excel, SPSS, SAS, Statgraphics,
DataFit, E-Views, Stata, etc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Coding
•
•
•
•
Numerical variables if at all possible
Exact first, you can classify later
Define informative variable and value labels
What to do with missing data (real missing or NA, cases
with many missing values, variables with many missing
values, imputation?)
• What is a missing value (checklists)
• Identification variables (ID number, dates, interviewer,
etc.)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Transformations
•
•
•
•
•
•
•
Classification of open-ended responses
Classifying continuous variables
Reversing items (1=5, 2=4, 4=2, 5=1)
Computing multi-item scales
Computing lags, logs, ratios, or other new variables
Checking for inconsistent responses
Removing outliers?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
ANALYSIS, ”THEORY”
•Data description: graphics and statistics
•Statistical inference: estimation, standard errors and
confidence intervals
•Statistical inference: hypothesis testing
•Testing for a single mean
•Contingency tables and Chi-Square tests
•Testing for equality of two means: t-test
•Testing for equality of three or more means: ANOVA
•Correlation
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.4
Types of Data
Data
Categorical
Numerical
Examples:



Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:


Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
Number of Children
Defects per hour
(Counted items)
Continuous
Examples:


Weight
Voltage
(Measured characteristics)
1.5
Chap 1-5
Nominal Data
[qualitative or categorical]
…
Nominal [Categorical] Data
• The values of nominal data are categories.
E.g. responses to questions about marital status, coded
as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary, arithmetic operations
don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)
Nominal data has no natural order to the values.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.6
Ordinal Data
[qualitative or categorical]
…
Ordinal [Categorical] Data appear to be categorical in
nature, but their values have an order; a ranking to them:
E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
While its still not meaningful to do arithmetic on this data
(e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric values
are assigned to each category.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.7
Interval Data…
Interval/Ratio data
• Real numbers, i.e. heights, weights, prices, etc.
• Also referred to as quantitative or numerical.
Arithmetic operations can be performed on Interval Data,
thus its meaningful to talk about 2*Height, or Price + $1, and
so on.
Interval Data: has no natural “0” such as temperature. 100
degrees is 50 degrees hotter than 50 degrees BUT not twice
as hot.
Ratio Data: has a natural “0” such as weight. 100 pounds is
50 pounds heaver than 50 pounds AND is twice as heavy.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.8
Graphical & Tabular Techniques for Nominal Data…
The only allowable calculation on nominal data is to count
the frequency of each value of the variable.
We can summarize the data in a table that presents the
categories and their counts called a frequency distribution.
A relative frequency distribution lists the categories and the
proportion with which each occurs.
Refer to Example 2.1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.9
Nominal Data (Tabular Summary)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.10
Nominal Data (Frequency)
Bar Charts are often used to display frequencies…
Is there a better way to order these?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.11
Nominal Data (Relative Frequency)
Pie Charts show relative frequencies…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.12
Ordinal data: Cumulative Relative Frequencies…
first class…
next class: .355+.185=.540
:
:
last class: .930+.070=1.00
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.13
Graphical Techniques for Interval Data
There are several graphical methods that are used when the
data are interval (i.e. numeric, non-categorical).
The most important of these graphical methods is the
histogram.
The histogram is not only a powerful graphical technique
used to summarize interval data, but it is also used to help
explain probabilities.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.14
Building a Histogram…
1) Collect the Data 
2) Create a frequency distribution for the data. 
3) Draw the Histogram. 
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.15
Interpret…
(18+28+14=60)÷200 = 30%
i.e. nearly a third of the phone bills
are greater than $75
about half (71+37=108)
of the bills are “small”,
i.e. less than $30
There are only a few telephone
bills in the middle range.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.16
Shapes of Histograms…
Variable
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Frequency
Frequency
Frequency
Symmetry
A histogram is said to be symmetric if, when we draw a
vertical line down the center of the histogram, the two sides
are identical in shape and size:
Variable
Variable
1.17
Shapes of Histograms…
Frequency
Frequency
Skewness
A skewed histogram is one with a long tail extending to
either the right or the left:
Variable
Positively Skewed
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Variable
Negatively Skewed
1.18
Shapes of Histograms…
Many statistical techniques
require that the population
be bell shaped.
Drawing the histogram
helps verify the shape of
the population in question.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Frequency
Bell Shape
A special type of symmetric unimodal histogram is one that
is bell shaped:
Variable
Bell Shaped
1.19
Measures of Central Tendency
Central Tendency
Median
Arithmetic
Mean
Mode
n
X
X
i 1
n
i
Middle value
in the ordered
array
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
Most
frequently
observed
value
1.20
Chap 3-20
Measures of Variation
Variation
Range

Variance
Standard
Deviation
Coefficient
of Variation
Measures of variation give
information on the spread
or variability or
dispersion of the data
values.
Same center,
different variation
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.21
Chap 3-21
Measures of Variation:Variance
Average (approximately) of squared deviations of values
from the mean
Sample variance:
n
S 
2
Where
 (X  X)
i1
2
i
n -1
X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.22
Chap 3-22
Measures of Variation:Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the variance
Has the same units as the original data
n
Sample standard deviation:
S
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
 (X
i1
i
 X)
2
n -1
1.23
Chap 3-23
Locating Extreme Outliers:Z-Score
XX
Z
S
where X represents the data value
X is the sample mean
S is the sample standard deviation
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.24
Chap 3-24
Locating Extreme Outliers:Z-Score
Suppose the mean math SAT score is 490, with a standard
deviation of 100.
Compute the Z-score for a test score of 620.
X  X 620  490 130
Z


 1.3
S
100
100
A score of 620 is 1.3 standard deviations above the
mean and would not be considered an outlier.
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.25
Chap 3-25
Key Statistical Concepts…
Population
Sample
Subset
Parameter
Statistic
Populations have Parameters,
Samples have Statistics.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.26
Statistical Inference…
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population
Sample
Inference
Statistic
Parameter
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.27
Confidence & Significance Levels…
The confidence level 1– is the proportion of times that an
estimating procedure will be correct.
E.g. a confidence level of 95% means that, estimates based on this
form of statistical inference will be correct 95% of the time.
When the purpose of the statistical inference is to draw a
conclusion about a population, the significance level 
measures how frequently the conclusion will be wrong in the
long run.
E.g. a 5% significance level means that, in the long run, this type of
conclusion will be wrong 5% of the time.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.28
Sampling Distributions
A sampling distribution is a distribution of all of the possible
values of a sample statistic for a given size sample selected
from a population.
For example, suppose you sample 50 students from your college
regarding their mean GPA. If you obtained many different
samples of 50, you will compute a different mean for each
sample. We are interested in the distribution of all potential
mean GPA we might calculate for any given sample of 50
students.
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-29
Developing a Sampling Distribution
Assume there is a population …
Population size N=4
Random variable, X,
is age of individuals
A
Values of X: 18, 20,
22, 24 (years)
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-30
B
C
D
Developing a Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
X

μ
P(x)
i
N
.3
18  20  22  24

 21
4
σ
 (X  μ)
i
N
.2
.1
0
2
 2.236
18
A
20
B
22
C
24
D
Uniform Distribution
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-31
x
Developing a Sampling Distribution
(continued)
Now consider all possible samples of size n=2
1st
Obs
16 Sample
Means
2nd Observation
18
20
22
24
18
18,18
18,20
18,22
18,24
20
20,18
20,20
20,22
20,24
1st 2nd Observation
Obs 18 20 22 24
22
22,18
22,20
22,22
22,24
18 18 19 20 21
24
24,18
24,20
24,22
24,24
20 19 20 21 22
22 20 21 22 23
16 possible samples
(sampling with
replacement)
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
24 21 22 23 24
Chap 7-32
Developing a Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
22
23
24
24Basic 21
Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
_
P(X)
.3
.2
.1
0
Chap 7-33
18 19 20 21 22 23 24
(no longer uniform)
_
X
Developing a Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution:
μX
X

i 18  19  19    24


 21
σX 

N
16
2
(
X

μ
)
i

X
N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-34
Comparing the Population Distribution
to the Sample Means Distribution
Population
N=4
μ  21
Sample Means Distribution
n=2
μX  21
σ  2.236
_
P(X)
.3
P(X)
.3
.2
.2
.1
.1
0
σ X  1.58
18
A
20
B
22
C
24
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
D
X
0
Chap 7-35
18 19 20 21 22 23 24
_
X
Sample Mean Sampling Distribution:
Standard Error of the Mean
Different samples of the same size from the same population
will yield different sample means
A measure of the variability in the mean from sample to sample
is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σ
σX 
n
Note that the standard error of the mean decreases as the
size increases Chap 7-36
Basic sample
Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Mean Sampling Distribution:
If the Population is Normal
If a population is normally distributed with mean μ and
standard deviation σ, the sampling distribution of X
is also normally distributed with
μX  μ
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
and
σ
σX 
n
Chap 7-37
Z-value for Sampling Distribution
of the Mean
Z-value for the sampling distribution of
Z
where:
( X  μX )
σX
( X  μ)

σ
n
X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-38
:
X
Sampling Distribution Properties
μx  μ
(i.e.
xis unbiased )
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Normal Population
Distribution
μ
x
μx
x
Normal Sampling
Distribution
(has the same mean)
Chap 7-39
Sampling Distribution Properties
(continued)
As n increases,
Larger sample
size
σ x decreases
Smaller sample
size
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-40
μ
x
Determining An Interval Including A Fixed
Proportion of the Sample Means
Find a symmetrically distributed interval around µ that
will include 95% of the sample means when µ = 368, σ
= 15, and n = 25.
Since the interval contains 95% of the sample means 5% of
the sample means will be outside the interval
Since the interval is symmetric 2.5% will be above the upper
limit and 2.5% will be below the lower limit.
From the standardized normal table, the Z score with 2.5%
(0.0250) below it is -1.96 and the Z score with 2.5%
(0.0250) above it is 1.96.
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-41
Determining An Interval Including A Fixed
Proportion of the Sample Means
Calculating the lower limit of the interval
(continued)
σ
15
XL  μ  Z
 368  (1.96)
 362.12
n
25
Calculating the upper limit of the interval
σ
15
XU  μ  Z
 368  (1.96)
 373.88
n
25
95% of all sample means of sample size 25 are between
362.12 and 373.88
This is the confidence interval for the mean
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 7-42
Confidence intervals
The general formula for all confidence
intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Where:
• Point Estimate is the sample statistic estimating the population
parameter of interest
• Critical Value is a table value based on the sampling distribution of
the point estimate and the desired confidence level
• Standard Error is the standard deviation of the point estimate
Basic Business
Statistics, 11e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 8-43
Hypothesis testing
In addition to estimation, hypothesis testing is a procedure
for making inferences about a population.
Population
Sample
Inference
Statistic
Parameter
Hypothesis testing allows us to determine whether enough
statistical evidence exists to conclude that a belief (i.e.
hypothesis) about a parameter is supported by the data.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.44
Nonstatistical Hypothesis Testing…
A criminal trial is an example of hypothesis testing without
the statistics.
In a trial a jury must decide between two hypotheses. The
null hypothesis is
H0: The defendant is innocent
The alternative hypothesis or research hypothesis is
H1: The defendant is guilty
The jury does not know which hypothesis is true. They must
make a decision on the basis of evidence presented.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.45
Nonstatistical Hypothesis Testing…
In the language of statistics convicting the defendant is
called rejecting the null hypothesis in favor of the alternative
hypothesis. That is, the jury is saying that there is enough
evidence to conclude that the defendant is guilty (i.e., there
is enough evidence to support the alternative hypothesis).
If the jury acquits it is stating that there is not enough
evidence to support the alternative hypothesis. Notice that
the jury is not saying that the defendant is innocent, only that
there is not enough evidence to support the alternative
hypothesis. That is why we never say that we accept the null
hypothesis.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.46
Nonstatistical Hypothesis Testing…
There are two possible errors.
A Type I error occurs when we reject a true null hypothesis.
That is, a Type I error occurs when the jury convicts an
innocent person.
A Type II error occurs when we don’t reject a false null
hypothesis. That occurs when a guilty defendant is acquitted.
The probability of a Type I error is denoted as α (Greek
letter alpha). The probability of a type II error is β (Greek
letter beta).
The two probabilities are inversely related. Decreasing one
increases the other.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.47
Nonstatistical Hypothesis Testing…
In our judicial system Type I errors are regarded as more
serious. We try to avoid convicting innocent people. We are
more willing to acquit guilty people.
We arrange to make α small by requiring the prosecution to
prove its case and instructing the jury to find the defendant
guilty only if there is “evidence beyond a reasonable doubt.”
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.48
Nonstatistical Hypothesis Testing…
The critical concepts are theses:
1. There are two hypotheses, the null and the alternative
hypotheses.
2. The procedure begins with the assumption that the null
hypothesis is true.
3. The goal is to determine whether there is enough evidence
to infer that the alternative hypothesis is true.
4. There are two possible decisions:
Conclude that there is enough evidence to support the
alternative hypothesis.
Conclude that there is not enough evidence to support the
alternative hypothesis.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.49
Nonstatistical Hypothesis Testing…
5. Two possible errors can be made.
Type I error: Reject a true null hypothesis
Type II error: Do not reject a false null hypothesis.
P(Type I error) = α
P(Type II error) = β
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.50
Testing for a single mean, example 11.1
A department store manager determines that a new billing
system will be cost-effective only if the mean monthly
account is more than $170.
A random sample of 400 monthly accounts is drawn, for
which the sample mean is $178. The accounts are
approximately normally distributed with a standard deviation
of $65.
Can we conclude that the new system will be cost-effective?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.51
Example 11.1…
The system will be cost effective if the mean account
balance for all customers is greater than $170.
We express this belief as a our research hypothesis, that is:
H1:
> 170 (this is what we want to determine)
Thus, our null hypothesis becomes:
H0: = 170 (this specifies a single value for the
parameter of interest)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.52
Example 11.1…
What we want to show:
H1: > 170
H0: = 170 (we’ll assume this is true)
We know:
n = 400,
= 178, and
= 65
Hmm. What to do next?!
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.53
Example 11.1…
To test our hypotheses, we can use two different approaches:
The rejection region approach (typically used when
computing statistics manually), and
The p-value approach (which is generally used with a
computer and statistical software).
We will explore only the latter…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.54
Standardized Test Statistic…
An easier method is to use the standardized test statistic:
and compare its result to
: (rejection region: z >
)
Since z = 2.46 > 1.645 (z.05), we reject H0 in favor of H1…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.55
p-Value
The p-value of a test is the probability of observing a test
statistic at least as extreme as the one computed given that
the null hypothesis is true.
In the case of our department store example, what is the
probability of observing a sample mean at least as extreme
as the one already observed (i.e. = 178), given that the null
hypothesis (H0: = 170) is true?
p-value
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.56
Interpreting the p-value…
Overwhelming Evidence
(Highly Significant)
Strong Evidence
(Significant)
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
0
.01
.05
.10
p=.0069
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.57
Interpreting the p-value…
Compare the p-value with the selected value of the
significance level:
If the p-value is less than , we judge the p-value to be
small enough to reject the null hypothesis.
If the p-value is greater than , we do not reject the null
hypothesis.
Since p-value = .0069 <
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
= .05, we reject H0 in favor of H1
1.58
Example 11.2…
AT&T’s argues that its rates are such that customers won’t
see a difference in their phone bills between them and their
competitors. They calculate the mean and standard deviation
for all their customers at $17.09 and $3.87 (respectively).
They then sample 100 customers at random and recalculate a
monthly phone bill based on competitor’s rates.
What we want to show is whether or not:
H1: ≠ 17.09. We do this by assuming that:
H0: = 17.09
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.59
Example 11.2…
The rejection region is set up so we can reject the null
hypothesis when the test statistic is large or when it is small.
stat is “small”
stat is “large”
That is, we set up a two-tail rejection region. The total area
in the rejection region must sum to , so we divide this
probability by 2.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.60
Example 11.2…
At a 5% significance level (i.e. = .05), we have
/2 = .025. Thus, z.025 = 1.96 and our rejection region is:
z < –1.96
-z.025
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
-or-
0
z > 1.96
+z.025
z
1.61
Example 11.2…
From the data, we calculate
= 17.55
Using our standardized test statistic:
We find that:
Since z = 1.19 is not greater than 1.96, nor less than –1.96
we cannot reject the null hypothesis in favor of H1. That is
“there is insufficient evidence to infer that there is a
difference between the bills of AT&T and the competitor.”
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.62
Summary of One- and Two-Tail Tests…
One-Tail Test
(left tail)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Tail Test
One-Tail Test
(right tail)
1.63
Chi square goodness of fit
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.64
13.1 Goodness-of-Fit Tests
Given the following…
1) Counts of items in each of several categories
2) A model that predicts the distribution of the relative
frequencies
…this question naturally arises:
“Does the actual distribution differ from the model
because of random error, or do the differences mean
that the model does not fit the data?”
In other words, “How good is the fit?”
65
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.65
13.1 Goodness-of-Fit Tests
Example: Stock Market “Up” Days
Population of Stock
Sample of 1000 “up” days
Market Days
“Up” days appear to be more common than expected on
certain days, especially on Fridays.
Null Hypothesis: The distribution of “up” days is no
different from the population distribution.
Test the hypothesis with a chi-square goodness-of-fit test.
66
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.66
13.1 Goodness-of-Fit Tests
The Chi-Square Distribution
2

Note that
“accumulates” the relative squared deviation
of each cell from its expected value.
2

So,
gets “big” when i). the data set is large and/or ii).
the model is a poor fit.
67
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.67
13.1 Goodness-of-Fit Tests
The Chi-Square Calculation
68
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.68
13.1 Goodness-of-Fit Tests
The Chi-Square Calculation: Stock Market “Up” Days
(192  193.4) 2
(218  199.7) 2
x 
 ... 
 2.62
193.4
199.7
2
Using a chi-square table at a significance level of 0.05 and
with 4 degrees of freedom:
 42  9.488  2.62
Do not reject the null hypothesis. (The fit is “good”.)
69
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.69
13.2 Interpreting Chi-Square Values
The Chi-Square Distribution
The  distribution is right-skewed and becomes broader
with increasing degrees of freedom:
2
2

The test is a one-sided test.
70
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.70
13.3 Examining the Residuals
When we reject a null hypothesis, we can examine the
residuals in each cell to discover which values are
extraordinary.
Because we might compare residuals for cells with very
different counts, we should examine standardized
residuals:
Note that standardized residuals from goodness-of-fit
tests are actually z-scores (which we already know how to
interpret and analyze).
71
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.71
13.3 Examining the Residuals
Standardized residuals for the trading days data:
• None of these values is remarkable.
• The largest, Friday, at 1.292, is not impressive when
viewed as a z-score.
• The deviations are in the direction of a “weekend effect”,
but they aren’t quite large enough for us to conclude they
are real.
72
© 2011 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.72
Chi square test of independence
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.73
Contingency Table…
In Example 2.8, a sample of newspaper readers was asked to report
which newspaper they read: Globe and Mail (1), Post (2), Star (3), or
Sun (4), and to indicate whether they were blue-collar worker (1),
white-collar worker (2), or professional (3).
This reader’s response is captured
as part of the total number on the
contingency table…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.74
Contingency Table…
Interpretation: The relative frequencies in the columns 2 & 3 are similar, but there are
large differences between columns 1 and 2 and between columns 1 and 3.
similar
dissimilar
This tells us that blue collar workers tend to read different newspapers from both
white collar workers and professionals and that white collar and professionals are
quite similar in their newspaper choice.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.75
Graphing the Relationship Between Two Nominal Variables…
Use the data from the contingency table to create bar charts…
Professionals tend
to read the Globe &
Mail more than
twice as often as the
Star or Sun…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.76
2 Test of Independence
contingency tables with r rows and c columns
H0: The two categorical variables are independent
(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-77
2 Test of Independence
(continued)
The Chi-square test statistic is:
2
χ STAT


( fo  fe )2
all cells
fe
 where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H0 is true
χ 2STAT for the r x c case has (r - 1)(c - 1) degrees of freedom
(Assumed: each cell in the contingency table has expected
frequency of at least 1)
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-78
Expected Cell Frequencies
Expected cell frequencies:
row total  column total
fe 
n
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-79
Decision Rule
The decision rule is
2
2
χ

χ
If STAT
α, reject H0,
otherwise, do not reject H0
2
Where χ α is from the chi-squared distribution with
(r – 1)(c – 1) degrees of freedom
Copyright
© 2005Statistics:
Brooks/Cole, aAdivision
Thomson 5e
Learning,
Inc.
Business
FirstofCourse,
© 2009
Prentice-Hall,
Chap 11-80
Example
The meal plan selected by 200 students is shown below:
Number of meals per week
Class
none
Standing 20/week 10/week
Total
Fresh.
24
32
14
70
Soph.
22
26
12
60
Junior
10
14
6
30
Senior
14
16
10
40
Total
70
88
42
200
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-81
Example
(continued)
The hypothesis to be tested is:
H0: Meal plan and class standing are independent
(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-82
Example:
Expected Cell Frequencies
(continued)
Observed:
Class
Standing
Number of meals
per week
Expected cell frequencies
if H0 is true:
20/wk
10/wk
none
Total
Fresh.
24
32
14
70
Soph.
22
26
12
60
Junior
10
14
6
30
Senior
14
16
10
40
Class
Standing
Total
70
88
42
200
Example for one cell:
row total  column total
fe 
n
30  70

 10.5
Business
Statistics: A
200
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Number of meals
per week
20/wk
10/wk
none
Total
Fresh.
24.5
30.8
14.7
70
Soph.
21.0
26.4
12.6
60
Junior
10.5
13.2
6.3
30
Senior
14.0
17.6
8.4
40
70
88
42
200
Total
Chap 11-83
Example: The Test Statistic
(continued)
The test statistic value is:
2
χ STAT


all cells
( f o  f e )2
fe
( 24  24.5 ) 2 ( 32  30.8 ) 2
( 10  8.4 ) 2



 0.709
24.5
30.8
8.4
χ 0.2 05 = 12.592 from the chi-squared distribution
with (4 – 1)(3 – 1) = 6 degrees of freedom
Copyright
© 2005Statistics:
Brooks/Cole, aAdivision
Thomson 5e
Learning,
Inc.
Business
FirstofCourse,
© 2009
Prentice-Hall,
Chap 11-84
Example:
Decision and Interpretation
(continued)
2
The te ststatisticis χ STAT
 0.709 ; χ 02.05 with 6 d.f.  12.592
Decision Rule:
2
χ
If
STAT > 12.592, reject H0,
otherwise, do not reject H0
0.05
0
Do
notH
reject
0
Reject H0
2
χHere,
STAT
2
20.05=12.592
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 11-85
χ 0.2 05
= 12.592,
= 0.709 <
so do not reject H0
Conclusion: there is not sufficient
evidence that meal plan and class
standing are related at  = 0.05
Independent samples t-test
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.86
Difference Between Two Means
Population means,
independent samples
*
σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown,
not assumed equal
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Goal: Test hypothesis or form a
confidence interval for the
difference between two
population means, μ1 – μ2
The point estimate for the difference
is
X1 – X 2
Chap 10-87
Difference Between Two Means:
Independent Samples
Different data sources
Population means,
independent samples
*
σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown,
not assumed equal
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Unrelated
Independent
Sample selected from one population has
no effect on the sample selected from the
other population
Use Sp to estimate unknown σ.
Use a Pooled-Variance t test.
Use S1 and S2 to estimate
unknown σ1 and σ2. Use a
Chap 10-88
Separate-variance t test
Hypothesis Tests for
Two Population Means
Two Population Means, Independent Samples
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: μ1  μ2
H1: μ1 < μ2
H0: μ1 ≤ μ2
H1: μ1 > μ2
H0: μ1 = μ2
H1: μ1 ≠ μ2
i.e.,
i.e.,
i.e.,
H0: μ1 – μ2  0
H1: μ1 – μ2 < 0
H0: μ1 – μ2 ≤ 0
H1: μ1 – μ2 > 0
H0: μ1 – μ2 = 0
H1: μ1 – μ2 ≠ 0
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-89
Hypothesis tests for μ1 – μ2
Two Population Means, Independent Samples
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: μ1 – μ2  0
H1: μ1 – μ2 < 0
H0: μ1 – μ2 ≤ 0
H1: μ1 – μ2 > 0
H0: μ1 – μ2 = 0
H1: μ1 – μ2 ≠ 0


-t
Reject H0 if tSTAT < -t
t
Reject H0 if tSTAT > t
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-90
/2
-t/2
/2
t/2
Reject H0 if tSTAT < -t/2
or tSTAT > t/2
Hypothesis tests for µ1 - µ2 with σ1
and σ2 unknown and assumed equal
Assumptions:
Population means,
independent samples
σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown,
not assumed equal
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
 Samples are randomly and
independently drawn
*
 Populations are normally
distributed or both sample
sizes are at least 30
 Population variances are
unknown but assumed equal
Chap 10-91
Hypothesis tests for µ1 - µ2 with σ1
and σ2 unknown and assumed equal
(continued)
• The pooled variance is:
Population means,
independent samples
2
2




n

1
S

n

1
S
1
2
2
S2  1
p
σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown,
not assumed equal
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
*
(n1  1)  (n2  1)
• The test statistic is:
t STAT

X1  X 2    μ 1  μ 2 

2  1
Sp 
 n1
1 


n 2 
• Where tSTAT has d.f. = (n1 + n2 – 2)
Chap 10-92
Confidence interval for µ1 - µ2 with σ1
and σ2 unknown and assumed equal
Population means,
independent samples
σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown,
not assumed equal
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
*
The confidence interval for
μ1 – μ2 is:
 X1  X 2   tα/2
2 
Sp 
1
1

 n1 n 2
Where tα/2 has d.f. = n1 + n2 – 2
Chap 10-93




Pooled-Variance t Test Example
You are a financial analyst for a brokerage firm. Is there a
difference in dividend yield between stocks listed on the NYSE
& NASDAQ? You collect the following data:
NYSE NASDAQ
Number
21
25
Sample mean
3.27
2.53
Sample std dev 1.30
1.16
Assuming both populations are
approximately normal with equal
variances, is
there a difference in mean
yield ( = 0.05)? Chap 10-94
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Pooled-Variance t Test Example:
Calculating the Test Statistic
(continued)
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
The test statistic is:

X1  X 2   μ1  μ 2 
t

2 
Sp 
1
1

 n1 n 2




3.27  2.53  0
1 
1
1.5021   
 21 25 
2
2





n

1
S

n

1
S
21  11.30 2  25  11.16 2
2
1
1
2
2
S 

p
(n1  1)  (n2  1)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
(21 - 1)  (25  1)
Chap 10-95
 2.040
 1.5021
Pooled-Variance t Test Example:
Hypothesis Test Solution
Reject H0
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
 = 0.05
df = 21 + 25 - 2 = 44
.025
-2.0154
Critical Values: t = ± 2.0154
.025
0 2.0154
t
2.040
Test Statistic:
3.27  2.53
t
 2.040
1 
 1
1.5021  

 21 25 
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reject H0
Chap 10-96
Decision:
Reject H0 at  = 0.05
Conclusion:
There is evidence of a
difference in means.
One-way ANOVA
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.97
One-Way Analysis of Variance
Evaluate the difference among the means of three or more
groups
Examples: Accident rates for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires
Assumptions
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-98
Hypotheses of One-Way ANOVA
H0 : μ1  μ2  μ3    μc
All population means are equal
i.e., no factor effect (no variation in means among groups)
HAt
Not one
all of
the population
means are the same
population
mean is different
1 : least
i.e., there is a factor effect
Does not mean that all population means are different (some
pairs may be the same)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-99
Partitioning the Variation
Total variation can be split into two parts:
SST = SSA + SSW
SST = Total Sum of Squares
(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-100
Partitioning the Variation
(continued)
SST = SSA + SSW
Total Variation = the aggregate variation of the individual data
values across the various factor levels (SST)
Among-Group Variation = variation among the factor sample
means (SSA)
Within-Group Variation = variation that exists among the
data values within a particular factor level (SSW)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-101
Partition of Total Variation
Total Variation (SST)
=
Variation Due to
Factor (SSA)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
+
Chap 10-102
Variation Due to Random
Error (SSW)
Total Sum of Squares
SST = SSA + SSW
c
nj
SST   ( Xij  X)
2
j1 i1
Where:
SST = Total sum of squares
c = number of groups or levels
nj = number of observations in group j
Xij = ith observation from group j
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-103
X = grand mean (mean of all data values)
Total Variation
(continued)
SST  ( X 11  X )  ( X 12  X )      ( X cn j  X )
2
2
Response, X
X
Group 1
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Group 2
Chap 10-104
Group 3
2
Among-Group Variation
SST = SSA + SSW
c
SSA   n j ( X j  X)
2
j1
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-105
Among-Group Variation
(continued)
c
SSA   n j ( X j  X)
2
j1
Variation Due to
Differences Among Groups
SSA
MSA 
c 1
Mean Square Among =
SSA/degrees of freedom
i
j
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-106
Among-Group Variation
(continued)
SSA  n1 (X1  X)  n 2 (X 2  X)      n c (X c  X)
2
2
Response, X
X3
X2
X1
Group 1
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Group 2
Chap 10-107
Group 3
X
2
Within-Group Variation
SST = SSA + SSW
c
SSW  
j1
nj

( Xij  X j )
2
i1
Where:
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Xij =Chap
ith10-108
observation in group j
Within-Group Variation
(continued)
c
SSW  
j1
nj

i1
( Xij  X j )
2
Summing the variation within
each group and then adding
over all groups
SSW
MSW 
nc
Mean Square Within =
SSW/degrees of freedom
μj
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-109
Within-Group Variation
(continued)
SSW  ( X 11  X 1 ) 2  ( X 12  X 2 ) 2      ( X cn j  X c ) 2
Response, X
X3
X1
Group 1
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Group 2
Chap 10-110
X2
Group 3
Obtaining the Mean Squares
The Mean Squares are obtained by dividing the various sum
of squares by their associated degrees of freedom
SSA
MSA 
c 1
Mean Square Among
(d.f. = c-1)
SSW
MSW 
nc
Mean Square Within
(d.f. = n-c)
SST
MST 
n 1
Business Statistics: A
Chap 10-111
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Mean Square Total
(d.f. = n-1)
One-Way ANOVA Table
Source of
Variation
Degrees of
Freedom
Sum Of
Squares
Among
Groups
c-1
Within
Groups
n-c
SSW
Total
n–1
SST
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
SSA
Mean Square
(Variance)
F
SSA
MSA =
c-1
SSW
MSW =
n-c
FSTAT =
MSA
MSW
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
Chap 10-112
One-Way ANOVA
F Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are different
Test statistic
MSA
FSTAT 
MSW
MSA is mean squares among groups
MSW is mean squares within groups
Degrees of freedom
df1 = c – 1
df2 = n – c
(c = number of groups)
(n = sum of sample sizes from all populations)
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-113
Interpreting One-Way ANOVA
F Statistic
The F statistic is the ratio of the among estimate of
variance and the within estimate of variance
The ratio must always be positive
df1 = c -1 will typically be small
df2 = n - c will typically be large

Decision Rule:
Reject H0 if FSTAT > Fα,
otherwise do not reject
H0
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-114

0
Do not
reject H0
Reject H0
Fα
One-Way ANOVA
F Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select five
measurements from trials on an
automated driving machine for
each club. At the 0.05
significance level, is there a
difference in mean distance?
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-115
Club 1
254
263
241
237
251
Club 2
234
218
235
227
216
Club 3
200
222
197
206
204
One-Way ANOVA Example: Scatter
Plot
Club 1
254
263
241
237
251
Club 2
234
218
235
227
216
Distance
270
Club 3
200
222
197
206
204
260
250
240
230
•
••
•
•
X1
••
•
•• X 2
220
210
x1  249.2 x 2  226.0 x 3  205.8
200
x  227.0
190
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-116
•
••
••
1
2
Club
3
X
X3
One-Way ANOVA Example
Computations
Club 1
254
263
241
237
251
Club 2
234
218
235
227
216
Club 3
200
222
197
206
204
X1 = 249.2
n1 = 5
X2 = 226.0
n2 = 5
X3 = 205.8
n3 = 5
X = 227.0
n = 15
c=3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6
MSA = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.3
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2358.2
FSTAT 
 25.275
93.3
Chap 10-117
One-Way ANOVA Example Solution
H0: μ1 = μ2 = μ3
H1: μj not all equal
 = 0.05
df1= 2 df2 = 12
Test Statistic:
MSA 2358.2
FSTAT 

 25.275
MSW
93.3
Critical
Value:
Fα = 3.89
 = .05
0
Do not
Reject H0
reject H0
Fα = A3.89
Business Statistics:
Chap 10-118
FSTAT = 25.275
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Decision:
Reject H0 at  = 0.05
Conclusion:
There is evidence that at
least one μj differs from
the rest
ANOVA Assumptions
Randomness and Independence
Select random samples from the c groups (or randomly
assign the levels)
Normality
The sample values for each group are from a normal
population
Homogeneity of Variance
All populations sampled from have the same variance
Can be tested with Levene’s Test
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-119
ANOVA Assumptions
Levene’s Test
Tests the assumption that the variances of each population are
equal.
First, define the null and alternative hypotheses:
H0: σ21 = σ22 = …=σ2c
H1: Not all σ2j are equal
Second, compute the absolute value of the difference between
each value and the median of each group.
Third, perform a one-way ANOVA on these absolute
differences.
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-120
Levene Homogeneity Of Variance
Test Example
H0: σ21 = σ22 = σ23
H1: Not all σ2j are equal
Calculate Medians
Club 1
Club 2
Calculate Absolute Differences
Club 3
Club 1
Club 2
Club 3
237
216
197
14
11
7
241
218
200
10
9
4
251
227
204 Median
0
0
0
254
234
206
3
7
2
263
235
222
12
8
18
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 10-121
Levene Homogeneity Of Variance
Test Example
(continued)
Anova: Single Factor
SUMMARY
Groups
Count
Sum Average Variance
Club 1
5
39
7.8
36.2
Club 2
5
35
7
17.5
Club 3
5
31
6.2
50.2
F
Pvalue
Source of Variation
Between Groups
Within Groups
Total
SS
df
6.4
2
415.6
12
422
14
Business Statistics: A
First Course, 5e ©
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
MS
3.2 0.092
34.6
Chap 10-122
F crit
0.912 3.885
Since the pvalue is
greater than
0.05 we fail
to reject H0
& conclude
the
variances
are equal.
correlation
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.123
Scatter Diagram…
It appears that there is a relationship, that is, the greater the
house size the greater the selling price…
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1.124
Types of Relationships
Linear relationships
Curvilinear relationships
Y
Y
X
X
Y
Y
X
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 12-125
X
Types of Relationships
(continued)
Strong relationships
Weak relationships
Y
Y
X
X
Y
Y
X
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 12-126
X
Types of Relationships
(continued)
No relationship
Y
X
Y
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 12-127
X
The Covariance
The covariance measures the strength of the linear relationship
between two numerical variables (X & Y)
The sample covariance:
n
cov ( X , Y ) 
 ( X  X)( Y  Y)
i1
i
i
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.128
Chap 3-128
Interpreting Covariance
Covariance between two variables:
cov(X,Y) > 0
X and Y tend to move in the same direction
cov(X,Y) < 0
X and Y tend to move in opposite directions
cov(X,Y) = 0
X and Y are independent
The covariance has a major flaw:
It is not possible to determine the relative strength of the
relationship from the size of the covariance
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.129
Chap 3-129
Coefficient of Correlation
Measures the relative strength of the linear relationship
between two numerical variables
Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n
cov (X , Y) 
 (X  X)(Y  Y)
i1
i
n
i
n 1
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
SX 
 (X  X)
i1
i
n 1
n
2
SY 
 (Y  Y )
i1
2
i
n 1
1.130
Chap 3-130
t Test for a Correlation Coefficient
Hypotheses
H0: ρ = 0
H1: ρ ≠ 0
(no correlation between X and Y)
(correlation exists)
Test statistic
t STAT 
r -ρ
2
1 r
n2
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chap 12-131
(with n – 2 degrees of freedom)
where
r   r 2 if b1  0
r   r 2 if b1  0
t-test For A Correlation Coefficient
(continued)
Is there evidence of a linear relationship between
square feet and house price at the .05 level of
significance?
H0: ρ = 0 (No correlation)
H1: ρ ≠ 0 (correlation exists)
 =.05 , df = 10 - 2 = 8
t STAT 
r ρ
1 r2
n2
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

.762  0
1  .762 2
10  2
Chap 12-132
 3.329
t-test For A Correlation Coefficient
(continued)
t STAT 
r ρ
1 r2
n2

.762  0
1  .762 2
10  2
 3.329
Conclusion:
There is evidence
of a linear
association at the
5% level of
significance
d.f. = 10-2 = 8
/2=.025
Reject H0
-tα/2
-2.3060
/2=.025
Do not reject H0
0
Business Statistics: A
First Course, 5e © 2009
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reject H0
tα/2
2.3060
Chap 12-133
Decision:
Reject H0
3.329
Features of the
Coefficient of Correlation
The population coefficient of correlation is referred as ρ.
The sample coefficient of correlation is referred to as r.
Either ρ or r have the following features:
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative linear relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker the linear relationship
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
1.134
Chap 3-134
Scatter Plots of Sample Data with Various
Coefficients of Correlation
Y
Y
X
r = -1
Y
X
r = -.6
Y
Y
r = +1
X
Copyright
© 2005
Brooks/Cole,
division of
Inc. Inc.
Business
Statistics:
A FirstaCourse,
5eThomson
© 2009 Learning,
Prentice-Hall,
X
r = +.3
X
r=0
1.135
Chap 3-135
Related documents