Download Part 09-testing hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Part (09):
Testing Hypotheses (Statistical Inference)
A hypothesis is a prediction about a bout some aspect of a variable or a collection of
variables.
Hypotheses are derived from theory and they serve guides to research. When a
hypothesis can stated in terms of one or more parameters of the appropriate population
distribution(s), statistical methods can be used to test its validity.
‫ﺍﻟﻔﺮﺿﻴﺔ ﻫﻲ ﺍﻟﺘﻨﺒﺆ ﻋﻦ ﻧﺎﺣﻴﺔ ﻣﻌﻴﻨﺔ ﳌﺘﻐﲑ ﺃﻭ ﳎﻤﻮﻋﺔ ﻣﻦ ﺍﳌﺘﻐﲑﺍﺕ ﻭﺗﺸﺘﻖ ﻣﻦ ﺍﻟﻨﻈﺮﻳﺔ ﻭﳑﻜﻦ ﺃﻥ ﺗﻮﺿﻊ ﺑﺸﻜﻞ ﻣﻌﻠﻢ ﺃﻭ ﻣﻌﺎﱂ ﺇﺣﺼﺎﺋﻴﺔ‬
.‫ﻟﻠﻤﺠﺘﻤﻊ ﺍﻹﺣﺼﺎﺋﻲ ﺍﳌﻼﺋﻢ ﻭﳝﻜﻦ ﺃﻥ ﺗﺴﺘﺨﺪﻡ ﺍﻷﺳﺎﻟﻴﺐ ﺍﻹﺣﺼﺎﺋﻴﺔ ﺍﳌﺨﺘﻠﻔﺔ ﳌﻌﺮﻓﺔ ﻣﺪﻯ ﻭﻗﺘﻬﺎ‬
A statistical test involves comparing what is expected according to the hypothesis
with what is actually observed in the data.
.‫ﺍﻟﻔﺮﺽ ﺍﻹﺣﺼﺎﺋﻲ ﻳﺘﻀﻤﻦ ﻣﻘﺎﺭﻧﺔ ﻣﺎ ﻫﻮ ﻣﺘﻮﻗﻊ ﻣﻦ ﺍﻟﻔﺮﺿﻴﺔ ﻭﺍﻟﻮﺍﻗﻊ ﺍﻟﺬﻱ ﻳﺘﻢ ﻣﻼﺣﻈﺘﻪ ﰲ ﺍﻟﺒﻴﺎﻧﺎﺕ ﺍﻷﻭﻟﻴﺔ‬
Elements of a statistical test :‫ﻋﻨﺎﺻﺮ ﺍﻟﻔﺤﺺ ﺍﻹﺣﺼﺎﺋﻲ‬
There are five basic elements of statistical tests of hypotheses about a parameter;
Assumptions, Hypothesis, test statistic, attained significance level and conclusion.
1. Assumptions:
All statistical tests are based upon certain assumptions that must be met in order
for the tests to be valid. These assumptions usually entail considerations such as the
following:
a. The assumed scale of measurement of the variable: as with other statistical
procedures, each test is specifically designed for a certain level of
measurements.
b. The form of the population distribution: for many tests, the variable must
be continuous or even normally distributed in the population.
c. The method of sampling: the formulas for nearly every test we consider
require random sampling.
d. The sample size: many rely on results similar to the central limit Theorem
and require a certain minimum sample size in order to be valid.
2. Hypotheses:
A statistical test focuses on two hypotheses about the value of a parameter.
The null hypothesis is the hypothesis that is usually tested. The alternative
hypothesis is accepted when the test results in rejection of the null hypothesis. It is
consists of an alternative set of parameter values to those given in the null hypothesis.
-1-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Notation: the symbol H o represents the null hypothesis and the symbol H a
represents the alternative hypothesis.
3. Test statistics:
After obtaining the sample, we form some sample statistic with a known sampling
distribution to help us test the null hypothesis.
The testing procedure is such that the null hypothesis can be either rejected or
accepted. If the null hypothesis is rejected, then the alternative hypothesis is
accepted.
The purpose of the test is to analyze, in a probabilistic terms, how strong the
sample evidence is for rejecting the null hypothesis and hence accepting the
alternative hypothesis.
4. Attained significance level:
The alternative significance level is defined to be the probability that the test
statistic would occur in this collection of values, if H o were true.
The attained significance level is the probability that we would have obtained a
value of the test statistic as favorable or more favorable to H a than the actual
observed value of the test statistic, if H o were true.
Notation: the attained significance level is denoted by P and is sometimes referred to
as the P value of the test.
The smaller the value of P, therefore, the more contradictory the sample results
are to H o .
5. Conclusion:
If the attained significance level is sufficiently small, we might decide to
reject H o , and therefore accept H a .
. H a ‫ ﻭ ﻗﺒﻮﻝ‬H o ‫ﻛﻠﻤﺎ ﻗﻠﺖ ﺩﺭﺟﺔ ﺍﳌﻌﻨﻮﻳﺔ ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﺩﺭﺟﺔ ﻗﺮﺍﺭﻧﺎ ﺑﺮﻓﺾ‬
To illustrate, we might decide to reject H o if the stained significance level
P<0.05 and thus conclude that there is not enough evidence to reject H o
if P ≥ 0.05 . The value 0.05 would then be referred to as α-level of the test.
The α-level is a number such that H o is rejected if the attained significance
level is less than its value.
.0.01 ،0.05 ،0.10 ‫ ﻣﻦ ﻗﺒﻞ ﺍﻟﺒﺎﺣﺜﲔ ﲝﻮﺍﱄ‬α-level ‫ﲣﺘﺎﺭ ﻋﺎﺩﺓ‬
The smaller the α-level is chosen to be, the stronger the evidence must be before
rejecting H o .
-2-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Table (9 – 1): Possible conclusions in a test of a hypothesis with α-level 0.05
Attained significance level (0.05 = α-level) conclusion
Ho
Ha
P<0.05
Reject
Accept
P ≥ 0.05
Do not reject Do not accept
The collection of values that would lead a researcher to reject H o at a particular αlevel is referred to a rejection region. For example the rejection region for a test of level
α = 0.05 is the set of values for the test statistic that produce P<0.05.
Large Sample Test of Hypothesis about µ
Table (9 – 2):
One tailed test (one sided)
H o : µ = µo
H a :µ 〈 µo
(Or H a : µ 〉 µ o )
Test statistic :
X − µo
N −n
σ
∗
; σX =
z=
N −1
σX
n
Rejection region:
z cal 〉 zα reject H o , accept H a
Two tailed test (two sided)
H o : µ = µo
H a :µ ≠ µo
Test statistic:
X − µo
N −n
σ
∗
; σX =
z=
N −1
σX
n
Rejection region:
z cal 〉 zα / 2 reject H o , accept H a
Where zα is chosen so that P(z > zα ) = α Where zα / 2 is chosen so that
P(z > zα / 2 ) = α/2
area zα = (0.5 – α)
zα / 2 = (0.5 - α/2)
Two tailed:
H o : µ = µ o = 0 (‫)ﺍﻟﻔﺮﻕ ﺑﻴﻨﻬﻤﺎ ﺻﻔﺮ‬
H a :µ ≠ µo ≠ 0
Right one tailed:
H o : µ = µo = 0
H a :µ 〉 µo ≠ 0
Left one tailed:
H o : µ = µo = 0
-3-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
H a :µ 〈 µo ≠ 0
Example (9 – 1):
The mean number of prior conviction in a country in 1970 was 2.0; the police chief
believes that the mean has increased. Then a small investigation has been organized to
check hunch, a random sample of 36 records is examined out of the set of all conviction
records involved a country in 1978 and is summarized that the X = 2.8 and σ = 2.0will
99 % confidence interval
‫ ﻭﻟﻜﻦ‬،‫ ﻛﺎﻥ ﻣﺘﻮﺳﻂ ﺍﳉﺮﳝﺔ ﺍﻟﻌﺎﻡ ﻋﺒﺎﺭﺓ ﻋﻦ ﺟﺮﳝﺘﲔ ﻟﻜﻞ ﺳﺎﻋﺔ‬1970 ‫ﺗﺸﲑ ﺍﻹﺣﺼﺎﺀﺍﺕ ﺍﻟﺴﺎﺑﻘﺔ ﰲ ﻗﺴﻢ ﺍﻟﺸﺮﻃﺔ ﻹﺣﺪﻯ ﺍﳌﻨﺎﻃﻖ ﺃﻧﻪ ﻋﺎﻡ‬
‫( ﻭﻭﺟﺪ‬1978) ‫ ﻣﻠﻒ ﻋﺸﻮﺍﺋﻴﹰﺎ ﻭﰎ ﻓﺤﺼﻬﺎ‬36 ‫ﺭﺋﻴﺲ ﺍﻟﻘﺴﻢ ﻛﺎﻥ ﻟﻪ ﺍﻋﺘﻘﺎﺩ ﺷﺨﺼﻲ ﺃﻥ ﻫﺬﺍ ﺍﳌﺘﻮﺳﻂ ﻗﺪ ﺍﺯﺩﺍﺩ ﻭﺑﻨﺎ ًﺀ ﻋﻠﻴﻪ ﰎ ﺃﺧﺬ ﻋﻴﻨﺔ ﻣﻦ‬
‫ ﻫﻞ ﻫﺬﻩ ﺍﻟﺒﻴﺎﻧﺎﺕ ﻛﺎﻓﻴﺔ ﺃﻭ ﻫﻞ ﻳﺴﺘﻄﻴﻊ ﺭﺋﻴﺲ ﻗﺴﻢ‬،2.0 ‫ ﻭﺍﻻﳓﺮﺍﻑ ﺍﳌﻌﻴﺎﺭﻱ ﻋﺒﺎﺭﺓ ﻋﻦ‬،2.8 ‫ﺃﻥ ﻣﺘﻮﺳﻂ ﺍﳉﺮﳝﺔ ﰲ ﺍﻟﻌﻴﻨﺔ ﻗﺪ ﺍﺯﺩﺍﺩ ﺇﱃ‬
.‫ ﻣﺴﺘﻮﻯ ﺛﻘﺔ ﻋﻠﻰ ﺃﻥ ﻧﺴﺒﺔ ﺍﳉﺮﳝﺔ ﺍﺯﺩﺍﺩﺕ ﺃﻭ ﻻ‬% 99 ‫ﺍﻟﺸﺮﻃﺔ ﺃﻥ ﳚﺰﻡ ﺑﻨﺴﺒﺔ‬
1 – α = confidence level (‫)ﻣﺴﺘﻮﻯ ﺍﻟﺜﻘﺔ‬
α = significance level (‫)ﺩﺭﺟﺔ ﺍﳌﻌﻨﻮﻳﺔ‬
µ =2.0, X =2.8, σ = 2.0, n = 36
The estimated standard error of the sampling distribution of X is:
σ
2.0
σX =
=
= 0.33
36
n
H o : µ = µ o = 2.0
H a : µ 〉 µ o ≠ 2.0
The value of test statistic is therefore:
X − µ o 2.8 − 2.0
z cal =
=
= 2.4
σX
0.33
One tailed = α (0.99 = 1- α, α = 0.01)
z tab = 0.5 – α
= 0.5 – 0.01 = 0.49
z = 2.33 ⇒ z cal 〉 z tab
As a result; accept H a and reject H o
-4-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Accept H o
Reject H a
Reject H o
Accept H a
α
z tab
z cal
= 2. 33
= 2.4
One tailed
one tailed ‫ ﺍﶈﺴﻮﺑﺔ ﰲ ﺣﺎﻟﺔ ﺍﻟـ‬z ‫ ﺍﳉﺪﻭﻟﻴﺔ ﺗﺄﺧﺬ ﺇﺷﺎﺭﺓ‬z ‫ ﺇﺷﺎﺭﺓ‬:‫ﻣﻼﺣﻈﺔ‬
Important:
H a :µ 〉µo
Accept H o
Reject H o
Reject H a
Accept H a
z tab
H a :µ 〈 µo
-5-
One tailed
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Reject H o
Accept H a
Accept H o
Reject H a
-
z tab
One tailed
H a :µ ≠ µo
Reject H o
Reject H o
Accept H o
Reject H a
Accept H a
- zα / 2
Accept H a
zα / 2
Tow tailed
‫ﻋﻠﻰ ﻓﺮﺽ ﺃﻥ ﺭﺋﻴﺲ ﻗﺴﻢ ﺍﻟﺸﺮﻃﺔ ﱂ ﻳﺼﺮﺡ ﺑﺄﻥ ﻧﺴﺒﺔ ﺍﳉﺮﳝﺔ ﻗﺪ ﺯﺍﺩﺕ ﻋﻦ ﻫﺬﺍ ﺍﳌﻌﺪﻝ ﻭﺇﳕﺎ ﺻﺮﺡ ﺑﺄﻧﻪ ﻻ ﻳﺴﺘﻄﻴﻊ ﺃﻥ ﳛﺪﺩ ﺑﺎﻟﻀﺒﻂ ﻫﻞ ﻫﻲ‬
‫ﺃﻛﱪ ﺃﻡ ﺃﻗﻞ؟‬
H o : µ = µ o = 2.0
H a :µ ≠ µo ≠ 2
Two tailed test;
-6-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
z cal =
X − µo
2.8 − 2.0
= 2.4
0.33
=
σX
P ( z α / 2 ) = 0 .5 −
Agricultural Statistic (605150)
Dr. Amer Salman
α
2
= 0 .5 −
0.01
= 0.495 ⇒ z tab = 2.575
2
zα / 2 = ± 2.575
z cal ?〉 zα / 2
2.4 〉 2.575
2.4 〈 2.575
z cal 〈 z tab
Accept H o , reject H a
Reject H o
Accept H a
Reject H o
Accept H a
Accept H o
Reject H a
z tab = −2.575
2.4
z tab = 2.757
. H o ‫ ﻭﺗﺰﺩﺍﺩ ﺩﺭﺟﺔ ﺍﻟﻘﺒﻮﻝ ﻟـ‬H a ‫ ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﻗﻴﻤﺔ ﺍﳋﻄﺄ ﻭﺑﺎﻟﺘﺎﱄ ﻳﺰﺩﺍﺩ ﺩﺭﺟﺔ ﺍﻟﺮﻓﺾ ﻟـ‬α ‫ ﻛﻠﻤﺎ ﺯﺍﺩﺕ‬:‫ﻣﻼﺣﻈﺔ‬
In the previous example suppose that X = 2.8, σ = 2.0 had been calculated from a
sample size n = 50 instead of n = 36.
H o : µ = 2.0
H a:µ ≠ 2
σX =
σ
n
=
2.0
50
= 0.283
-7-
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
z cal =
X − µo
σX
=
Agricultural Statistic (605150)
Dr. Amer Salman
2.8 − 2.0
= 2.83
0.283
Two tailed test at 99 % confidence level
(1 – α) = confidence level
α = significance level
0.01
1−
= 0.495
2
z cal 〉 zα / 2
2.83 〉 2.575
Reject H o , accept H a
.‫ ﻛﻠﻤﺎ ﺯﺍﺩ ﺣﺠﻢ ﺍﻟﻌﻴﻨﺔ ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﺍﻟﺪﻗﺔ ﰲ ﺣﺴﺎﺑﺎﺕ ﺍﻟﻔﺮﺿﻴﺔ ﻭﺫﻟﻚ ﻷﻥ ﺍﻻﳓﺮﺍﻑ ﺍﳌﻌﻴﺎﺭﻱ ﻳﻘﻞ‬:‫ﻣﻼﺣﻈﺔ‬
Reject H o
Reject H o
Accept H o
Reject H a
Accept H a
Accept H a
z scale
2.575
2.757
2.83
X scale
1.269
2.731
-8-
2.83
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Confidence interval = µ o ± z tab
Agricultural Statistic (605150)
Dr. Amer Salman
s
n
2.0
X = 2.0 ± 2.575
= 2.0 ± 0.731
50
= µ o ± z tab
σ
n
1.269 〈 X 〈 2.731
Example (9 – 2):
Suppose that a firm wants to test if it can claim that the light bulbs produced last 1000
burning hours ( µ ). The firm takes a random sample of n = 100 bulbs and finds that the
sample mean ( X ) = 980 hr and the sample standard deviation (s ) = 80 hr. if the firm
wants to conduct the test at 5 % level of significance to show that the light bulbs last
different than 1000 hr.
Solution:
It could proceed as follows. Since µ could be equal to, larger than or smaller then 1000
hr the firm should set the null hypothesis and the alternative hypothesis as:
H o : µ = 1000
H a : µ ≠ 1000
Since n > 30, the sampling distribution of the mean is approximately normal (and we can
use s as an estimate of σ ). The acceptance region of the test at the 5 % level of
significance is within ± 1.96 (95%) under the standard normal curve and the rejection
region is in both tails we have a two tailed test. The third step is to find the z value
corresponding to X : 0.5 −
z cal =
X − µo
σX
=
X − µo
σ/ n
=
α
2
= 0.5 − 0.025 = 0.475 ⇒ z tab = ±1.96
X − µo
s/ n
=
980 − 1000
80 / 100
z cal 〉 zα / 2
2.5 〉 1.96 → reject H o , accept H a
-9-
=
− 20
= −2.5
8
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Reject H o
Accept H a
Agricultural Statistic (605150)
Dr. Amer Salman
Reject H o
Accept H a
Accept H o
Reject H a
z scale
− 1.96
1.96
X scale
980
984.32
Confidence interval for X = µ o ± z tab
1015.68
s
n
= 1000 ± 1.96 ×
80
= 1000 ± 15.68
10
984.32 〈 X 〈 1015.68
980 not in interval, rejection of bulbs.
Example (9 – 3):
An army recruiting center from past experience that the weight of army recruits is
normally distributed with a mean µ of 80 kg (about 176 lb) and a standard deviation σ
of 10 kg. The recruiting center wants to test at 1% level of significance if the average
weight of this year recruits is above 80 kg, to do this, it takes a sample of 25 recruits and
finds that the average for this sample is 85 kg. How can this test be performed?
Solution:
Since the center interested in testing the µ > 80 kg, it sets up the following hypotheses
H o : µ = 80 kg
H a : µ > 80 kg
- 10 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
With H a : µ > 80 kg we have right tail test with the region to the right of z = 2.33 at the
level of 1% level of significance then; 0.5 – 0.01 = 0.49 (area), z = 2.33
z cal =
X − µo
σX
=
X − µo
σ/ n
=
85 − 80
10 / 25
= 2.5
one tailed ‫ ﰲ ﺣﺎﻟﺔ‬zcal ‫ ﺇﺷﺎﺭﺓ‬zα ‫ ﺩﺍﺋﻤﹰﺎ ﺗﺘﺒﻊ ﺇﺷﺎﺭﺓ‬:‫ﻣﻼﺣﻈﺔ‬
z cal 〉 zα / 2
2.5 〉 2.33 → reject H o , accept H a
Since the calculated value z falls within the rejection region, we reject H o and accept
H a (that µ > 80 kg ). This means that if µ = 80 kg the probability of getting a random
sample from this population that gives X = 85 kg is less than 1%. That would be an
unused sample indeed. Thus we reject H o at the 1% level of significance (i.e. we reject
99% confident of making the right decision).
Reject H o
Accept H a
2.33
Example (9 – 4):
A producer of steel cables wants to test if the cables produced have a breaking strength of
5000 lb. a breaking strength of less than 5000lb would not be adequate breaking strength
and to produce steel cables with breaking strength of more than 5000 lb would
unnecessarily increase production costs, and the production takes a random sample of 64
pieces and finds that the average breaking strength is 5100 lb and the sample standard
deviation is 480 lb. should the producer accept the hypothesis that its steel cable has a
breaking strength of 5000 lb at 5% level of significance?
- 11 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Solution:
Since µ could be equal to, larger than or smaller than 5000 lb we set up the null and
alternative hypotheses as follows:
H o : µ = 5000 lb
H a : µ ≠ 5000 lb
Since n > 30 and the acceptance region of the test at the 5% level of significance is within
( ± 1.96 ) under the standard normal curve and the rejection region or critical region is
outside, since the rejection region is in both tails, we have a two tail test, the third step is
to find the z value corresponding to X :
X − µ o X − µ o X − µ o 5100 − 5000
=
=
=
= 1.67
z cal =
σX
σ/ n
s/ n
480 / 64
z cal 〈 zα / 2
1.67 〉 1.96 → accept H o , reject H a
Since the calculated value of z falls within the acceptance region, the producer should
accept the null hypothesis and reject H a at 5% level of significance (or with 95% level of
confidence).
Note that this does not prove that µ is indeed equal to 5000 lb, it can only proves that
there is no statistical evidence that µ is not equal to 5000 lb at the 5% level of
significance.
Confidence interval:
C.I = µ o ± z tab σ X
= µ o ± z tab
σ
n
= 5000 ± 1.96 ×
480
64
= 5000 ± 117.6
4882.4 〈 X 〈 5117.6
- 12 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Reject H o
Accept H a
Agricultural Statistic (605150)
Dr. Amer Salman
Reject H o
Accept H a
Accept H o
Reject H a
z scale
− 1.96
1.67
1.96
X scale
4882.4
5117.6
Example (9 – 5):
Assume that a population is composed of 900 elements with a mean of 20 units and
standard deviation of 12. Find the mean and standard deviation and confidence interval of
the sampling distribution of the mean for a sample size of 36 units, at 1% level of
significance.
N = 900, µ = 20, s = 12, n = 36, α = 1%
µ X = µ = 20
σ 12
σX =
=
=2
n
6
(Note: we use correction factor if n ≥ 0.05 N , 36? ≥ 0.05(900), 36 < 45)
P ( zα / 2 ) = 0.2 – α/2 = 0.495
Confidence interval = µ o ± z tab σ X
= 20 ± 2.575 × 2
= 20 ± 5.15
14.85 〈 X 〈 25.15
- 13 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Example (9 – 6):
A random sample of 144 with a mean of 100 and s = 60 is taken from a population (N) of
1000. Compute the 95% confidence interval for the unknown population mean.
Solution:
N = 1000, n = 144, X = 100, s = 60, α = 5%
Confidence interval = µ o ± z tab σ X
µ = X ± 1.96 ×σ X
= 100 ± 1.96 ×
= 100 ± 9.11
60
144
90.89 〈 µ 〈 109.11
1 – α = confidence interval
Large – Sample Estimation of a Population Mean
The sample mean X represents a point estimation of the population mean µ . How
can we asses the accuracy of this point estimation?
zσ
X ± zσ X = X ± z
n
Definition 1: an interval estimator is a formula that tells us how to use sample data to
calculate an interval that estimates a population parameter.
Definition 2: the confidence coefficient is the probability that an interval estimator
encloses the population parameter if the estimator is used repeatedly a very large number
of times. The confidence level is the confidence coefficient expressed as a percentage.
Large sample 100(1 – α) % confidence interval for µ
X ± zα / 2 σ X
Where: zα / 2 the z is value with an area α/2 to its right and σ X =
σ
, σ is the standard
n
deviation of the sampled population and n is the sample size. When σ unknown (as is
almost always the case) and n is large (say n ≥ 30 ), the value of σ can be approximated
by the sample standard deviation, s.
- 14 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
To illustrate, for a confidence coefficient of 0.90, (1 – α) = 0.90, α = 0.10, α/2 = 0.05
and z 0.05 is the value that locates 0.05 in one tail of the sampling distribution.
Confidence level (100 (1 – α) α
90 %
95 %
99 %
α/2
zα / 2
0.10 0.05 1.645
0.05 0.025 1.96
0.01 0.005 2.575
Example (9 – 7):
Unoccupied seats on flights cause the airline to lose revenue. Suppose a large airline
wants to estimate its average number of unoccupied seats per flight over the past year. To
accomplish this, the records of 225 flights are randomly selected and the number of
unoccupied seats is noted for each of the sampled flights, the sample mean and standard
deviation are X =11.6 seats, s = 4.1 seats. Estimate µ the mean number of unoccupied
seats per flight during the past year using a 90 % confidence interval.
Solution:
The general form of the large – sample 90 % confidence interval for a population mean
is:
α = 0.1, α/2 = 0.05
X ± zα / 2 σ X = X ± z 0.05 σ X
⎛ σ ⎞
= X ± 1.645 ⎜⎜
⎟⎟
⎝ n⎠
For the 225 records sampled, we have
⎛ 4.1 ⎞
⎟⎟ = 11.6 ± 0.45
11.6 ± 1.645 ⎜⎜
⎝ 225 ⎠
Or from 11.15 to 12.05, that is the airline can be 90 % confident that the number of
unoccupied seats per flight was between 11.15 and 12.05 during the sampled year.
Sampling distribution of the mean:
If we take repeated random samples from a population and measure the mean of each
sample, we find that most of these sample means, X s, differ from each other. The
probability distribution of these sample means is called sampling distribution of the
mean. However the sampling distribution of the mean itself has a mean, given by the
symbol µ X and a standard deviation or standard error σ X .
- 15 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Two important theorems relate the sampling distribution of the mean to the parent
population.
Theorem 1: if we take repeated random samples of size n from a population:
µX = µ
→ (4.1)
σX =
And
σ
n
or
σX =
σ
n
×
N −n
N −1
(4.2 a, b)
Where equation (4.2 b) is used for finite populations of size N when n ≥ 0.05 N .
Theorem 2: as the samples’ size is increased (that is as n → ∞ ), the sampling distribution
of the mean approaches the normal distribution regardless of the shape of the parent
population. The approximation is sufficiently good for n ≥ 30 . This is the central limit
theorem.
We can find the probability that a random sample has a mean X in a given interval by
first calculation the z values for the interval, where:
X − µX
And then look up these values from the z-table.
z cal =
σX
Note: the greater is n, the smaller is the spread or standard error of the mean, σ X if
the parent population is normal, the sampling distribution of the mean are also normally
distributed, even in small samples. According to the central limit theorem, even if the
parent population is not normally distributed, the sampling distributions of the mean are
approximately normal for n ≥ 30 .
- 16 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Sampling distribution
of the mean, n=20
Sampling distribution
of the mean, n=5
X scale
µ
µX
X scale
Example (9 – 8):
Assume that a population is composed of 900 elements with a mean of 20 units and a
standard deviation of 12. The mean and standard error of the sampling distribution of the
mean for a sample size of 36 is:
µ X = µ = 20
σX =
σ
=
n
n ≥ 0.05 N
12
36
=2
36 〉 0.05(900)
36 〈 45
If n had been 64 instead of 36 (so that n ≥ 0.05 N ), then
σX =
σ
n
×
N −n
12
900 − 64 12
836
=
×
= ×
= (1.5) × (0.96) = 1.44
N −1
900 − 1
8
899
64
Instead of σ X =1.5 without the correction factor.
- 17 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Example (9 – 9):
The probability that the mean of a random sample, X of 36 elements from the population
in the pervious example falls between 18 and 24 units is computed as follows:
z1 =
X 1 − µX
σX
=
X 2 − µ X 24 − 20
18 − 20
= −1 And z 2 =
=
=2
σX
2
2
Looking up z1 and z 2 in the z-table we get
P(18 〈 X 〈 24) = 0.3413 + 0.4772 = 0.8158 Or 81.85%
X Scale
18
-1
20
0
24
z scale
2
Estimation using the Normal Distribution:
A point estimate: is a single number. Such a point estimate is unbiased if in repeated
random sampling from a population, the expected or mean value of the corresponding
statistic is equal to the population parameter. For example, X is an unbiased (point)
estimate of µ because µ X = µ , where µ X is the unexpected value of X . The sample
standard deviation is an unbiased estimate of σ and a sample population P is an unbiased
estimate of P.
- 18 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
An Interval estimate: refers to the range of values together with the probability, or
confidence level that the interval includes the unknown population parameter. Given the
population standard deviation or its estimate and given the population is normal or that
the random sample is equal to or larger than 30, we can find the 95 % confidence interval
for the unknown population mean as:
P ( X − 1.96 σ X 〈 µ 〈 X + 1.96 σ X ) = 0.95
This means that we expect that 95 out of 100 intervals include the unknown
population mean and that our confidence interval is one of these.
A confidence interval can be constructed similarly for the population proportion
where:
µP =
σP =
µ
n
= P (The proportion of success in the population)
P(1 − P )
(The standard error of the proportion)
n
Example (9 – 10):
A random sample of 144 with a mean of 100 and a standard deviation of 60 is taken from
a population of 1000. The 95 % confidence interval for the unknown population mean is:
µ = X ± 1.96 σ X Since n > 30
= X ± 1.96 ×
σ
= 100 ± 1.96 ×
×
n
60
N −n
Since n > 0.05 N
N −1
×
1000 − 144
(using s as an estimate of σ )
1000 − 1
144
= 100 ± 1.96 × 5 × 0.93
= 100 ± 9.11
⇒ 90.89 < µ <109.11
Example (9 – 11):
A manager wishes to estimate the mean number of minutes that workers take to complete
a particular manufacturing process within 3 min. and with 90 % confidence. From past
experience, the manager knows that the standard deviation σ is 15 min. The minimum
required sample (n > 30) is found as follows:
- 19 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
z=
X −µ
σX
Agricultural Statistic (605150)
Dr. Amer Salman
⇒ zσ X = X − µ
1.64 ×σ X = X − µ Assuming n< 0.05 N
1.46 ×
σ
n
15
= X −µ
= 3 Since the total confidence interval, X − µ is 3 min.
n
15
1.46 × = n
3
n = 67.24 or 68 (round to the next higher integer)
1.46 ×
Determining the Sample Size Necessary for Making Inference about a Population
Mean
We will see in this section how to choose the appropriate sample size for making an
inference about the population mean depends on the desired reliability.
Consider the following example: a sample of 100 delinquent accounts produced an
estimate X that was within $18 of the true mean amount due, µ for all delinquent
accounts at the 95 % confidence level, that is the 95 % confidence interval for µ was $36
wide when 100 accounts we sampled.
1.96 σ X =18
X
µ
1.96 σ X =18
n = 100
Now suppose that we want to estimate µ to within $5 with 95% confidence. That is,
we want to narrow the width of the confidence interval from $36 to $10, as shown in the
following figure:
- 20 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
X
µ
1.96 σ X =5
n = 1300
1.96 σ X =5
How much will the sample size have to be increased to accomplish this, if we want
⎛ σ ⎞
the estimator X to be within $5 of µ 1.96 σ X = 5 or equivalently 1.96 ⎜⎜
⎟⎟ = 5
⎝ n⎠
If the s = 90 from the sample 100 (approximation)
⎛ σ ⎞
⎛ s ⎞
1.96 ⎜⎜
⎟⎟ ≈ 1.96 ⎜⎜
⎟⎟
⎝ n⎠
⎝ n⎠
⎛ 90 ⎞
= 1.96 ⎜⎜
⎟⎟ = 5
⎝ n⎠
1.96 (90)
n=
= 35.28
5
n = (35.28) 2 = 1,244.68
Approximately 1,245 accounts will have to be sampled to estimate the mean overdue
amount µ to within $5 with approximately 95 % confidence, the confidence interval
resulting from a sample of this size will be approximately $10 wide.
- 21 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Testing Hypothesis about a Population Proportion
(Large – sample Test for a Population Proportion P)
1. Null hypothesis:
One tailed: H o : P = Po
Two tailed: H o : P = Po
2. Alternative hypothesis:
One tailed test: H a : P > Po or H a : P < Po
Two tailed test: H a : P ≠ Po
3. Test statistic:
Pˆ − Po
Pˆ − Po
y
, where Pˆ =
=
z=
n
σ Pˆ
Po qo
n
4. Rejection region
z cal > z tab
Reject H o , Accept H a
α/2
α/2
Acceptance region
Rejection
region
- zα / 2
0
zα / 2
- 22 -
Rejection
region
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
α
Acceptance region
0
zα
Rejection
region
Note 1: the estimate P̂ takes the place of Y , the hypothesized value Po takes the place of
µ o and the standard error σ P takes the place of σ Y .
y
Note 2: Pˆ = Y =
n
Pˆ − Po
z=
⇒ σ Pˆ =
Pˆo (1 − Pˆ )
n
σ Pˆ
z=
Estimate of parameter − Null hypothesis value of parameter
S tan dard error of estimate
Example (9 – 12):
An organization is public set up for the purpose of banning smoking in public restaurants
in Greenvale Colorado. In an attempt to convince the city council of this support, they
conduct a statistical test of H o : P = 0.5 against H a : P > 0.5 , where P denotes the
proportion of adults in the community who support the ban on smoking. The H a
(alternative hypothesis) represents the claim of organization that this proportion exceeds
one – half. The null hypothesis contains the statements that this proportion is just one –
half. Suppose that out of a random sample of 25 adults in this community, 15 support the
smoking ban so the point estimate of P is:
‫ﻣﻨﻈﻤﺔ ﳌﻨﻊ ﺍﻟﺘﺪﺧﲔ ﰲ ﺍﻷﻣﺎﻛﻦ ﺍﻟﻌﺎﻣﺔ ﰲ ﻣﺪﻳﻨﺔ ﻣﺎ ﻭﰲ ﳏﺎﻭﻟﺔ ﻹﻗﻨﺎﻉ ﳎﻠﺲ ﺷﻴﻮﺥ ﺍﳌﺪﻳﻨﺔ ﻭﺫﻟﻚ ﻟﺴﻦ ﻗﺎﻧﻮﻥ ﳌﻨﻊ ﺍﻟﺘﺪﺧﲔ ﰲ ﺍﻷﻣﺎﻛﻦ ﺍﻟﻌﺎﻣﺔ‬
‫ ﺇﺫﺍ ﻋﻠﻤﺖ ﺃﻥ ﻫﺬﻩ ﺍﳌﻨﻈﻤﺔ ﻗﺪ ﺻﺮﺣﺖ ﺃﻣﺎﻡ ﳎﻠﺲ ﺍﻟﺸﻴﻮﺥ ﰲ‬،(P) ½ ‫ﻋﻠﻤﹰﺎ ﺑﺄﻥ ﻧﺴﺒﺔ ﺍﳌﺘﻮﺳﻂ ﺍﻟﻌﺎﻡ ﻟﻠﻨﺎﺱ ﺍﻟﺬﻳﻦ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻲ‬
15 ‫ ﻭﺗﺒﲔ ﺃﻥ‬،‫ ﻧﺎﺿﺞ‬25 ‫( ﻟﺪﻋﻢ ﺫﻟﻚ ﰎ ﺃﺧﺬ ﻋﻴﻨﺔ ﻣﻦ‬P > 0.5) ‫ﻫﺬﻩ ﺍﳌﺪﻳﻨﺔ ﺃﻥ ﻧﺴﺒﺔ ﺍﻟﺬﻳﻦ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻲ ﺃﻛﱪ ﻣﻦ ﺍﻟﻨﺼﻒ‬
- 23 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
5 ‫ ( ﻋﻨﺪ ﺩﺭﺟﺔ ﺛﻘﺔ‬H a : P > 0.5 ) ‫ (أو ﺍﻟﻔﺮﺽ ﺍﻟﺒﺪﻳﻞ‬H o : P = 0.5 ‫ﻣﻨﻬﻢ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻞ ﻧﻘﺒﻞ ﺑﺎﻟﻔﺮﺽ ﺍﻷﺳﺎﺳﻲ )ﺃﻥ‬
‫( ؟‬α) %
H o : P = 0.5
H a : P > 0.5
y 15
= 0.6
Pˆ = =
n 25
Now, 5/min ( Po , 1- Po ) = 5/min (0.5, 0.5) = 10
Whereas n = 25 exceeds 10, so the sampling distribution of P̂ is approximately normal
when H o is true. The assumptions are fulfilled for the large – sample test.
The standard error of P̂ when H o : P = 0.5 is true is:
Po (1 − Po )
(0.5)(0.5)
=
= 1.0
25
n
Pˆ − Po 0.6 − 0.5 0.1
=
=
=
=1
0.1
0.1
σ Pˆ
σ Pˆ =
z cal
P( zα / 2 ) = 0.5 − α = 0.45
zα / 2 = 1.645 (+ ve, zcal ‫)ﺗﺘﺒﻊ ﺇﺷﺎﺭﺓ‬
zcal < zα
⇒ Accept H o , reject H a
Confidence interval (in general) = Pˆ ± zα / 2 σ Pˆ
Pˆ qˆ
= Pˆ ± zα / 2 ×
n
Pˆ qˆ
Confidence interval (special case) = Pˆ ± zα ×
n
= 0.6 ± 1.645 ×
= 0.6 ± 0.1606
Pˆ ≥ 0.76
- 24 -
0.6 × 0.4
25
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Accept H o
Reject H o
z cal 1
0.6
1.645
0.76
z Scale
P̂ Scale
Example (9 – 13):
A hospital wants to test that 90 % of the dosages of a drug it purchases contain 100 mg
(1/1000 g) of the drug. To do this the hospital takes a sample of n = 100 dosages and
finds that only 85 of them contain the appropriate amount. How can the hospital test this
at:
a. α = 1%
b. α = 5%
c. α = 10%
a. This problem involves the binomial distribution. However, since n > 30 and nP
and n(1-P) > 5 we can use the normal distribution with P = 0.90 for the sample,
H o : P = 90%
H a : P ≠ 90%
y 85
= 0.85 And σ Pˆ =
Pˆ = =
n 100
Po (1 − Po )
=
n
(0.9)(0.1)
= 0.03
100
Since we are interested in finding if P ≤ or ≥ 90% we have H o : P = 90% and
H a : P ≠ 90% at the 1% level of significance lies with z tab ( ± 2.85 ) standard
deviation units
Pˆ − Po 0.85 − 0.9
Since z cal =
=
= −1.67 the hospital should accept H o , that P =
0.03
σ Pˆ
0.90 at the 1% level of significance.
- 25 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Rejection
Rejection
Accept H o
Ho
z tab = −2.58
Ho
z cal = 1.67
z tab = 2.58
Pˆ qˆ
(0.85)(0.15)
Confidence interval = Pˆ ± zα / 2 ×
= 0.85 ± 2.85 ×
n
100
b. At the 5% level of significance, the acceptance region for H o lies within ± 1.96
standard deviation units and thus the hospital should accept H o and reject H a at
the 95% level of confidence as well
Rejection
Rejection
Accept H o
Ho
z tab = −1.96
Ho
z cal = 1.67
z tab = 1.96
c. At the 10% level of significance, the acceptance region lies within ± 1.64
standard deviation units and thus the hospital should reject H o and accept H a
that P = 0.90
z cal > z tab ⇒ reject H o
- 26 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Rejection
Agricultural Statistic (605150)
Dr. Amer Salman
Rejection
Accept
Ho
Ho
Ho
z tab = −1.64
z tab = 1.64
z cal = 1.67
Note that larger values of α increase the reject region for H o (i.e. increase the
probability of acceptance of H a ). Furthermore, the greater the value of α (i.e. the
greater is the probability of rejecting H o when true), the smaller is P (the probability
of accepting a false hypothesis).
Example (9 – 14):
Find the probability of accepting H o for the previous example about the army
recruiting center if:
a. µ = µ o = 80 Kg.
b. µ = 82 Kg.
c. µ = 84 Kg.
d. µ = 85 Kg.
e. µ = 86 Kg.
f. µ = 87 Kg.
a. If µ = µ o = 80 Kg, X =85, σ =10kg and n = 25:
z=
X −µ
σX
=
X −µ
σ/ n
=
85 − 80
=
5
= 2.5
2
10 / 25
The probability of accepting H o when µ = µ o = 80 kg is 0.9938 (by looking up the
value of z = 2.5 in z – table and adding 0.5 to it). Therefore, the probability of H o
when H o is in fact true equals 1- 0.9938, or 0.0062.
b. If µ = 82 Kg instead,
z=
X −µ
σ/ n
=
85 − 82
10 / 25
=
3
= 1.5
2
- 27 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
Therefore, the probability of accepting H o when H o is false equals 0.9332 (by z =
1.5 and adding 0.5 to it).
c.
d.
e.
f.
If
If
If
If
µ = 84 Kg, z = (85-84)/2 = 0.5 and β = 0.6915.
µ = 85 Kg, z = (85-85)/2 = 0 and β = 0.5.
µ = 86 Kg, z = (85-86)/2 = -0.5 and β = 0.5 - 0.1915 = 0.3085.
µ = 87 Kg, z = (87-86)/2 = -1 and β = 0.5 - 0.3414 = 0.1587.
α
0.50-α
0.50
z
0
Rejection region
Acceptance region
a) Form of H a : <
0.50
0.50-α
α
z
0
Acceptance region
Rejection region
b) Form of H a : >
(a), (b): one tailed rejection regions for lower and upper tailed tests.
- 28 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
α/2
Agricultural Statistic (605150)
Dr. Amer Salman
0.50-α/2
0.50-α/2
α/2
0
Rejection
region
Rejection
region
Acceptance region
c) Form of H a :≠
(c): The two tailed rejection region.
The rejection regions corresponding to typical values selected for α are shown in the
following table for one and two tailed tests. Note that the smaller α you select, the more
evidence (the larger z) is required before you can reject H o .
Table (9 – 3): Rejection region for common values of α ( H o )
Alternative hypothesis
Lower – tailed Upper – tailed Two – tailed
α = 0.1 z < -1.28
z > 1.28
z < -1.645 or z > 1.645
α = 0.05 z < -1.645
z > 1.645
z < -1.96 or z > 1.96
α = 0.01 z < -2.33
z > 2.33
z < -2.575 or z > 2.575
Example (9 – 15):
A manufacturer of cereal wants to test the performance of its filling machine. The
machine is designed to discharge a mean amount of µ ounces per box and the
manufacturer wants to detect any departure from this setting. The quantity control
experiments call for sampling 100 boxes to determine whether the machine is performing
to specifications. Set up a test of hypothesis for this quality control experiment using
α = 0.01, X = 11.85, s = 0.5
Solution:
Since the manufacturer wishes to detect a departure from the setting of µ = 12 in either
direction; µ < 12, µ > 12 will conduct a two tailed test.
- 29 -
University of Jordan
Faculty of Agriculture
Dept. of Agri. Econ. & Agribusiness
Agricultural Statistic (605150)
Dr. Amer Salman
H o : µ = 12
H a : µ ≠ 12 (i.e. µ < 12, µ > 12)
Test statistic: z =
X − 12
σX
α = 0.01 so α/2 = 0.005 is placed in each tail. This area in the tails corresponds to
z = -2.575 and 2.575.
Rejection region: (z < -2.575 or z > 2.575)
If the sampling experiment in the rejection region of H o , the manufacturer can be 99 %
confident that the machine needs adjustment.
Ha
Ho
α/2=0.005
Ha
α/2=0.005
Acceptance region
-2.575
2.575
Rejection
region
Rejection
region
Example (9 – 16):
Reefing to the previous example; if n = 100, X = 11.85 and s = 0.5 ounce
X − 12 11.85 − 12
z=
=
= −3.0
σX
0.5 / 100
You can see from the figure above that the value of z = -3.0 is less than -2.575 which
provides an evidence to reject the H o and conclude at the α = 0.01 level of significance,
that the mean fill differs from the specification of µ = 12 ounce. It appears that the
machine is, on average, under filling the boxes.
- 30 -
Related documents