Download likelihood ratio test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LECTURE 7
8. Hypothesis Testing
8.1 Introduction
Exercise 8.1 In 1000 tosses of a coin, 560 heads and 440
tails appear. Is it reasonable to assume that the coin is fair?
We need a model of the experiment. Let X ∼ Bin(1000,p)
denote the number of heads in 1000 throws and p the
probability of obtaining head in one throw.
Then we need a hypothesis about p corresponding to the
coin being fair.
Definition 8.1.1 A hypothesis is a statement
about a population parameter.
The goal of a hypothesis test is to decide which of two
complementary hypotheses is true.
Definition 8.1.2 The two complementary
hypotheses in a hypothesis testing problem are
called the null hypothesis and the alternative
hypothesis, denoted by H0 and H1, respectively.
In our example we put
H0: p = 1/2
H1: p > 1/2
How do we decide which hypothesis to hold for true?
1
LECTURE 7
Definition 8.1.3 A hypothesis test is a rule that
specifies:
i. For which sample values the decision is made to
accept H0 as true.
ii. For which sample values H0 is rejected and H1 is
accepted as true.
The subset of the sample space for which H0 will be
rejected is called the rejection region or critical
region. The complement of the rejection region is
called the acceptance region.
A hypothesis test is specified in terms of a test statistic
W(X1,…,Xn) = W(X), a function of the sample.
In our example, such a test statistic could be finding the
probability of the observed number of heads, and more extreme
numbers, i.e.
x
1000  x
1000  1   1 
P(X ≥ 560) =  
   
x
x 560 
 2   2 
1000
≈ 0.0000825
Using the normal approximation we have X ∼ n(500, 250) giving
 X  500 559.5  500 

 ≈ P(Z ≥ 3.763) ≈
250 
 250
P(X ≥ 560) = P 
0.0000839
2
LECTURE 7
Histogram: 1000 obs from Bin(1000,0.5)
140
120
Frequency
100
80
60
40
20
0
460
480
500
520
540
560
C2
Histogram of C6
2500
Frequency
2000
1500
1000
500
0
420
440
460
480
500
520
540
560
C6
Histogram 100000 obs from Bin(1000,0.5)
2500
Frequency
2000
1500
1000
500
0
432
450
468
486
504
C2
3
522
540
558
LECTURE 7
8.2 Methods of Finding Tests
8.2.1 Likelihood Ratio Tests
We are finding a test statistic using the ratio between the
value of the likelihood function when maximized under the
null hypothesis and maximized under no restriction with
respect to the parameter θ.
If this ratio is very small it means that the likelihood under
H0 is much smaller than the likelihood under H1 and we
would reject our null hypothesis. The critical limit is
determined by specifying the significance level of the test.
Definition 8.2.1 The likelihood ratio test statistic
for testing H0: θ ∈ 0 versus H1: θ ∈  c0 is
sup L( | x)
λ(x) =
0
sup L( | x)
.

A likelihood ratio test (LRT) is any test that has
a rejection region of the form {x: λ(x)≤c},
where c is any number satisfying 0 ≤ c ≤ 1.
“If the upper bound M of a set E belongs to that set then M is called
the maximum element” (answers.yahoo.com)
4
LECTURE 7
Example 8.2.2 (Normal LRT)
Given: X1,…,Xn random sample from a n(ϴ,1) population.
Task:
Test H0: ϴ = ϴ0 versus H1: ϴ ≠ ϴ0.
Solution: The numerator equals L(ϴ0|x).
The unrestricted MLE of ϴ is X , the sample mean.
The denominator is L( x |x).
The LRT statistic is
 n

(2 )
exp  ( xi   0 ) 2 / 2
 i 1
=
λ(x) =
 n

n / 2
(2 )
exp  ( xi  x ) 2 / 2
 i 1

n / 2

n
 n
 
2
exp    ( xi  0 )   ( xi  x )2  / 2 = exp  n( x  0 ) 2 / 2
 
i 1
 i 1
We reject H0 for small values of λ(x), or, equivalently, for
large values of x  0 .
5

LECTURE 7
Example 8.2.3 (Exponential LRT)
X1,…,Xn random sample from an exponential
population
Given:
e  ( x  ) , x  
f(x|θ) = 
where -  < θ < 
x 
0,
The likelihood function is
  xi  n )

,   x(1)
e
L(θ|x) = 
  x(1)

 0,
(x(1) = min xi)
Consider testing H0: θ ≤ θ0 versus H1: θ > θ0.
The denominator in the likelihood ratio is maximized
when θ is as large as possible in the interval -  < θ ≤ x(1).
This gives L(x(1)|x) = e
  xi  nx(1)
If x(1) is ≤ θ0, then the numerator of λ(x) is also L(x(1)|x).
If x(1) is ≥ θ0, then the numerator of λ(x) is L(θ0|x).
Therefore we obtain
x(1)   0
1,
λ(x) =   n( x(1)  0 )
, x(1)   0
e
We reject H0 for small values of λ(x), which implies a
rejection region {x: x(1) ≥ θ0 – log(c)/n}.
Note that the rejection region depends on the sample only
through the sufficient statistic X(1).
6
LECTURE 7
If there is a sufficient statistic T(X) for θ, then we can
determine an LRT based on T and its likelihood function
L*( θ|t) = g(t|θ) rather than L(θ|x).
Given that all information about θ is in T(X) such a test
should be as good as the test based on X. In fact it is!
Theorem 8.2.4 If T(X) is a sufficient statistic for
θ and λ*(t) and λ(x) are the LRT statistics based
on T and X, respectively, then λ*(T(x)) = λ(x)
for every x in the sample space.
Proof:
The Factorization Theorem ⇒ f(x|θ) = g(T(x)|θ)h(x).
sup f ( x |  )
sup L( | x)
λ(x) =
0
sup L( | x)

=
0
sup f ( x |  )

0
sup g (T ( x) |  )


0
sup g (T ( x) |  )h( x)

sup L * ( | T ( x))
sup g (T ( x) |  )
=
sup g (T ( x) |  )h( x)

0
sup L * ( | T ( x))

7
  * (T ( x))
LECTURE 7
Example 8.2.5 (LRT and sufficiency)
Suppose X ∼ n(θ,1). X is sufficient for θ and is n(θ,1/n). An
LRT based on this statistic would be obtained from
λ*(t) =
n
exp(n( x   0 ) 2 / 2)
2
 exp(n( x   0 ) 2 / 2)
n
exp(n( x  x ) 2 / 2)
2
and H0: ϴ=ϴ0 is rejected for small values of λ*(t), which is
equivalent to large values of | x -ϴ0|.
In statistics, a nuisance parameter is any parameter which is not of
immediate interest but which must be accounted for in the analysis
of those parameters which are of interest. The classic example of a
nuisance parameter is the variance, σ2, of a normal distribution,
when the mean, μ, is of primary interest. (Wikipedia)
8
LECTURE 7
Example 8.2.6 (Normal LRT with unknown variance)
Given: X1,…,Xn with Xi ∼ n(μ,σ2)
Hypotheses: H0: μ≤ μ0 versus H1: μ > μ0.
σ2 is a nuisance parameter.
L(  , 2 | x)
max
λ(x) =
{ , 2:    0 , 2  0}
L(  , | x)
2
max
=
{ , 2:     , 2  0}
if ˆ   0
1

=  L(  0 ,ˆ 02 | x)
if ˆ   0
 L( ˆ ,ˆ 2 | x)

This leads to a test equivalent to Student’s t statistic test.
9
LECTURE 7
8.2.3 Union-Intersection and Intersection-Union Tests
Example 8.2.8 (Normal union-intersection test)
Given: X1,…,Xn from n(μ,σ2).
Test:
H0: μ=μ0 versus H1: μ≠μ0 .
Now we can write H0 as the intersection of two sets:
H0: {μ:μ≤ μ0} ⋂ {μ:μ≥ μ0}.
The LRT of H0L: μ≤ μ0 versus H1L: μ> μ0 is
X  0
≥ tL
S/ n
reject H0L if
The LRT of H0U: μ≥ μ0 versus H1L: μ< μ0 is
reject H0U if
X  0
≤ tU.
S/ n
Combining the two tests the union-intersection test of H0:
μ=μ0 versus H1: μ≠μ0 is
reject H0 if
X  0
X  0
≥ tL or
≤ tU.
S/ n
S/ n
10
LECTURE 7
So, in general, if our null hypothesis can be written as the
intersection H0: θ ∈   and tests are available for each
 
H0γ: θ ∈ Θγ, then the rejection region for the U-I test is
 x : T (x)  R .
 
Alternatively, if our null hypothesis can be written as the
union of separate hypothesis, H0: θ ∈   the rejection
 
region of the intersection-union test is given by
 x : T (x)  R .
 
11
LECTURE 7
Example 8.2.9 (Acceptance sampling)
Considering the quality of upholstery, two properties are
of importance, measured by ϴ1 = mean breaking strength,
and ϴ2 = probability of passing a flammability test.
H0: {ϴ1 ≤ 50 or ϴ2 ≤ .95} versus H1: {ϴ1 > 50 and ϴ2 > .95}
A batch of material is acceptable only if H0 is rejected.
Obs. on breaking strength: X1,…,Xn assumed to be n(ϴ1,σ2)
The LRT of H01: ϴ1 ≤ 50 is rejected if ( X  50) /(S / n )  t .
Obs. on flammability tests: Y1,…,Ym with Yi=1 if it passes
the test, Yi=0 otherwise. Each Yi is modeled Bernoulli(ϴ2)
m
The LRT of H02: ϴ2≤.95 is rejected if  yi  b .
i 1
Combining the two tests, the intersection-union test is
given by
m
x  50


 t and  yi  b
(x, y ) :
s/ n


i 1
12
LECTURE 7
Exercise 8.2
No of yearly traffic accidents in a city, X, is assumed to be
Poisson(λ).
Average number in past years, λ = 15.
This year X = 10.
Does it indicate a drop in the accident rate?
Find the probability of {X≤10| λ = 15}!
15i 15
P{X≤10| λ = 15} = 
e ≈.11846.
i  0 i!
10
Using the normal approximation X ≈ n(15,15) gives
 X  15 10.5  15 
P{X≤10}=P 

 =Φ(-1.1619) =.12264
15 
 15
13
LECTURE 7
Exercise 8.6
Two independent samples:
X1,…,Xn ∼ exponential(ϴ) and Y1,…Ym ∼ exponential(μ)
(a) Find the LRT test of H0: ϴ=μ versus H1: ϴ≠μ.
The likelihood function under H0, is given by
n
n
1
xi

m
1

 xi   yi
yi
 i 1
1
L(ϴ|x,y) =  e   e  =
i 1
m
i 1

nm

e
i 1
Taking logarithms and differentiating gives the MLE:
 x   yi
ˆ0  i
nm
which we insert into the likelihood function:
L(ˆ0 |x,y) =
( n  m) n  m
n
m
i 1
i 1
(  xi   yi ) n  m
e( n  m)
The likelihood function under H1 is given by
n
n
y
 i
x
 i m
1
1
L(ϴ,μ|x,y) =  e   e  =
i 1
i 1 
 xi
m
 yi
 i 1  i 1
1
 n

e
m

Taking logarithms and differentiating give the MLE´s
m
n
 xi
 yi
ˆ = i 1 and ̂  i 1
n
m
14
LECTURE 7
which we insert into the likelihood function
L(ˆ , ̂ |x,y) =
nn
mm
( xi ) n ( yi )
( n  m)
e
m
The likelihood ratio is
(n  m) n  m  ( xi ) n ( yi ) m 
λ(x,y) =

n m
m n 


n m
 ( xi   yi )
   yi

(n  m) n  m   xi
(b) λ(x,y) =



 =
n m
x

y
x

y




n m
 i
i 
i
i
n
( n  m) n  m
nnmm
T n 1  T m where T =
m
 xi
 xi   yi
(c) The sum of n independent exponentially distributed
variables Xi with parameter ϴ is gamma(n,ϴ) and the
sum of m independently distributed variables Yi is
gamma(m, ϴ). So T is beta(n,m).
15
LECTURE 7
Derived from other distributions (Wikipedia)

The kth order statistic of a sample of size n from the uniform
distribution is a beta random variable, U(k)∼B(k,n + 1 − k).[6]
If
and
then
If
and
then

If


and
then
.

E(X) = kθ Var(X) = kθ2

If
, then X has an exponential distribution with
rate parameter λ.


If
, then X is identical to χ2(ν), the chi-squared
distribution with ν degrees of freedom. Conversely, if
and c
is a positive constant, then
.


If is an integer, the gamma distribution is an Erlang distribution
and is the probability distribution of the waiting time until the -th
"arrival" in a one-dimensional Poisson process with intensity 1/θ.
If
and
, then
.


If X has a Maxwell-Boltzmann distribution with parameter a, then
.


, then
follows a generalized gamma distribution with
parameters
,
, and
.


, then
; i.e. an exponential
distribution: see skew-logistic distribution.
(Wikipedia)
16
Related documents