Download Model Estimation with Inequality Constraints: A Bayesian Approach Using SAS/IMLSoftware

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Model Estimation With Inequality Constraints: A Bayesian Approach Using SAS/IML" Software
Charlie Hallahan
Economic Research Service, USDA
will be discussed further below.
Abstract
Let "(j[) be the prior probability distribution
density of fl. and LWY) the likelihood function
given the sample y. Then by Bayes' Theorem,
Economic models derived from economic .
theory are generally acompanied with various
sets of equality and inequality restrictions.
For example, in an economic relationship
between consumption,c, and income, y , c.
= a + Pv. + et' the coefficient p represents
the marginal propensity to consume and
should lie in the interval [0,1). Treating the
parameters as fixed and using constrained
least squares, estimates can be obtained, but
the complex distributional properties of the
estimates complicate inference. The Bayesian
approach treats the parameters as random
variables whose prior distribution incorporates
the inequality restrictions. The steps
necessary to obtain Bayesian estimates via
Monte Carlo integration will be discussed and
illustrated with IML programs.
(1)
p (Ji/ X) = L (Ji/ X) n (Ji)
fIx)
or p(fl./y) a: L(fl./y) "(j[), where fly) is the
marginal density of y and p(fl./y) is the
posterior density of 8.. The uncertainty in /J.
before the sample is drawn is summarized by
"(j[), and p (fl./y) represents the uncertainty in
fl. after the sample is drawn. A Bayesian
point estimate of fl. can be obtained by
defining a quadratic loss function and finding
the value of 8. that minimizes expected loss
(under the posterior distribution). The optimal
estimate under a quadratic loss funcion is the
mean of the posterior distribution.
For the simple consumption function
example, assuming f - N(Q,I), the likelihood
will have the form
Bayesian Estimation of Regression Models
Suppose that the parameters, fl., in the
regression model y = X/l. + ~ must satisfy
certain inequality restrictions. Following the
classical frequentist approach, a constrained
least squares estimator could be obtained by
solving a quadra,tic programming problem. A
drawback to this. estimator is that its
distributional properties are not straightforward, making inference difficult.
Writing y. - x,P = y. - x,P0 + x,P0_ x,P,
expanding and ignoring terms not involving p,
we get
(3)
L(Ji/Y)
~ e--'-.l
,A.1. 2 t x; (I! _I!O)')
t=l
An alternative approach is JO treat fl. as a
random variable and apply Bayesian
techniques. The article by Griffiths [1988) is
a good intuitive discussion of applying
Bayesian methods to impose inequality
restrictions 01) regression models. A simple
example is given there of a consumption
function, y. = p x. + f., where y is
consumption and x is income. The parameter
p is called the long-run marginal oropensitv to
consume (or mpc), and economic theory
states that p should be between 0 and 1.
More complicated models, yet similar in
prillciple, are treated by Chalfant, Gray and
White [1991), Fernandez-Cornejo [1992), and
Geweke [1986, 1988,1989). These examples
where po is the OLS estimate of p. The
posterior density for the mpc is then
(4)
pIP/x)
n(l!)
~
e~-~
t,x; (I!_PO)')
The specific form of the posterior depends on
the prior, "(jJ). The prior could be categorized
as f o l l o w s : ·
1043
P - N(.85,.004), which
is roughly equivalent to saying that
P(.75 < P < .95) = 0.9.
n, is a sequence of independent draws from
the posterior density, p(f1/.ll) =
L(fl/.llln(flli f eL(f1I.llln(flldf1, it follows by the
strong law of large numbers that
(i) informative: eg.,
(ii) non-informative: eg., nIP) = 1 for
-oo<P<oo. This is an improper density.
(iii) inequalitv restriction: eg., n(fJ)
o oS P oS 1, uniform density.
=
(7 )
1 for
where ... is almost sure convergence.
For the mpc, a prior of type (iii) seems
appropriate. The posterior becomes a
truncated normal distribution in this case. In
general, priors reflecting inequality
restrictions lead to truncated distributions for
the posterior. The next step is to find the
mean of the posterior distribution. We can
also ask such questions as: What is the
probability that P oS .95? Each of these
questions leads to integrals of the form:
When g (f1) = 8;, one of the components of f1,
then then we are just estimating Ep[8j J by the
sample mean of a random sample from the
posterior distribution.
Suppose we Want to estimate the probability
that.l1 f D, some region of the parameter
space. Let gl./1) be an indicator function equal
to 1 if fl f D, and 0 otherwise. Then the
integral
1
(5)
E[P/X]
=
fpp(PIx) dP
o
.• s
P(O~P~ .95) = f pIP/x)
dP
is Probl./1
o
In most higher dimensional cases, such
integrals cannot be evaluated analytically or
approximated numerically and must be
estimated using Monte Carlo integration.
Equation (7) states that the Monte Carlo
estimate will converge to the correct limit,
but doesn't indicate the rate of convergence
or how accurate the approximation is. To do
that, a centrallimit'theorem is needed.
The principle behind Monte Carlo integration
is as follows:
To evaluate the integral f g(x)f(x) dx, where
fix) is a (possibly multivariate) density
function, we note that f g(x)f(x) dx =
E.[g(x)), the expectation of g(x) under the fdistribution. (A subscript on the expectation
operator refers to the distribution with
respect to which the expectation is being
taken.) If we can make n draws {x.} from the
f-distribution, then
n g(Xl)
-1 ~
nf;1
=f g(x) fIx)
D).
By making random draws from the posterior
and counting the number of draws that
satisfy ~ f D,say n" we can use n,/n as the
estimate.
Monte Carlo Integration
(6 )
f
If var[g(fl)], the variance of g(fl) under the
posterior distribution, exists, then a central
limit theorem applies to show that
(9)
rn (gn -g) -N(.ll., vax [g(lI) ]
where .. is convergence in distribution.
dx
Since var[g.!f1)) = Ep[g.!f1)2] - Ep[g.!f1l1', Monte
Carlo integration can also be used to estimate
the variance.
To apply these ideas in the Bayesian
estimation setting, let fl be the parameter
vector.fora model with likelihood function
L(f1/l£l and prior density n(f1) and let g(f1) be
any function of interest. Then if {~}, i = 1 to
When random sampling from the posterior is
not possible, the next best alternative may be
to sample from the likelihood density,
L(.I11.lll1 f eL(f1I.llld/!. Let c = f eL(f1I.llldf1, the
1044
normalizing constant for the likelihood kernel.
Then we can write .
(13) Ib(X) f(x) dx =lh(X) f(x) I(x) dx
I (x)
=IW(X) I(x) dx
(10)
Ep(g(.Il)
I
=
I
g(.Il) p(.Il!JC)
dO
=EI[W(x)
I
Ig(.Il)~(.Il)L(.Il/X)dB
I ~ (.Il)
I
g(.Il)
L (.Illx)
~ (.Il)
I ~ (.Il)
L (.Il:X)
L (.Il:x)
EL (g(.Il) ~ (.Il)
EL[~ (.Il) I
Therefore,
dB
(14)
dB
dB
ibex) f(x) dx =
.! f
n t:!
b(x,) f(x,
I(x,)
where the X; are random draws from the 1distribution.
I
Let I(~ be the density from which the random
draws {~} are made and define the weight
function w(~J = L(tlll£l"(~/I(tl). Then the
Monte Carlo approximation to Ep [g@J is
Note that as soon as we sample from a
distribution other than the posterior, we must
estimate a ratio of integrals.
Given {~}, i = 1 to n, a sequence ot
independent draws from the likelihood
density, it can be shown that
(15)
g" =
"
~
(g(lh) w(lh) )
"
~ w(lh)
t:!
Simple Monte Carlo integration, as described
above, has either I@ ex: L@ with w~) =
k· "(tl) or I(~) ex: L(~)"@ with w~) = k.
(11)
The main theorem in Geweke (1989) provides
a way to measure the numerical accuracy of
the Monte carlo approximation.
and that
(12)
Iii (ga:- g) -NUl., vax (g(.Il)
Theorem (Geweke) Suppose E,[w(tlll < co
and Ep[g~)2w@1 < co. Let
~ (8) )
a"=Ep(g(.Il) -g)'w(.Il)] ,
where vax ('g (8) ~ (8»
represents the
variance of g~)"~1 under the likelihood
distribution.
Importance Sampling
When it is not possible to draw from either
the likelihood <>rposterior distributions, draws
are made from another density called the
importance density. This works because of
the simple identity
Then
n"'(g,,-g) -
NUl, a2 )
,
f
I
I
I
,.,.t
8" • (8,,2) '" is called the numerical
. standard error Inse) of g" and measures the
1045
numerical accuracy of the estimate g.. the
definition of the nse shows that it is sensitive
to var[g(~)) under the posterior (which is not
in the control of the statistician) and the size
of the weights, w@ (which is controllable by
choice of an importance density).
is a draw from, I!ay, a multivariate normal or t
distribution. The corresponding antithetic
draw is then jl - z.!
As a standard of reference, we could use the
nse that would occur from using the posterior
itself as the Monte Carlo sampling
distribution, When I~) = p~), then
var[g(~)) and the nse from np Monte Carlo
draws would be var[g(mll np. For example,
10,000 draws would reduce the nse to 1 %
of the variance of g(m under the posterior.
The series of papers by Geweke contain a
numbeF,of interesting examples applying the
above techniques. For example:
Some Examples
cr=
Geweke (1986): Three regression examples
are presented with various kinds of inequality
restrictions ranging from straight-forward sign
constraints, such as P2 > P. or P. > 0, to
nonlinear restrictions on an autoregressive
model. The restriction that an AR model be
stable requires that the roots of the
associated lag polynomial have a.llits roots
outside the unit circle.•. The SASIIML function
polyroot is exactly what is needed to check
each draw to see if the restriction holds.
We could ask the question: if another densi'ty
I(ll) is used as the sampling density, how. .
many iterati()ns, n" would it take to achieve
the samense, i:e., set var[g@l/np ,d'
or
crtn,
vax [g(a)]
02
= np . Geweke calls this ratio
1') I
the relative numeric efficiency, RNE. The RNE
can be interpreted as the ratio of the number
of Monte Carlo draws from the posterior to
the number of draws from an importance
density to achieve the same nse. A RNE close
to 1 indicates an importance density as
efficient as the posterior. Geweke has shown .
that non-random sampling las discussed in
the next section) can result in RNE's close to
100.
Geweke (1988): The means of predictive
densities in a vector autoregression are
calculated.
Geweke (1989): Markov chain and ARCH
models are studied.
The paper by Chalfant, Gray and White
[1 9911 uses the above describell techniques
to estimate a system of demand equations.
Economic theory imposes a set of equality
(symmetry, homogeneity) and inequality
(monotonicity, concavity and substitutability)
restrictions on the parameters in the system.
The equality restrictions are easily handled by
direct substitution. The discussion below will
describe in . general terms ,the kinds of
quantities that ari$e in their .model and prior
and to show how SASIIML can do the
necessary calculations.
Antithetic Sampling
So far we have talked about drawing
independent samples from a distribution.
When we sample from the posterior and our
estimate is g., as in equation (7), then
vax [g.] = vax [g(a) ] / n . Antithetic
sampling consists of drawing in correlated
pairs with the goal'of decreasing vax [g.] .
To see how this would work, suppose that X
and XO are draws from a distribution with
The usual estimate
mean p and variance
ofpispo = }4(X+X°l.lfXandXoare
independent, then var(p°) = }4cr. In general,
var(p°) = J4 (var(X) + 2cov(X,XO) + var(XO))
= }4cr(1 + corr(X,XO)): Thus if X and XO are
negatively correlated, the var(p°) can be
reduced.
Let fl be the vector of parameters in the
systelJl and suppose ~ £ Rk, where R = the
real numbers. The··inequality restrictions can
be e;'pressed as· fl £ 0 !;; Rk, for some subset
o of Rk. The (lnequality restriction) prior
would be"@ ex: c, V ~ £ 0, cconstant, i.e.
the prior information is that ~ must satisfy the
inequality restrictions, but is otherwise
uninformative.
cr.
In many Bayesian estimation problems the
likelihood is symmetric about an MLE
estimate ji . and a random draw is typically of
the form $I. + Z. i where ~
The system of equations is estimated using
Zellner's Seemingly Unrelated Regression
(SUR) technique, where the error vector f is
1046
assumed to be distributed as a multivariate
normal, f - N(Q,I). A normal likelihood,
diffuse prior on I, and inequality prior on ~
results in a (marginal) posterior density for ~
of the form:
software.
Step 2: Construct the multivariate t
distribution function to serve as the
importance density. In the following IML
code, the SAS data set' ets. theta' contains
the output from Proc Model. The first
observation is II and the remaining
observations are the rows of V(II). Since ~
has 15 elements in this example, 'ets.theta'
has 16 observations. The first section of code
brings the output from Proc Model into IML.
where T '" the number of observations and
A = (a;;),
and .§;lID is the Tx1 vector of residuals for the
i"' equation in the system. Note that p~/:t) is
a truncated distrubution since "lID = for ~
E D. Since p~/y) is not a "familiar" density,
a random sample cannot be drawn from the
posterior distribution and a truncated
multivariate t-distribution is used as an
importance density..
+ read estimates saved from Proc Model;
libname ets 'mydir';
+ 1 st read theta and transpose;
use mydir.theta var{pl p2 ... p15};
read point 1 into theta;
theta = theta';
+ next read covariance matrix;
read point (2: 16) into cov;
+ find Cholesky factor of cov;
H = root(cov);
+ save matrices in storage catalog;
reset storage = 'mylib.unconstr';
store theta H;
°
The following two questions can now be
addressed:
(1) What is the probability of ~ satisfying the
inequality restrictions? i.e., evaluate the
integral
(16)
P (fIE D) =
.•
f
Now that we have a matrix H such that V(~)
= H' H', we can construct the multivariate t
with A degrees of freedom. Antithetic draws
are used as suggested by Geweke (1988).
L (Jlh) I (JI) dll
leD
I (JI)
assuming a diffuse prior on
+ set degrees of freedom for this problem;
df = 4;
+ 1st draw multivariate normal, size of
theta;
nparms = nrow(theta);
norml = normaI(J(nparms,l ,0));
+ next draw multivariate normal, size of
df;
norm2 = normaI(J(df,1 ,0));
+ construct chi-square usingnorm2;
chisqr = (norm2'+norm2);
+ now put pieces together;
add = H+norml/(sqrt(chisqr/df));
+ get random draw from importance distn;
theta 1 = theta + add;
+ get antithetic draw;
theta2 = theta - add;
~.
(2) What is the mean of the posterior
distribution? i.e., evaluate the integral
(17 ) Ep(Jl./da tal·
=
~
f JI
1t
f JI
j.D
L (ldJl) I (JI) dll
I(JI)
(JI)
L (Jlh) I (JI)
I (JI)
dll
The last equation follows since "~) is just the
indicator function fbhheset D.
An outline of the algorithm to answer these
questions, along with the accompanying IML
code, is given next;
The above statements would appear in a loop
where a total of N draws would be made.
Step 1: Esimate the unrestricted model by
iterated SUR to obtain the maximum
likelihood estimates ~ and V (~). Since there is
no need to reinvent the wheel in SAS/IML,
this can be done with Proc Model in SAS/ETS
Step 3. With each replication in the above
loop, check if the random draws ~, and
satisfy the inequality restrictions. Using the
"successful" draws, say there are n out of a
u..
1047
total of N, we can estimate the posterior
mean and probability of the restrictions
holding by:
(18)
Two things to note about the above code.
First, SAS/IML being a matrix language,
there is a built-in function to find eigenvalues
of (symmetric) matrices. Second, the variable
concave is defined by a logical condition,
eval[ < > J < = .000000001. The expression
eval[ < > J selects the maximum element of
the matrix eval. If eval[ < > J is non-positive
(allowing for some round-off error), then the
value of concave is 1, else concave = O.
Thus the variable count is only incremented
when the inequality restriction holds for the
L(J!../X)
Define w(J!..)
I (J!..)
a: =
a
~ J!..w(J!..)
a
~ w(J!..)
a
P(llE D)
drawn~.
L W(J!..)
1«1
N
k
w(J!..)
Conclusion
The denominators in the above expressions
are normalizing factors to account for the fact
that we did not carry along constants with
the various density functions.
SAS/lML provides all the necessary tools to
implement Monte Carlo integration , with
importance and antithetic sampling, in order
to estimate posterior means and various
probabilities associated with Bayesian
inference. The series of papers by Geweke
provide a good summary of these procedures.
Step 4. The next task with eacn draw fl. is
to evaluate the posterior and the importance
densities at that value of ~. Assuming the
Txq matrix errs (T = number of observations
and q = number of equations estimated) has
already been defined to be the errors for the
system of equations in the model after the
drawn fl. is substituted into thfi! equations, we
have:
References
[1 J Chalfant,J. R.Gray and K.White,
'Evaluating Prior Beliefs in a Demand
System: The Case of Meat Demand in
Canada', May 1991, Amer. J. Agr. Econ.,
476-490.
[2J Fernandez-Cornejo, Jorge, 'Short- and
Long-Run Demand and Substitution of
Agricultural Inputs' ,April 1992, NJARE,
Vol 21, No 1,36-49.
[3J Geweke,J., 'Exact Inference in the
Inequality Constrained Normal Linear
Regression Model', 1986, J. Applied
Econometrics, Vol.18, 127-141.
[4J Geweke,J., 'Antithetic Acceleration of
Monte Carlo .Integration in Bayesian
Inference', 1988,J. of Econometrics, Vol.
38,73-89.
[5J Geweke,J., 'Bayesian Inference in
Econometric Models. Using Monte Carlo
Integration', 1989, Econometrica, Vol.57,
1317-1339.
[6J Geweke,J.,'Generic, Algorithmic
Approaches to Monte Carlo Integrati9n in
Bayesian Inference', 1991, Contemporary
Mathematics, Vol 115, 117-135.
[7J Griffiths,W., 'Bayesian Econometrics and
How to Get Rid of Tho.se Wrong Signs',
April 1988, Rev. of Mkt. and Ag. Econ.,
Vol 56, Nol, 36-56.
• define matrix A used for posterior;
A = errs' ·errs;
• marginal posterior pdf for theta;
fpost = 1/(det(A))·· (nobs/2);
• multivariate t, importance pdf;
ft = 1/(df +
add' *inv(cov) *add) * * «df + nparm)/2);
Step 5. The last calculation to make for each
iteration is to check whether or not the draw,
fl., is "successful" or not, i.e., does fl. satisfy
the inequality restrictions of~ E D. In this
example "concavity" means that the
eigenvalues of a certain matrix are negative.
Assuming the- matrix of interest is called
submat and has already been constructed,
then:
• find eigenvalues of submat;
eval = eigval(submat);
* check if all eigenvalues negative;
concave = (eval[ < > J < = .000000001);
• count successful draws;
count = count + concave;
1048
[81 Kloek,T. and H.K. van Dijk:Sayesian
Estimates of Equation Systems: An
Application of Integration by Monte
Carlo', 1978, Econometrica, Vol 46, 1-
20.
[101 SAS/IML Software: Usage and
Reference, Version 6, First Edition, 1990
SAS, SAS/ETS,and SAS/IML are registered
trademarks of SAS Institute Inc., Cary NC,
USA
1049