Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Model Estimation With Inequality Constraints: A Bayesian Approach Using SAS/IML" Software Charlie Hallahan Economic Research Service, USDA will be discussed further below. Abstract Let "(j[) be the prior probability distribution density of fl. and LWY) the likelihood function given the sample y. Then by Bayes' Theorem, Economic models derived from economic . theory are generally acompanied with various sets of equality and inequality restrictions. For example, in an economic relationship between consumption,c, and income, y , c. = a + Pv. + et' the coefficient p represents the marginal propensity to consume and should lie in the interval [0,1). Treating the parameters as fixed and using constrained least squares, estimates can be obtained, but the complex distributional properties of the estimates complicate inference. The Bayesian approach treats the parameters as random variables whose prior distribution incorporates the inequality restrictions. The steps necessary to obtain Bayesian estimates via Monte Carlo integration will be discussed and illustrated with IML programs. (1) p (Ji/ X) = L (Ji/ X) n (Ji) fIx) or p(fl./y) a: L(fl./y) "(j[), where fly) is the marginal density of y and p(fl./y) is the posterior density of 8.. The uncertainty in /J. before the sample is drawn is summarized by "(j[), and p (fl./y) represents the uncertainty in fl. after the sample is drawn. A Bayesian point estimate of fl. can be obtained by defining a quadratic loss function and finding the value of 8. that minimizes expected loss (under the posterior distribution). The optimal estimate under a quadratic loss funcion is the mean of the posterior distribution. For the simple consumption function example, assuming f - N(Q,I), the likelihood will have the form Bayesian Estimation of Regression Models Suppose that the parameters, fl., in the regression model y = X/l. + ~ must satisfy certain inequality restrictions. Following the classical frequentist approach, a constrained least squares estimator could be obtained by solving a quadra,tic programming problem. A drawback to this. estimator is that its distributional properties are not straightforward, making inference difficult. Writing y. - x,P = y. - x,P0 + x,P0_ x,P, expanding and ignoring terms not involving p, we get (3) L(Ji/Y) ~ e--'-.l ,A.1. 2 t x; (I! _I!O)') t=l An alternative approach is JO treat fl. as a random variable and apply Bayesian techniques. The article by Griffiths [1988) is a good intuitive discussion of applying Bayesian methods to impose inequality restrictions 01) regression models. A simple example is given there of a consumption function, y. = p x. + f., where y is consumption and x is income. The parameter p is called the long-run marginal oropensitv to consume (or mpc), and economic theory states that p should be between 0 and 1. More complicated models, yet similar in prillciple, are treated by Chalfant, Gray and White [1991), Fernandez-Cornejo [1992), and Geweke [1986, 1988,1989). These examples where po is the OLS estimate of p. The posterior density for the mpc is then (4) pIP/x) n(l!) ~ e~-~ t,x; (I!_PO)') The specific form of the posterior depends on the prior, "(jJ). The prior could be categorized as f o l l o w s : · 1043 P - N(.85,.004), which is roughly equivalent to saying that P(.75 < P < .95) = 0.9. n, is a sequence of independent draws from the posterior density, p(f1/.ll) = L(fl/.llln(flli f eL(f1I.llln(flldf1, it follows by the strong law of large numbers that (i) informative: eg., (ii) non-informative: eg., nIP) = 1 for -oo<P<oo. This is an improper density. (iii) inequalitv restriction: eg., n(fJ) o oS P oS 1, uniform density. = (7 ) 1 for where ... is almost sure convergence. For the mpc, a prior of type (iii) seems appropriate. The posterior becomes a truncated normal distribution in this case. In general, priors reflecting inequality restrictions lead to truncated distributions for the posterior. The next step is to find the mean of the posterior distribution. We can also ask such questions as: What is the probability that P oS .95? Each of these questions leads to integrals of the form: When g (f1) = 8;, one of the components of f1, then then we are just estimating Ep[8j J by the sample mean of a random sample from the posterior distribution. Suppose we Want to estimate the probability that.l1 f D, some region of the parameter space. Let gl./1) be an indicator function equal to 1 if fl f D, and 0 otherwise. Then the integral 1 (5) E[P/X] = fpp(PIx) dP o .• s P(O~P~ .95) = f pIP/x) dP is Probl./1 o In most higher dimensional cases, such integrals cannot be evaluated analytically or approximated numerically and must be estimated using Monte Carlo integration. Equation (7) states that the Monte Carlo estimate will converge to the correct limit, but doesn't indicate the rate of convergence or how accurate the approximation is. To do that, a centrallimit'theorem is needed. The principle behind Monte Carlo integration is as follows: To evaluate the integral f g(x)f(x) dx, where fix) is a (possibly multivariate) density function, we note that f g(x)f(x) dx = E.[g(x)), the expectation of g(x) under the fdistribution. (A subscript on the expectation operator refers to the distribution with respect to which the expectation is being taken.) If we can make n draws {x.} from the f-distribution, then n g(Xl) -1 ~ nf;1 =f g(x) fIx) D). By making random draws from the posterior and counting the number of draws that satisfy ~ f D,say n" we can use n,/n as the estimate. Monte Carlo Integration (6 ) f If var[g(fl)], the variance of g(fl) under the posterior distribution, exists, then a central limit theorem applies to show that (9) rn (gn -g) -N(.ll., vax [g(lI) ] where .. is convergence in distribution. dx Since var[g.!f1)) = Ep[g.!f1)2] - Ep[g.!f1l1', Monte Carlo integration can also be used to estimate the variance. To apply these ideas in the Bayesian estimation setting, let fl be the parameter vector.fora model with likelihood function L(f1/l£l and prior density n(f1) and let g(f1) be any function of interest. Then if {~}, i = 1 to When random sampling from the posterior is not possible, the next best alternative may be to sample from the likelihood density, L(.I11.lll1 f eL(f1I.llld/!. Let c = f eL(f1I.llldf1, the 1044 normalizing constant for the likelihood kernel. Then we can write . (13) Ib(X) f(x) dx =lh(X) f(x) I(x) dx I (x) =IW(X) I(x) dx (10) Ep(g(.Il) I = I g(.Il) p(.Il!JC) dO =EI[W(x) I Ig(.Il)~(.Il)L(.Il/X)dB I ~ (.Il) I g(.Il) L (.Illx) ~ (.Il) I ~ (.Il) L (.Il:X) L (.Il:x) EL (g(.Il) ~ (.Il) EL[~ (.Il) I Therefore, dB (14) dB dB ibex) f(x) dx = .! f n t:! b(x,) f(x, I(x,) where the X; are random draws from the 1distribution. I Let I(~ be the density from which the random draws {~} are made and define the weight function w(~J = L(tlll£l"(~/I(tl). Then the Monte Carlo approximation to Ep [g@J is Note that as soon as we sample from a distribution other than the posterior, we must estimate a ratio of integrals. Given {~}, i = 1 to n, a sequence ot independent draws from the likelihood density, it can be shown that (15) g" = " ~ (g(lh) w(lh) ) " ~ w(lh) t:! Simple Monte Carlo integration, as described above, has either I@ ex: L@ with w~) = k· "(tl) or I(~) ex: L(~)"@ with w~) = k. (11) The main theorem in Geweke (1989) provides a way to measure the numerical accuracy of the Monte carlo approximation. and that (12) Iii (ga:- g) -NUl., vax (g(.Il) Theorem (Geweke) Suppose E,[w(tlll < co and Ep[g~)2w@1 < co. Let ~ (8) ) a"=Ep(g(.Il) -g)'w(.Il)] , where vax ('g (8) ~ (8» represents the variance of g~)"~1 under the likelihood distribution. Importance Sampling When it is not possible to draw from either the likelihood <>rposterior distributions, draws are made from another density called the importance density. This works because of the simple identity Then n"'(g,,-g) - NUl, a2 ) , f I I I ,.,.t 8" • (8,,2) '" is called the numerical . standard error Inse) of g" and measures the 1045 numerical accuracy of the estimate g.. the definition of the nse shows that it is sensitive to var[g(~)) under the posterior (which is not in the control of the statistician) and the size of the weights, w@ (which is controllable by choice of an importance density). is a draw from, I!ay, a multivariate normal or t distribution. The corresponding antithetic draw is then jl - z.! As a standard of reference, we could use the nse that would occur from using the posterior itself as the Monte Carlo sampling distribution, When I~) = p~), then var[g(~)) and the nse from np Monte Carlo draws would be var[g(mll np. For example, 10,000 draws would reduce the nse to 1 % of the variance of g(m under the posterior. The series of papers by Geweke contain a numbeF,of interesting examples applying the above techniques. For example: Some Examples cr= Geweke (1986): Three regression examples are presented with various kinds of inequality restrictions ranging from straight-forward sign constraints, such as P2 > P. or P. > 0, to nonlinear restrictions on an autoregressive model. The restriction that an AR model be stable requires that the roots of the associated lag polynomial have a.llits roots outside the unit circle.•. The SASIIML function polyroot is exactly what is needed to check each draw to see if the restriction holds. We could ask the question: if another densi'ty I(ll) is used as the sampling density, how. . many iterati()ns, n" would it take to achieve the samense, i:e., set var[g@l/np ,d' or crtn, vax [g(a)] 02 = np . Geweke calls this ratio 1') I the relative numeric efficiency, RNE. The RNE can be interpreted as the ratio of the number of Monte Carlo draws from the posterior to the number of draws from an importance density to achieve the same nse. A RNE close to 1 indicates an importance density as efficient as the posterior. Geweke has shown . that non-random sampling las discussed in the next section) can result in RNE's close to 100. Geweke (1988): The means of predictive densities in a vector autoregression are calculated. Geweke (1989): Markov chain and ARCH models are studied. The paper by Chalfant, Gray and White [1 9911 uses the above describell techniques to estimate a system of demand equations. Economic theory imposes a set of equality (symmetry, homogeneity) and inequality (monotonicity, concavity and substitutability) restrictions on the parameters in the system. The equality restrictions are easily handled by direct substitution. The discussion below will describe in . general terms ,the kinds of quantities that ari$e in their .model and prior and to show how SASIIML can do the necessary calculations. Antithetic Sampling So far we have talked about drawing independent samples from a distribution. When we sample from the posterior and our estimate is g., as in equation (7), then vax [g.] = vax [g(a) ] / n . Antithetic sampling consists of drawing in correlated pairs with the goal'of decreasing vax [g.] . To see how this would work, suppose that X and XO are draws from a distribution with The usual estimate mean p and variance ofpispo = }4(X+X°l.lfXandXoare independent, then var(p°) = }4cr. In general, var(p°) = J4 (var(X) + 2cov(X,XO) + var(XO)) = }4cr(1 + corr(X,XO)): Thus if X and XO are negatively correlated, the var(p°) can be reduced. Let fl be the vector of parameters in the systelJl and suppose ~ £ Rk, where R = the real numbers. The··inequality restrictions can be e;'pressed as· fl £ 0 !;; Rk, for some subset o of Rk. The (lnequality restriction) prior would be"@ ex: c, V ~ £ 0, cconstant, i.e. the prior information is that ~ must satisfy the inequality restrictions, but is otherwise uninformative. cr. In many Bayesian estimation problems the likelihood is symmetric about an MLE estimate ji . and a random draw is typically of the form $I. + Z. i where ~ The system of equations is estimated using Zellner's Seemingly Unrelated Regression (SUR) technique, where the error vector f is 1046 assumed to be distributed as a multivariate normal, f - N(Q,I). A normal likelihood, diffuse prior on I, and inequality prior on ~ results in a (marginal) posterior density for ~ of the form: software. Step 2: Construct the multivariate t distribution function to serve as the importance density. In the following IML code, the SAS data set' ets. theta' contains the output from Proc Model. The first observation is II and the remaining observations are the rows of V(II). Since ~ has 15 elements in this example, 'ets.theta' has 16 observations. The first section of code brings the output from Proc Model into IML. where T '" the number of observations and A = (a;;), and .§;lID is the Tx1 vector of residuals for the i"' equation in the system. Note that p~/:t) is a truncated distrubution since "lID = for ~ E D. Since p~/y) is not a "familiar" density, a random sample cannot be drawn from the posterior distribution and a truncated multivariate t-distribution is used as an importance density.. + read estimates saved from Proc Model; libname ets 'mydir'; + 1 st read theta and transpose; use mydir.theta var{pl p2 ... p15}; read point 1 into theta; theta = theta'; + next read covariance matrix; read point (2: 16) into cov; + find Cholesky factor of cov; H = root(cov); + save matrices in storage catalog; reset storage = 'mylib.unconstr'; store theta H; ° The following two questions can now be addressed: (1) What is the probability of ~ satisfying the inequality restrictions? i.e., evaluate the integral (16) P (fIE D) = .• f Now that we have a matrix H such that V(~) = H' H', we can construct the multivariate t with A degrees of freedom. Antithetic draws are used as suggested by Geweke (1988). L (Jlh) I (JI) dll leD I (JI) assuming a diffuse prior on + set degrees of freedom for this problem; df = 4; + 1st draw multivariate normal, size of theta; nparms = nrow(theta); norml = normaI(J(nparms,l ,0)); + next draw multivariate normal, size of df; norm2 = normaI(J(df,1 ,0)); + construct chi-square usingnorm2; chisqr = (norm2'+norm2); + now put pieces together; add = H+norml/(sqrt(chisqr/df)); + get random draw from importance distn; theta 1 = theta + add; + get antithetic draw; theta2 = theta - add; ~. (2) What is the mean of the posterior distribution? i.e., evaluate the integral (17 ) Ep(Jl./da tal· = ~ f JI 1t f JI j.D L (ldJl) I (JI) dll I(JI) (JI) L (Jlh) I (JI) I (JI) dll The last equation follows since "~) is just the indicator function fbhheset D. An outline of the algorithm to answer these questions, along with the accompanying IML code, is given next; The above statements would appear in a loop where a total of N draws would be made. Step 1: Esimate the unrestricted model by iterated SUR to obtain the maximum likelihood estimates ~ and V (~). Since there is no need to reinvent the wheel in SAS/IML, this can be done with Proc Model in SAS/ETS Step 3. With each replication in the above loop, check if the random draws ~, and satisfy the inequality restrictions. Using the "successful" draws, say there are n out of a u.. 1047 total of N, we can estimate the posterior mean and probability of the restrictions holding by: (18) Two things to note about the above code. First, SAS/IML being a matrix language, there is a built-in function to find eigenvalues of (symmetric) matrices. Second, the variable concave is defined by a logical condition, eval[ < > J < = .000000001. The expression eval[ < > J selects the maximum element of the matrix eval. If eval[ < > J is non-positive (allowing for some round-off error), then the value of concave is 1, else concave = O. Thus the variable count is only incremented when the inequality restriction holds for the L(J!../X) Define w(J!..) I (J!..) a: = a ~ J!..w(J!..) a ~ w(J!..) a P(llE D) drawn~. L W(J!..) 1«1 N k w(J!..) Conclusion The denominators in the above expressions are normalizing factors to account for the fact that we did not carry along constants with the various density functions. SAS/lML provides all the necessary tools to implement Monte Carlo integration , with importance and antithetic sampling, in order to estimate posterior means and various probabilities associated with Bayesian inference. The series of papers by Geweke provide a good summary of these procedures. Step 4. The next task with eacn draw fl. is to evaluate the posterior and the importance densities at that value of ~. Assuming the Txq matrix errs (T = number of observations and q = number of equations estimated) has already been defined to be the errors for the system of equations in the model after the drawn fl. is substituted into thfi! equations, we have: References [1 J Chalfant,J. R.Gray and K.White, 'Evaluating Prior Beliefs in a Demand System: The Case of Meat Demand in Canada', May 1991, Amer. J. Agr. Econ., 476-490. [2J Fernandez-Cornejo, Jorge, 'Short- and Long-Run Demand and Substitution of Agricultural Inputs' ,April 1992, NJARE, Vol 21, No 1,36-49. [3J Geweke,J., 'Exact Inference in the Inequality Constrained Normal Linear Regression Model', 1986, J. Applied Econometrics, Vol.18, 127-141. [4J Geweke,J., 'Antithetic Acceleration of Monte Carlo .Integration in Bayesian Inference', 1988,J. of Econometrics, Vol. 38,73-89. [5J Geweke,J., 'Bayesian Inference in Econometric Models. Using Monte Carlo Integration', 1989, Econometrica, Vol.57, 1317-1339. [6J Geweke,J.,'Generic, Algorithmic Approaches to Monte Carlo Integrati9n in Bayesian Inference', 1991, Contemporary Mathematics, Vol 115, 117-135. [7J Griffiths,W., 'Bayesian Econometrics and How to Get Rid of Tho.se Wrong Signs', April 1988, Rev. of Mkt. and Ag. Econ., Vol 56, Nol, 36-56. • define matrix A used for posterior; A = errs' ·errs; • marginal posterior pdf for theta; fpost = 1/(det(A))·· (nobs/2); • multivariate t, importance pdf; ft = 1/(df + add' *inv(cov) *add) * * «df + nparm)/2); Step 5. The last calculation to make for each iteration is to check whether or not the draw, fl., is "successful" or not, i.e., does fl. satisfy the inequality restrictions of~ E D. In this example "concavity" means that the eigenvalues of a certain matrix are negative. Assuming the- matrix of interest is called submat and has already been constructed, then: • find eigenvalues of submat; eval = eigval(submat); * check if all eigenvalues negative; concave = (eval[ < > J < = .000000001); • count successful draws; count = count + concave; 1048 [81 Kloek,T. and H.K. van Dijk:Sayesian Estimates of Equation Systems: An Application of Integration by Monte Carlo', 1978, Econometrica, Vol 46, 1- 20. [101 SAS/IML Software: Usage and Reference, Version 6, First Edition, 1990 SAS, SAS/ETS,and SAS/IML are registered trademarks of SAS Institute Inc., Cary NC, USA 1049