As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers.
Alternatively you can here view or download the uninterpreted source code file.

1 \chapter{Numerical methods} 2 \label{chap:numerical} 3 4 \section{Numerical optimization} 5 \label{sec:hp-numopt} 6 7 Many, if not most, cases in which an econometrician wants to use a 8 programming language such as hansl, rather than relying on pre-canned 9 routines, involve some form of numerical optimization. This could take 10 the form of maximization of a likelihood or similar methods of 11 inferential statistics. Alternatively, optimization could be used in a 12 more general and abstract way, for example to solve portfolio choice 13 or analogous resource allocation problems. 14 15 Since hansl is Turing-complete, in principle any numerical 16 optimization technique could be programmed in hansl itself. Some such 17 techniques, however, are included in hansl's set of native 18 functions, in the interest of both simplicity of use and 19 efficiency. These are geared towards the most common kind of problem 20 encountered in economics and econometrics, that is unconstrained 21 optimization of differentiable functions. 22 23 In this chapter, we will briefly review what hansl offers to solve 24 generic problems of the kind 25 \[ 26 \hat{\mathbf{x}} \equiv \argmax_{\mathbf{x} \in \Re^k} f(\mathbf{x}; \mathbf{a}), 27 \] 28 where $f(\mathbf{x}; \mathbf{a})$ is a function of $\mathbf{x}$, whose 29 shape depends on a vector of parameters $\mathbf{a}$. The objective 30 function $f(\cdot)$ is assumed to return a scalar real value. In most 31 cases, it will be assumed it is also continuous and differentiable, 32 although this need not necessarily be the case. (Note that while 33 hansl's built-in functions maximize the given objective function, 34 minimization can be achieved simply by flipping the sign of 35 $f(\cdot)$.) 36 37 A special case of the above occurs when $\mathbf{x}$ is a vector of 38 parameters and $\mathbf{a}$ represents ``data''. In these cases, the 39 objective function is usually a (log-)likelihood and the problem is 40 one of estimation. For such cases hansl offers several special 41 constructs, reviewed in section~\ref{sec:est-blocks}. Here we deal 42 with more generic problems; nevertheless, the differences are only in 43 the hansl syntax involved: the mathematical algorithms that gretl 44 employs to solve the optimization problem are the same. 45 46 The reader is invited to read the ``Numerical methods'' chapter of 47 \GUG\ for a comprehensive treatment. Here, we will only give a small 48 example which should give an idea of how things are done. 49 50 \begin{code} 51 function scalar Himmelblau(matrix x) 52 /* extrema: 53 f(3.0, 2.0) = 0.0, 54 f(-2.805118, 3.131312) = 0.0, 55 f(-3.779310, -3.283186) = 0.0 56 f(3.584428, -1.848126) = 0.0 57 */ 58 scalar ret = (x[1]^2 + x[2] - 11)^2 59 return -(ret + (x[1] + x[2]^2 - 7)^2) 60 end function 61 62 # ---------------------------------------------------------------------- 63 64 set max_verbose 1 65 66 matrix theta1 = { 0, 0 } 67 y1 = BFGSmax(theta1, "Himmelblau(theta1)") 68 matrix theta2 = { 0, -1 } 69 y2 = NRmax(theta2, "Himmelblau(theta2)") 70 71 print y1 y2 theta1 theta2 72 \end{code} 73 74 We use for illustration here a classic ``nasty'' function from the 75 numerical optimization literature, namely the Himmelblau function, 76 which has four different minima; $f(x, y) = (x^2+y-11)^2 + 77 (x+y^2-7)^2$. The example proceeds as follows. 78 \begin{enumerate} 79 \item First we define the function to optimize: it must return a 80 scalar and have among its arguments the vector to optimize. In this 81 particular case that is its only argument, but there could have been 82 other ones if necessary. Since in this case we are solving for a 83 minimum our definition returns the negative of the Himmelblau 84 function proper. 85 \item We next set \verb|max_verbose| to 1. This is another example of 86 the usage of the \cmd{set} command; its meaning is ``let me see how 87 the iterations go'' and it defaults to 0. By using the \cmd{set} 88 command with appropriate parameters, you control several features of 89 the optimization process, such as numerical tolerances, 90 visualization of the iterations, and so forth. 91 \item Define $\theta_1 = [0, 0]$ as the starting point. 92 \item Invoke the \cmd{BFGSmax} function; this will seek the maximum 93 via the BFGS technique. Its base syntax is \texttt{BFGSmax(arg1, 94 arg2)}, where \texttt{arg1} is the vector contining the 95 optimization variable and \texttt{arg2} is a string containing the 96 invocation of the function to maximize. BFGS will try several values 97 of $\theta_1$ until the maximum is reached. On successful 98 completion, the vector \texttt{theta1} will contain the final 99 point. (Note: there's \emph{much} more to this. For details, be sure 100 to read \GUG and the \GCR.) 101 \item Then we tackle the same problem but with a different starting 102 point and a different optimization technique. We start from 103 $\theta_2 = [0, -1]$ and use Newton--Raphson instead of BFGS, 104 calling the \cmd{NRmax()} function instead if \cmd{BFGSmax()}. The 105 syntax, however, is the same. 106 \item Finally we print the results. 107 \end{enumerate} 108 109 Table~\ref{tab:optim-output} on page \pageref{tab:optim-output} 110 contains a selected portion of the output. Note that the second run 111 converges to a different local optimum than the first one. This is a 112 consequence of having initialized the algorithm with a different 113 starting point. In this example, numerical derivatives were used, but 114 you can supply analytically computed derivatives to both methods if 115 you have a hansl function for them; see \GUG\ for more detail. 116 117 \begin{table}[ht] 118 \begin{footnotesize} 119 \begin{scode} 120 ? matrix theta1 = { 0, 0 } 121 Replaced matrix theta1 122 ? y1 = BFGSmax(theta1, "Himmelblau(11, theta1)") 123 Iteration 1: Criterion = -170.000000000 124 Parameters: 0.0000 0.0000 125 Gradients: 14.000 22.000 (norm 0.00e+00) 126 127 Iteration 2: Criterion = -128.264504038 (steplength = 0.04) 128 Parameters: 0.56000 0.88000 129 Gradients: 33.298 39.556 (norm 5.17e+00) 130 131 ... 132 133 --- FINAL VALUES: 134 Criterion = -1.83015730011e-28 (steplength = 0.0016) 135 Parameters: 3.0000 2.0000 136 Gradients: 1.7231e-13 -3.7481e-13 (norm 7.96e-07) 137 138 Function evaluations: 39 139 Evaluations of gradient: 16 140 Replaced scalar y1 = -1.83016e-28 141 ? matrix theta2 = { 0, -1 } 142 Replaced matrix theta2 143 ? y2 = NRmax(theta2, "Himmelblau(11, theta2)") 144 Iteration 1: Criterion = -179.999876556 (steplength = 1) 145 Parameters: 1.0287e-05 -1.0000 146 Gradients: 12.000 2.8422e-06 (norm 7.95e-03) 147 148 Iteration 2: Criterion = -175.440691085 (steplength = 1) 149 Parameters: 0.25534 -1.0000 150 Gradients: 12.000 4.5475e-05 (norm 1.24e+00) 151 152 ... 153 154 --- FINAL VALUES: 155 Criterion = -3.77420797114e-22 (steplength = 1) 156 Parameters: 3.5844 -1.8481 157 Gradients: -2.6649e-10 2.9536e-11 (norm 2.25e-05) 158 159 Gradient within tolerance (1e-07) 160 Replaced scalar y2 = -1.05814e-07 161 ? print y1 y2 theta1 theta2 162 163 y1 = -1.8301573e-28 164 165 y2 = -1.0581385e-07 166 167 theta1 (1 x 2) 168 169 3 2 170 171 theta2 (1 x 2) 172 173 3.5844 -1.8481 174 \end{scode} 175 176 \end{footnotesize} 177 \caption{Output from maximization} 178 \label{tab:optim-output} 179 \end{table} 180 181 The optimization methods hansl puts at your disposal are: 182 \begin{itemize} 183 \item BFGS, via the \cmd{BFGSmax()} function. This is in most cases 184 the best compromise between performance and robustness. It assumes 185 that the function to maximize is differentiable and will try to 186 approximate its curvature by clever use of the change in the 187 gradient between iterations. You can supply it with an 188 analytically-computed gradient for speed and accuracy, but if you 189 don't, the first derivatives will be computed numerically. 190 \item Newton--Raphson, via the \cmd{NRmax()} function. Actually, the 191 name is misleading. It should be called something like 192 ``curvature-based'', since it relies on the iterations 193 \[ 194 x_{i+1} = -\lambda_i C(x_i)^{-1} g(x_i) 195 \] 196 where $g(x)$ is the gradient and $C(x_i)$ is some measure of 197 curvature of the function to optimize; if $C(x)$ is the Hessian 198 matrix, you get Newton--Raphson. Again, you can code your own 199 functions for $g(\cdot)$ and $C(\cdot)$, but if you don't then 200 numerical approximations to the gradient and the Hessian will be 201 used, respectively. Other popular optimization methods (such as BHHH 202 and the scoring algorithm) can be implemented by supplying to 203 \cmd{NRmax()} the appropriate curvature matrix $C(\cdot)$. This 204 method is very efficient when it works, but is rather fragile: for 205 example, if $C(x_i)$ happens to be non-negative definite at some 206 iteration convergence may become problematic. 207 \item Derivative-free methods: the only method that hansl offers 208 presently is simulated annealing, via the \cmd{simann()} function, 209 but an implementiation of the Nelder--Mead algorithm (also known as 210 the ``amoeba'' method) should be just a matter of time. These 211 methods work even when the function to maximize has some form of 212 disconinuity or is not everywhere differentiable; however, they may 213 be very slow and CPU-intensive. 214 \end{itemize} 215 216 \section{Numerical differentiation} 217 \label{sec:hp-numdiff} 218 219 For numerical differentiation we have \texttt{fdjac}. For example: 220 221 \begin{code} 222 set echo off 223 set messages off 224 225 function scalar beta(scalar x, scalar a, scalar b) 226 return x^(a-1) * (1-x)^(b-1) 227 end function 228 229 function scalar ad_beta(scalar x, scalar a, scalar b) 230 scalar g = beta(x, a-1, b-1) 231 f1 = (a-1) * (1-x) 232 f2 = (b-1) * x 233 return (f1 - f2) * g 234 end function 235 236 function scalar nd_beta(scalar x, scalar a, scalar b) 237 matrix mx = {x} 238 return fdjac(mx, beta(mx, a, b)) 239 end function 240 241 a = 3.5 242 b = 2.5 243 244 loop for (x=0; x<=1; x+=0.1) 245 printf "x = %3.1f; beta(x) = %7.5f, ", x, beta(x, a, b) 246 A = ad_beta(x, a, b) 247 N = nd_beta(x, a, b) 248 printf "analytical der. = %8.5f, numerical der. = %8.5f\n", A, N 249 endloop 250 \end{code} 251 252 returns 253 254 \begin{code} 255 x = 0.0; beta(x) = 0.00000, analytical der. = 0.00000, numerical der. = 0.00000 256 x = 0.1; beta(x) = 0.00270, analytical der. = 0.06300, numerical der. = 0.06300 257 x = 0.2; beta(x) = 0.01280, analytical der. = 0.13600, numerical der. = 0.13600 258 x = 0.3; beta(x) = 0.02887, analytical der. = 0.17872, numerical der. = 0.17872 259 x = 0.4; beta(x) = 0.04703, analytical der. = 0.17636, numerical der. = 0.17636 260 x = 0.5; beta(x) = 0.06250, analytical der. = 0.12500, numerical der. = 0.12500 261 x = 0.6; beta(x) = 0.07055, analytical der. = 0.02939, numerical der. = 0.02939 262 x = 0.7; beta(x) = 0.06736, analytical der. = -0.09623, numerical der. = -0.09623 263 x = 0.8; beta(x) = 0.05120, analytical der. = -0.22400, numerical der. = -0.22400 264 x = 0.9; beta(x) = 0.02430, analytical der. = -0.29700, numerical der. = -0.29700 265 x = 1.0; beta(x) = 0.00000, analytical der. = -0.00000, numerical der. = NA 266 \end{code} 267 268 Details on the algorithm used can be found in the \GCR. Suffice it to 269 say here that you have a \texttt{fdjac\_quality} setting that goes 270 from 0 to 2. The default value is to 0, which gives you 271 forward-difference approximation: this is the fastest algorithm, but 272 sometimes may not be precise enough. The value of 1 gives you 273 bilateral difference, while 2 uses Richardson extrapolation. As you go 274 up, you gain in accuracy, but the method becomes considerably more 275 CPU-intensive. 276 277 % \section{Random number generation} 278 279 % \begin{itemize} 280 % \item Mersenne Twister in its various incarnations 281 % \item Ziggurat vs Box--Muller 282 % \item Other distributions 283 % \end{itemize} 284 285 %%% Local Variables: 286 %%% mode: latex 287 %%% TeX-master: "hansl-primer" 288 %%% End: 289