"Fossies" - the Fresh Open Source Software Archive

Member "gretl-2020e/doc/tex/hp-numerical.tex" (17 May 2020, 11380 Bytes) of package /linux/misc/gretl-2020e.tar.xz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \chapter{Numerical methods}
    2 \label{chap:numerical}
    3 
    4 \section{Numerical optimization}
    5 \label{sec:hp-numopt}
    6 
    7 Many, if not most, cases in which an econometrician wants to use a
    8 programming language such as hansl, rather than relying on pre-canned
    9 routines, involve some form of numerical optimization. This could take
   10 the form of maximization of a likelihood or similar methods of
   11 inferential statistics. Alternatively, optimization could be used in a
   12 more general and abstract way, for example to solve portfolio choice
   13 or analogous resource allocation problems.
   14 
   15 Since hansl is Turing-complete, in principle any numerical
   16 optimization technique could be programmed in hansl itself. Some such
   17 techniques, however, are included in hansl's set of native
   18 functions, in the interest of both simplicity of use and
   19 efficiency. These are geared towards the most common kind of problem
   20 encountered in economics and econometrics, that is unconstrained
   21 optimization of differentiable functions.
   22 
   23 In this chapter, we will briefly review what hansl offers to solve
   24 generic problems of the kind
   25 \[
   26 \hat{\mathbf{x}} \equiv \argmax_{\mathbf{x} \in \Re^k} f(\mathbf{x}; \mathbf{a}),
   27 \]
   28 where $f(\mathbf{x}; \mathbf{a})$ is a function of $\mathbf{x}$, whose
   29 shape depends on a vector of parameters $\mathbf{a}$. The objective
   30 function $f(\cdot)$ is assumed to return a scalar real value. In most
   31 cases, it will be assumed it is also continuous and differentiable,
   32 although this need not necessarily be the case. (Note that while
   33 hansl's built-in functions maximize the given objective function,
   34 minimization can be achieved simply by flipping the sign of
   35 $f(\cdot)$.)
   36 
   37 A special case of the above occurs when $\mathbf{x}$ is a vector of
   38 parameters and $\mathbf{a}$ represents ``data''. In these cases, the
   39 objective function is usually a (log-)likelihood and the problem is
   40 one of estimation. For such cases hansl offers several special
   41 constructs, reviewed in section~\ref{sec:est-blocks}. Here we deal
   42 with more generic problems; nevertheless, the differences are only in
   43 the hansl syntax involved: the mathematical algorithms that gretl
   44 employs to solve the optimization problem are the same.
   45 
   46 The reader is invited to read the ``Numerical methods'' chapter of
   47 \GUG\ for a comprehensive treatment. Here, we will only give a small
   48 example which should give an idea of how things are done.
   49 
   50 \begin{code}
   51 function scalar Himmelblau(matrix x)
   52     /* extrema:
   53     f(3.0, 2.0) = 0.0, 
   54     f(-2.805118, 3.131312) = 0.0,
   55     f(-3.779310, -3.283186) = 0.0
   56     f(3.584428, -1.848126) = 0.0
   57     */
   58     scalar ret = (x[1]^2 + x[2] - 11)^2
   59     return -(ret + (x[1] + x[2]^2 - 7)^2)
   60 end function
   61 
   62 # ----------------------------------------------------------------------
   63 
   64 set max_verbose 1
   65 
   66 matrix theta1 = { 0, 0 }
   67 y1 = BFGSmax(theta1, "Himmelblau(theta1)")
   68 matrix theta2 = { 0, -1 }
   69 y2 = NRmax(theta2, "Himmelblau(theta2)")
   70 
   71 print y1 y2 theta1 theta2
   72 \end{code}
   73 
   74 We use for illustration here a classic ``nasty'' function from the
   75 numerical optimization literature, namely the Himmelblau function,
   76 which has four different minima; $f(x, y) = (x^2+y-11)^2 +
   77 (x+y^2-7)^2$. The example proceeds as follows.
   78 \begin{enumerate}
   79 \item First we define the function to optimize: it must return a
   80   scalar and have among its arguments the vector to optimize. In this
   81   particular case that is its only argument, but there could have been
   82   other ones if necessary.  Since in this case we are solving for a
   83   minimum our definition returns the negative of the Himmelblau
   84   function proper.
   85 \item We next set \verb|max_verbose| to 1. This is another example of
   86   the usage of the \cmd{set} command; its meaning is ``let me see how
   87   the iterations go'' and it defaults to 0. By using the \cmd{set}
   88   command with appropriate parameters, you control several features of
   89   the optimization process, such as numerical tolerances,
   90   visualization of the iterations, and so forth.
   91 \item Define $\theta_1 = [0, 0]$ as the starting point.
   92 \item Invoke the \cmd{BFGSmax} function; this will seek the maximum
   93   via the BFGS technique. Its base syntax is \texttt{BFGSmax(arg1,
   94     arg2)}, where \texttt{arg1} is the vector contining the
   95   optimization variable and \texttt{arg2} is a string containing the
   96   invocation of the function to maximize. BFGS will try several values
   97   of $\theta_1$ until the maximum is reached. On successful
   98   completion, the vector \texttt{theta1} will contain the final
   99   point. (Note: there's \emph{much} more to this. For details, be sure
  100   to read \GUG and the \GCR.)
  101 \item Then we tackle the same problem but with a different starting
  102   point and a different optimization technique. We start from
  103   $\theta_2 = [0, -1]$ and use Newton--Raphson instead of BFGS,
  104   calling the \cmd{NRmax()} function instead if \cmd{BFGSmax()}. The
  105   syntax, however, is the same.
  106 \item Finally we print the results.
  107 \end{enumerate}
  108 
  109 Table~\ref{tab:optim-output} on page \pageref{tab:optim-output}
  110 contains a selected portion of the output. Note that the second run
  111 converges to a different local optimum than the first one. This is a
  112 consequence of having initialized the algorithm with a different
  113 starting point. In this example, numerical derivatives were used, but
  114 you can supply analytically computed derivatives to both methods if
  115 you have a hansl function for them; see \GUG\ for more detail.
  116 
  117 \begin{table}[ht]
  118   \begin{footnotesize}
  119 \begin{scode}
  120 ? matrix theta1 = { 0, 0 }
  121 Replaced matrix theta1
  122 ? y1 = BFGSmax(theta1, "Himmelblau(11, theta1)")
  123 Iteration 1: Criterion = -170.000000000
  124 Parameters:       0.0000      0.0000
  125 Gradients:        14.000      22.000 (norm 0.00e+00)
  126 
  127 Iteration 2: Criterion = -128.264504038 (steplength = 0.04)
  128 Parameters:      0.56000     0.88000
  129 Gradients:        33.298      39.556 (norm 5.17e+00)
  130 
  131 ...
  132 
  133 --- FINAL VALUES: 
  134 Criterion = -1.83015730011e-28 (steplength = 0.0016)
  135 Parameters:       3.0000      2.0000
  136 Gradients:    1.7231e-13 -3.7481e-13 (norm 7.96e-07)
  137 
  138 Function evaluations: 39
  139 Evaluations of gradient: 16
  140 Replaced scalar y1 = -1.83016e-28
  141 ? matrix theta2 = { 0, -1 }
  142 Replaced matrix theta2
  143 ? y2 = NRmax(theta2, "Himmelblau(11, theta2)")
  144 Iteration 1: Criterion = -179.999876556 (steplength = 1)
  145 Parameters:   1.0287e-05     -1.0000
  146 Gradients:        12.000  2.8422e-06 (norm 7.95e-03)
  147 
  148 Iteration 2: Criterion = -175.440691085 (steplength = 1)
  149 Parameters:      0.25534     -1.0000
  150 Gradients:        12.000  4.5475e-05 (norm 1.24e+00)
  151 
  152 ...
  153 
  154 --- FINAL VALUES: 
  155 Criterion = -3.77420797114e-22 (steplength = 1)
  156 Parameters:       3.5844     -1.8481
  157 Gradients:   -2.6649e-10  2.9536e-11 (norm 2.25e-05)
  158 
  159 Gradient within tolerance (1e-07)
  160 Replaced scalar y2 = -1.05814e-07
  161 ? print y1 y2 theta1 theta2
  162 
  163              y1 = -1.8301573e-28
  164 
  165              y2 = -1.0581385e-07
  166 
  167 theta1 (1 x 2)
  168 
  169   3   2 
  170 
  171 theta2 (1 x 2)
  172 
  173       3.5844      -1.8481 
  174 \end{scode}
  175     
  176   \end{footnotesize}
  177   \caption{Output from maximization}
  178   \label{tab:optim-output}
  179 \end{table}
  180 
  181 The optimization methods hansl puts at your disposal are:
  182 \begin{itemize}
  183 \item BFGS, via the \cmd{BFGSmax()} function. This is in most cases
  184   the best compromise between performance and robustness. It assumes
  185   that the function to maximize is differentiable and will try to
  186   approximate its curvature by clever use of the change in the
  187   gradient between iterations. You can supply it with an
  188   analytically-computed gradient for speed and accuracy, but if you
  189   don't, the first derivatives will be computed numerically.
  190 \item Newton--Raphson, via the \cmd{NRmax()} function. Actually, the
  191   name is misleading. It should be called something like
  192   ``curvature-based'', since it relies on the iterations
  193   \[
  194     x_{i+1} = -\lambda_i C(x_i)^{-1} g(x_i)
  195   \]
  196   where $g(x)$ is the gradient and $C(x_i)$ is some measure of
  197   curvature of the function to optimize; if $C(x)$ is the Hessian
  198   matrix, you get Newton--Raphson. Again, you can code your own
  199   functions for $g(\cdot)$ and $C(\cdot)$, but if you don't then
  200   numerical approximations to the gradient and the Hessian will be
  201   used, respectively. Other popular optimization methods (such as BHHH
  202   and the scoring algorithm) can be implemented by supplying to
  203   \cmd{NRmax()} the appropriate curvature matrix $C(\cdot)$. This
  204   method is very efficient when it works, but is rather fragile: for
  205   example, if $C(x_i)$ happens to be non-negative definite at some
  206   iteration convergence may become problematic.
  207 \item Derivative-free methods: the only method that hansl offers
  208   presently is simulated annealing, via the \cmd{simann()} function,
  209   but an implementiation of the Nelder--Mead algorithm (also known as
  210   the ``amoeba'' method) should be just a matter of time. These
  211   methods work even when the function to maximize has some form of
  212   disconinuity or is not everywhere differentiable; however, they may
  213   be very slow and CPU-intensive.
  214 \end{itemize}
  215 
  216 \section{Numerical differentiation}
  217 \label{sec:hp-numdiff}
  218 
  219 For numerical differentiation we have \texttt{fdjac}. For example:
  220 
  221 \begin{code}
  222 set echo off
  223 set messages off
  224 
  225 function scalar beta(scalar x, scalar a, scalar b)
  226     return x^(a-1) * (1-x)^(b-1)
  227 end function
  228 
  229 function scalar ad_beta(scalar x, scalar a, scalar b)
  230     scalar g = beta(x, a-1, b-1)
  231     f1 = (a-1) * (1-x)
  232     f2 = (b-1) * x
  233     return (f1 - f2) * g
  234 end function
  235 
  236 function scalar nd_beta(scalar x, scalar a, scalar b)
  237     matrix mx = {x}
  238     return fdjac(mx, beta(mx, a, b))
  239 end function
  240 
  241 a = 3.5
  242 b = 2.5
  243 
  244 loop for (x=0; x<=1; x+=0.1)
  245     printf "x = %3.1f; beta(x) = %7.5f, ", x, beta(x, a, b)
  246     A = ad_beta(x, a, b)
  247     N = nd_beta(x, a, b)
  248     printf "analytical der. = %8.5f, numerical der. = %8.5f\n", A, N
  249 endloop
  250 \end{code}
  251 
  252 returns 
  253 
  254 \begin{code}
  255 x = 0.0; beta(x) = 0.00000, analytical der. =  0.00000, numerical der. =  0.00000
  256 x = 0.1; beta(x) = 0.00270, analytical der. =  0.06300, numerical der. =  0.06300
  257 x = 0.2; beta(x) = 0.01280, analytical der. =  0.13600, numerical der. =  0.13600
  258 x = 0.3; beta(x) = 0.02887, analytical der. =  0.17872, numerical der. =  0.17872
  259 x = 0.4; beta(x) = 0.04703, analytical der. =  0.17636, numerical der. =  0.17636
  260 x = 0.5; beta(x) = 0.06250, analytical der. =  0.12500, numerical der. =  0.12500
  261 x = 0.6; beta(x) = 0.07055, analytical der. =  0.02939, numerical der. =  0.02939
  262 x = 0.7; beta(x) = 0.06736, analytical der. = -0.09623, numerical der. = -0.09623
  263 x = 0.8; beta(x) = 0.05120, analytical der. = -0.22400, numerical der. = -0.22400
  264 x = 0.9; beta(x) = 0.02430, analytical der. = -0.29700, numerical der. = -0.29700
  265 x = 1.0; beta(x) = 0.00000, analytical der. = -0.00000, numerical der. =       NA
  266 \end{code}
  267 
  268 Details on the algorithm used can be found in the \GCR. Suffice it to
  269 say here that you have a \texttt{fdjac\_quality} setting that goes
  270 from 0 to 2. The default value is to 0, which gives you
  271 forward-difference approximation: this is the fastest algorithm, but
  272 sometimes may not be precise enough. The value of 1 gives you
  273 bilateral difference, while 2 uses Richardson extrapolation. As you go
  274 up, you gain in accuracy, but the method becomes considerably more
  275 CPU-intensive.
  276 
  277 % \section{Random number generation}
  278 
  279 % \begin{itemize}
  280 % \item Mersenne Twister in its various incarnations
  281 % \item Ziggurat vs Box--Muller
  282 % \item Other distributions
  283 % \end{itemize}
  284 
  285 %%% Local Variables: 
  286 %%% mode: latex
  287 %%% TeX-master: "hansl-primer"
  288 %%% End: 
  289