## "Fossies" - the Fresh Open Source Software Archive

### Member "gretl-2020e/doc/tex/hp-numerical.tex" (17 May 2020, 11380 Bytes) of package /linux/misc/gretl-2020e.tar.xz:

As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \chapter{Numerical methods}
2 \label{chap:numerical}
3
4 \section{Numerical optimization}
5 \label{sec:hp-numopt}
6
7 Many, if not most, cases in which an econometrician wants to use a
8 programming language such as hansl, rather than relying on pre-canned
9 routines, involve some form of numerical optimization. This could take
10 the form of maximization of a likelihood or similar methods of
11 inferential statistics. Alternatively, optimization could be used in a
12 more general and abstract way, for example to solve portfolio choice
13 or analogous resource allocation problems.
14
15 Since hansl is Turing-complete, in principle any numerical
16 optimization technique could be programmed in hansl itself. Some such
17 techniques, however, are included in hansl's set of native
18 functions, in the interest of both simplicity of use and
19 efficiency. These are geared towards the most common kind of problem
20 encountered in economics and econometrics, that is unconstrained
21 optimization of differentiable functions.
22
23 In this chapter, we will briefly review what hansl offers to solve
24 generic problems of the kind
25 $26 \hat{\mathbf{x}} \equiv \argmax_{\mathbf{x} \in \Re^k} f(\mathbf{x}; \mathbf{a}), 27$
28 where $f(\mathbf{x}; \mathbf{a})$ is a function of $\mathbf{x}$, whose
29 shape depends on a vector of parameters $\mathbf{a}$. The objective
30 function $f(\cdot)$ is assumed to return a scalar real value. In most
31 cases, it will be assumed it is also continuous and differentiable,
32 although this need not necessarily be the case. (Note that while
33 hansl's built-in functions maximize the given objective function,
34 minimization can be achieved simply by flipping the sign of
35 $f(\cdot)$.)
36
37 A special case of the above occurs when $\mathbf{x}$ is a vector of
38 parameters and $\mathbf{a}$ represents data''. In these cases, the
39 objective function is usually a (log-)likelihood and the problem is
40 one of estimation. For such cases hansl offers several special
41 constructs, reviewed in section~\ref{sec:est-blocks}. Here we deal
42 with more generic problems; nevertheless, the differences are only in
43 the hansl syntax involved: the mathematical algorithms that gretl
44 employs to solve the optimization problem are the same.
45
46 The reader is invited to read the Numerical methods'' chapter of
47 \GUG\ for a comprehensive treatment. Here, we will only give a small
48 example which should give an idea of how things are done.
49
50 \begin{code}
51 function scalar Himmelblau(matrix x)
52     /* extrema:
53     f(3.0, 2.0) = 0.0,
54     f(-2.805118, 3.131312) = 0.0,
55     f(-3.779310, -3.283186) = 0.0
56     f(3.584428, -1.848126) = 0.0
57     */
58     scalar ret = (x[1]^2 + x[2] - 11)^2
59     return -(ret + (x[1] + x[2]^2 - 7)^2)
60 end function
61
62 # ----------------------------------------------------------------------
63
64 set max_verbose 1
65
66 matrix theta1 = { 0, 0 }
67 y1 = BFGSmax(theta1, "Himmelblau(theta1)")
68 matrix theta2 = { 0, -1 }
69 y2 = NRmax(theta2, "Himmelblau(theta2)")
70
71 print y1 y2 theta1 theta2
72 \end{code}
73
74 We use for illustration here a classic nasty'' function from the
75 numerical optimization literature, namely the Himmelblau function,
76 which has four different minima; $f(x, y) = (x^2+y-11)^2 + 77 (x+y^2-7)^2$. The example proceeds as follows.
78 \begin{enumerate}
79 \item First we define the function to optimize: it must return a
80   scalar and have among its arguments the vector to optimize. In this
81   particular case that is its only argument, but there could have been
82   other ones if necessary.  Since in this case we are solving for a
83   minimum our definition returns the negative of the Himmelblau
84   function proper.
85 \item We next set \verb|max_verbose| to 1. This is another example of
86   the usage of the \cmd{set} command; its meaning is let me see how
87   the iterations go'' and it defaults to 0. By using the \cmd{set}
88   command with appropriate parameters, you control several features of
89   the optimization process, such as numerical tolerances,
90   visualization of the iterations, and so forth.
91 \item Define $\theta_1 = [0, 0]$ as the starting point.
92 \item Invoke the \cmd{BFGSmax} function; this will seek the maximum
93   via the BFGS technique. Its base syntax is \texttt{BFGSmax(arg1,
94     arg2)}, where \texttt{arg1} is the vector contining the
95   optimization variable and \texttt{arg2} is a string containing the
96   invocation of the function to maximize. BFGS will try several values
97   of $\theta_1$ until the maximum is reached. On successful
98   completion, the vector \texttt{theta1} will contain the final
99   point. (Note: there's \emph{much} more to this. For details, be sure
100   to read \GUG and the \GCR.)
101 \item Then we tackle the same problem but with a different starting
102   point and a different optimization technique. We start from
103   $\theta_2 = [0, -1]$ and use Newton--Raphson instead of BFGS,
104   calling the \cmd{NRmax()} function instead if \cmd{BFGSmax()}. The
105   syntax, however, is the same.
106 \item Finally we print the results.
107 \end{enumerate}
108
109 Table~\ref{tab:optim-output} on page \pageref{tab:optim-output}
110 contains a selected portion of the output. Note that the second run
111 converges to a different local optimum than the first one. This is a
112 consequence of having initialized the algorithm with a different
113 starting point. In this example, numerical derivatives were used, but
114 you can supply analytically computed derivatives to both methods if
115 you have a hansl function for them; see \GUG\ for more detail.
116
117 \begin{table}[ht]
118   \begin{footnotesize}
119 \begin{scode}
120 ? matrix theta1 = { 0, 0 }
121 Replaced matrix theta1
122 ? y1 = BFGSmax(theta1, "Himmelblau(11, theta1)")
123 Iteration 1: Criterion = -170.000000000
124 Parameters:       0.0000      0.0000
125 Gradients:        14.000      22.000 (norm 0.00e+00)
126
127 Iteration 2: Criterion = -128.264504038 (steplength = 0.04)
128 Parameters:      0.56000     0.88000
129 Gradients:        33.298      39.556 (norm 5.17e+00)
130
131 ...
132
133 --- FINAL VALUES:
134 Criterion = -1.83015730011e-28 (steplength = 0.0016)
135 Parameters:       3.0000      2.0000
136 Gradients:    1.7231e-13 -3.7481e-13 (norm 7.96e-07)
137
138 Function evaluations: 39
140 Replaced scalar y1 = -1.83016e-28
141 ? matrix theta2 = { 0, -1 }
142 Replaced matrix theta2
143 ? y2 = NRmax(theta2, "Himmelblau(11, theta2)")
144 Iteration 1: Criterion = -179.999876556 (steplength = 1)
145 Parameters:   1.0287e-05     -1.0000
146 Gradients:        12.000  2.8422e-06 (norm 7.95e-03)
147
148 Iteration 2: Criterion = -175.440691085 (steplength = 1)
149 Parameters:      0.25534     -1.0000
150 Gradients:        12.000  4.5475e-05 (norm 1.24e+00)
151
152 ...
153
154 --- FINAL VALUES:
155 Criterion = -3.77420797114e-22 (steplength = 1)
156 Parameters:       3.5844     -1.8481
157 Gradients:   -2.6649e-10  2.9536e-11 (norm 2.25e-05)
158
160 Replaced scalar y2 = -1.05814e-07
161 ? print y1 y2 theta1 theta2
162
163              y1 = -1.8301573e-28
164
165              y2 = -1.0581385e-07
166
167 theta1 (1 x 2)
168
169   3   2
170
171 theta2 (1 x 2)
172
173       3.5844      -1.8481
174 \end{scode}
175
176   \end{footnotesize}
177   \caption{Output from maximization}
178   \label{tab:optim-output}
179 \end{table}
180
181 The optimization methods hansl puts at your disposal are:
182 \begin{itemize}
183 \item BFGS, via the \cmd{BFGSmax()} function. This is in most cases
184   the best compromise between performance and robustness. It assumes
185   that the function to maximize is differentiable and will try to
186   approximate its curvature by clever use of the change in the
187   gradient between iterations. You can supply it with an
188   analytically-computed gradient for speed and accuracy, but if you
189   don't, the first derivatives will be computed numerically.
190 \item Newton--Raphson, via the \cmd{NRmax()} function. Actually, the
191   name is misleading. It should be called something like
192   curvature-based'', since it relies on the iterations
193   $194 x_{i+1} = -\lambda_i C(x_i)^{-1} g(x_i) 195$
196   where $g(x)$ is the gradient and $C(x_i)$ is some measure of
197   curvature of the function to optimize; if $C(x)$ is the Hessian
198   matrix, you get Newton--Raphson. Again, you can code your own
199   functions for $g(\cdot)$ and $C(\cdot)$, but if you don't then
200   numerical approximations to the gradient and the Hessian will be
201   used, respectively. Other popular optimization methods (such as BHHH
202   and the scoring algorithm) can be implemented by supplying to
203   \cmd{NRmax()} the appropriate curvature matrix $C(\cdot)$. This
204   method is very efficient when it works, but is rather fragile: for
205   example, if $C(x_i)$ happens to be non-negative definite at some
206   iteration convergence may become problematic.
207 \item Derivative-free methods: the only method that hansl offers
208   presently is simulated annealing, via the \cmd{simann()} function,
209   but an implementiation of the Nelder--Mead algorithm (also known as
210   the amoeba'' method) should be just a matter of time. These
211   methods work even when the function to maximize has some form of
212   disconinuity or is not everywhere differentiable; however, they may
213   be very slow and CPU-intensive.
214 \end{itemize}
215
216 \section{Numerical differentiation}
217 \label{sec:hp-numdiff}
218
219 For numerical differentiation we have \texttt{fdjac}. For example:
220
221 \begin{code}
222 set echo off
223 set messages off
224
225 function scalar beta(scalar x, scalar a, scalar b)
226     return x^(a-1) * (1-x)^(b-1)
227 end function
228
229 function scalar ad_beta(scalar x, scalar a, scalar b)
230     scalar g = beta(x, a-1, b-1)
231     f1 = (a-1) * (1-x)
232     f2 = (b-1) * x
233     return (f1 - f2) * g
234 end function
235
236 function scalar nd_beta(scalar x, scalar a, scalar b)
237     matrix mx = {x}
238     return fdjac(mx, beta(mx, a, b))
239 end function
240
241 a = 3.5
242 b = 2.5
243
244 loop for (x=0; x<=1; x+=0.1)
245     printf "x = %3.1f; beta(x) = %7.5f, ", x, beta(x, a, b)
246     A = ad_beta(x, a, b)
247     N = nd_beta(x, a, b)
248     printf "analytical der. = %8.5f, numerical der. = %8.5f\n", A, N
249 endloop
250 \end{code}
251
252 returns
253
254 \begin{code}
255 x = 0.0; beta(x) = 0.00000, analytical der. =  0.00000, numerical der. =  0.00000
256 x = 0.1; beta(x) = 0.00270, analytical der. =  0.06300, numerical der. =  0.06300
257 x = 0.2; beta(x) = 0.01280, analytical der. =  0.13600, numerical der. =  0.13600
258 x = 0.3; beta(x) = 0.02887, analytical der. =  0.17872, numerical der. =  0.17872
259 x = 0.4; beta(x) = 0.04703, analytical der. =  0.17636, numerical der. =  0.17636
260 x = 0.5; beta(x) = 0.06250, analytical der. =  0.12500, numerical der. =  0.12500
261 x = 0.6; beta(x) = 0.07055, analytical der. =  0.02939, numerical der. =  0.02939
262 x = 0.7; beta(x) = 0.06736, analytical der. = -0.09623, numerical der. = -0.09623
263 x = 0.8; beta(x) = 0.05120, analytical der. = -0.22400, numerical der. = -0.22400
264 x = 0.9; beta(x) = 0.02430, analytical der. = -0.29700, numerical der. = -0.29700
265 x = 1.0; beta(x) = 0.00000, analytical der. = -0.00000, numerical der. =       NA
266 \end{code}
267
268 Details on the algorithm used can be found in the \GCR. Suffice it to
269 say here that you have a \texttt{fdjac\_quality} setting that goes
270 from 0 to 2. The default value is to 0, which gives you
271 forward-difference approximation: this is the fastest algorithm, but
272 sometimes may not be precise enough. The value of 1 gives you
273 bilateral difference, while 2 uses Richardson extrapolation. As you go
274 up, you gain in accuracy, but the method becomes considerably more
275 CPU-intensive.
276
277 % \section{Random number generation}
278
279 % \begin{itemize}
280 % \item Mersenne Twister in its various incarnations
281 % \item Ziggurat vs Box--Muller
282 % \item Other distributions
283 % \end{itemize}
284
285 %%% Local Variables:
286 %%% mode: latex
287 %%% TeX-master: "hansl-primer"
288 %%% End:
289