\chapter{Series and lists} \label{chap:series-etc} Scalars, matrices and strings can be used in a hansl script at any point; series and lists, on the other hand, are inherently tied to a dataset and therefore can be used only when a dataset is currently open. \section{The \texttt{series} type} \label{sec:series} Series are just what any applied economist would call ``variables'', that is, repeated observations of a given quantity; a dataset is an ordered array of series, complemented by additional information, such as the nature of the data (time-series, cross-section or panel), descriptive labels for the series and/or the observations, source information and so on. Series are the basic data type on which gretl's built-in estimation commands depend. The series belonging to a dataset are named via standard hansl identifiers (strings of maximum length 31 characters as described above). In the context of commands that take series as arguments, series may be referenced either by name or by \emph{ID number}, that is, the index of the series within the dataset. Position 0 in a dataset is always taken by the automatic ``variable'' known as \texttt{const}, which is just a column of 1s. The IDs of the actual data series can be displayed via the \cmd{varlist} command. (But note that in \textit{function calls}, as opposed to commands, series must be referred to by name.) A detailed description of how a dataset works can be found in chapter 4 of \GUG. Some basic rules regarding series follow: \begin{itemize} \item If \texttt{lngdp} belongs to a time series or panel dataset, then the syntax \texttt{lngdp(-1)} yields its first lag, and \texttt{lngdp(+1)} its first lead. \item To access individual elements of a series, you use square brackets enclosing \begin{itemize} \item the progressive (1-based) number of the observation you want, as in \verb|lngdp[15]|, or \item the corresponding date code in the case of time-series data, as in \verb|lngdp[2008:4]| (for the 4th quarter of 2008), or \item the corresponding observation marker string, if the dataset contains any, as in \verb|GDP["USA"]|. \end{itemize} \end{itemize} The rules for assigning values to series are just the same as for other objects, so the following examples should be self-explanatory: \begin{code} series k = 3 # implicit conversion from scalar; a constant series series x = normal() # pseudo-rv via a built-in function series s = a/b # element-by-element operation on existing series series movavg = 0.5*(x + x(-1)) # using lags series y[2012:4] = x[2011:2] # using individual data points series x2000 = 100*x/x[2000:1] # constructing an index \end{code} \tip{In hansl, you don't have separate commands for \emph{creating} series and \emph{modifying} them. Other popular packages make this distinction, but we still struggle to understand why this is supposed to be useful.} \subsection{Converting series to or from matrices} The reason why hansl provides a specific series type, distinct from the matrix type, is historical. However, is also a very convenient feature. Operations that are typically performed on series in applied work can be awkward to implement using ``raw'' matrices---for example, the computation of leads and lags, or regular and seasonal differences; the treatment of missing values; the addition of descriptive labels, and so on. Anyway, it is straightforward to convert data in either direction between the series and matrix types. \begin{itemize} \item To turn series into matrices, you use the curly braces syntax, as in \begin{code} matrix MACRO = {outputgap, unemp, infl} \end{code} where you can also use lists; the number of rows of the resulting matrix will depend on your currently selected sample. \item To turn matrices into series, you can just use matrix columns, as in \begin{code} series y = my_matrix[,4] \end{code} But note that this will work only if the number of rows in \texttt{my\_matrix} matches the length of the dataset (or the currently selected sample range). \end{itemize} Also note that the \cmd{lincomb} and \cmd{filter} functions are quite useful for creating and manipulating series in complex ways without having to convert the data to matrix form (which could be computationally costly with large datasets). \subsection{The ternary operator with series} Consider this assignment: \begin{code} worker_income = employed ? income : 0 \end{code} Here we assume that \texttt{employed} is a dummy series coding for employee status. Its value will be tested for each observation in the current sample range and the value assigned to \texttt{worker\_income} at that observation will be determined accordingly. It is therefore equivalent to the following much more verbose formulation (where \dollar{t1} and \dollar{t2} are accessors for the start and end of the sample range): \begin{code} series worker_income loop i=$t1..$t2 if employed[i] worker_income[i] = income[i] else worker_income[i] = 0 endif endloop \end{code} \section{The \texttt{list} type} \label{sec:lists} In hansl parlance, a \textit{list} is an array of integers, representing the ID numbers of a set (in a loose sense of the word) of series. For this reason, the most common operations you perform on lists are set operations such as addition or deletion of members, union, intersection and so on. Unlike sets, however, hansl lists are ordered, so individual list members can be accessed via the \texttt{[]} syntax, as in \texttt{X[3]}. There are several ways to assign values to a list. The most basic sort of expression that works in this context is a space-separated list of series, given either by name or by ID number. For example, \begin{code} list xlist = 1 2 3 4 list reglist = income price \end{code} An empty list is obtained by using the keyword \texttt{null}, as in \begin{code} list W = null \end{code} or simply by bare declaration. Some more special forms (for example, using wildcards) are described in \GUG. The main idea is to use lists to group, under one identifier, one or more series that logically belong together somehow (for example, as explanatory variables in a model). So, for example, \begin{code} list xlist = x1 x2 x3 x4 ols y 0 xlist \end{code} is an idiomatic way of specifying the OLS regression that could also be written as \begin{code} ols y 0 x1 x2 x3 x4 \end{code} Note that we used here the convention, mentioned in section \ref{sec:series}, by which a series can be identified by its ID number when used as an argument to a command, typing \texttt{0} instead of \texttt{const}. Lists can be concatenated, as in as in \texttt{list L3 = L1 L2} (where \texttt{L1} and \texttt{L2} are names of existing lists). This will not necessarily do what you want, however, since the resulting list may contain duplicates. It's more common to use the following set operations: \begin{center} \begin{tabular}{rl} \textbf{Operator} & \textbf{Meaning} \\ \hline \verb,||, & Union \\ \verb|&&| & Intersection \\ \verb|-| & Set difference \\ \hline \end{tabular} \end{center} So for example, if \texttt{L1} and \texttt{L2} are existing lists, after running the following code snippet \begin{code} list UL = L1 || L2 list IL = L1 && L2 list DL = L1 - L2 \end{code} the list \texttt{UL} will contain all the members of \texttt{L1}, plus any members of \texttt{L2} that are not already in \texttt{L1}; \texttt{IL} will contain all the elements that are present in both \texttt{L1} and \texttt{L2} and \texttt{DL} will contain all the elements of \texttt{L1} that are not present in \texttt{L2}. To \textit{append} or \textit{prepend} variables to an existing list, we can make use of the fact that a named list stands in for a ``longhand'' list. For example, assuming that a list \texttt{xlist} is already defined (possibly as \texttt{null}), we can do \begin{code} list xlist = xlist 5 6 7 xlist = 9 10 xlist 11 12 \end{code} Another option for appending terms to, or dropping terms from, an existing list is to use \texttt{+=} or \texttt{-=}, respectively, as in \begin{code} xlist += cpi zlist -= cpi \end{code} A nice example of the above is provided by a common idiom: you may see in hansl scripts something like \begin{code} list C -= const list C = const C \end{code} which ensures that the series \texttt{const} is included (exactly once) in the list \texttt{C}, and comes first. \subsection{Converting lists to or from matrices} The idea of converting from a list, as defined above, to a matrix may be taken in either of two ways. You may want to turn a list into a matrix (vector) by filling the latter with the ID numbers contained in the former, or rather to create a matrix whose columns contain the series to which the ID numbers refer. Both interpretations are legitimate (and potentially useful in different contexts) so hansl lets you go either way. If you assign a list to a matrix, as in \begin{code} list L = moo foo boo zoo matrix A = L \end{code} the matrix \texttt{A} will contain the ID numbers of the four series as a row vector. This operation goes both ways, so the statement \begin{code} list C = seq(7,10) \end{code} is perfectly valid (provided, of course, that you have at least 10 series in the currently open dataset). If instead you want to create a data matrix from the series which belong to a given list, you have to enclose the list name in curly brackets, as in \begin{code} matrix X = {L} \end{code} \subsection{The \texttt{foreach} loop variant with lists} Lists can be used as the ``catalogue'' in the \texttt{foreach} variant of the \cmd{loop} construct (see section \ref{sec:loop-foreach}). This is especially handy when you have to perform some operation on multiple series. For example, the following syntax can be used to calculate and print the mean of each of several series: \begin{code} list X = age income experience loop foreach i X printf "mean($i) = %g\n", mean($i) endloop \end{code} %%% Local Variables: %%% mode: latex %%% TeX-master: "hansl-primer" %%% End: