"Fossies" - the Fresh Open Source Software Archive

Member "lapack-3.9.1/DOCS/lawn81.tex" (25 Mar 2021, 71100 Bytes) of package /linux/misc/lapack-3.9.1.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \documentclass[11pt]{report}
    2 
    3 \usepackage{indentfirst}
    4 \usepackage[body={6in,8.5in}]{geometry}
    5 \usepackage{hyperref}
    6 \usepackage{graphicx}
    7 \DeclareGraphicsRule{.ps}{eps}{}{}
    8 
    9 \renewcommand{\thesection}{\arabic{section}}
   10 \setcounter{tocdepth}{3}
   11 \setcounter{secnumdepth}{3}
   12 
   13 \begin{document}
   14 \begin{center}
   15   {\Large LAPACK Working Note 81\\
   16   Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
   17  supported by NSF Grant No. ASC-8715728  and NSF Grant No. 0444486}}
   18 \end{center}
   19 \begin{center}
   20 %  Edward Anderson\footnote{Current address:  Cray Research Inc.,
   21 %                           655F Lone Oak Drive, Eagan, MN  55121},
   22   The LAPACK Authors\\
   23   Department of Computer Science \\
   24   University of Tennessee \\
   25   Knoxville, Tennessee  37996-1301 \\
   26 \end{center}
   27 \begin{center}
   28   REVISED:  VERSION 3.1.1, February 2007 \\
   29   REVISED:  VERSION 3.2.0, November 2008
   30 \end{center}
   31 
   32 \begin{center}
   33 Abstract
   34 \end{center}
   35 This working note describes how to install, and test version 3.2.0
   36 of LAPACK, a linear algebra package for high-performance
   37 computers, on a Unix System.  The timing routines are not actually included in
   38 release 3.2.0, and that part of the LAWN refers to release 3.0.  Also,
   39 version 3.2.0 contains many prototype routines needing user feedback.
   40 Non-Unix installation instructions and
   41 further details of the testing and timing suites are only contained in
   42 LAPACK Working Note 41, and not in this abbreviated version.
   43 %Separate instructions are provided for the Unix and non-Unix
   44 %versions of the test package.
   45 %Further details are also given on the design of the test and timing
   46 %programs.
   47 \newpage
   48 
   49 \tableofcontents
   50 
   51 \newpage
   52 % Introduction to Implementation Guide
   53 
   54 \section{Introduction}
   55 
   56 LAPACK is a linear algebra library for high-performance
   57 computers.
   58 The library includes Fortran subroutines for
   59 the analysis and solution of systems of simultaneous linear algebraic
   60 equations, linear least-squares problems, and matrix eigenvalue
   61 problems.
   62 Our approach to achieving high efficiency is based on the use of
   63 a standard set of Basic Linear Algebra Subprograms (the BLAS),
   64 which can be optimized for each computing environment.
   65 By confining most of the computational work to the BLAS,
   66 the subroutines should be
   67 transportable and efficient across a wide range of computers.
   68 
   69 This working note describes how to install, test, and time this
   70 release of LAPACK on a Unix System.
   71 
   72 The instructions for installing, testing, and timing
   73 \footnote{timing are only provided in LAPACK 3.0 and before}
   74 are designed for a person whose
   75 responsibility is the maintenance of a mathematical software library.
   76 We assume the installer has experience in compiling and running
   77 Fortran programs and in creating object libraries.
   78 The installation process involves untarring the file, creating a set of
   79 libraries, and compiling and running the test and timing programs
   80 \footnotemark[\value{footnote}].
   81 
   82 %This guide combines the instructions for the Unix and non-Unix
   83 %versions of the LAPACK test package (the non-Unix version is in Appendix
   84 %~\ref{appendixe}).
   85 %At this time, the non-Unix version of LAPACK can only be obtained
   86 %after first untarring the Unix tar tape and then following the instructions in
   87 %Appendix ~\ref{appendixe}.
   88 
   89 Section~\ref{fileformat} describes how the files are organized in the
   90 file, and
   91 Section~\ref{overview} gives a general overview of the parts of the test package.
   92 Step-by-step instructions appear in Section~\ref{installation}.
   93 %for the Unix version and in the appendix for the non-Unix version.
   94 
   95 For users desiring additional information, please refer to LAPACK
   96 Working Note 41.
   97 % Sections~\ref{moretesting}
   98 %and ~\ref{moretiming} give
   99 %details of the test and timing programs and their input files.
  100 %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe
  101 %the LAPACK routines and auxiliary routines provided
  102 %in this release.
  103 %Appendix ~\ref{appendixc} lists the operation counts we have computed
  104 %for the BLAS and for some of the LAPACK routines.
  105 Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known
  106 problems from our own experiences, with suggestions on how to
  107 overcome them.
  108 
  109 \textbf{It is strongly advised that the user read Appendix
  110 A before proceeding with the installation process.}
  111 %Appendix E contains the execution times of the different test
  112 %and timing runs on two sample machines.
  113 %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
  114 %system.
  115 
  116 \section{Revisions Since the First Public Release}
  117 
  118 Since its first public release in February, 1992, LAPACK has had
  119 several updates, which have encompassed the introduction of new routines
  120 as well as extending the functionality of existing routines.  The first
  121 update,
  122 June 30, 1992, was version 1.0a; the second update, October 31, 1992,
  123 was version 1.0b; the third update, March 31, 1993, was version 1.1;
  124 version 2.0 on September 30, 1994, coincided with the release of the
  125 Second Edition of the LAPACK Users' Guide;
  126 version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
  127 the LAPACK Users' Guide;
  128 version 3.1 was released on November, 2006;
  129 version 3.1.1 was released on November, 2007;
  130 and version 3.2.0 was released on November, 2008.
  131 
  132 All LAPACK routines reflect the current version number with the date
  133 on the routine indicating when it was last modified.
  134 For more information on revisions in the latest release, please refer
  135 to the \texttt{revisions.info} file in the lapack directory on netlib.
  136 \begin{quote}
  137 \url{http://www.netlib.org/lapack/revisions.info}
  138 \end{quote}
  139 
  140 %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
  141 %available on netlib is always the most up-to-date.
  142 %
  143 %On-line manpages (troff files) for LAPACK driver and computational
  144 %routines, as well as most of the BLAS routines, are available via
  145 %the \texttt{lapack} index on netlib.
  146 
  147 \section{File Format}\label{fileformat}
  148 
  149 The software for LAPACK is distributed in the form of a
  150 gzipped tar file (via anonymous ftp or the World Wide Web),
  151 which contains the Fortran source for LAPACK,
  152 the Basic Linear Algebra Subprograms
  153 (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
  154 and the timing programs\footnotemark[\value{footnote}].
  155 Users who wish to have a non-Unix installation should refer to LAPACK
  156 Working Note 41,
  157 although the overview in section~\ref{overview} applies to both the Unix and non-Unix
  158 versions.
  159 %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
  160 %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
  161 %versions.
  162 
  163 The package may be accessed via the World Wide Web through
  164 the URL address:
  165 \begin{quote}
  166 \url{http://www.netlib.org/lapack/lapack.tgz}
  167 \end{quote}
  168 
  169 Or, you can retrieve the file via anonymous ftp at netlib:
  170 
  171 \begin{verbatim}
  172      ftp ftp.netlib.org
  173      login:  anonymous
  174      password:  <your email address>
  175      cd lapack
  176      binary
  177      get lapack.tgz
  178      quit
  179 \end{verbatim}
  180 
  181 The software in the \texttt{tar} file
  182 is organized in a number of essential directories as shown
  183 in Figure 1.  Please note that this figure does not reflect everything
  184 that is contained in the \texttt{LAPACK} directory.  Input and instructional
  185 files are also located at various levels.
  186 \begin{figure}
  187 \vspace{11pt}
  188 \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
  189 \caption{Unix organization of LAPACK 3.0}
  190 \vspace{11pt}
  191 \end{figure}
  192 Libraries are created in the LAPACK directory and
  193 executable files are created in one of the directories BLAS, TESTING,
  194 or TIMING\footnotemark[\value{footnote}].  Input files for the test and
  195 timing\footnotemark[\value{footnote}]  programs are also
  196 found in these three directories so that testing may be carried out
  197 in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
  198 A top-level makefile in the LAPACK directory is provided to perform the
  199 entire installation procedure.
  200 
  201 \section{Overview of Tape Contents}\label{overview}
  202 
  203 Most routines in LAPACK occur in four versions: REAL,
  204 DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
  205 The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
  206 are written in standard Fortran and are completely portable;
  207 the COMPLEX*16 version is provided for
  208 those compilers which allow this data type.
  209 Some routines use features of Fortran 90.
  210 For convenience, we often refer to routines by their single precision
  211 names; the leading `S' can be replaced by a `D' for double precision,
  212 a `C' for complex, or a `Z' for complex*16.
  213 For LAPACK use and testing you must decide which version(s)
  214 of the package you intend to install at your site (for example,
  215 REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
  216 COMPLEX*16 on an IBM computer).
  217 
  218 \subsection{LAPACK Routines}
  219 
  220 There are three classes of LAPACK routines:
  221 \begin{itemize}
  222 
  223 \item \textbf{driver} routines solve a complete problem, such as solving
  224 a system of linear equations or computing the eigenvalues of a real
  225 symmetric matrix.  Users are encouraged to use a driver routine if there
  226 is one that meets their requirements.  The driver routines are listed
  227 in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
  228 %in Appendix ~\ref{appendixa}.
  229 
  230 \item \textbf{computational} routines, also called simply LAPACK routines,
  231 perform a distinct computational task, such as computing
  232 the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
  233 eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
  234 the $QR$ algorithm.
  235 The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
  236 and the LAPACK Users' Guide~\cite{LUG}.
  237 %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
  238 %Working Note \#5 \cite{WN5}.
  239 
  240 \item \textbf{auxiliary} routines are all the other subroutines called
  241 by the driver routines and computational routines.
  242 %Among them are subroutines to perform subtasks of block algorithms,
  243 %in particular, the unblocked versions of the block algorithms;
  244 %extensions to the BLAS, such as matrix-vector operations involving
  245 %complex symmetric matrices;
  246 %the special routines LSAME and XERBLA which first appeared with the
  247 %BLAS;
  248 %and a number of routines to perform common low-level computations,
  249 %such as computing a matrix norm, generating an elementary Householder
  250 %transformation, and applying a sequence of plane rotations.
  251 %Many of the auxiliary routines may be of use to numerical analysts
  252 %or software developers, so we have documented the Fortran source for
  253 %these routines with the same level of detail used for the LAPACK
  254 %routines and driver routines.
  255 The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
  256 and the LAPACK Users' Guide~\cite{LUG}.
  257 %The auxiliary routines are listed in Appendix ~\ref{appendixb}.
  258 \end{itemize}
  259 
  260 \subsection{Level 1, 2, and 3 BLAS}
  261 
  262 The BLAS are a set of Basic Linear Algebra Subprograms that perform
  263 vector-vector, matrix-vector, and matrix-matrix operations.
  264 LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
  265 of the parallelism in the LAPACK routines is contained in the BLAS.
  266 Therefore,
  267 the key to getting good performance from LAPACK lies in having an
  268 efficient version of the BLAS optimized for your particular machine.
  269 Optimized BLAS libraries are available on a variety of architectures,
  270 refer to the BLAS FAQ on netlib for further information.
  271 \begin{quote}
  272 \url{http://www.netlib.org/blas/faq.html}
  273 \end{quote}
  274 There are also freely available BLAS generators that automatically
  275 tune a subset of the BLAS for a given architecture.  E.g.,
  276 \begin{quote}
  277 \url{http://www.netlib.org/atlas/}
  278 \end{quote}
  279 And, if all else fails, there is the Fortran~77 reference implementation
  280 of the Level 1, 2, and 3 BLAS available on netlib (also included in
  281 the LAPACK distribution tar file).
  282 \begin{quote}
  283 \url{http://www.netlib.org/blas/blas.tgz}
  284 \end{quote}
  285 No matter which BLAS library is used, the BLAS test programs should
  286 always be run.
  287 
  288 Users should not expect too much from the Fortran~77 reference implementation
  289 BLAS; these versions were written to define the basic operations and do not
  290 employ the standard tricks for optimizing Fortran code.
  291 
  292 The formal definitions of the Level 1, 2, and 3 BLAS
  293 are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}.
  294 The BLAS Quick Reference card is available on netlib.
  295 
  296 \subsection{Mixed- and Extended-Precision BLAS: XBLAS}
  297 
  298 The XBLAS extend the BLAS to work with mixed input and output
  299 precisions as well as using extra precision internally.  The XBLAS are
  300 used in the prototype extra-precise iterative refinement codes.
  301 
  302 The current release of the XBLAS is available through
  303 Netlib\footnote{Development versions may be available through
  304   \url{http://www.cs.berkeley.edu/~yozo/} or
  305   \url{http://www.nersc.gov/~xiaoye/XBLAS/}.}  at
  306 \begin{quote}
  307   \url{http://www.netlib.org/xblas}
  308 \end{quote}
  309 Their formal definition is in \cite{XBLAS}.
  310 
  311 \subsection{LAPACK Test Routines}
  312 
  313 This release contains two distinct test programs for LAPACK routines
  314 in each data type.  One test program tests the routines for solving
  315 linear equations and linear least squares problems,
  316 and the other tests routines for the matrix eigenvalue problem.
  317 The routines for generating test matrices are used by both test
  318 programs and are compiled into a library for use by both test programs.
  319 
  320 \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
  321 
  322 This release also contains two distinct timing programs for the
  323 LAPACK routines in each data type.
  324 The linear equation timing program gathers performance data in
  325 megaflops on the factor, solve, and inverse routines for solving
  326 linear systems, the routines to generate or apply an orthogonal matrix
  327 given as a sequence of elementary transformations, and the reductions
  328 to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
  329 computations.
  330 The operation counts used in computing the megaflop rates are computed
  331 from a formula;
  332 see LAPACK Working Note 41~\cite{WN41}.
  333 % see Appendix ~\ref{appendixc}.
  334 The eigenvalue timing program is used with the eigensystem routines
  335 and returns the execution time, number of floating point operations, and
  336 megaflop rate for each of the requested subroutines.
  337 In this program, the number of operations is computed while the
  338 code is executing using special instrumented versions of the LAPACK
  339 subroutines.
  340 
  341 \section{Installing LAPACK on a Unix System}\label{installation}
  342 
  343 Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
  344 involves the following steps:
  345 \begin{enumerate}
  346 \item Gunzip and tar the file.
  347 
  348 \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
  349 
  350 \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
  351 
  352 %\item Test and Install the Machine-Dependent Routines \\
  353 %\emph{(WARNING:  You may need to supply a correct version of second.f and
  354 %dsecnd.f for your machine)}
  355 %{\tt
  356 %\begin{list}{}{}
  357 %\item cd LAPACK
  358 %\item make install
  359 %\end{list} }
  360 %
  361 %\item Create the BLAS Library, \emph{if necessary} \\
  362 %\emph{(NOTE:  For best performance, it is recommended you use the manufacturers' BLAS)}
  363 %{\tt
  364 %\begin{list}{}{}
  365 %\item \texttt{cd LAPACK}
  366 %\item \texttt{make blaslib}
  367 %\end{list} }
  368 %
  369 %\item Run the Level 1, 2, and 3 BLAS Test Programs
  370 %\begin{list}{}{}
  371 %\item \texttt{cd LAPACK}
  372 %\item \texttt{make blas\_testing}
  373 %\end{list}
  374 %
  375 %\item Create the LAPACK Library
  376 %\begin{list}{}{}
  377 %\item \texttt{cd LAPACK}
  378 %\item \texttt{make lapacklib}
  379 %\end{list}
  380 %
  381 %\item Create the Library of Test Matrix Generators
  382 %\begin{list}{}{}
  383 %\item \texttt{cd LAPACK}
  384 %\item \texttt{make tmglib}
  385 %\end{list}
  386 %
  387 %\item Run the LAPACK Test Programs
  388 %\begin{list}{}{}
  389 %\item \texttt{cd LAPACK}
  390 %\item \texttt{make testing}
  391 %\end{list}
  392 %
  393 %\item Run the LAPACK Timing Programs
  394 %\begin{list}{}{}
  395 %\item \texttt{cd LAPACK}
  396 %\item \texttt{make timing}
  397 %\end{list}
  398 %
  399 %\item Run the BLAS Timing Programs
  400 %\begin{list}{}{}
  401 %\item \texttt{cd LAPACK}
  402 %\item \texttt{make blas\_timing}
  403 %\end{list}
  404 \end{enumerate}
  405 
  406 \subsection{Untar the File}
  407 
  408 If you received a tar file of LAPACK via the World Wide
  409 Web or anonymous ftp, enter the following command:
  410 
  411 \begin{list}{}
  412 \item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
  413 \end{list}
  414 
  415 \noindent
  416 This will create a top-level directory called \texttt{LAPACK}, which
  417 requires approximately 34 Mbytes of disk space.
  418 The total space requirements including the object files and executables
  419 is approximately 100 Mbytes for all four data types.
  420 
  421 \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
  422 
  423 Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
  424 run, you must define all machine-specific parameters for the
  425 architecture to which you are installing LAPACK.  All machine-specific
  426 parameters are contained in the file \texttt{LAPACK/make.inc}.
  427 An example of  \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given
  428 in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
  429 
  430 \begin{list}{}
  431 \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
  432 \end{list}
  433 
  434 \noindent
  435 Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
  436 The first line of this \texttt{make.inc} file is:
  437 \begin{quote}
  438 SHELL = /bin/sh
  439 \end{quote}
  440 and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
  441 installing LAPACK on an SGI architecture.
  442 Next, you will need to modify \texttt{FC}, \texttt{FFLAGS},
  443 \texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify
  444 the compiler, compiler options, compiler options for the testing and
  445 timing\footnotemark[\value{footnote}] main programs, and linker options.
  446 Next you will have to choose which function you will use to time in the
  447 \texttt{SECOND} and \texttt{DSECND} routines.
  448 \begin{verbatim}
  449 #  Default:  SECOND and DSECND will use a call to the
  450 #  EXTERNAL FUNCTION ETIME
  451 #TIMER = EXT_ETIME
  452 #  For RS6K:  SECOND and DSECND will use a call to the
  453 #  EXTERNAL FUNCTION ETIME_
  454 #TIMER = EXT_ETIME_
  455 #  For gfortran compiler:  SECOND and DSECND will use a call to the
  456 #  INTERNAL FUNCTION ETIME
  457 TIMER = INT_ETIME
  458 #  If your Fortran compiler does not provide etime (like Nag Fortran
  459 #  Compiler, etc...) SECOND and DSECND will use a call to the
  460 #  INTERNAL FUNCTION CPU_TIME
  461 #TIMER = INT_CPU_TIME
  462 #  If none of these work, you can use the NONE value.
  463 #  In that case, SECOND and DSECND will always return 0.
  464 #TIMER = NONE
  465 \end{verbatim}
  466 Refer to the section~\ref{second} to get more information.
  467 
  468 
  469 Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver,
  470 archiver options, and ranlib for your machine.  If your architecture
  471 does not require \texttt{ranlib} to be run after each archive command (as
  472 is the case with CRAY computers running UNICOS, Hewlett Packard
  473 computers running HP-UX, or SUN SPARCstations running Solaris), set
  474 \texttt{RANLIB = echo}.  And finally, you must
  475 modify the \texttt{BLASLIB} definition to specify the BLAS library to which
  476 you will be linking.  If an optimized version of the BLAS is available
  477 on your machine, you are highly recommended to link to that library.
  478 Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
  479 
  480 If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
  481 to some value, for example \texttt{USEXBLAS = Yes}.  Then set the
  482 variable \texttt{XBLASLIB} to point at the XBLAS library.  Note that
  483 the prototype iterative refinement routines and their testers will not
  484 be built unless \texttt{USEXBLAS} is defined.
  485 
  486 \textbf{NOTE:}  Example \texttt{make.inc} include files are contained in the
  487 \texttt{LAPACK/INSTALL} directory.  Please refer to
  488 Appendix~\ref{appendixd} for machine-specific installation hints, and/or
  489 the \texttt{release\_notes} file on \texttt{netlib}.
  490 \begin{quote}
  491 \url{http://www.netlib.org/lapack/release\_notes}
  492 \end{quote}
  493 
  494 \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
  495 
  496 This \texttt{Makefile} can be modified to perform as much of the
  497 installation process as the user desires.  Ideally, this is the ONLY
  498 makefile the user must modify.  However, modification of lower-level
  499 makefiles may be necessary if a specific routine needs to be compiled
  500 with a different level of optimization.
  501 
  502 First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
  503 \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
  504 to specify the data types desired.  For example,
  505 if you only wish to compile the single precision real version of the
  506 LAPACK library, you would modify the \texttt{lapacklib} definition to be:
  507 
  508 \begin{verbatim}
  509 lapacklib:
  510         $(MAKE) -C SRC single
  511 \end{verbatim}
  512 
  513 Likewise, you could specify \texttt{double, complex, or complex16} to
  514 build the double precision real, single precision complex, or double
  515 precision complex libraries, respectively.  By default, the presence of
  516 no arguments following the \texttt{make} command will result in the
  517 building of all four data types.
  518 The make command can be run more than once to add another
  519 data type to the library if necessary.
  520 
  521 %If you are installing LAPACK on a Silicon Graphics machine, you must
  522 %modify the respective definitions of \texttt{testing} and \texttt{timing} to be
  523 %\begin{verbatim}
  524 %testing:
  525 %        ( cd TESTING; $(MAKE) -f Makefile.sgi )
  526 %\end{verbatim}
  527 %and
  528 %\begin{verbatim}
  529 %timing:
  530 %        ( cd TIMING; $(MAKE) -f Makefile.sgi )
  531 %\end{verbatim}
  532 
  533 Next, if you will be using a locally available BLAS library, you will need
  534 to remove \texttt{blaslib} from the \texttt{lib} definition.  And finally,
  535 if you do not wish to build all of the libraries individually and
  536 likewise run all of the testing and timing separately, you can
  537 modify the \texttt{all} definition to specify the amount of the
  538 installation process that you want performed.  By default,
  539 the \texttt{all} definition is set to
  540 \begin{verbatim}
  541 all: lapack_install lib lapack_testing blas_testing
  542 \end{verbatim}
  543 which will perform all phases of the installation
  544 process -- testing of machine-dependent routines, building the libraries,
  545 BLAS testing and LAPACK testing.
  546 
  547 The entire installation process will then be performed by typing
  548 \texttt{make}.
  549 
  550 Questions and/or comments can be directed to the
  551 authors as described in Section~\ref{sendresults}.  If test failures
  552 occur, please refer to the appropriate subsection in
  553 Section~\ref{furtherdetails}.
  554 
  555 If disk space is limited, we suggest building each data type separately
  556 and/or deleting all object files after building the libraries.  Likewise, all
  557 testing and timing executables can be deleted after the testing and timing
  558 process is completed.  The removal of all object files and executables
  559 can be accomplished by the following:
  560 
  561 \begin{list}{}{}
  562 \item \texttt{cd LAPACK}
  563 \item \texttt{make cleanobj}
  564 \end{list}
  565 
  566 \section{Further Details of the Installation Process}\label{furtherdetails}
  567 
  568 Alternatively, you can choose to run each of the phases of the
  569 installation process separately.  The following sections give details
  570 on how this may be achieved.
  571 
  572 \subsection{Test and Install the Machine-Dependent Routines.}
  573 
  574 There are six machine-dependent functions in the test and timing
  575 package, at least three of which must be installed.  They are
  576 
  577 \begin{tabbing}
  578 MONOMO  \=  DOUBLE PRECYSION  \=  \kill
  579 LSAME   \>  LOGICAL      \> Test if two characters are the same regardless of case \\
  580 SLAMCH  \>  REAL  \> Determine machine-dependent parameters \\
  581 DLAMCH  \>  DOUBLE PRECISION \> Determine machine-dependent parameters \\
  582 SECOND  \>  REAL  \> Return time in seconds from a fixed starting time \\
  583 DSECND  \>  DOUBLE PRECISION  \> Return time in seconds from a fixed starting time\\
  584 ILAENV  \>  INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
  585 \end{tabbing}
  586 
  587 \noindent
  588 If you are working only in single precision, you do not need to install
  589 DLAMCH and DSECND, and if you are working only in double precision,
  590 you do not need to install SLAMCH and SECOND.
  591 
  592 These six subroutines are provided in \texttt{LAPACK/INSTALL},
  593 along with six test programs.
  594 To compile the six test programs and run the tests, go to \texttt{LAPACK} and
  595 type \texttt{make lapack\_install}.  The test programs are called
  596 \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
  597 \texttt{testieee}.
  598 If you do not wish to run all tests, you will need to modify the
  599 \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
  600 tests you wish to run.  Otherwise, all tests will be performed.
  601 The expected results of each test program are described below.
  602 
  603 \subsubsection{Installing LSAME}
  604 
  605 LSAME is a logical function with two character parameters, A and B.
  606 It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
  607 if they are different.
  608 For example, the expression
  609 
  610 \begin{list}{}{}
  611 \item \texttt{LSAME( UPLO, 'U' )}
  612 \end{list}
  613 \noindent
  614 is equivalent to
  615 \begin{list}{}{}
  616 \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
  617 \end{list}
  618 
  619 The test program in \texttt{lsametst.f} tests all combinations of
  620 the same character in upper and lower case for A and B, and two
  621 cases where A and B are different characters.
  622 
  623 Run the test program by typing \texttt{testlsame}.
  624 If LSAME works correctly, the only message you should see after the
  625 execution of \texttt{testlsame} is
  626 \begin{verbatim}
  627  ASCII character set
  628  Tests completed
  629 \end{verbatim}
  630 The file \texttt{lsame.f} is automatically copied to
  631 \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}.
  632 The function LSAME is needed by both the BLAS and LAPACK, so it is safer
  633 to have it in both libraries as long as this does not cause trouble
  634 in the link phase when both libraries are used.
  635 
  636 \subsubsection{Installing SLAMCH and DLAMCH}
  637 
  638 SLAMCH and DLAMCH are real functions with a single character parameter
  639 that indicates the machine parameter to be returned.  The test
  640 program in \texttt{slamchtst.f}
  641 simply prints out the different values computed by SLAMCH,
  642 so you need to know something about what the values should be.
  643 For example, the output of the test program executable \texttt{testslamch}
  644 for SLAMCH on a Sun SPARCstation is
  645 \begin{verbatim}
  646  Epsilon                      =     5.96046E-08
  647  Safe minimum                 =     1.17549E-38
  648  Base                         =     2.00000
  649  Precision                    =     1.19209E-07
  650  Number of digits in mantissa =     24.0000
  651  Rounding mode                =     1.00000
  652  Minimum exponent             =    -125.000
  653  Underflow threshold          =     1.17549E-38
  654  Largest exponent             =     128.000
  655  Overflow threshold           =     3.40282E+38
  656  Reciprocal of safe minimum   =     8.50706E+37
  657 \end{verbatim}
  658 On a Cray machine, the safe minimum underflows its output
  659 representation and the overflow threshold overflows its output
  660 representation, so the safe minimum is printed as 0.00000 and overflow
  661 is printed as R.  This is normal.
  662 If you would prefer to print a representable number, you can modify
  663 the test program to print SFMIN*100. and RMAX/100. for the safe
  664 minimum and overflow thresholds.
  665 
  666 Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
  667 
  668 If both tests were successful, go to Section~\ref{second}.
  669 
  670 If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
  671 your own version of this function.  The following options are used in
  672 LAPACK and must be set:
  673 
  674 \begin{list}{}{}
  675 \item {`B': }  Base of the machine
  676 \item {`E': }  Epsilon (relative machine precision)
  677 \item {`O': }  Overflow threshold
  678 \item {`P': }  Precision = Epsilon*Base
  679 \item {`S': }  Safe minimum (often same as underflow threshold)
  680 \item {`U': }  Underflow threshold
  681 \end{list}
  682 
  683 Some people may be familiar with R1MACH (D1MACH), a primitive
  684 routine for setting machine parameters in which the user must
  685 comment out the appropriate assignment statements for the target
  686 machine.  If a version of R1MACH is on hand, the assignments in
  687 SLAMCH can be made to refer to R1MACH using the correspondence
  688 
  689 \begin{list}{}{}
  690 \item {SLAMCH( `U' )}  $=$ R1MACH( 1 )
  691 \item {SLAMCH( `O' )}  $=$ R1MACH( 2 )
  692 \item {SLAMCH( `E' )}  $=$ R1MACH( 3 )
  693 \item {SLAMCH( `B' )}  $=$ R1MACH( 5 )
  694 \end{list}
  695 
  696 \noindent
  697 The safe minimum returned by SLAMCH( 'S' ) is initially set to the
  698 underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
  699 it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
  700 where $\varepsilon$ is the machine precision.
  701 
  702 BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.
  703 We suggest that installers run it once, save the results, and hard-code
  704 the constants in the version they put in their library.
  705 
  706 \subsubsection{Installing SECOND and DSECND}\label{second}
  707 
  708 Both the timing routines\footnotemark[\value{footnote}]  and the test routines call SECOND
  709 (DSECND), a real function with no arguments that returns the time
  710 in seconds from some fixed starting time.
  711 Our version of this routine
  712 returns only ``user time'', and not ``user time $+$ system time''.
  713 The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls
  714 ETIME, a Fortran library routine available on some computer systems.
  715 If ETIME is not available or a better local timing function exists,
  716 you will have to provide the correct interface to SECOND and DSECND
  717 on your machine.
  718 
  719 Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
  720 The version that will be used depends on the value of the TIMER variable in the make.inc
  721 
  722 \begin{itemize}
  723 \item If ETIME is available as an external function, set the value of the TIMER variable in your
  724 make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
  725 Usually on HPPA architectures,
  726 the compiler and linker flag \texttt{+U77} should be included to access
  727 the function \texttt{ETIME}.
  728 
  729 \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc
  730 to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
  731 It is the case on some IBM architectures such as IBM RS/6000s.
  732 
  733 \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
  734 to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f}  and \texttt{dsecnd\_INT\_ETIME.f} will be used.
  735 This is the case with gfortan.
  736 
  737 \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
  738 to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
  739 
  740 \item If none of these function is available, set the value of the TIMER variable in your make.inc
  741 to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
  742 These routines will always return zero.
  743 \end{itemize}
  744 
  745 The test program in \texttt{secondtst.f}
  746 performs a million operations using 5000 iterations of
  747 the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
  748 The total time and megaflops for this test is reported, then
  749 the operation is repeated including a call to SECOND on each of
  750 the 5000 iterations to determine the overhead due to calling SECOND.
  751 The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
  752 There is no single right answer, but the times
  753 in seconds should be positive and the megaflop ratios should be
  754 appropriate for your machine.
  755 
  756 \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
  757 
  758 %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST
  759 %modify ILAENV!  Otherwise, ILAENV will crash .  By default, ILAENV
  760 %assumes an IEEE machine, and does a test for IEEE-754 compliance.}
  761 
  762 As some new routines in LAPACK rely on IEEE-754 compliance,
  763 two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
  764 (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
  765 infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
  766 machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
  767 are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
  768 as this test inside ILAENV will crash!}
  769 
  770 If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
  771 issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
  772 and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
  773 
  774 Thus, for non-IEEE machines, the user must hard-code the setting of
  775 (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
  776 of \texttt{LAPACK/SRC/ilaenv.f} to be put in
  777 his library.  There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
  778 ILAENV that will also need to be modified.
  779 \begin{itemize}
  780 \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
  781 \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
  782 \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
  783 \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
  784 \end{itemize}
  785 
  786 %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
  787 %is detected (via a call to the function ILAENV), alternative (slower)
  788 %algorithms will be chosen.
  789 %For further details, refer to the leading comments of routines such
  790 %as \texttt{LAPACK/SRC/sstevr.f}.
  791 
  792 The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
  793 architecture
  794 to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.
  795 A warning message to the user is printed if non-compliance is detected.
  796 This same test is performed inside the function ILAENV.  If
  797 \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
  798 issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
  799 and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
  800 
  801 To avoid this IEEE test being run every time you call
  802 \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
  803 that the user hard-code the setting of
  804 \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
  805 his library.  As aforementioned, there are also specialized testing and
  806 timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
  807 
  808 \subsection{Create the BLAS Library}
  809 
  810 Ideally, a highly optimized version of the BLAS library already
  811 exists on your machine.
  812 In this case you can go directly to Section~\ref{testblas} to
  813 make the BLAS test programs.
  814 
  815 \begin{itemize}
  816 \item[a)]
  817 Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
  818 file \texttt{Makefile} to specify the data types desired, as in the example
  819 in Section~\ref{toplevelmakefile}.
  820 
  821 If you already have some of the BLAS, you will need to edit the file
  822 \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines
  823 defining the BLAS you have.
  824 
  825 \item[b)]
  826 Type \texttt{make blaslib}.
  827 The make command can be run more than once to add another
  828 data type to the library if necessary.
  829 \end{itemize}
  830 
  831 \noindent
  832 The BLAS library is created in \texttt{LAPACK/librefblas.a},
  833 or in the user-defined location specified by \texttt{BLASLIB} in the file
  834 \texttt{LAPACK/make.inc}.
  835 
  836 \subsection{Run the BLAS Test Programs}\label{testblas}
  837 
  838 Test programs for the Level 1, 2, and 3 BLAS are in the directory
  839 \texttt{LAPACK/BLAS/TESTING}.
  840 
  841 To compile and run the Level 1, 2, and 3 BLAS test programs,
  842 go to \texttt{LAPACK} and type \texttt{make blas\_testing}.  The executable
  843 files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
  844 \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
  845 depending upon the level of BLAS that it is testing.  All executable and
  846 output files are created in \texttt{LAPACK/BLAS/}.
  847 For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
  848 \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}.  For the Level
  849 2 and 3 BLAS, the name of the output file is indicated on the first line of the
  850 input file and is currently defined to be \texttt{sblat2.out} for
  851 the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
  852 version, with similar names for the other data types.
  853 
  854 If the tests using the supplied data files were completed successfully,
  855 consider whether the tests were sufficiently thorough.
  856 For example, on a machine with vector registers, at least one value
  857 of $N$ greater than the length of the vector registers should be used;
  858 otherwise, important parts of the compiled code may not be
  859 exercised by the tests.
  860 If the tests were not successful, either because the program did not
  861 finish or the test ratios did not pass the threshold, you will
  862 probably have to find and correct the problem before continuing.
  863 If you have been testing a system-specific
  864 BLAS library, try using the Fortran BLAS for the routines that
  865 did not pass the tests.
  866 For more details on the BLAS test programs,
  867 see \cite{BLAS2-test} and \cite{BLAS3-test}.
  868 
  869 \subsection{Create the LAPACK Library}
  870 
  871 \begin{itemize}
  872 \item[a)]
  873 Go to the directory \texttt{LAPACK} and edit the definition of
  874 \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
  875 as in the example in Section~\ref{toplevelmakefile}.
  876 
  877 \item[b)]
  878 Type \texttt{make lapacklib}.
  879 The make command can be run more than once to add another
  880 data type to the library if necessary.
  881 
  882 \end{itemize}
  883 
  884 \noindent
  885 The LAPACK library is created in \texttt{LAPACK/liblapack.a},
  886 or in the user-defined location specified by \texttt{LAPACKLIB} in the file
  887 \texttt{LAPACK/make.inc}.
  888 
  889 \subsection{Create the Test Matrix Generator Library}
  890 
  891 \begin{itemize}
  892 \item[a)]
  893 Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
  894 in the file \texttt{Makefile} to specify the data types desired, as in the
  895 example in Section~\ref{toplevelmakefile}.
  896 
  897 \item[b)]
  898 Type \texttt{make tmglib}.
  899 The make command can be run more than once to add another
  900 data type to the library if necessary.
  901 
  902 \end{itemize}
  903 
  904 \noindent
  905 The test matrix generator library is created in \texttt{LAPACK/libtmglib.a},
  906 or in the user-defined location specified by \texttt{TMGLIB} in the file
  907 \texttt{LAPACK/make.inc}.
  908 
  909 \subsection{Run the LAPACK Test Programs}
  910 
  911 There are two distinct test programs for LAPACK routines
  912 in each data type, one for the linear equation routines and
  913 one for the eigensystem routines.
  914 In each data type, there is one input file for testing the linear
  915 equation routines and eighteen input files for testing the eigenvalue
  916 routines.
  917 The input files reside in \texttt{LAPACK/TESTING}.
  918 For more information on the test programs and how to modify the
  919 input files, please refer to LAPACK Working Note 41~\cite{WN41}.
  920 % see Section~\ref{moretesting}.
  921 
  922 If you do not wish to run each of the tests individually, you can
  923 go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
  924 \texttt{Makefile} to specify the data types desired, and type \texttt{make
  925 lapack\_testing}.  This will
  926 compile and run the tests as described in sections~\ref{testlin}
  927 and ~\ref{testeig}.
  928 
  929 %If you are installing LAPACK on a Silicon Graphics machine, you must
  930 %modify the definition of \texttt{testing} to be
  931 %\begin{verbatim}
  932 %testing:
  933 %        ( cd TESTING; $(MAKE) -f Makefile.sgi )
  934 %\end{verbatim}
  935 
  936 \subsubsection{Testing the Linear Equations Routines}\label{testlin}
  937 
  938 \begin{itemize}
  939 
  940 \item[a)]
  941 Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
  942 desired.  The executable files are called \texttt{xlintsts, xlintstc,
  943 xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
  944 
  945 \item[b)]
  946 Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
  947 For the REAL version, the command is
  948 \begin{list}{}{}
  949 \item{} \texttt{xlintsts  < stest.in > stest.out}
  950 \end{list}
  951 
  952 \noindent
  953 The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
  954 with the leading `s' in the input and output file names replaced
  955 by `d', `c', or `z'.
  956 
  957 \end{itemize}
  958 
  959 If you encountered failures in this phase of the testing process, please
  960 refer to Section~\ref{sendresults}.
  961 
  962 \subsubsection{Testing the Eigensystem Routines}\label{testeig}
  963 
  964 \begin{itemize}
  965 
  966 \item[a)]
  967 Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
  968 desired.  The executable files are called \texttt{xeigtsts,
  969 xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
  970 in \texttt{LAPACK/TESTING}.
  971 
  972 \item[b)]
  973 Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
  974 The tests for the eigensystem routines use eighteen separate input files
  975 for testing the nonsymmetric eigenvalue problem,
  976 the symmetric eigenvalue problem, the banded symmetric eigenvalue
  977 problem, the generalized symmetric eigenvalue
  978 problem, the generalized nonsymmetric eigenvalue problem, the
  979 singular value decomposition, the banded singular value decomposition,
  980 the generalized singular value
  981 decomposition, the generalized QR and RQ factorizations, the generalized
  982 linear regression model, and the constrained linear least squares
  983 problem.
  984 The tests for the REAL version are as follows:
  985 \begin{list}{}{}
  986 \item \texttt{xeigtsts  < nep.in > snep.out}
  987 \item \texttt{xeigtsts  < sep.in > ssep.out}
  988 \item \texttt{xeigtsts  < svd.in > ssvd.out}
  989 \item \texttt{xeigtsts  < sec.in > sec.out}
  990 \item \texttt{xeigtsts  < sed.in > sed.out}
  991 \item \texttt{xeigtsts  < sgg.in > sgg.out}
  992 \item \texttt{xeigtsts  < sgd.in > sgd.out}
  993 \item \texttt{xeigtsts  < ssg.in > ssg.out}
  994 \item \texttt{xeigtsts  < ssb.in > ssb.out}
  995 \item \texttt{xeigtsts  < sbb.in > sbb.out}
  996 \item \texttt{xeigtsts  < sbal.in > sbal.out}
  997 \item \texttt{xeigtsts  < sbak.in > sbak.out}
  998 \item \texttt{xeigtsts  < sgbal.in > sgbal.out}
  999 \item \texttt{xeigtsts  < sgbak.in > sgbak.out}
 1000 \item \texttt{xeigtsts  < glm.in > sglm.out}
 1001 \item \texttt{xeigtsts  < gqr.in > sgqr.out}
 1002 \item \texttt{xeigtsts  < gsv.in > sgsv.out}
 1003 \item \texttt{xeigtsts  < lse.in > slse.out}
 1004 \end{list}
 1005 The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
 1006 use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
 1007 \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
 1008 but the leading `s' in the other input file names must be changed
 1009 to `c', `d', or `z'.
 1010 \end{itemize}
 1011 
 1012 If you encountered failures in this phase of the testing process, please
 1013 refer to Section~\ref{sendresults}.
 1014 
 1015 \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
 1016 
 1017 There are two distinct timing programs for LAPACK routines
 1018 in each data type, one for the linear equation routines and
 1019 one for the eigensystem routines.  The timing program for the
 1020 linear equation routines is also used to time the BLAS.
 1021 We encourage you to conduct these timing experiments
 1022 in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
 1023 not necessary to send timing results in all four data types.
 1024 
 1025 Two sets of input files are provided, a small set and a large set.
 1026 The small data sets are appropriate for a standard workstation or
 1027 other non-vector machine.
 1028 The large data sets are appropriate for supercomputers, vector
 1029 computers, and high-performance workstations.
 1030 We are mainly interested in results from the large data sets, and
 1031 it is not necessary to run both the large and small sets.
 1032 The values of N in the large data sets are about five times larger
 1033 than those in the small data set,
 1034 and the large data sets use additional values for parameters such as the
 1035 block size NB and the leading array dimension LDA.
 1036 Small data sets finished with the \_small in their name , such as
 1037 \texttt{stime\_small.in}, and large data sets finished with \_large in their name,
 1038 such as \texttt{stime\_large.in}.
 1039 Except as noted, the leading `s' in the input file name must be
 1040 replaced by `d', `c', or `z' for the other data types.
 1041 
 1042 We encourage you to obtain timing results with the large data sets,
 1043 as this allows us to compare different machines.
 1044 If this would take too much time, suggestions for paring back the large
 1045 data sets are given in the instructions below.
 1046 We also encourage you to experiment with these timing
 1047 programs and send us any interesting results, such as results for
 1048 larger problems or for a wider range of block sizes.
 1049 The main programs are dimensioned for the large data sets,
 1050 so the parameters in the main program may have to be reduced in order
 1051 to run the small data sets on a small machine, or increased to run
 1052 experiments with larger problems.
 1053 
 1054 The minimum time each subroutine will be timed is set to 0.0 in
 1055 the large data files and to 0.05 in the small data files, and on
 1056 many machines this value should be increased.
 1057 If the timing interval is not long
 1058 enough, the time for the subroutine after subtracting the overhead
 1059 may be very small or zero, resulting in megaflop rates that are
 1060 very large or zero. (To avoid division by zero, the megaflop rate is
 1061 set to zero if the time is less than or equal to zero.)
 1062 The minimum time that should be used depends on the machine and the
 1063 resolution of the clock.
 1064 
 1065 For more information on the timing programs and how to modify the
 1066 input files, please refer to LAPACK Working Note 41~\cite{WN41}.
 1067 % see Section~\ref{moretiming}.
 1068 
 1069 If you do not wish to run each of the timings individually, you can
 1070 go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
 1071 \texttt{Makefile} to specify the data types desired, and type \texttt{make
 1072 lapack\_timing}.  This will compile
 1073 and run the timings for the linear equation routines and the eigensystem
 1074 routines (see Sections~\ref{timelin} and ~\ref{timeeig}).
 1075 
 1076 %If you are installing LAPACK on a Silicon Graphics machine, you must
 1077 %modify the definition of \texttt{timing} to be
 1078 %\begin{verbatim}
 1079 %timing:
 1080 %        ( cd TIMING; $(MAKE) -f Makefile.sgi )
 1081 %\end{verbatim}
 1082 
 1083 If you encounter failures in any phase of the timing process, please
 1084 feel free to contact the authors as directed in Section~\ref{sendresults}.
 1085 Tell us the
 1086 type of machine on which the tests were run, the version of the operating
 1087 system, the compiler and compiler options that were used,
 1088 and details of the BLAS library or libraries that you used.  You should
 1089 also include a copy of the output file in which the failure occurs.
 1090 
 1091 Please note that the BLAS
 1092 timing runs will still need to be run as instructed in ~\ref{timeblas}.
 1093 
 1094 \subsubsection{Timing the Linear Equations Routines}\label{timelin}
 1095 
 1096 The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
 1097 and the input files are in \texttt{LAPACK/TIMING}.
 1098 Three input files are provided in each data type for timing the
 1099 linear equation routines, one for square matrices, one for band
 1100 matrices, and one for rectangular matrices.  The small data sets for the REAL version
 1101 are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
 1102 and the large data sets are
 1103 \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
 1104 
 1105 The timing program for the least squares routines uses special instrumented
 1106 versions of the LAPACK routines to time individual sections of the code.
 1107 The first step in compiling the timing program is therefore to make a library
 1108 of the instrumented routines.
 1109 
 1110 \begin{itemize}
 1111 \item[a)]
 1112 \begin{sloppypar}
 1113 To make a library of the instrumented LAPACK routines, first
 1114 go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
 1115 by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
 1116 The library of instrumented code is created in
 1117 \texttt{LAPACK/TIMING/LIN/linsrc.a}.
 1118 \end{sloppypar}
 1119 
 1120 \item[b)]
 1121 To make the linear equation timing programs,
 1122 go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
 1123 types desired, as in the examples in Section~\ref{toplevelmakefile}.
 1124 The executable files are called \texttt{xlintims},
 1125 \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
 1126 in \texttt{LAPACK/TIMING}.
 1127 
 1128 \item[c)]
 1129 Go to \texttt{LAPACK/TIMING} and
 1130 make any necessary modifications to the input files.
 1131 You may need to set the minimum time a subroutine will
 1132 be timed to a positive value, or to restrict the size of the tests
 1133 if you are using a computer with performance in between that of a
 1134 workstation and that of a supercomputer.
 1135 The computational requirements can be cut in half by using only one
 1136 value of LDA.
 1137 If it is necessary to also reduce the matrix sizes or the values of
 1138 the blocksize, corresponding changes should be made to the
 1139 BLAS input files (see Section~\ref{timeblas}).
 1140 
 1141 \item[d)]
 1142 Run the programs for each data type you are using.
 1143 For the REAL version, the commands for the small data sets are
 1144 
 1145 \begin{list}{}{}
 1146 \item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
 1147 \item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
 1148 \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
 1149 \end{list}
 1150 or the commands for the large data sets are
 1151 \begin{list}{}{}
 1152 \item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
 1153 \item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
 1154 \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
 1155 \end{list}
 1156 
 1157 \noindent
 1158 Similar commands should be used for the other data types.
 1159 \end{itemize}
 1160 
 1161 \subsubsection{Timing the BLAS}\label{timeblas}
 1162 
 1163 The linear equation timing program is also used to time the BLAS.
 1164 Three input files are provided in each data type for timing the Level
 1165 2 and 3 BLAS.
 1166 These input files time the BLAS using the matrix shapes encountered
 1167 in the LAPACK routines, and we will use the results to analyze the
 1168 performance of the LAPACK routines.
 1169 For the REAL version, the small data files are
 1170 \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
 1171 and the large data files are
 1172 \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
 1173 There are three sets of inputs because there are three
 1174 parameters in the Level 3 BLAS, M, N, and K, and
 1175 in most applications one of these parameters is small (on the order
 1176 of the blocksize) while the other two are large (on the order of the
 1177 matrix size).
 1178 In \texttt{sblasa\_small.in}, M and N are large but K is
 1179 small, while in \texttt{sblasb\_small.in} the small parameter is M, and
 1180 in \texttt{sblasc\_small.in} the small parameter is N.
 1181 The Level 2 BLAS are timed only in the first data set, where K
 1182 is also used as the bandwidth for the banded routines.
 1183 
 1184 \begin{itemize}
 1185 
 1186 \item[a)]
 1187 Go to \texttt{LAPACK/TIMING} and
 1188 make any necessary modifications to the input files.
 1189 You may need to set the minimum time a subroutine will
 1190 be timed to a positive value.
 1191 If you modified the values of N or NB
 1192 in Section~\ref{timelin}, set M, N, and K accordingly.
 1193 The large parameters among M, N, and K
 1194 should be the same as the matrix sizes used in timing the linear
 1195 equation routines,
 1196 and the small parameter should be the same as the
 1197 blocksizes used in timing the linear equation routines.
 1198 If necessary, the large data set can be simplified by using only one
 1199 value of LDA.
 1200 
 1201 \item[b)]
 1202 Run the programs for each data type you are using.
 1203 For the REAL version, the commands for the small data sets are
 1204 
 1205 \begin{list}{}{}
 1206 \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
 1207 \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
 1208 \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
 1209 \end{list}
 1210 or the commands for the large data sets are
 1211 \begin{list}{}{}
 1212 \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
 1213 \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
 1214 \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
 1215 \end{list}
 1216 
 1217 \noindent
 1218 Similar commands should be used for the other data types.
 1219 \end{itemize}
 1220 
 1221 \subsubsection{Timing the Eigensystem Routines}\label{timeeig}
 1222 
 1223 The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
 1224 and the input files are in \texttt{LAPACK/TIMING}.
 1225 Four input files are provided in each data type for timing the
 1226 eigensystem routines,
 1227 one for the generalized nonsymmetric eigenvalue problem,
 1228 one for the nonsymmetric eigenvalue problem,
 1229 one for the symmetric and generalized symmetric eigenvalue problem,
 1230 and one for the singular value decomposition.
 1231 For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
 1232 \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
 1233 and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
 1234 \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
 1235 Each of the four input files reads a different set of parameters,
 1236 and the format of the input is indicated by a 3-character code
 1237 on the first line.
 1238 
 1239 The timing program for eigenvalue/singular value routines accumulates
 1240 the operation count as the routines are executing using special
 1241 instrumented versions of the LAPACK routines.  The first step in
 1242 compiling the timing program is therefore to make a library of the
 1243 instrumented routines.
 1244 
 1245 \begin{itemize}
 1246 \item[a)]
 1247 \begin{sloppypar}
 1248 To make a library of the instrumented LAPACK routines, first
 1249 go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
 1250 by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
 1251 The library of instrumented code is created in
 1252 \texttt{LAPACK/TIMING/EIG/eigsrc.a}.
 1253 \end{sloppypar}
 1254 
 1255 \item[b)]
 1256 To make the eigensystem timing programs,
 1257 go to \texttt{LAPACK/TIMING/EIG} and
 1258 type \texttt{make} followed by the data types desired, as in the examples
 1259 of Section~\ref{toplevelmakefile}.  The executable files are called
 1260 \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
 1261 and are created in \texttt{LAPACK/TIMING}.
 1262 
 1263 \item[c)]
 1264 Go to \texttt{LAPACK/TIMING} and
 1265 make any necessary modifications to the input files.
 1266 You may need to set the minimum time a subroutine will
 1267 be timed to a positive value, or to restrict the number of tests
 1268 if you are using a computer with performance in between that of a
 1269 workstation and that of a supercomputer.
 1270 Instead of decreasing the matrix dimensions to reduce the time,
 1271 it would be better to reduce the number of matrix types to be timed,
 1272 since the performance varies more with the matrix size than with the
 1273 type.  For example, for the nonsymmetric eigenvalue routines,
 1274 you could use only one matrix of type 4 instead of four matrices of
 1275 types 1, 3, 4, and 6.
 1276 Refer to LAPACK Working Note 41~\cite{WN41} for further details.
 1277 %  See Section~\ref{moretiming} for further details.
 1278 
 1279 \item[d)]
 1280 Run the programs for each data type you are using.
 1281 For the REAL version, the commands for the small data sets are
 1282 
 1283 \begin{list}{}{}
 1284 \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
 1285 \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
 1286 \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
 1287 \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
 1288 \end{list}
 1289 or the commands for the large data sets are
 1290 \begin{list}{}{}
 1291 \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
 1292 \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
 1293 \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
 1294 \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
 1295 \end{list}
 1296 
 1297 \noindent
 1298 Similar commands should be used for the other data types.
 1299 \end{itemize}
 1300 
 1301 \subsection{Send the Results to Tennessee}\label{sendresults}
 1302 
 1303 Congratulations!  You have now finished installing, testing, and
 1304 timing LAPACK.  If you encountered failures in any phase of the
 1305 testing or timing process, please
 1306 consult our \texttt{release\_notes} file on netlib.
 1307 \begin{quote}
 1308 \url{http://www.netlib.org/lapack/release\_notes}
 1309 \end{quote}
 1310 This file contains machine-dependent installation clues which hopefully will
 1311 alleviate your difficulties or at least let you know that other users
 1312 have had similar difficulties on that machine.  If there is not an entry
 1313 for your machine or the suggestions do not fix your problem, please feel
 1314 free to contact the authors at
 1315 \begin{list}{}{}
 1316 \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
 1317 \end{list}
 1318 Tell us the
 1319 type of machine on which the tests were run, the version of the operating
 1320 system, the compiler and compiler options that were used,
 1321 and details of the BLAS library or libraries that you used.  You should
 1322 also include a copy of the output file in which the failure occurs.
 1323 
 1324 We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
 1325 Therefore, if you do not see an entry for your machine, please contact us
 1326 with your testing results.
 1327 
 1328 Comments and suggestions are also welcome.
 1329 
 1330 We encourage you to make the LAPACK library available to your
 1331 users and provide us with feedback from their experiences.
 1332 %This release of LAPACK is not guaranteed to be compatible
 1333 %with any previous test release.
 1334 
 1335 \subsection{Get support}\label{getsupport}
 1336 First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}.
 1337 if you still cannot solve your problem, you have 2 ways to go:
 1338 \begin{itemize}
 1339 \item
 1340 either send a post in the LAPACK forum
 1341 \begin{quote}
 1342 \url{http://icl.cs.utk.edu/lapack-forum}
 1343 \end{quote}
 1344 \item
 1345 or send an email to the LAPACK mailing list:
 1346 \begin{list}{}{}
 1347 \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
 1348 \end{list}
 1349 \end{itemize}
 1350 \section*{Acknowledgments}
 1351 
 1352 Ed Anderson and Susan Blackford contributed to previous versions of this report.
 1353 
 1354 \appendix
 1355 
 1356 \chapter{Caveats}\label{appendixd}
 1357 
 1358 In this appendix we list a few of the machine-specific difficulties we
 1359 have
 1360 encountered in our own experience with LAPACK.  A more detailed list
 1361 of machine-dependent problems, bugs, and compiler errors encountered
 1362 in the LAPACK installation process is maintained
 1363 on \emph{netlib}.
 1364 \begin{quote}
 1365 \url{http://www.netlib.org/lapack/release\_notes}
 1366 \end{quote}
 1367 
 1368 We assume the user has installed the machine-specific routines
 1369 correctly and that the Level 1, 2 and 3 BLAS test programs have run
 1370 successfully, so we do not list any warnings associated with those
 1371 routines.
 1372 
 1373 \section{\texttt{LAPACK/make.inc}}
 1374 
 1375 All machine-specific
 1376 parameters are specified in the file \texttt{LAPACK/make.inc}.
 1377 
 1378 The first line of this \texttt{make.inc} file is:
 1379 \begin{quote}
 1380 SHELL = /bin/sh
 1381 \end{quote}
 1382 and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
 1383 installing LAPACK on an SGI architecture.
 1384 
 1385 \section{ETIME}
 1386 
 1387 On HPPA architectures,
 1388 the compiler and linker flag \texttt{+U77} should be included to access
 1389 the function \texttt{ETIME}.
 1390 
 1391 \section{ILAENV and IEEE-754 compliance}
 1392 
 1393 %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
 1394 %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
 1395 %and (\texttt{ISPEC=11}) settings in ILAENV.
 1396 %
 1397 %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
 1398 %as this test inside ILAENV will crash!
 1399 
 1400 As some new routines in LAPACK rely on IEEE-754 compliance,
 1401 two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
 1402 (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
 1403 infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
 1404 machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
 1405 are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
 1406 as this test inside ILAENV will crash!}
 1407 
 1408 Thus, for non-IEEE machines, the user must hard-code the setting of
 1409 (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
 1410 of \texttt{LAPACK/SRC/ilaenv.f} to be put in
 1411 his library.  For further details, refer to section~\ref{testieee}.
 1412 
 1413 Be aware
 1414 that some IEEE compilers by default do not enforce IEEE-754 compliance, and
 1415 a compiler flag must be explicitly set by the user.
 1416 
 1417 On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
 1418 flag to enable IEEE-754 compliance.
 1419 
 1420 And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
 1421 result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
 1422 Thus, if the user is installing on a machine that issues IEEE exception
 1423 warning messages (like a Sun SPARCstation), the user can disregard these
 1424 messages.  To avoid these messages, the user can hard-code the values
 1425 inside ILAENV as explained in section~\ref{testieee}.
 1426 
 1427 \section{Lack of \texttt{/tmp} space}
 1428 
 1429 If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
 1430 architecture, you may run out of space
 1431 when compiling.  There are a few possible solutions to this problem.
 1432 \begin{enumerate}
 1433 \item You can ask your system administrator to increase the size of the
 1434 \texttt{/tmp} partition.
 1435 \item You can change the environment variable \texttt{TMPDIR} to point to
 1436 your home directory for temporary space.  E.g.,
 1437 \begin{quote}
 1438 \texttt{setenv TMPDIR /home/userid/}
 1439 \end{quote}
 1440 where \texttt{/home/userid/} is the user's home directory.
 1441 \item If your archive command has an \texttt{l} option, you can change the
 1442 archive command to \texttt{ar crl} so that the
 1443 archive command will only place temporary files in the current working
 1444 directory rather than in the default temporary directory /tmp.
 1445 \end{enumerate}
 1446 
 1447 \section{BLAS}
 1448 
 1449 If you suspect a BLAS-related problem and you are linking
 1450 with an optimized version of the BLAS, we would strongly suggest
 1451 as a first step that you link to the Fortran~77 version of
 1452 the suspected BLAS routine and see if the error has disappeared.
 1453 
 1454 We have included test programs for the Level 1 BLAS.
 1455 Users should therefore beware of a common problem in machine-specific
 1456 implementations of xNRM2,
 1457 the function to compute the 2-norm of a vector.
 1458 The Fortran version of xNRM2 avoids underflow or overflow
 1459 by scaling intermediate results, but some library versions of xNRM2
 1460 are not so careful about scaling.
 1461 If xNRM2 is implemented without scaling intermediate results, some of
 1462 the LAPACK test ratios may be unusually high, or
 1463 a floating point exception may occur in the problems scaled near
 1464 underflow or overflow.
 1465 The solution to these problems is to link the Fortran version of
 1466 xNRM2 with the test program.  \emph{On some CRAY architectures, the Fortran77
 1467 version of xNRM2 should be used.}
 1468 
 1469 \section{Optimization}
 1470 
 1471 If a large numbers of test failures occur for a specific matrix type
 1472 or operation, it could be that there is an optimization problem with
 1473 your compiler.  Thus, the user could try reducing the level of
 1474 optimization or eliminating optimization entirely for those routines
 1475 to see if the failures disappear when you rerun the tests.
 1476 
 1477 %LAPACK is written in Fortran 77.  Prospective users with only a
 1478 %Fortran 66 compiler will not be able to use this package.
 1479 
 1480 \section{Compiling testing/timing drivers}
 1481 
 1482 The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
 1483 xTIMEE)
 1484 allocate large amounts of local variables.  Therefore, it is vitally
 1485 important that the user know if his compiler by default allocates local
 1486 variables statically or on the stack.  It is not uncommon for those
 1487 compilers which place local variables on the stack to cause a stack
 1488 overflow at runtime in the testing or timing process.  The user then
 1489 has two options:  increase your stack size, or force all local variables
 1490 to be allocated statically.
 1491 
 1492 On HPPA architectures, the
 1493 compiler and linker flag \texttt{-K} should be used when compiling these testing
 1494 and timing main programs to avoid such a stack overflow.  I.e., set
 1495 \texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file.
 1496 
 1497 For similar reasons,
 1498 on SGI architectures, the compiler and linker flag \texttt{-static} should be
 1499 used.  I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file.
 1500 
 1501 \section{IEEE arithmetic}
 1502 
 1503 Some of our test matrices are scaled near overflow or underflow,
 1504 but on the Crays, problems with the arithmetic near overflow and
 1505 underflow forced us to scale by only the square root of overflow
 1506 and underflow.
 1507 The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
 1508 take the square root of underflow and overflow in cases where it
 1509 could cause difficulties.
 1510 We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
 1511 is greater than 2000
 1512 and take the square root of underflow and overflow in this case.
 1513 The test in SLABAD is as follows:
 1514 \begin{verbatim}
 1515       IF( LOG10( LARGE ).GT.2000. ) THEN
 1516          SMALL = SQRT( SMALL )
 1517          LARGE = SQRT( LARGE )
 1518       END IF
 1519 \end{verbatim}
 1520 Users of other machines with similar restrictions on the effective
 1521 range of usable numbers may have to modify this test so that the
 1522 square roots are done on their machine as well.  \emph{Usually on
 1523 HPPA architectures, a similar restriction in SLABAD should be enforced
 1524 for all testing involving complex arithmetic.}
 1525 SLABAD is located in \texttt{LAPACK/SRC}.
 1526 
 1527 For machines which have a narrow exponent range or lack gradual
 1528 underflow (DEC VAXes for example), it is not uncommon to experience
 1529 failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
 1530 The failures in SLAQTR/DLAQTR and DTRSYL
 1531 occur with test problems which are very badly scaled when the norm of
 1532 the solution is very close to the underflow
 1533 threshold (or even underflows to zero).  We believe that these failures
 1534 could probably be avoided by an even greater degree of care in scaling,
 1535 but we did not want to delay the release of LAPACK any further.  These
 1536 tests pass successfully on most other machines.  An example failure in
 1537 dec.out on a MicroVAX II looks like the following:
 1538 
 1539 \begin{verbatim}
 1540 Tests of the Nonsymmetric eigenproblem condition estimation routines
 1541 DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
 1542 
 1543 Relative machine precision (EPS) =     0.277556D-16
 1544 Safe minimum (SFMIN)             =     0.587747D-38
 1545 
 1546 Routines pass computational tests if test ratio is less than   20.00
 1547 
 1548 DEC routines passed the tests of the error exits ( 35 tests done)
 1549 Error in DTRSYL: RMAX =   0.155D+07
 1550 LMAX =     5323 NINFO=    1600 KNT=   27648
 1551 Error in DLAQTR: RMAX =   0.344D+04
 1552 LMAX =    15792 NINFO=   26720 KNT=   45000
 1553 \end{verbatim}
 1554 
 1555 \section{Timing programs}
 1556 
 1557 In the eigensystem timing program, calls are made to the LINPACK
 1558 and EISPACK equivalents of the LAPACK routines to allow a direct
 1559 comparison of performance measures.
 1560 In some cases we have increased the minimum number of
 1561 iterations in the LINPACK and EISPACK routines to allow
 1562 them to converge for our test problems, but
 1563 even this may not be enough.
 1564 One goal of the LAPACK project is to improve the convergence
 1565 properties of these routines, so error messages in the output
 1566 file indicating that a LINPACK or EISPACK routine did not
 1567 converge should not be regarded with alarm.
 1568 
 1569 In the eigensystem timing program, we have equivalenced some work
 1570 arrays and then passed them to a subroutine, where both arrays are
 1571 modified.  This is a violation of the Fortran~77 standard, which
 1572 says ``if a subprogram reference causes a dummy argument in the
 1573 referenced subprogram to become associated with another dummy
 1574 argument in the referenced subprogram, neither dummy argument may
 1575 become defined during execution of the subprogram.''
 1576 \footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
 1577 If this causes any difficulties, the equivalence
 1578 can be commented out as explained in the comments for the main
 1579 eigensystem timing programs.
 1580 
 1581 %\section*{MACHINE-SPECIFIC DIFFICULTIES}
 1582 %Some IBM compilers do not recognize DBLE as a generic function as used
 1583 %in LAPACK.  The software tools we use to convert from single precision
 1584 %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
 1585 %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
 1586 %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
 1587 %imaginary parts of a double complex number.
 1588 %IBM users can fix this problem by changing DBLE to DREAL when the
 1589 %argument of DBLE is COMPLEX*16.
 1590 %
 1591 %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
 1592 %subprogram definition.  The data type on the first line of the
 1593 %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
 1594 %for the following functions:
 1595 %
 1596 %\begin{tabbing}
 1597 %\dent ZLATMOO \= from the test matrix generator library \kill
 1598 %\dent ZBEG \> from the Level 2 BLAS test program  \\
 1599 %\dent ZBEG \> from the Level 3 BLAS test program  \\
 1600 %\dent ZLADIV \> from the LAPACK library \\
 1601 %\dent ZLARND \> from the test matrix generator library \\
 1602 %\dent ZLATM2 \> from the test matrix generator library \\
 1603 %\dent ZLATM3 \> from the test matrix generator library
 1604 %\end{tabbing}
 1605 %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
 1606 %declared DOUBLE COMPLEX.  If that doesn't work, try the declaration
 1607 %COMPLEX FUNCTION*16.
 1608 
 1609 
 1610 \newpage
 1611 \addcontentsline{toc}{section}{Bibliography}
 1612 
 1613 \begin{thebibliography}{9}
 1614 
 1615 \bibitem{LUG}
 1616 E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
 1617 J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
 1618 S. Ostrouchov, and D. Sorensen,
 1619 \textit{LAPACK Users' Guide}, Second Edition,
 1620 {SIAM}, Philadelphia, PA, 1995.
 1621 
 1622 \bibitem{WN16}
 1623 E. Anderson and J. Dongarra,
 1624 \textit{LAPACK Working Note 16:
 1625 Results from the Initial Release of LAPACK},
 1626 University of Tennessee, CS-89-89, November 1989.
 1627 
 1628 \bibitem{WN41}
 1629 E. Anderson, J. Dongarra, and S. Ostrouchov,
 1630 \textit{LAPACK Working Note 41:
 1631 Installation Guide for LAPACK},
 1632 University of Tennessee, CS-92-151, February 1992 (revised June 1999).
 1633 
 1634 \bibitem{WN5}
 1635 C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
 1636 S. Hammarling, and D. Sorensen,
 1637 \textit{LAPACK Working Note \#5:  Provisional Contents},
 1638 Argonne National Laboratory, ANL-88-38, September 1988.
 1639 
 1640 \bibitem{WN13}
 1641 Z. Bai, J. Demmel, and A. McKenney,
 1642 \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
 1643 Eigenvalue Problem:  Theory and Software},
 1644 University of Tennessee, CS-89-86, October 1989.
 1645 
 1646 \bibitem{XBLAS}
 1647 X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
 1648 W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
 1649 and D. J. Yoo, \textit{Design, implementation and testing of extended
 1650   and mixed precision BLAS},
 1651 \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
 1652 
 1653 \bibitem{BLAS3}
 1654 J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
 1655 ``A Set of Level 3 Basic Linear Algebra Subprograms,''
 1656 \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
 1657 %Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
 1658 
 1659 \bibitem{BLAS3-test}
 1660 J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
 1661 ``A Set of Level 3 Basic Linear Algebra Subprograms:
 1662 Model Implementation and Test Programs,''
 1663 \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
 1664 %Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
 1665 
 1666 \bibitem{BLAS2}
 1667 J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
 1668 ``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
 1669 \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
 1670 
 1671 \bibitem{BLAS2-test}
 1672 J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
 1673 ``An Extended Set of Fortran Basic Linear Algebra Subprograms:
 1674 Model Implementation and Test Programs,''
 1675 \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
 1676 
 1677 \bibitem{BLAS1}
 1678 C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
 1679 ``Basic Linear Algebra Subprograms for Fortran Usage,''
 1680 \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
 1681 
 1682 \end{thebibliography}
 1683 
 1684 \end{document}