As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers.
Alternatively you can here view or download the uninterpreted source code file.

1 \documentclass[11pt]{report} 2 3 \usepackage{indentfirst} 4 \usepackage[body={6in,8.5in}]{geometry} 5 \usepackage{hyperref} 6 \usepackage{graphicx} 7 \DeclareGraphicsRule{.ps}{eps}{}{} 8 9 \renewcommand{\thesection}{\arabic{section}} 10 \setcounter{tocdepth}{3} 11 \setcounter{secnumdepth}{3} 12 13 \begin{document} 14 \begin{center} 15 {\Large LAPACK Working Note 81\\ 16 Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was 17 supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}} 18 \end{center} 19 \begin{center} 20 % Edward Anderson\footnote{Current address: Cray Research Inc., 21 % 655F Lone Oak Drive, Eagan, MN 55121}, 22 The LAPACK Authors\\ 23 Department of Computer Science \\ 24 University of Tennessee \\ 25 Knoxville, Tennessee 37996-1301 \\ 26 \end{center} 27 \begin{center} 28 REVISED: VERSION 3.1.1, February 2007 \\ 29 REVISED: VERSION 3.2.0, November 2008 30 \end{center} 31 32 \begin{center} 33 Abstract 34 \end{center} 35 This working note describes how to install, and test version 3.2.0 36 of LAPACK, a linear algebra package for high-performance 37 computers, on a Unix System. The timing routines are not actually included in 38 release 3.2.0, and that part of the LAWN refers to release 3.0. Also, 39 version 3.2.0 contains many prototype routines needing user feedback. 40 Non-Unix installation instructions and 41 further details of the testing and timing suites are only contained in 42 LAPACK Working Note 41, and not in this abbreviated version. 43 %Separate instructions are provided for the Unix and non-Unix 44 %versions of the test package. 45 %Further details are also given on the design of the test and timing 46 %programs. 47 \newpage 48 49 \tableofcontents 50 51 \newpage 52 % Introduction to Implementation Guide 53 54 \section{Introduction} 55 56 LAPACK is a linear algebra library for high-performance 57 computers. 58 The library includes Fortran subroutines for 59 the analysis and solution of systems of simultaneous linear algebraic 60 equations, linear least-squares problems, and matrix eigenvalue 61 problems. 62 Our approach to achieving high efficiency is based on the use of 63 a standard set of Basic Linear Algebra Subprograms (the BLAS), 64 which can be optimized for each computing environment. 65 By confining most of the computational work to the BLAS, 66 the subroutines should be 67 transportable and efficient across a wide range of computers. 68 69 This working note describes how to install, test, and time this 70 release of LAPACK on a Unix System. 71 72 The instructions for installing, testing, and timing 73 \footnote{timing are only provided in LAPACK 3.0 and before} 74 are designed for a person whose 75 responsibility is the maintenance of a mathematical software library. 76 We assume the installer has experience in compiling and running 77 Fortran programs and in creating object libraries. 78 The installation process involves untarring the file, creating a set of 79 libraries, and compiling and running the test and timing programs 80 \footnotemark[\value{footnote}]. 81 82 %This guide combines the instructions for the Unix and non-Unix 83 %versions of the LAPACK test package (the non-Unix version is in Appendix 84 %~\ref{appendixe}). 85 %At this time, the non-Unix version of LAPACK can only be obtained 86 %after first untarring the Unix tar tape and then following the instructions in 87 %Appendix ~\ref{appendixe}. 88 89 Section~\ref{fileformat} describes how the files are organized in the 90 file, and 91 Section~\ref{overview} gives a general overview of the parts of the test package. 92 Step-by-step instructions appear in Section~\ref{installation}. 93 %for the Unix version and in the appendix for the non-Unix version. 94 95 For users desiring additional information, please refer to LAPACK 96 Working Note 41. 97 % Sections~\ref{moretesting} 98 %and ~\ref{moretiming} give 99 %details of the test and timing programs and their input files. 100 %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe 101 %the LAPACK routines and auxiliary routines provided 102 %in this release. 103 %Appendix ~\ref{appendixc} lists the operation counts we have computed 104 %for the BLAS and for some of the LAPACK routines. 105 Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known 106 problems from our own experiences, with suggestions on how to 107 overcome them. 108 109 \textbf{It is strongly advised that the user read Appendix 110 A before proceeding with the installation process.} 111 %Appendix E contains the execution times of the different test 112 %and timing runs on two sample machines. 113 %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix 114 %system. 115 116 \section{Revisions Since the First Public Release} 117 118 Since its first public release in February, 1992, LAPACK has had 119 several updates, which have encompassed the introduction of new routines 120 as well as extending the functionality of existing routines. The first 121 update, 122 June 30, 1992, was version 1.0a; the second update, October 31, 1992, 123 was version 1.0b; the third update, March 31, 1993, was version 1.1; 124 version 2.0 on September 30, 1994, coincided with the release of the 125 Second Edition of the LAPACK Users' Guide; 126 version 3.0 on June 30, 1999 coincided with the release of the Third Edition of 127 the LAPACK Users' Guide; 128 version 3.1 was released on November, 2006; 129 version 3.1.1 was released on November, 2007; 130 and version 3.2.0 was released on November, 2008. 131 132 All LAPACK routines reflect the current version number with the date 133 on the routine indicating when it was last modified. 134 For more information on revisions in the latest release, please refer 135 to the \texttt{revisions.info} file in the lapack directory on netlib. 136 \begin{quote} 137 \url{http://www.netlib.org/lapack/revisions.info} 138 \end{quote} 139 140 %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is 141 %available on netlib is always the most up-to-date. 142 % 143 %On-line manpages (troff files) for LAPACK driver and computational 144 %routines, as well as most of the BLAS routines, are available via 145 %the \texttt{lapack} index on netlib. 146 147 \section{File Format}\label{fileformat} 148 149 The software for LAPACK is distributed in the form of a 150 gzipped tar file (via anonymous ftp or the World Wide Web), 151 which contains the Fortran source for LAPACK, 152 the Basic Linear Algebra Subprograms 153 (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs, 154 and the timing programs\footnotemark[\value{footnote}]. 155 Users who wish to have a non-Unix installation should refer to LAPACK 156 Working Note 41, 157 although the overview in section~\ref{overview} applies to both the Unix and non-Unix 158 versions. 159 %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe}, 160 %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix 161 %versions. 162 163 The package may be accessed via the World Wide Web through 164 the URL address: 165 \begin{quote} 166 \url{http://www.netlib.org/lapack/lapack.tgz} 167 \end{quote} 168 169 Or, you can retrieve the file via anonymous ftp at netlib: 170 171 \begin{verbatim} 172 ftp ftp.netlib.org 173 login: anonymous 174 password: <your email address> 175 cd lapack 176 binary 177 get lapack.tgz 178 quit 179 \end{verbatim} 180 181 The software in the \texttt{tar} file 182 is organized in a number of essential directories as shown 183 in Figure 1. Please note that this figure does not reflect everything 184 that is contained in the \texttt{LAPACK} directory. Input and instructional 185 files are also located at various levels. 186 \begin{figure} 187 \vspace{11pt} 188 \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}} 189 \caption{Unix organization of LAPACK 3.0} 190 \vspace{11pt} 191 \end{figure} 192 Libraries are created in the LAPACK directory and 193 executable files are created in one of the directories BLAS, TESTING, 194 or TIMING\footnotemark[\value{footnote}]. Input files for the test and 195 timing\footnotemark[\value{footnote}] programs are also 196 found in these three directories so that testing may be carried out 197 in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}]. 198 A top-level makefile in the LAPACK directory is provided to perform the 199 entire installation procedure. 200 201 \section{Overview of Tape Contents}\label{overview} 202 203 Most routines in LAPACK occur in four versions: REAL, 204 DOUBLE PRECISION, COMPLEX, and COMPLEX*16. 205 The first three versions (REAL, DOUBLE PRECISION, and COMPLEX) 206 are written in standard Fortran and are completely portable; 207 the COMPLEX*16 version is provided for 208 those compilers which allow this data type. 209 Some routines use features of Fortran 90. 210 For convenience, we often refer to routines by their single precision 211 names; the leading `S' can be replaced by a `D' for double precision, 212 a `C' for complex, or a `Z' for complex*16. 213 For LAPACK use and testing you must decide which version(s) 214 of the package you intend to install at your site (for example, 215 REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and 216 COMPLEX*16 on an IBM computer). 217 218 \subsection{LAPACK Routines} 219 220 There are three classes of LAPACK routines: 221 \begin{itemize} 222 223 \item \textbf{driver} routines solve a complete problem, such as solving 224 a system of linear equations or computing the eigenvalues of a real 225 symmetric matrix. Users are encouraged to use a driver routine if there 226 is one that meets their requirements. The driver routines are listed 227 in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}. 228 %in Appendix ~\ref{appendixa}. 229 230 \item \textbf{computational} routines, also called simply LAPACK routines, 231 perform a distinct computational task, such as computing 232 the $LU$ decomposition of an $m$-by-$n$ matrix or finding the 233 eigenvalues and eigenvectors of a symmetric tridiagonal matrix using 234 the $QR$ algorithm. 235 The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41} 236 and the LAPACK Users' Guide~\cite{LUG}. 237 %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK 238 %Working Note \#5 \cite{WN5}. 239 240 \item \textbf{auxiliary} routines are all the other subroutines called 241 by the driver routines and computational routines. 242 %Among them are subroutines to perform subtasks of block algorithms, 243 %in particular, the unblocked versions of the block algorithms; 244 %extensions to the BLAS, such as matrix-vector operations involving 245 %complex symmetric matrices; 246 %the special routines LSAME and XERBLA which first appeared with the 247 %BLAS; 248 %and a number of routines to perform common low-level computations, 249 %such as computing a matrix norm, generating an elementary Householder 250 %transformation, and applying a sequence of plane rotations. 251 %Many of the auxiliary routines may be of use to numerical analysts 252 %or software developers, so we have documented the Fortran source for 253 %these routines with the same level of detail used for the LAPACK 254 %routines and driver routines. 255 The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41} 256 and the LAPACK Users' Guide~\cite{LUG}. 257 %The auxiliary routines are listed in Appendix ~\ref{appendixb}. 258 \end{itemize} 259 260 \subsection{Level 1, 2, and 3 BLAS} 261 262 The BLAS are a set of Basic Linear Algebra Subprograms that perform 263 vector-vector, matrix-vector, and matrix-matrix operations. 264 LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all 265 of the parallelism in the LAPACK routines is contained in the BLAS. 266 Therefore, 267 the key to getting good performance from LAPACK lies in having an 268 efficient version of the BLAS optimized for your particular machine. 269 Optimized BLAS libraries are available on a variety of architectures, 270 refer to the BLAS FAQ on netlib for further information. 271 \begin{quote} 272 \url{http://www.netlib.org/blas/faq.html} 273 \end{quote} 274 There are also freely available BLAS generators that automatically 275 tune a subset of the BLAS for a given architecture. E.g., 276 \begin{quote} 277 \url{http://www.netlib.org/atlas/} 278 \end{quote} 279 And, if all else fails, there is the Fortran~77 reference implementation 280 of the Level 1, 2, and 3 BLAS available on netlib (also included in 281 the LAPACK distribution tar file). 282 \begin{quote} 283 \url{http://www.netlib.org/blas/blas.tgz} 284 \end{quote} 285 No matter which BLAS library is used, the BLAS test programs should 286 always be run. 287 288 Users should not expect too much from the Fortran~77 reference implementation 289 BLAS; these versions were written to define the basic operations and do not 290 employ the standard tricks for optimizing Fortran code. 291 292 The formal definitions of the Level 1, 2, and 3 BLAS 293 are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}. 294 The BLAS Quick Reference card is available on netlib. 295 296 \subsection{Mixed- and Extended-Precision BLAS: XBLAS} 297 298 The XBLAS extend the BLAS to work with mixed input and output 299 precisions as well as using extra precision internally. The XBLAS are 300 used in the prototype extra-precise iterative refinement codes. 301 302 The current release of the XBLAS is available through 303 Netlib\footnote{Development versions may be available through 304 \url{http://www.cs.berkeley.edu/~yozo/} or 305 \url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at 306 \begin{quote} 307 \url{http://www.netlib.org/xblas} 308 \end{quote} 309 Their formal definition is in \cite{XBLAS}. 310 311 \subsection{LAPACK Test Routines} 312 313 This release contains two distinct test programs for LAPACK routines 314 in each data type. One test program tests the routines for solving 315 linear equations and linear least squares problems, 316 and the other tests routines for the matrix eigenvalue problem. 317 The routines for generating test matrices are used by both test 318 programs and are compiled into a library for use by both test programs. 319 320 \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) } 321 322 This release also contains two distinct timing programs for the 323 LAPACK routines in each data type. 324 The linear equation timing program gathers performance data in 325 megaflops on the factor, solve, and inverse routines for solving 326 linear systems, the routines to generate or apply an orthogonal matrix 327 given as a sequence of elementary transformations, and the reductions 328 to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue 329 computations. 330 The operation counts used in computing the megaflop rates are computed 331 from a formula; 332 see LAPACK Working Note 41~\cite{WN41}. 333 % see Appendix ~\ref{appendixc}. 334 The eigenvalue timing program is used with the eigensystem routines 335 and returns the execution time, number of floating point operations, and 336 megaflop rate for each of the requested subroutines. 337 In this program, the number of operations is computed while the 338 code is executing using special instrumented versions of the LAPACK 339 subroutines. 340 341 \section{Installing LAPACK on a Unix System}\label{installation} 342 343 Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK 344 involves the following steps: 345 \begin{enumerate} 346 \item Gunzip and tar the file. 347 348 \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}. 349 350 \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}. 351 352 %\item Test and Install the Machine-Dependent Routines \\ 353 %\emph{(WARNING: You may need to supply a correct version of second.f and 354 %dsecnd.f for your machine)} 355 %{\tt 356 %\begin{list}{}{} 357 %\item cd LAPACK 358 %\item make install 359 %\end{list} } 360 % 361 %\item Create the BLAS Library, \emph{if necessary} \\ 362 %\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)} 363 %{\tt 364 %\begin{list}{}{} 365 %\item \texttt{cd LAPACK} 366 %\item \texttt{make blaslib} 367 %\end{list} } 368 % 369 %\item Run the Level 1, 2, and 3 BLAS Test Programs 370 %\begin{list}{}{} 371 %\item \texttt{cd LAPACK} 372 %\item \texttt{make blas\_testing} 373 %\end{list} 374 % 375 %\item Create the LAPACK Library 376 %\begin{list}{}{} 377 %\item \texttt{cd LAPACK} 378 %\item \texttt{make lapacklib} 379 %\end{list} 380 % 381 %\item Create the Library of Test Matrix Generators 382 %\begin{list}{}{} 383 %\item \texttt{cd LAPACK} 384 %\item \texttt{make tmglib} 385 %\end{list} 386 % 387 %\item Run the LAPACK Test Programs 388 %\begin{list}{}{} 389 %\item \texttt{cd LAPACK} 390 %\item \texttt{make testing} 391 %\end{list} 392 % 393 %\item Run the LAPACK Timing Programs 394 %\begin{list}{}{} 395 %\item \texttt{cd LAPACK} 396 %\item \texttt{make timing} 397 %\end{list} 398 % 399 %\item Run the BLAS Timing Programs 400 %\begin{list}{}{} 401 %\item \texttt{cd LAPACK} 402 %\item \texttt{make blas\_timing} 403 %\end{list} 404 \end{enumerate} 405 406 \subsection{Untar the File} 407 408 If you received a tar file of LAPACK via the World Wide 409 Web or anonymous ftp, enter the following command: 410 411 \begin{list}{} 412 \item{\texttt{gunzip -c lapack.tgz | tar xvf -}} 413 \end{list} 414 415 \noindent 416 This will create a top-level directory called \texttt{LAPACK}, which 417 requires approximately 34 Mbytes of disk space. 418 The total space requirements including the object files and executables 419 is approximately 100 Mbytes for all four data types. 420 421 \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}} 422 423 Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs 424 run, you must define all machine-specific parameters for the 425 architecture to which you are installing LAPACK. All machine-specific 426 parameters are contained in the file \texttt{LAPACK/make.inc}. 427 An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given 428 in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command: 429 430 \begin{list}{} 431 \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}} 432 \end{list} 433 434 \noindent 435 Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations. 436 The first line of this \texttt{make.inc} file is: 437 \begin{quote} 438 SHELL = /bin/sh 439 \end{quote} 440 and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are 441 installing LAPACK on an SGI architecture. 442 Next, you will need to modify \texttt{FC}, \texttt{FFLAGS}, 443 \texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify 444 the compiler, compiler options, compiler options for the testing and 445 timing\footnotemark[\value{footnote}] main programs, and linker options. 446 Next you will have to choose which function you will use to time in the 447 \texttt{SECOND} and \texttt{DSECND} routines. 448 \begin{verbatim} 449 # Default: SECOND and DSECND will use a call to the 450 # EXTERNAL FUNCTION ETIME 451 #TIMER = EXT_ETIME 452 # For RS6K: SECOND and DSECND will use a call to the 453 # EXTERNAL FUNCTION ETIME_ 454 #TIMER = EXT_ETIME_ 455 # For gfortran compiler: SECOND and DSECND will use a call to the 456 # INTERNAL FUNCTION ETIME 457 TIMER = INT_ETIME 458 # If your Fortran compiler does not provide etime (like Nag Fortran 459 # Compiler, etc...) SECOND and DSECND will use a call to the 460 # INTERNAL FUNCTION CPU_TIME 461 #TIMER = INT_CPU_TIME 462 # If none of these work, you can use the NONE value. 463 # In that case, SECOND and DSECND will always return 0. 464 #TIMER = NONE 465 \end{verbatim} 466 Refer to the section~\ref{second} to get more information. 467 468 469 Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver, 470 archiver options, and ranlib for your machine. If your architecture 471 does not require \texttt{ranlib} to be run after each archive command (as 472 is the case with CRAY computers running UNICOS, Hewlett Packard 473 computers running HP-UX, or SUN SPARCstations running Solaris), set 474 \texttt{RANLIB = echo}. And finally, you must 475 modify the \texttt{BLASLIB} definition to specify the BLAS library to which 476 you will be linking. If an optimized version of the BLAS is available 477 on your machine, you are highly recommended to link to that library. 478 Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version. 479 480 If you want to enable the XBLAS, define the variable \texttt{USEXBLAS} 481 to some value, for example \texttt{USEXBLAS = Yes}. Then set the 482 variable \texttt{XBLASLIB} to point at the XBLAS library. Note that 483 the prototype iterative refinement routines and their testers will not 484 be built unless \texttt{USEXBLAS} is defined. 485 486 \textbf{NOTE:} Example \texttt{make.inc} include files are contained in the 487 \texttt{LAPACK/INSTALL} directory. Please refer to 488 Appendix~\ref{appendixd} for machine-specific installation hints, and/or 489 the \texttt{release\_notes} file on \texttt{netlib}. 490 \begin{quote} 491 \url{http://www.netlib.org/lapack/release\_notes} 492 \end{quote} 493 494 \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile} 495 496 This \texttt{Makefile} can be modified to perform as much of the 497 installation process as the user desires. Ideally, this is the ONLY 498 makefile the user must modify. However, modification of lower-level 499 makefiles may be necessary if a specific routine needs to be compiled 500 with a different level of optimization. 501 502 First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib}, 503 \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile} 504 to specify the data types desired. For example, 505 if you only wish to compile the single precision real version of the 506 LAPACK library, you would modify the \texttt{lapacklib} definition to be: 507 508 \begin{verbatim} 509 lapacklib: 510 $(MAKE) -C SRC single 511 \end{verbatim} 512 513 Likewise, you could specify \texttt{double, complex, or complex16} to 514 build the double precision real, single precision complex, or double 515 precision complex libraries, respectively. By default, the presence of 516 no arguments following the \texttt{make} command will result in the 517 building of all four data types. 518 The make command can be run more than once to add another 519 data type to the library if necessary. 520 521 %If you are installing LAPACK on a Silicon Graphics machine, you must 522 %modify the respective definitions of \texttt{testing} and \texttt{timing} to be 523 %\begin{verbatim} 524 %testing: 525 % ( cd TESTING; $(MAKE) -f Makefile.sgi ) 526 %\end{verbatim} 527 %and 528 %\begin{verbatim} 529 %timing: 530 % ( cd TIMING; $(MAKE) -f Makefile.sgi ) 531 %\end{verbatim} 532 533 Next, if you will be using a locally available BLAS library, you will need 534 to remove \texttt{blaslib} from the \texttt{lib} definition. And finally, 535 if you do not wish to build all of the libraries individually and 536 likewise run all of the testing and timing separately, you can 537 modify the \texttt{all} definition to specify the amount of the 538 installation process that you want performed. By default, 539 the \texttt{all} definition is set to 540 \begin{verbatim} 541 all: lapack_install lib lapack_testing blas_testing 542 \end{verbatim} 543 which will perform all phases of the installation 544 process -- testing of machine-dependent routines, building the libraries, 545 BLAS testing and LAPACK testing. 546 547 The entire installation process will then be performed by typing 548 \texttt{make}. 549 550 Questions and/or comments can be directed to the 551 authors as described in Section~\ref{sendresults}. If test failures 552 occur, please refer to the appropriate subsection in 553 Section~\ref{furtherdetails}. 554 555 If disk space is limited, we suggest building each data type separately 556 and/or deleting all object files after building the libraries. Likewise, all 557 testing and timing executables can be deleted after the testing and timing 558 process is completed. The removal of all object files and executables 559 can be accomplished by the following: 560 561 \begin{list}{}{} 562 \item \texttt{cd LAPACK} 563 \item \texttt{make cleanobj} 564 \end{list} 565 566 \section{Further Details of the Installation Process}\label{furtherdetails} 567 568 Alternatively, you can choose to run each of the phases of the 569 installation process separately. The following sections give details 570 on how this may be achieved. 571 572 \subsection{Test and Install the Machine-Dependent Routines.} 573 574 There are six machine-dependent functions in the test and timing 575 package, at least three of which must be installed. They are 576 577 \begin{tabbing} 578 MONOMO \= DOUBLE PRECYSION \= \kill 579 LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\ 580 SLAMCH \> REAL \> Determine machine-dependent parameters \\ 581 DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\ 582 SECOND \> REAL \> Return time in seconds from a fixed starting time \\ 583 DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\ 584 ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant 585 \end{tabbing} 586 587 \noindent 588 If you are working only in single precision, you do not need to install 589 DLAMCH and DSECND, and if you are working only in double precision, 590 you do not need to install SLAMCH and SECOND. 591 592 These six subroutines are provided in \texttt{LAPACK/INSTALL}, 593 along with six test programs. 594 To compile the six test programs and run the tests, go to \texttt{LAPACK} and 595 type \texttt{make lapack\_install}. The test programs are called 596 \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and 597 \texttt{testieee}. 598 If you do not wish to run all tests, you will need to modify the 599 \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the 600 tests you wish to run. Otherwise, all tests will be performed. 601 The expected results of each test program are described below. 602 603 \subsubsection{Installing LSAME} 604 605 LSAME is a logical function with two character parameters, A and B. 606 It returns .TRUE. if A and B are the same regardless of case, or .FALSE. 607 if they are different. 608 For example, the expression 609 610 \begin{list}{}{} 611 \item \texttt{LSAME( UPLO, 'U' )} 612 \end{list} 613 \noindent 614 is equivalent to 615 \begin{list}{}{} 616 \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )} 617 \end{list} 618 619 The test program in \texttt{lsametst.f} tests all combinations of 620 the same character in upper and lower case for A and B, and two 621 cases where A and B are different characters. 622 623 Run the test program by typing \texttt{testlsame}. 624 If LSAME works correctly, the only message you should see after the 625 execution of \texttt{testlsame} is 626 \begin{verbatim} 627 ASCII character set 628 Tests completed 629 \end{verbatim} 630 The file \texttt{lsame.f} is automatically copied to 631 \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}. 632 The function LSAME is needed by both the BLAS and LAPACK, so it is safer 633 to have it in both libraries as long as this does not cause trouble 634 in the link phase when both libraries are used. 635 636 \subsubsection{Installing SLAMCH and DLAMCH} 637 638 SLAMCH and DLAMCH are real functions with a single character parameter 639 that indicates the machine parameter to be returned. The test 640 program in \texttt{slamchtst.f} 641 simply prints out the different values computed by SLAMCH, 642 so you need to know something about what the values should be. 643 For example, the output of the test program executable \texttt{testslamch} 644 for SLAMCH on a Sun SPARCstation is 645 \begin{verbatim} 646 Epsilon = 5.96046E-08 647 Safe minimum = 1.17549E-38 648 Base = 2.00000 649 Precision = 1.19209E-07 650 Number of digits in mantissa = 24.0000 651 Rounding mode = 1.00000 652 Minimum exponent = -125.000 653 Underflow threshold = 1.17549E-38 654 Largest exponent = 128.000 655 Overflow threshold = 3.40282E+38 656 Reciprocal of safe minimum = 8.50706E+37 657 \end{verbatim} 658 On a Cray machine, the safe minimum underflows its output 659 representation and the overflow threshold overflows its output 660 representation, so the safe minimum is printed as 0.00000 and overflow 661 is printed as R. This is normal. 662 If you would prefer to print a representable number, you can modify 663 the test program to print SFMIN*100. and RMAX/100. for the safe 664 minimum and overflow thresholds. 665 666 Likewise, the test executable \texttt{testdlamch} is run for DLAMCH. 667 668 If both tests were successful, go to Section~\ref{second}. 669 670 If SLAMCH (or DLAMCH) returns an invalid value, you will have to create 671 your own version of this function. The following options are used in 672 LAPACK and must be set: 673 674 \begin{list}{}{} 675 \item {`B': } Base of the machine 676 \item {`E': } Epsilon (relative machine precision) 677 \item {`O': } Overflow threshold 678 \item {`P': } Precision = Epsilon*Base 679 \item {`S': } Safe minimum (often same as underflow threshold) 680 \item {`U': } Underflow threshold 681 \end{list} 682 683 Some people may be familiar with R1MACH (D1MACH), a primitive 684 routine for setting machine parameters in which the user must 685 comment out the appropriate assignment statements for the target 686 machine. If a version of R1MACH is on hand, the assignments in 687 SLAMCH can be made to refer to R1MACH using the correspondence 688 689 \begin{list}{}{} 690 \item {SLAMCH( `U' )} $=$ R1MACH( 1 ) 691 \item {SLAMCH( `O' )} $=$ R1MACH( 2 ) 692 \item {SLAMCH( `E' )} $=$ R1MACH( 3 ) 693 \item {SLAMCH( `B' )} $=$ R1MACH( 5 ) 694 \end{list} 695 696 \noindent 697 The safe minimum returned by SLAMCH( 'S' ) is initially set to the 698 underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$ 699 it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$, 700 where $\varepsilon$ is the machine precision. 701 702 BE AWARE that the initial call to SLAMCH or DLAMCH is expensive. 703 We suggest that installers run it once, save the results, and hard-code 704 the constants in the version they put in their library. 705 706 \subsubsection{Installing SECOND and DSECND}\label{second} 707 708 Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND 709 (DSECND), a real function with no arguments that returns the time 710 in seconds from some fixed starting time. 711 Our version of this routine 712 returns only ``user time'', and not ``user time $+$ system time''. 713 The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls 714 ETIME, a Fortran library routine available on some computer systems. 715 If ETIME is not available or a better local timing function exists, 716 you will have to provide the correct interface to SECOND and DSECND 717 on your machine. 718 719 Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines. 720 The version that will be used depends on the value of the TIMER variable in the make.inc 721 722 \begin{itemize} 723 \item If ETIME is available as an external function, set the value of the TIMER variable in your 724 make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used. 725 Usually on HPPA architectures, 726 the compiler and linker flag \texttt{+U77} should be included to access 727 the function \texttt{ETIME}. 728 729 \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc 730 to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used. 731 It is the case on some IBM architectures such as IBM RS/6000s. 732 733 \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc 734 to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used. 735 This is the case with gfortan. 736 737 \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc 738 to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used. 739 740 \item If none of these function is available, set the value of the TIMER variable in your make.inc 741 to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used. 742 These routines will always return zero. 743 \end{itemize} 744 745 The test program in \texttt{secondtst.f} 746 performs a million operations using 5000 iterations of 747 the SAXPY operation $y := y + \alpha x$ on a vector of length 100. 748 The total time and megaflops for this test is reported, then 749 the operation is repeated including a call to SECOND on each of 750 the 5000 iterations to determine the overhead due to calling SECOND. 751 The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}). 752 There is no single right answer, but the times 753 in seconds should be positive and the megaflop ratios should be 754 appropriate for your machine. 755 756 \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee} 757 758 %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST 759 %modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV 760 %assumes an IEEE machine, and does a test for IEEE-754 compliance.} 761 762 As some new routines in LAPACK rely on IEEE-754 compliance, 763 two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV 764 (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and 765 infinity arithmetic, respectively. By default, ILAENV assumes an IEEE 766 machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you 767 are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 768 as this test inside ILAENV will crash!} 769 770 If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 771 issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, 772 and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. 773 774 Thus, for non-IEEE machines, the user must hard-code the setting of 775 (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version 776 of \texttt{LAPACK/SRC/ilaenv.f} to be put in 777 his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of 778 ILAENV that will also need to be modified. 779 \begin{itemize} 780 \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f} 781 \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f} 782 \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f} 783 \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f} 784 \end{itemize} 785 786 %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance 787 %is detected (via a call to the function ILAENV), alternative (slower) 788 %algorithms will be chosen. 789 %For further details, refer to the leading comments of routines such 790 %as \texttt{LAPACK/SRC/sstevr.f}. 791 792 The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation 793 architecture 794 to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant. 795 A warning message to the user is printed if non-compliance is detected. 796 This same test is performed inside the function ILAENV. If 797 \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 798 issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, 799 and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. 800 801 To avoid this IEEE test being run every time you call 802 \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest 803 that the user hard-code the setting of 804 \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in 805 his library. As aforementioned, there are also specialized testing and 806 timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified. 807 808 \subsection{Create the BLAS Library} 809 810 Ideally, a highly optimized version of the BLAS library already 811 exists on your machine. 812 In this case you can go directly to Section~\ref{testblas} to 813 make the BLAS test programs. 814 815 \begin{itemize} 816 \item[a)] 817 Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the 818 file \texttt{Makefile} to specify the data types desired, as in the example 819 in Section~\ref{toplevelmakefile}. 820 821 If you already have some of the BLAS, you will need to edit the file 822 \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines 823 defining the BLAS you have. 824 825 \item[b)] 826 Type \texttt{make blaslib}. 827 The make command can be run more than once to add another 828 data type to the library if necessary. 829 \end{itemize} 830 831 \noindent 832 The BLAS library is created in \texttt{LAPACK/librefblas.a}, 833 or in the user-defined location specified by \texttt{BLASLIB} in the file 834 \texttt{LAPACK/make.inc}. 835 836 \subsection{Run the BLAS Test Programs}\label{testblas} 837 838 Test programs for the Level 1, 2, and 3 BLAS are in the directory 839 \texttt{LAPACK/BLAS/TESTING}. 840 841 To compile and run the Level 1, 2, and 3 BLAS test programs, 842 go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable 843 files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and 844 \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3, 845 depending upon the level of BLAS that it is testing. All executable and 846 output files are created in \texttt{LAPACK/BLAS/}. 847 For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out}, 848 \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level 849 2 and 3 BLAS, the name of the output file is indicated on the first line of the 850 input file and is currently defined to be \texttt{sblat2.out} for 851 the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL 852 version, with similar names for the other data types. 853 854 If the tests using the supplied data files were completed successfully, 855 consider whether the tests were sufficiently thorough. 856 For example, on a machine with vector registers, at least one value 857 of $N$ greater than the length of the vector registers should be used; 858 otherwise, important parts of the compiled code may not be 859 exercised by the tests. 860 If the tests were not successful, either because the program did not 861 finish or the test ratios did not pass the threshold, you will 862 probably have to find and correct the problem before continuing. 863 If you have been testing a system-specific 864 BLAS library, try using the Fortran BLAS for the routines that 865 did not pass the tests. 866 For more details on the BLAS test programs, 867 see \cite{BLAS2-test} and \cite{BLAS3-test}. 868 869 \subsection{Create the LAPACK Library} 870 871 \begin{itemize} 872 \item[a)] 873 Go to the directory \texttt{LAPACK} and edit the definition of 874 \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired, 875 as in the example in Section~\ref{toplevelmakefile}. 876 877 \item[b)] 878 Type \texttt{make lapacklib}. 879 The make command can be run more than once to add another 880 data type to the library if necessary. 881 882 \end{itemize} 883 884 \noindent 885 The LAPACK library is created in \texttt{LAPACK/liblapack.a}, 886 or in the user-defined location specified by \texttt{LAPACKLIB} in the file 887 \texttt{LAPACK/make.inc}. 888 889 \subsection{Create the Test Matrix Generator Library} 890 891 \begin{itemize} 892 \item[a)] 893 Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib} 894 in the file \texttt{Makefile} to specify the data types desired, as in the 895 example in Section~\ref{toplevelmakefile}. 896 897 \item[b)] 898 Type \texttt{make tmglib}. 899 The make command can be run more than once to add another 900 data type to the library if necessary. 901 902 \end{itemize} 903 904 \noindent 905 The test matrix generator library is created in \texttt{LAPACK/libtmglib.a}, 906 or in the user-defined location specified by \texttt{TMGLIB} in the file 907 \texttt{LAPACK/make.inc}. 908 909 \subsection{Run the LAPACK Test Programs} 910 911 There are two distinct test programs for LAPACK routines 912 in each data type, one for the linear equation routines and 913 one for the eigensystem routines. 914 In each data type, there is one input file for testing the linear 915 equation routines and eighteen input files for testing the eigenvalue 916 routines. 917 The input files reside in \texttt{LAPACK/TESTING}. 918 For more information on the test programs and how to modify the 919 input files, please refer to LAPACK Working Note 41~\cite{WN41}. 920 % see Section~\ref{moretesting}. 921 922 If you do not wish to run each of the tests individually, you can 923 go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file 924 \texttt{Makefile} to specify the data types desired, and type \texttt{make 925 lapack\_testing}. This will 926 compile and run the tests as described in sections~\ref{testlin} 927 and ~\ref{testeig}. 928 929 %If you are installing LAPACK on a Silicon Graphics machine, you must 930 %modify the definition of \texttt{testing} to be 931 %\begin{verbatim} 932 %testing: 933 % ( cd TESTING; $(MAKE) -f Makefile.sgi ) 934 %\end{verbatim} 935 936 \subsubsection{Testing the Linear Equations Routines}\label{testlin} 937 938 \begin{itemize} 939 940 \item[a)] 941 Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types 942 desired. The executable files are called \texttt{xlintsts, xlintstc, 943 xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}. 944 945 \item[b)] 946 Go to \texttt{LAPACK/TESTING} and run the tests for each data type. 947 For the REAL version, the command is 948 \begin{list}{}{} 949 \item{} \texttt{xlintsts < stest.in > stest.out} 950 \end{list} 951 952 \noindent 953 The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar 954 with the leading `s' in the input and output file names replaced 955 by `d', `c', or `z'. 956 957 \end{itemize} 958 959 If you encountered failures in this phase of the testing process, please 960 refer to Section~\ref{sendresults}. 961 962 \subsubsection{Testing the Eigensystem Routines}\label{testeig} 963 964 \begin{itemize} 965 966 \item[a)] 967 Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types 968 desired. The executable files are called \texttt{xeigtsts, 969 xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created 970 in \texttt{LAPACK/TESTING}. 971 972 \item[b)] 973 Go to \texttt{LAPACK/TESTING} and run the tests for each data type. 974 The tests for the eigensystem routines use eighteen separate input files 975 for testing the nonsymmetric eigenvalue problem, 976 the symmetric eigenvalue problem, the banded symmetric eigenvalue 977 problem, the generalized symmetric eigenvalue 978 problem, the generalized nonsymmetric eigenvalue problem, the 979 singular value decomposition, the banded singular value decomposition, 980 the generalized singular value 981 decomposition, the generalized QR and RQ factorizations, the generalized 982 linear regression model, and the constrained linear least squares 983 problem. 984 The tests for the REAL version are as follows: 985 \begin{list}{}{} 986 \item \texttt{xeigtsts < nep.in > snep.out} 987 \item \texttt{xeigtsts < sep.in > ssep.out} 988 \item \texttt{xeigtsts < svd.in > ssvd.out} 989 \item \texttt{xeigtsts < sec.in > sec.out} 990 \item \texttt{xeigtsts < sed.in > sed.out} 991 \item \texttt{xeigtsts < sgg.in > sgg.out} 992 \item \texttt{xeigtsts < sgd.in > sgd.out} 993 \item \texttt{xeigtsts < ssg.in > ssg.out} 994 \item \texttt{xeigtsts < ssb.in > ssb.out} 995 \item \texttt{xeigtsts < sbb.in > sbb.out} 996 \item \texttt{xeigtsts < sbal.in > sbal.out} 997 \item \texttt{xeigtsts < sbak.in > sbak.out} 998 \item \texttt{xeigtsts < sgbal.in > sgbal.out} 999 \item \texttt{xeigtsts < sgbak.in > sgbak.out} 1000 \item \texttt{xeigtsts < glm.in > sglm.out} 1001 \item \texttt{xeigtsts < gqr.in > sgqr.out} 1002 \item \texttt{xeigtsts < gsv.in > sgsv.out} 1003 \item \texttt{xeigtsts < lse.in > slse.out} 1004 \end{list} 1005 The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also 1006 use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in}, 1007 \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in}, 1008 but the leading `s' in the other input file names must be changed 1009 to `c', `d', or `z'. 1010 \end{itemize} 1011 1012 If you encountered failures in this phase of the testing process, please 1013 refer to Section~\ref{sendresults}. 1014 1015 \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)} 1016 1017 There are two distinct timing programs for LAPACK routines 1018 in each data type, one for the linear equation routines and 1019 one for the eigensystem routines. The timing program for the 1020 linear equation routines is also used to time the BLAS. 1021 We encourage you to conduct these timing experiments 1022 in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is 1023 not necessary to send timing results in all four data types. 1024 1025 Two sets of input files are provided, a small set and a large set. 1026 The small data sets are appropriate for a standard workstation or 1027 other non-vector machine. 1028 The large data sets are appropriate for supercomputers, vector 1029 computers, and high-performance workstations. 1030 We are mainly interested in results from the large data sets, and 1031 it is not necessary to run both the large and small sets. 1032 The values of N in the large data sets are about five times larger 1033 than those in the small data set, 1034 and the large data sets use additional values for parameters such as the 1035 block size NB and the leading array dimension LDA. 1036 Small data sets finished with the \_small in their name , such as 1037 \texttt{stime\_small.in}, and large data sets finished with \_large in their name, 1038 such as \texttt{stime\_large.in}. 1039 Except as noted, the leading `s' in the input file name must be 1040 replaced by `d', `c', or `z' for the other data types. 1041 1042 We encourage you to obtain timing results with the large data sets, 1043 as this allows us to compare different machines. 1044 If this would take too much time, suggestions for paring back the large 1045 data sets are given in the instructions below. 1046 We also encourage you to experiment with these timing 1047 programs and send us any interesting results, such as results for 1048 larger problems or for a wider range of block sizes. 1049 The main programs are dimensioned for the large data sets, 1050 so the parameters in the main program may have to be reduced in order 1051 to run the small data sets on a small machine, or increased to run 1052 experiments with larger problems. 1053 1054 The minimum time each subroutine will be timed is set to 0.0 in 1055 the large data files and to 0.05 in the small data files, and on 1056 many machines this value should be increased. 1057 If the timing interval is not long 1058 enough, the time for the subroutine after subtracting the overhead 1059 may be very small or zero, resulting in megaflop rates that are 1060 very large or zero. (To avoid division by zero, the megaflop rate is 1061 set to zero if the time is less than or equal to zero.) 1062 The minimum time that should be used depends on the machine and the 1063 resolution of the clock. 1064 1065 For more information on the timing programs and how to modify the 1066 input files, please refer to LAPACK Working Note 41~\cite{WN41}. 1067 % see Section~\ref{moretiming}. 1068 1069 If you do not wish to run each of the timings individually, you can 1070 go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file 1071 \texttt{Makefile} to specify the data types desired, and type \texttt{make 1072 lapack\_timing}. This will compile 1073 and run the timings for the linear equation routines and the eigensystem 1074 routines (see Sections~\ref{timelin} and ~\ref{timeeig}). 1075 1076 %If you are installing LAPACK on a Silicon Graphics machine, you must 1077 %modify the definition of \texttt{timing} to be 1078 %\begin{verbatim} 1079 %timing: 1080 % ( cd TIMING; $(MAKE) -f Makefile.sgi ) 1081 %\end{verbatim} 1082 1083 If you encounter failures in any phase of the timing process, please 1084 feel free to contact the authors as directed in Section~\ref{sendresults}. 1085 Tell us the 1086 type of machine on which the tests were run, the version of the operating 1087 system, the compiler and compiler options that were used, 1088 and details of the BLAS library or libraries that you used. You should 1089 also include a copy of the output file in which the failure occurs. 1090 1091 Please note that the BLAS 1092 timing runs will still need to be run as instructed in ~\ref{timeblas}. 1093 1094 \subsubsection{Timing the Linear Equations Routines}\label{timelin} 1095 1096 The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN} 1097 and the input files are in \texttt{LAPACK/TIMING}. 1098 Three input files are provided in each data type for timing the 1099 linear equation routines, one for square matrices, one for band 1100 matrices, and one for rectangular matrices. The small data sets for the REAL version 1101 are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively, 1102 and the large data sets are 1103 \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}. 1104 1105 The timing program for the least squares routines uses special instrumented 1106 versions of the LAPACK routines to time individual sections of the code. 1107 The first step in compiling the timing program is therefore to make a library 1108 of the instrumented routines. 1109 1110 \begin{itemize} 1111 \item[a)] 1112 \begin{sloppypar} 1113 To make a library of the instrumented LAPACK routines, first 1114 go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed 1115 by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 1116 The library of instrumented code is created in 1117 \texttt{LAPACK/TIMING/LIN/linsrc.a}. 1118 \end{sloppypar} 1119 1120 \item[b)] 1121 To make the linear equation timing programs, 1122 go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data 1123 types desired, as in the examples in Section~\ref{toplevelmakefile}. 1124 The executable files are called \texttt{xlintims}, 1125 \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created 1126 in \texttt{LAPACK/TIMING}. 1127 1128 \item[c)] 1129 Go to \texttt{LAPACK/TIMING} and 1130 make any necessary modifications to the input files. 1131 You may need to set the minimum time a subroutine will 1132 be timed to a positive value, or to restrict the size of the tests 1133 if you are using a computer with performance in between that of a 1134 workstation and that of a supercomputer. 1135 The computational requirements can be cut in half by using only one 1136 value of LDA. 1137 If it is necessary to also reduce the matrix sizes or the values of 1138 the blocksize, corresponding changes should be made to the 1139 BLAS input files (see Section~\ref{timeblas}). 1140 1141 \item[d)] 1142 Run the programs for each data type you are using. 1143 For the REAL version, the commands for the small data sets are 1144 1145 \begin{list}{}{} 1146 \item{} \texttt{xlintims < stime\_small.in > stime\_small.out } 1147 \item{} \texttt{xlintims < sband\_small.in > sband\_small.out } 1148 \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out } 1149 \end{list} 1150 or the commands for the large data sets are 1151 \begin{list}{}{} 1152 \item{} \texttt{xlintims < stime\_large.in > stime\_large.out } 1153 \item{} \texttt{xlintims < sband\_large.in > sband\_large.out } 1154 \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out } 1155 \end{list} 1156 1157 \noindent 1158 Similar commands should be used for the other data types. 1159 \end{itemize} 1160 1161 \subsubsection{Timing the BLAS}\label{timeblas} 1162 1163 The linear equation timing program is also used to time the BLAS. 1164 Three input files are provided in each data type for timing the Level 1165 2 and 3 BLAS. 1166 These input files time the BLAS using the matrix shapes encountered 1167 in the LAPACK routines, and we will use the results to analyze the 1168 performance of the LAPACK routines. 1169 For the REAL version, the small data files are 1170 \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in} 1171 and the large data files are 1172 \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}. 1173 There are three sets of inputs because there are three 1174 parameters in the Level 3 BLAS, M, N, and K, and 1175 in most applications one of these parameters is small (on the order 1176 of the blocksize) while the other two are large (on the order of the 1177 matrix size). 1178 In \texttt{sblasa\_small.in}, M and N are large but K is 1179 small, while in \texttt{sblasb\_small.in} the small parameter is M, and 1180 in \texttt{sblasc\_small.in} the small parameter is N. 1181 The Level 2 BLAS are timed only in the first data set, where K 1182 is also used as the bandwidth for the banded routines. 1183 1184 \begin{itemize} 1185 1186 \item[a)] 1187 Go to \texttt{LAPACK/TIMING} and 1188 make any necessary modifications to the input files. 1189 You may need to set the minimum time a subroutine will 1190 be timed to a positive value. 1191 If you modified the values of N or NB 1192 in Section~\ref{timelin}, set M, N, and K accordingly. 1193 The large parameters among M, N, and K 1194 should be the same as the matrix sizes used in timing the linear 1195 equation routines, 1196 and the small parameter should be the same as the 1197 blocksizes used in timing the linear equation routines. 1198 If necessary, the large data set can be simplified by using only one 1199 value of LDA. 1200 1201 \item[b)] 1202 Run the programs for each data type you are using. 1203 For the REAL version, the commands for the small data sets are 1204 1205 \begin{list}{}{} 1206 \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out } 1207 \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out } 1208 \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out } 1209 \end{list} 1210 or the commands for the large data sets are 1211 \begin{list}{}{} 1212 \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out } 1213 \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out } 1214 \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out } 1215 \end{list} 1216 1217 \noindent 1218 Similar commands should be used for the other data types. 1219 \end{itemize} 1220 1221 \subsubsection{Timing the Eigensystem Routines}\label{timeeig} 1222 1223 The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG} 1224 and the input files are in \texttt{LAPACK/TIMING}. 1225 Four input files are provided in each data type for timing the 1226 eigensystem routines, 1227 one for the generalized nonsymmetric eigenvalue problem, 1228 one for the nonsymmetric eigenvalue problem, 1229 one for the symmetric and generalized symmetric eigenvalue problem, 1230 and one for the singular value decomposition. 1231 For the REAL version, the small data sets are called \texttt{sgeptim\_small.in}, 1232 \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively. 1233 and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in}, 1234 \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}. 1235 Each of the four input files reads a different set of parameters, 1236 and the format of the input is indicated by a 3-character code 1237 on the first line. 1238 1239 The timing program for eigenvalue/singular value routines accumulates 1240 the operation count as the routines are executing using special 1241 instrumented versions of the LAPACK routines. The first step in 1242 compiling the timing program is therefore to make a library of the 1243 instrumented routines. 1244 1245 \begin{itemize} 1246 \item[a)] 1247 \begin{sloppypar} 1248 To make a library of the instrumented LAPACK routines, first 1249 go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed 1250 by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 1251 The library of instrumented code is created in 1252 \texttt{LAPACK/TIMING/EIG/eigsrc.a}. 1253 \end{sloppypar} 1254 1255 \item[b)] 1256 To make the eigensystem timing programs, 1257 go to \texttt{LAPACK/TIMING/EIG} and 1258 type \texttt{make} followed by the data types desired, as in the examples 1259 of Section~\ref{toplevelmakefile}. The executable files are called 1260 \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz} 1261 and are created in \texttt{LAPACK/TIMING}. 1262 1263 \item[c)] 1264 Go to \texttt{LAPACK/TIMING} and 1265 make any necessary modifications to the input files. 1266 You may need to set the minimum time a subroutine will 1267 be timed to a positive value, or to restrict the number of tests 1268 if you are using a computer with performance in between that of a 1269 workstation and that of a supercomputer. 1270 Instead of decreasing the matrix dimensions to reduce the time, 1271 it would be better to reduce the number of matrix types to be timed, 1272 since the performance varies more with the matrix size than with the 1273 type. For example, for the nonsymmetric eigenvalue routines, 1274 you could use only one matrix of type 4 instead of four matrices of 1275 types 1, 3, 4, and 6. 1276 Refer to LAPACK Working Note 41~\cite{WN41} for further details. 1277 % See Section~\ref{moretiming} for further details. 1278 1279 \item[d)] 1280 Run the programs for each data type you are using. 1281 For the REAL version, the commands for the small data sets are 1282 1283 \begin{list}{}{} 1284 \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out } 1285 \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out } 1286 \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out } 1287 \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out } 1288 \end{list} 1289 or the commands for the large data sets are 1290 \begin{list}{}{} 1291 \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out } 1292 \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out } 1293 \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out } 1294 \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out } 1295 \end{list} 1296 1297 \noindent 1298 Similar commands should be used for the other data types. 1299 \end{itemize} 1300 1301 \subsection{Send the Results to Tennessee}\label{sendresults} 1302 1303 Congratulations! You have now finished installing, testing, and 1304 timing LAPACK. If you encountered failures in any phase of the 1305 testing or timing process, please 1306 consult our \texttt{release\_notes} file on netlib. 1307 \begin{quote} 1308 \url{http://www.netlib.org/lapack/release\_notes} 1309 \end{quote} 1310 This file contains machine-dependent installation clues which hopefully will 1311 alleviate your difficulties or at least let you know that other users 1312 have had similar difficulties on that machine. If there is not an entry 1313 for your machine or the suggestions do not fix your problem, please feel 1314 free to contact the authors at 1315 \begin{list}{}{} 1316 \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. 1317 \end{list} 1318 Tell us the 1319 type of machine on which the tests were run, the version of the operating 1320 system, the compiler and compiler options that were used, 1321 and details of the BLAS library or libraries that you used. You should 1322 also include a copy of the output file in which the failure occurs. 1323 1324 We would like to keep our \texttt{release\_notes} file as up-to-date as possible. 1325 Therefore, if you do not see an entry for your machine, please contact us 1326 with your testing results. 1327 1328 Comments and suggestions are also welcome. 1329 1330 We encourage you to make the LAPACK library available to your 1331 users and provide us with feedback from their experiences. 1332 %This release of LAPACK is not guaranteed to be compatible 1333 %with any previous test release. 1334 1335 \subsection{Get support}\label{getsupport} 1336 First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}. 1337 if you still cannot solve your problem, you have 2 ways to go: 1338 \begin{itemize} 1339 \item 1340 either send a post in the LAPACK forum 1341 \begin{quote} 1342 \url{http://icl.cs.utk.edu/lapack-forum} 1343 \end{quote} 1344 \item 1345 or send an email to the LAPACK mailing list: 1346 \begin{list}{}{} 1347 \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. 1348 \end{list} 1349 \end{itemize} 1350 \section*{Acknowledgments} 1351 1352 Ed Anderson and Susan Blackford contributed to previous versions of this report. 1353 1354 \appendix 1355 1356 \chapter{Caveats}\label{appendixd} 1357 1358 In this appendix we list a few of the machine-specific difficulties we 1359 have 1360 encountered in our own experience with LAPACK. A more detailed list 1361 of machine-dependent problems, bugs, and compiler errors encountered 1362 in the LAPACK installation process is maintained 1363 on \emph{netlib}. 1364 \begin{quote} 1365 \url{http://www.netlib.org/lapack/release\_notes} 1366 \end{quote} 1367 1368 We assume the user has installed the machine-specific routines 1369 correctly and that the Level 1, 2 and 3 BLAS test programs have run 1370 successfully, so we do not list any warnings associated with those 1371 routines. 1372 1373 \section{\texttt{LAPACK/make.inc}} 1374 1375 All machine-specific 1376 parameters are specified in the file \texttt{LAPACK/make.inc}. 1377 1378 The first line of this \texttt{make.inc} file is: 1379 \begin{quote} 1380 SHELL = /bin/sh 1381 \end{quote} 1382 and will need to be modified to \texttt{SHELL = /sbin/sh} if you are 1383 installing LAPACK on an SGI architecture. 1384 1385 \section{ETIME} 1386 1387 On HPPA architectures, 1388 the compiler and linker flag \texttt{+U77} should be included to access 1389 the function \texttt{ETIME}. 1390 1391 \section{ILAENV and IEEE-754 compliance} 1392 1393 %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754 1394 %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10}) 1395 %and (\texttt{ISPEC=11}) settings in ILAENV. 1396 % 1397 %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 1398 %as this test inside ILAENV will crash! 1399 1400 As some new routines in LAPACK rely on IEEE-754 compliance, 1401 two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV 1402 (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and 1403 infinity arithmetic, respectively. By default, ILAENV assumes an IEEE 1404 machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you 1405 are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, 1406 as this test inside ILAENV will crash!} 1407 1408 Thus, for non-IEEE machines, the user must hard-code the setting of 1409 (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version 1410 of \texttt{LAPACK/SRC/ilaenv.f} to be put in 1411 his library. For further details, refer to section~\ref{testieee}. 1412 1413 Be aware 1414 that some IEEE compilers by default do not enforce IEEE-754 compliance, and 1415 a compiler flag must be explicitly set by the user. 1416 1417 On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler 1418 flag to enable IEEE-754 compliance. 1419 1420 And lastly, the test inside ILAENV to detect IEEE-754 compliance, will 1421 result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''. 1422 Thus, if the user is installing on a machine that issues IEEE exception 1423 warning messages (like a Sun SPARCstation), the user can disregard these 1424 messages. To avoid these messages, the user can hard-code the values 1425 inside ILAENV as explained in section~\ref{testieee}. 1426 1427 \section{Lack of \texttt{/tmp} space} 1428 1429 If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your 1430 architecture, you may run out of space 1431 when compiling. There are a few possible solutions to this problem. 1432 \begin{enumerate} 1433 \item You can ask your system administrator to increase the size of the 1434 \texttt{/tmp} partition. 1435 \item You can change the environment variable \texttt{TMPDIR} to point to 1436 your home directory for temporary space. E.g., 1437 \begin{quote} 1438 \texttt{setenv TMPDIR /home/userid/} 1439 \end{quote} 1440 where \texttt{/home/userid/} is the user's home directory. 1441 \item If your archive command has an \texttt{l} option, you can change the 1442 archive command to \texttt{ar crl} so that the 1443 archive command will only place temporary files in the current working 1444 directory rather than in the default temporary directory /tmp. 1445 \end{enumerate} 1446 1447 \section{BLAS} 1448 1449 If you suspect a BLAS-related problem and you are linking 1450 with an optimized version of the BLAS, we would strongly suggest 1451 as a first step that you link to the Fortran~77 version of 1452 the suspected BLAS routine and see if the error has disappeared. 1453 1454 We have included test programs for the Level 1 BLAS. 1455 Users should therefore beware of a common problem in machine-specific 1456 implementations of xNRM2, 1457 the function to compute the 2-norm of a vector. 1458 The Fortran version of xNRM2 avoids underflow or overflow 1459 by scaling intermediate results, but some library versions of xNRM2 1460 are not so careful about scaling. 1461 If xNRM2 is implemented without scaling intermediate results, some of 1462 the LAPACK test ratios may be unusually high, or 1463 a floating point exception may occur in the problems scaled near 1464 underflow or overflow. 1465 The solution to these problems is to link the Fortran version of 1466 xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77 1467 version of xNRM2 should be used.} 1468 1469 \section{Optimization} 1470 1471 If a large numbers of test failures occur for a specific matrix type 1472 or operation, it could be that there is an optimization problem with 1473 your compiler. Thus, the user could try reducing the level of 1474 optimization or eliminating optimization entirely for those routines 1475 to see if the failures disappear when you rerun the tests. 1476 1477 %LAPACK is written in Fortran 77. Prospective users with only a 1478 %Fortran 66 compiler will not be able to use this package. 1479 1480 \section{Compiling testing/timing drivers} 1481 1482 The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and 1483 xTIMEE) 1484 allocate large amounts of local variables. Therefore, it is vitally 1485 important that the user know if his compiler by default allocates local 1486 variables statically or on the stack. It is not uncommon for those 1487 compilers which place local variables on the stack to cause a stack 1488 overflow at runtime in the testing or timing process. The user then 1489 has two options: increase your stack size, or force all local variables 1490 to be allocated statically. 1491 1492 On HPPA architectures, the 1493 compiler and linker flag \texttt{-K} should be used when compiling these testing 1494 and timing main programs to avoid such a stack overflow. I.e., set 1495 \texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file. 1496 1497 For similar reasons, 1498 on SGI architectures, the compiler and linker flag \texttt{-static} should be 1499 used. I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file. 1500 1501 \section{IEEE arithmetic} 1502 1503 Some of our test matrices are scaled near overflow or underflow, 1504 but on the Crays, problems with the arithmetic near overflow and 1505 underflow forced us to scale by only the square root of overflow 1506 and underflow. 1507 The LAPACK auxiliary routine SLABAD (or DLABAD) is called to 1508 take the square root of underflow and overflow in cases where it 1509 could cause difficulties. 1510 We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$ 1511 is greater than 2000 1512 and take the square root of underflow and overflow in this case. 1513 The test in SLABAD is as follows: 1514 \begin{verbatim} 1515 IF( LOG10( LARGE ).GT.2000. ) THEN 1516 SMALL = SQRT( SMALL ) 1517 LARGE = SQRT( LARGE ) 1518 END IF 1519 \end{verbatim} 1520 Users of other machines with similar restrictions on the effective 1521 range of usable numbers may have to modify this test so that the 1522 square roots are done on their machine as well. \emph{Usually on 1523 HPPA architectures, a similar restriction in SLABAD should be enforced 1524 for all testing involving complex arithmetic.} 1525 SLABAD is located in \texttt{LAPACK/SRC}. 1526 1527 For machines which have a narrow exponent range or lack gradual 1528 underflow (DEC VAXes for example), it is not uncommon to experience 1529 failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL. 1530 The failures in SLAQTR/DLAQTR and DTRSYL 1531 occur with test problems which are very badly scaled when the norm of 1532 the solution is very close to the underflow 1533 threshold (or even underflows to zero). We believe that these failures 1534 could probably be avoided by an even greater degree of care in scaling, 1535 but we did not want to delay the release of LAPACK any further. These 1536 tests pass successfully on most other machines. An example failure in 1537 dec.out on a MicroVAX II looks like the following: 1538 1539 \begin{verbatim} 1540 Tests of the Nonsymmetric eigenproblem condition estimation routines 1541 DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR 1542 1543 Relative machine precision (EPS) = 0.277556D-16 1544 Safe minimum (SFMIN) = 0.587747D-38 1545 1546 Routines pass computational tests if test ratio is less than 20.00 1547 1548 DEC routines passed the tests of the error exits ( 35 tests done) 1549 Error in DTRSYL: RMAX = 0.155D+07 1550 LMAX = 5323 NINFO= 1600 KNT= 27648 1551 Error in DLAQTR: RMAX = 0.344D+04 1552 LMAX = 15792 NINFO= 26720 KNT= 45000 1553 \end{verbatim} 1554 1555 \section{Timing programs} 1556 1557 In the eigensystem timing program, calls are made to the LINPACK 1558 and EISPACK equivalents of the LAPACK routines to allow a direct 1559 comparison of performance measures. 1560 In some cases we have increased the minimum number of 1561 iterations in the LINPACK and EISPACK routines to allow 1562 them to converge for our test problems, but 1563 even this may not be enough. 1564 One goal of the LAPACK project is to improve the convergence 1565 properties of these routines, so error messages in the output 1566 file indicating that a LINPACK or EISPACK routine did not 1567 converge should not be regarded with alarm. 1568 1569 In the eigensystem timing program, we have equivalenced some work 1570 arrays and then passed them to a subroutine, where both arrays are 1571 modified. This is a violation of the Fortran~77 standard, which 1572 says ``if a subprogram reference causes a dummy argument in the 1573 referenced subprogram to become associated with another dummy 1574 argument in the referenced subprogram, neither dummy argument may 1575 become defined during execution of the subprogram.'' 1576 \footnote{ ANSI X3.9-1978, sec. 15.9.3.6} 1577 If this causes any difficulties, the equivalence 1578 can be commented out as explained in the comments for the main 1579 eigensystem timing programs. 1580 1581 %\section*{MACHINE-SPECIFIC DIFFICULTIES} 1582 %Some IBM compilers do not recognize DBLE as a generic function as used 1583 %in LAPACK. The software tools we use to convert from single precision 1584 %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX, 1585 %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but 1586 %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and 1587 %imaginary parts of a double complex number. 1588 %IBM users can fix this problem by changing DBLE to DREAL when the 1589 %argument of DBLE is COMPLEX*16. 1590 % 1591 %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION 1592 %subprogram definition. The data type on the first line of the 1593 %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX 1594 %for the following functions: 1595 % 1596 %\begin{tabbing} 1597 %\dent ZLATMOO \= from the test matrix generator library \kill 1598 %\dent ZBEG \> from the Level 2 BLAS test program \\ 1599 %\dent ZBEG \> from the Level 3 BLAS test program \\ 1600 %\dent ZLADIV \> from the LAPACK library \\ 1601 %\dent ZLARND \> from the test matrix generator library \\ 1602 %\dent ZLATM2 \> from the test matrix generator library \\ 1603 %\dent ZLATM3 \> from the test matrix generator library 1604 %\end{tabbing} 1605 %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already 1606 %declared DOUBLE COMPLEX. If that doesn't work, try the declaration 1607 %COMPLEX FUNCTION*16. 1608 1609 1610 \newpage 1611 \addcontentsline{toc}{section}{Bibliography} 1612 1613 \begin{thebibliography}{9} 1614 1615 \bibitem{LUG} 1616 E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, 1617 J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, 1618 S. Ostrouchov, and D. Sorensen, 1619 \textit{LAPACK Users' Guide}, Second Edition, 1620 {SIAM}, Philadelphia, PA, 1995. 1621 1622 \bibitem{WN16} 1623 E. Anderson and J. Dongarra, 1624 \textit{LAPACK Working Note 16: 1625 Results from the Initial Release of LAPACK}, 1626 University of Tennessee, CS-89-89, November 1989. 1627 1628 \bibitem{WN41} 1629 E. Anderson, J. Dongarra, and S. Ostrouchov, 1630 \textit{LAPACK Working Note 41: 1631 Installation Guide for LAPACK}, 1632 University of Tennessee, CS-92-151, February 1992 (revised June 1999). 1633 1634 \bibitem{WN5} 1635 C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, 1636 S. Hammarling, and D. Sorensen, 1637 \textit{LAPACK Working Note \#5: Provisional Contents}, 1638 Argonne National Laboratory, ANL-88-38, September 1988. 1639 1640 \bibitem{WN13} 1641 Z. Bai, J. Demmel, and A. McKenney, 1642 \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric 1643 Eigenvalue Problem: Theory and Software}, 1644 University of Tennessee, CS-89-86, October 1989. 1645 1646 \bibitem{XBLAS} 1647 X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, 1648 W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, 1649 and D. J. Yoo, \textit{Design, implementation and testing of extended 1650 and mixed precision BLAS}, 1651 \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002. 1652 1653 \bibitem{BLAS3} 1654 J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, 1655 ``A Set of Level 3 Basic Linear Algebra Subprograms,'' 1656 \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990 1657 %Argonne National Laboratory, ANL-MCS-P88-1, August 1988. 1658 1659 \bibitem{BLAS3-test} 1660 J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, 1661 ``A Set of Level 3 Basic Linear Algebra Subprograms: 1662 Model Implementation and Test Programs,'' 1663 \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990 1664 %Argonne National Laboratory, ANL-MCS-TM-119, June 1988. 1665 1666 \bibitem{BLAS2} 1667 J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, 1668 ``An Extended Set of Fortran Basic Linear Algebra Subprograms,'' 1669 \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988. 1670 1671 \bibitem{BLAS2-test} 1672 J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, 1673 ``An Extended Set of Fortran Basic Linear Algebra Subprograms: 1674 Model Implementation and Test Programs,'' 1675 \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988. 1676 1677 \bibitem{BLAS1} 1678 C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, 1679 ``Basic Linear Algebra Subprograms for Fortran Usage,'' 1680 \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979. 1681 1682 \end{thebibliography} 1683 1684 \end{document}