"Fossies" - the Fresh Open Source Software Archive

Member "hpcc-1.5.0/README.txt" (18 Mar 2016, 18343 Bytes) of package /linux/privat/hpcc-1.5.0.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the latest Fossies "Diffs" side-by-side code changes report for "README.txt": 1.5.0b_vs_1.5.0.

    1     
    2           DARPA/DOE HPC Challenge Benchmark version 1.5.0beta
    3           ***************************************************
    4                            Piotr Luszczek (1)
    5                            ==================
    6                             October 12, 2012
    7                             ================
    8   
    9 
   10 
   11 1  Introduction
   12 *=*=*=*=*=*=*=*
   13 
   14    This is a suite of benchmarks that measure performance of processor,
   15 memory subsytem, and the interconnect. For details refer to the
   16 HPC Challenge web site (http://icl.cs.utk.edu/hpcc/.)
   17   In essence, HPC Challenge consists of a number of tests each of which
   18 measures performance of a different aspect of the system.
   19   If you are familiar with the High Performance Linpack (HPL) benchmark
   20 code (see the HPL web site: http://www.netlib.org/benchmark/hpl/) then
   21 you can reuse the build script file (input for make(1) command) and the
   22 input file that you already have for HPL. The HPC Challenge benchmark
   23 includes HPL and uses its build script and input files with only slight
   24 modifications. The most important change must be done to the line that
   25 sets the TOPdir variable. For HPC Challenge, the variable's value should
   26 always be ../../.. regardless of what it was in the HPL build script
   27 file.
   28 
   29 
   30 2  Compiling
   31 *=*=*=*=*=*=
   32 
   33    The first step is to create a build script file that reflects
   34 characteristics of your machine. This file is reused by all the
   35 components of the HPC Challenge suite. The build script file should be
   36 created in the hpl directory. This directory contains instructions (the
   37 files README and INSTALL) on how to create the build script file for
   38 your system. The hpl/setup directory contains many examples of build
   39 script files. A recommended approach is to copy one of them to the hpl
   40 directory and if it doesn't work then change it.
   41   The build script file has a name that starts with Make. prefix and
   42 usally ends with a suffix that identifies the target system. For
   43 example, if the suffix chosen for the system is Unix, the file should be
   44 named Make.Unix.
   45   To build the benchmark executable (for the system named Unix) type:
   46 make arch=Unix. This command should be run in the top directory (not in
   47 the hpl directory). It will look in the hpl directory for the build
   48 script file and use it to build the benchmark executable.
   49   The runtime behavior of the HPC Challenge source code may be
   50 configured at compiled time by defining a few C preprocessor symbols.
   51 They can be defined by adding appropriate options to CCNOOPT and CCFLAGS
   52 make variables. The former controls options for source code files that
   53 need to be compiled without aggressive optimizations to ensure accurate
   54 generation of system-specific parameters. The latter applies to the rest
   55 of the files that need good compiler optimization for best performance.
   56 To define a symbol S, the majority of compilers requires option -DS to
   57 be used. Currently, the following options are available in the
   58 HPC Challenge source code: 
   59  
   60  
   61    - HPCC_FFT_235: if this symbol is defined the FFTE code (an FFT
   62    implementation) will use vector sizes and processor counts that are
   63    not limited to powers of 2. Instead, the vector sizes and processor
   64    counts to be used will be a product of powers of 2, 3, and 5.
   65  
   66    - HPCC_FFTW_ESTIMATE: if this symbol is defined it will affect the
   67    way external FFTW library is called (it does not have any effect if
   68    the FFTW library is not used). When defined, this symbol will call
   69    the FFTW planning routine with FFTW_ESTIMATE flag (instead of
   70    FFTW_MEASURE). This might result with worse performance results but
   71    shorter execution time of the benchmark. Defining this symbol may
   72    also positively affect the memory fragmentation caused by the FFTW's
   73    planning routine.
   74  
   75    - HPCC_MEMALLCTR: if this symbol is defined a custom memory allocator
   76    will be used to alleviate effects of memory fragmentation and allow
   77    for larger data sets to be used which may result in obtaining better
   78    performance.
   79  
   80    - HPL_USE_GETPROCESSTIMES: if this symbol is defined then
   81    Windows-specific GetProcessTimes() function will be used to measure
   82    the elapsed CPU time.
   83  
   84    - USE_MULTIPLE_RECV: if this symbol is defined then multiple
   85    non-blocking receives will be posted simultaneously. By default only
   86    one non-blocking receive is posted.
   87  
   88    - RA_SANDIA_NOPT: if this symbol is defined the HPC Challenge
   89    standard algorithm for Global RandomAccess will not be used. Instead,
   90    an alternative implementation from Sandia National Laboratory will be
   91    used. It routes messages in software across virtual hyper-cube
   92    topology formed from MPI processes.
   93  
   94    - RA_SANDIA_OPT2: if this symbol is defined the HPC Challenge
   95    standard algorithm for Global RandomAccess will not be used. Instead,
   96    instead an alternative implementation from Sandia National Laboratory
   97    will be used. This implementation is optimized for number of
   98    processors being powers of two. The optimizations are sorting of data
   99    before sending and unrolling the data update loop. If the number of
  100    process is not a power two then the code is the same as the one
  101    performed with the RA_SANDIA_NOPT setting.
  102  
  103    - RA_TIME_BOUND_DISABLE: if this symbol is defined then the standard
  104    Global RandomAccess code will be used without time limits. This is
  105    discouraged for most runs because the standard algorithm tends to be
  106    slow for large array sizes due to a large overhead for short MPI
  107    messages.
  108  
  109    - USING_FFTW: if this symbol is defined the standard HPC Challenge
  110    FFT implemenation (called FFTE) will not be used. Instead, FFTW
  111    library will be called. Defining the USING_FFTW symbol is not
  112    sufficient: appropriate flags have to be added in the make script so
  113    that FFTW headers files can be found at compile time and the FFTW
  114    libraries at link time.
  115   
  116 
  117 
  118 3  Runtime Configuration
  119 *=*=*=*=*=*=*=*=*=*=*=*=
  120 
  121    The HPC Challenge is driven by a short input file named hpccinf.txt
  122 that is almost the same as the input file for HPL (customarily called
  123 HPL.dat). Refer to the directory hpl/www/tuning.html for details about
  124 the input file for HPL. A sample input file is included with the
  125 HPC Challenge distribution.
  126   The differences between HPL's input file and HPC Challenge's input
  127 file can be summarized as follows:
  128   
  129   
  130    - Lines 3 and 4 are ignored. The output is always appended to the
  131    file named hpccoutf.txt. 
  132    - There are additional lines (starting with line 33) that may (but do
  133    not have to) be used to customize the HPC Challenge benchmark. They
  134    are described below. 
  135   
  136   The additional lines in the HPC Challenge input file (compared to the
  137 HPL input file) are:
  138   
  139   
  140    - Lines 33 and 34 describe additional matrix sizes to be used for
  141    running the PTRANS benchmark (one of the components of the
  142    HPC Challenge benchmark). 
  143    - Lines 35 and 36 describe additional blocking factors to be used for
  144    running the PTRANS test. 
  145   
  146   Just for completeness, here is the list of lines of the HPC
  147 Challenge's input file and brief description of their meaning: 
  148   
  149    - Line 1: ignored 
  150    - Line 2: ignored 
  151    - Line 3: ignored 
  152    - Line 4: ignored 
  153    - Line 5: number of matrix sizes for HPL (and PTRANS) 
  154    - Line 6: matrix sizes for HPL (and PTRANS) 
  155    - Line 7: number of blocking factors for HPL (and PTRANS) 
  156    - Line 8: blocking factors for HPL (and PTRANS) 
  157    - Line 9: type of process ordering for HPL 
  158    - Line 10: number of process grids for HPL (and PTRANS) 
  159    - Line 11: numbers of process rows of each process grid for HPL (and
  160    PTRANS) 
  161    - Line 12: numbers of process columns of each process grid for HPL
  162    (and PTRANS) 
  163    - Line 13: threshold value not to be exceeded by scaled residual for
  164    HPL (and PTRANS) 
  165    - Line 14: number of panel factorization methods for HPL 
  166    - Line 15: panel factorization methods for HPL 
  167    - Line 16: number of recursive stopping criteria for HPL 
  168    - Line 17: recursive stopping criteria for HPL 
  169    - Line 18: number of recursion panel counts for HPL 
  170    - Line 19: recursion panel counts for HPL 
  171    - Line 20: number of recursive panel factorization methods for HPL 
  172    - Line 21: recursive panel factorization methods for HPL 
  173    - Line 22: number of broadcast methods for HPL 
  174    - Line 23: broadcast methods for HPL 
  175    - Line 24: number of look-ahead depths for HPL 
  176    - Line 25: look-ahead depths for HPL 
  177    - Line 26: swap methods for HPL 
  178    - Line 27: swapping threshold for HPL 
  179    - Line 28: form of L1 for HPL 
  180    - Line 29: form of U for HPL 
  181    - Line 30: value that specifies whether equilibration should be used
  182    by HPL 
  183    - Line 31: memory alignment for HPL 
  184    - Line 32: ignored 
  185    - Line 33: number of additional problem sizes for PTRANS 
  186    - Line 34: additional problem sizes for PTRANS 
  187    - Line 35: number of additional blocking factors for PTRANS 
  188    - Line 36: additional blocking factors for PTRANS 
  189   
  190 
  191 
  192 4  Running
  193 *=*=*=*=*=
  194 
  195    The exact way to run the HPC Challenge benchmark depends on the MPI
  196 implementation and system details. An example command to run the
  197 benchmark could like like this: mpirun -np 4 hpcc. The meaning of the
  198 command's components is as follows: 
  199   
  200    - mpirun is the command that starts execution of an MPI code.
  201    Depending on the system, it might also be aprun, mpiexec, mprun, poe,
  202    or something appropriate for your computer.
  203  
  204    - -np 4 is the argument that specifies that 4 MPI processes should be
  205    started. The number of MPI processes should be large enough to
  206    accomodate all the process grids specified in the hpccinf.txt file.
  207  
  208    - hpcc is the name of the HPC Challenge executable to run. 
  209   
  210   After the run, a file called hpccoutf.txt is created. It contains
  211 results of the benchmark. This file should be uploaded through the web
  212 form at the HPC Challenge website.
  213 
  214 
  215 5  Source Code Changes across Versions (ChangeLog)
  216 *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  217 
  218   
  219 
  220 
  221 5.1  Version 1.5.0 (2016-03-18)
  222 ===============================
  223    
  224   
  225    1. Fixed memory leak in STREAM code. 
  226    2. Fixed bug in STREAM that resulted in minimum results reported as
  227    0. 
  228    3. Removed some of the compilation warnings. 
  229   
  230 
  231 
  232 5.2  Version 1.5.0beta (2015-07-23)
  233 ===================================
  234    
  235   
  236    1. Added new targets to the main make(1) file. 
  237    2. Fixed bug introduced while updating to MPI STREAM 1.7 with
  238    spurious global communicator (reported by NEC). 
  239    3. Added make(1) file for OpenMPI from MacPorts. 
  240    4. Fixed bug introduced while updating to MPI STREAM 1.7 that caused
  241    some ranks to use NULL communicator. 
  242    5. Fixed bug introduced while updating to MPI STREAM 1.7 that caused
  243    syntax errors. 
  244   
  245 
  246 
  247 5.3  Version 1.5.0alpha (2015-05-22)
  248 ====================================
  249    
  250   
  251    1. Added global error accounting in STREAM. 
  252    2. Updated checking to report from multiple MPI processes
  253    contributing to overall error. 
  254    3. Added barrier to make sure all processes enter STREAM kernel tests
  255    at the same time. 
  256    4. Updated naming conventions to match the original benchmark in
  257    STREAM. 
  258    5. Changed scaling constant to prevent verification from overflowing
  259    in STREAM. 
  260    6. Simplified MPI communicator code in STREAM. 
  261    7. Substituted large constants for more descriptive compile time
  262    arithmetic in STREAM. 
  263    8. Added the "restrict" keyword to the STREAM vector pointers for
  264    faster generated code. 
  265    9. Updated STREAM code to the official STREAM MPI version 1.7. 
  266    10. Removed infinite loop due to default compiler optimization in
  267    DLAMCH and SLAMCH. 
  268    11. Added compiler flags to allow compiling with a C++ compiler. 
  269   
  270 
  271 
  272 5.4  Version 1.4.3 (2013-08-26)
  273 ===============================
  274    
  275   
  276    1. Increased the size of scratch vector for local FFT tests that was
  277    missed in the previous version (reported by SGI). 
  278    2. Added Makefile for Blue Gene/P contributed by Vasil Tsanov. 
  279   
  280 
  281 
  282 5.5  Version 1.4.2 (2012-10-12)
  283 ===============================
  284    
  285   
  286    1. Increased sizes of scratch vectors for local FFT tests to account
  287    for runs on systems with large main memory (reported by IBM, SGI and
  288    Intel). 
  289    2. Reduced vector size for local FFT tests due to larger scratch
  290    space needed. 
  291    3. Added a type cast to prevent overflow of a 32-bit integer vector
  292    size in FFT data generation routine (reported by IBM). 
  293    4. Fixed variable types to handle array sizes that overflow 32-bit
  294    integers in RandomAccess (reported by IBM and SGI). 
  295    5. Changed time-bound code to be used by default in Global
  296    RandomAccess and allowed for it to be switched off with a compile
  297    time flag if necessary. 
  298    6. Code cleanup to allow compilation without warnings of RandomAccess
  299    test. 
  300    7. Changed communication code in PTRANS to avoid large message sizes
  301    that caused problems in some MPI implementations. 
  302    8. Updated documentation in README.txt and README.html files. 
  303   
  304 
  305 
  306 5.6  Version 1.4.1 (2010-06-01)
  307 ===============================
  308    
  309   
  310    1. Added optimized variants of RandomAccess that use Linear
  311    Congruential Generator for random number generation. 
  312    2. Made corrections to comments that provide definition of the
  313    RandomAccess test. 
  314    3. Removed initialization of the main array from the timed section of
  315    optimized versions of RandomAccess. 
  316    4. Fixed the length of the vector used to compute error when using
  317    MPI implementation from FFTW. 
  318    5. Added global reduction to error calculation in MPI FFT to achieve
  319    more accurate error estimate. 
  320    6. Updated documentation in README. 
  321   
  322 
  323 
  324 5.7  Version 1.4.0 (2010-03-26)
  325 ===============================
  326    
  327   
  328    1. Added new variant of RandomAccess that uses Linear Congruential
  329    Generator for random number generation. 
  330    2. Rearranged the order of benchmarks so that HPL component runs last
  331    and may be aborted if the performance of other components was not
  332    satisfactory. RandomAccess is now first to assist in tuning the code.
  333    
  334    3. Added global initialization and finalization routine that allows
  335    to properly initialize and finalize external software and hardware
  336    components without changing the rest of the HPCC testing harness. 
  337    4. Lack of hpccinf.txt is no longer reported as error but as a
  338    warning. 
  339   
  340 
  341 
  342 5.8  Version 1.3.2 (2009-03-24)
  343 ===============================
  344    
  345   
  346    1. Fixed memory leaks in G-RandomAccess driver routine. 
  347    2. Made the check for 32-bit vector sizes in G-FFT optional. MKL
  348    allows for 64-bit vector sizes in its FFTW wrapper. 
  349    3. Fixed memory bug in single-process FFT. 
  350    4. Update documentation (README). 
  351   
  352 
  353 
  354 5.9  Version 1.3.1 (2008-12-09)
  355 ===============================
  356    
  357   
  358    1. Fixed a dead-lock problem in FFT component due to use of wrong
  359    communicator. 
  360    2. Fixed the 32-bit random number generator in PTRANS that was using
  361    64-bit routines from HPL. 
  362   
  363 
  364 
  365 5.10  Version 1.3.0 (2008-11-13)
  366 ================================
  367    
  368   
  369    1. Updated HPL component to use HPL 2.0 source code 
  370      
  371       1. Replaced 32-bit Pseudo Random Number Generator (PRNG) with a
  372       64-bit one. 
  373       2. Removed 3 numerical checks of the solution residual with a
  374       single one. 
  375       3. Added support for 64-bit systems with large memory sizes
  376       (before they would overflow during index calculations 32-bit
  377       integers.) 
  378   
  379    2. Introduced a limit on FFT vector size so they fit in a 32-bit
  380    integer (only applicable when using FFTW version 2.) 
  381   
  382 
  383 
  384 5.11  Version 1.2.0 (2007-06-25)
  385 ================================
  386   
  387   
  388   
  389    1. Changes in the FFT component: 
  390      
  391       1. Added flexibility in choosing vector sizes and processor
  392       counts: now the code can do powers of 2, 3, and 5 both
  393       sequentially and in parallel tests. 
  394       2. FFTW can now run with ESTIMATE (not just MEASURE) flag: it
  395       might produce worse performance results but often reduces time to
  396       run the test and cuases less memory fragmentation. 
  397   
  398    2. Changes in the DGEMM component: 
  399      
  400       1. Added more comprehensive checking of the numerical properties
  401       of the test's results. 
  402   
  403    3. Changes in the RandomAccess component: 
  404      
  405       1. Removed time-bound functionality: only runs that perform
  406       complete computation are now possible. 
  407       2. Made the timing more accurate: main array initialization is not
  408       counted towards performance timing. 
  409       3. Cleaned up the code: some non-portable C language constructs
  410       have been removed. 
  411       4. Added new algorithms: new algorithms from Sandia based on
  412       hypercube network topology can now be chosen at compile time which
  413       results on much better performance results on many types of
  414       parallel systems. 
  415       5. Fixed potential resource leaks by adding function calls rquired
  416       by the MPI standard. 
  417   
  418    4. Changes in the HPL component: 
  419      
  420       1. Cleaned up reporting of numerics: more accurate printing of
  421       scaled residual formula. 
  422   
  423    5. Changes in the PTRANS component: 
  424      
  425       1. Added randomization of virtual process grids to measure
  426       bandwidth of the network more accurately. 
  427   
  428    6. Miscellaneous changes: 
  429      
  430       1. Added better support for Windows-based clusters by taking
  431       advantage of Win32 API. 
  432       2. Added custom memory allocator to deal with memory fragmentation
  433       on some systems. 
  434       3. Added better reporting of configuration options in the output
  435       file. 
  436   
  437   
  438 
  439 
  440 5.12  Version 1.0.0 (2005-06-11)
  441 ================================
  442   
  443 
  444 
  445 5.13  Version 0.8beta (2004-10-19)
  446 ==================================
  447   
  448 
  449 
  450 5.14  Version 0.8alpha (2004-10-15)
  451 ===================================
  452   
  453 
  454 
  455 5.15  Version 0.6beta (2004-08-21)
  456 ==================================
  457   
  458 
  459 
  460 5.16  Version 0.6alpha (2004-05-31)
  461 ===================================
  462   
  463 
  464 
  465 5.17  Version 0.5beta (2003-12-01)
  466 ==================================
  467   
  468 
  469 
  470 5.18  Version 0.4alpha (2003-11-13)
  471 ===================================
  472   
  473 
  474 
  475 5.19  Version 0.3alpha (2004-11-05)
  476 ===================================
  477   
  478 -----------------------------------------------------------------------
  479   
  480    This document was translated from LaTeX by HeVeA (2).
  481 -----------------------------------
  482   
  483   
  484  (1) University of Tennessee Knoxville, Innovative Computing Laboratory
  485  (2) http://hevea.inria.fr/index.html