"Fossies" - the Fresh Open Source Software Archive

Member "scalasca-2.6/OPEN_ISSUES" (19 Apr 2021, 13048 Bytes) of package /linux/misc/scalasca-2.6.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the last Fossies "Diffs" side-by-side code changes report for "OPEN_ISSUES": 2.3.1_vs_2.4.

    1 
    2                          SCALASCA v2.6 OPEN ISSUES
    3                          =========================
    4                            (Status:  April 2021)
    5 
    6 This file lists known limitations and unimplemented features of various
    7 Scalasca components.
    8 
    9 -----------------------------------------------------------------------------
   10 
   11 * Platform support
   12 
   13   - Scalasca has been tested on the following platforms:
   14      + IBM BladeCenter & iDataPlex clusters
   15      + Cray XC series (x86_64, AArch64)
   16      + various Linux-based clusters (x86_64, Power8/9, armhf, AArch64)
   17      + Intel Xeon Phi (KNL)
   18 
   19     In addition, the provided configure options (see INSTALL) may provide a
   20     good basis for building and testing the toolset on other systems.  Please
   21     report success/failure on other platforms to the Scalasca development
   22     team.
   23 
   24   - The following platforms have not been tested recently, however, the
   25     supplied build system might still work on those systems:
   26      + IBM Blue Gene/Q
   27      + IBM Blue Gene/P
   28      + AIX-based clusters
   29      + Cray XT, XE, XK series
   30      + Fujitsu FX10, FX100, and K computer
   31      + Intel Xeon Phi (KNC)
   32      + Oracle/Sun Solaris/SPARC-based clusters
   33 
   34   - On the Intel Xeon Phi platform, only Intel compilers and Intel MPI are
   35     currently supported.
   36 
   37 
   38 -----------------------------------------------------------------------------
   39 
   40 * SCOUT parallel trace analysis
   41 
   42   - The OpenMP and hybrid MPI/OpenMP versions of the SCOUT parallel trace
   43     analyzer (and its associated libraries) have been found not to build or
   44     execute incorrectly with old PGI pgCC versions (verified with, e.g.,
   45     v10.5, v13.9, v14.1).  Consequently, Scalasca should be configured using
   46     '--disable-openmp' to skip building the corresponding scout.omp and
   47     scout.hyb executables.  Pathscale and other compilers may have similar
   48     problems and also need the same treatment.
   49 
   50   - If it is not possible to build the required versions of SCOUT, or it
   51     fails to run reliably, it may be possible to substitute a version built
   52     with a different compiler (such as GCC) when doing measurement collection
   53     & analysis (e.g., in a batch job).
   54 
   55   - The MPI and hybrid MPI/OpenMP versions of the SCOUT parallel analyzer
   56     must be run as an MPI program with exactly the same number of processes
   57     as contained in the experiment to analyze: typically it will be
   58     convenient to launch SCOUT immediately following the measurement in a
   59     single batch script so that the MPI launch command can be configured
   60     similarly for both steps.  The SCAN nexus executes SCOUT with the
   61     appropriate launch configuration when automatic trace analysis is
   62     specified.
   63 
   64   - If the appropriate variant of SCOUT (e.g., scout.hyb for hybrid
   65     OpenMP/MPI) is not located by SCAN, it attempts to substitute an
   66     alternative variant which will generally result in only partial trace
   67     analysis (e.g., scout.mpi will ignore OpenMP events in hybrid traces).
   68 
   69   - SCOUT is unable to analyze hybrid OpenMP/MPI traces of applications using
   70     MPI_THREAD_MULTIPLE and generally unable to handle MPI_THREAD_SERIALIZED,
   71     therefore it is necessary to enforce use of MPI_THREAD_FUNNELED.
   72 
   73   - SCOUT is unable to analyze traces of OpenMP or hybrid OpenMP/MPI
   74     applications using nested parallelism, untied tasks, or using varying
   75     numbers of threads for parallel regions (e.g., due to "num_threads(#)" or
   76     "if" clauses).
   77 
   78   - Although SCOUT is able to handle traces including OpenMP tasking events,
   79     no tasking-specific analysis is performed yet.  Also, calculated wait
   80     states in OpenMP barriers as well as results of the root-cause and
   81     critical-path analysis are currently incorrect in the presence of OpenMP
   82     tasks.
   83 
   84   - For traces including OpenMP tasking events, the CubeGUI and CubeLib
   85     command-line tools up to and including version 4.5 show incorrect metric
   86     values for the whole-program call path (existent for traces generated by
   87     Score-P v5.0 and above).  This is due to accounting tasks twice in the
   88     inclusify/exclusify calculations, and also applies to derived metrics,
   89     for example, calculated during the report post-processing.  This issue
   90     will be addressed in future versions of CubeLib/CubeGUI.
   91 
   92   - Analysis of traces using POSIX threads is performed using the OpenMP and
   93     OpenMP/MPI versions of SCOUT, which will try to launch one analysis
   94     thread for every POSIX thread recorded in the trace.  That is, if the
   95     application repeatedly creates and joins POSIX threads, the analysis is
   96     likely to oversubscribe the system (thus, potentially increasing analysis
   97     times significantly), and the OpenMP runtime may not even be able to
   98     create the required number of analysis threads.
   99 
  100   - SCOUT is unable to analyze traces of applications using CUDA or OpenCL
  101     that contain events for compute kernels and/or memory transfers recorded
  102     on separate locations.  Yet, analysis of traces including only host-side
  103     events from CUDA, OpenCL, or OpenACC is possible, also when used in
  104     combination with MPI and/or OpenMP.  However, care has to be taken when
  105     interpreting analysis results of such experiments, as the critical-path
  106     and root-cause analysis only consider host-side events.  For example,
  107     SCOUT may identify a GPU synchronization operation as the delay causing
  108     some wait states, while the real culprit are the compute kernels causing
  109     the synchronization operation to block.
  110 
  111   - SCOUT is unable to analyze traces of applications using SHMEM, even when
  112     used in combination with MPI and/or OpenMP.
  113 
  114   - SCOUT is unable to analyze traces that include per-process metric events
  115     on separate locations.  In particular, this applies to memory tracking
  116     events measured using, for example, "SCOREP_MEMORY_RECORDING=true" or
  117     "SCOREP_MPI_MEMORY_RECORDING=true".
  118 
  119   - SCOUT is unable to analyze incomplete traces or traces that it is unable
  120     to load entirely into memory.  Experiment archives are portable to other
  121     systems where sufficient processors with additional memory are available
  122     and a compatible version of SCOUT is installed, however, the size of such
  123     experiment archives typically prohibits this.
  124 
  125   - In rare cases, the SCOUT analysis of traces generated with Score-P v6.0
  126     or earlier built against OTF2 v2.2 or higher may produce negative time
  127     values.  This is due to a code inconsistency in the given versions,
  128     which is fixed in Score-P v7.0 (using OTF2 2.2 or higher).
  129 
  130   - SCOUT requires user-specified instrumentation blocks to correctly nest
  131     and match for it to be able to analyze resulting measurement traces.
  132     Similarly, collective operations must be recorded by all participating
  133     processes and messages recorded as sent (or received) by one process must
  134     be recorded as received (or sent) by another process, otherwise SCOUT can
  135     be expected to deadlock during trace analysis.
  136 
  137   - SCOUT ignores hardware counter measurements recorded in traces.  If
  138     measurement included simultaneous runtime summarization and tracing, the
  139     two reports are automatically combined during experiment post-processing.
  140 
  141   - SCOUT is unable to handle old EPILOG trace files stored in SIONlib
  142     containers.  Also, the lock contention analysis will not be performed
  143     for EPILOG traces.
  144 
  145   - SCOUT may deadlock and be unable to analyze measurement experiments:
  146     should you suspect this to be the case, please save the experiment
  147     archive and contact the Scalasca development team for it to be
  148     investigated (see instructions in the User Guide with details on which
  149     information to include in bug reports).
  150 
  151   - Traces that SCOUT is unable to analyze may still be visualized and
  152     interactively analyzed by 3rd-party tools such as VAMPIR.
  153 
  154   - Issues related to Score-P's sampling/unwinding feature:
  155 
  156       - The current implementation of handling traces generated with
  157         sampling/unwinding is known to be memory inefficient.  That is,
  158         it may only be possible to analyze small trace files.
  159 
  160       - Obviously, limitations of the Score-P measurement system also
  161         propagate to Scalasca via the generated trace data.  In particular,
  162         analysis results of programs using OpenMP are likely to include
  163         unexpected callpaths for worker threads.
  164 
  165       - The aforementioned limitation also leads to the calculation of
  166         bogus OpenMP thread management times.
  167 
  168 -----------------------------------------------------------------------------
  169 
  170 * SCAN collection & analysis launcher
  171 
  172   This utility attempts to parse MPI launcher commands to be able to launch
  173   measurement collection along with subsequent trace analysis when
  174   appropriate.  It also attempts to determine whether measurement and
  175   analysis are likely to be blocked by various configuration issues, before
  176   performing the actual launch(es).  Such basic checks might be invalid in
  177   certain circumstances, and inhibit legitimate measurement and analysis
  178   launches.
  179 
  180   While it has been tested with a selection of MPI launchers (on different
  181   systems, interactively and via batch systems), it is not possible to test
  182   all versions, combinations and configuration/launch arguments, and if the
  183   current support is inadequate for a particular setup, details should be
  184   sent to the developers for investigation.  In general, launcher flags that
  185   require one or more arguments can be ignored by SCAN if they are quoted,
  186   e.g., $MPIEXEC -np 32 "-ignore arg1 arg2" target arglist would ignore the
  187   "-ignore arg1 arg2" flag and arguments.
  188 
  189   Although SCAN parses launcher arguments from the given command-line (and in
  190   certain cases also launcher environment variables), it does not parse
  191   launcher configurations from command-files (regardless of whether they are
  192   specified on the command-line or otherwise).  Since the part of the
  193   launcher configuration specified in this way is ignored by SCAN, but will
  194   be used for the measurement and analysis steps launched, this may lead to
  195   undesirable discrepancies.  If command-files are used for launcher
  196   configuration, it may therefore be necessary or desirable to repeat some of
  197   their specifications on the command-line to make it visible to SCAN.
  198 
  199   SCAN only parses the command-line as far as the target executable, assuming
  200   that subsequent flags/parameters are intended solely for the target itself.
  201   Unfortunately, some launchers (notably POE) allow MPI configuration options
  202   after the target executable, where SCAN won't find them and therefore won't
  203   use them when launching the parallel trace analyzer.  A workaround is to
  204   specify POE configuration options via environment variables instead, e.g.,
  205   specify MP_PROCS instead of -procs.
  206 
  207   SCAN uses getopt_long_only() (either from the system's C library or GNU
  208   libiberty) to parse launcher options.  Older versions seem to have a bug
  209   that fails to stop parsing when the first non-option (typically the target
  210   executable) is encountered: a workaround in such cases is to insert "--"
  211   in the commandline before the target executable, e.g., scan -t mpirun -np 4
  212   -- target.exe arglist.
  213 
  214   If an MPI launcher is used that is not recognized by SCAN, such as one that
  215   has been locally customized, it can be specified via an environment
  216   variable, e.g., SCAN_MPI_LAUNCHER=mympirun, to have SCAN accept it.
  217   Warning: In such a case, SCAN's parsing of the launcher's arguments may
  218   fail.
  219 
  220   Some MPI launchers result in some or all program output being buffered
  221   until execution terminates.  In such cases, SCAN_MPI_REDIRECT can be set to
  222   redirect program standard and error output to separate files in the
  223   experiment archive.
  224 
  225   If necessary, or preferred, measurement and analysis launches can be
  226   performed without using SCAN, resulting in "default" measurement collection
  227   or explicit trace analysis (based on the effective Score-P configuration).
  228 
  229   SCAN automatic trace analysis of hybrid MPI/OpenMP applications primarily
  230   is done with an MPI/OpenMP version of the SCOUT trace analyzer
  231   (scout.hyb).  When this is not available, or when the MPI-only version of
  232   the trace analyzer (scout.mpi) is specified, analysis results are provided
  233   for the master threads only.
  234 
  235 -----------------------------------------------------------------------------
  236 
  237  * Other known issues
  238 
  239   - In contrast to Score-P v6.0 and above, the Cube metric hierarchy
  240     remapping specification file shipped with Scalasca currently classifies
  241     all MPI request finalization calls (i.e., MPI_Test[all|any|some] and
  242     MPI_Wait[all|any|some]) as point-to-point, regardless of whether they
  243     actually finalize a point-to-point request, a file I/O request, or a
  244     non-blocking collective operation.  This causes incorrect results
  245     when merging already post-processed runtime summary and trace analysis
  246     reports.  However, this is not an issue when post-processing reports
  247     after merging.  This inconsistency will be addressed in a future
  248     release.