"Fossies" - the Fresh Open Source Software Archive

Member "scalasca-2.6/doc/quick-reference/quickref.tex" (19 Apr 2021, 28091 Bytes) of package /linux/misc/scalasca-2.6.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 %%*************************************************************************%%
    2 %%  SCALASCA    http://www.scalasca.org/                                   %%
    3 %%*************************************************************************%%
    4 %%  Copyright (c) 1998-2019                                                %%
    5 %%  Forschungszentrum Juelich GmbH, Juelich Supercomputing Centre          %%
    6 %%                                                                         %%
    7 %%  Copyright (c) 2009-2014                                                %%
    8 %%  German Research School for Simulation Sciences GmbH,                   %%
    9 %%  Laboratory for Parallel Programming                                    %%
   10 %%                                                                         %%
   11 %%  This software may be modified and distributed under the terms of       %%
   12 %%  a BSD-style license.  See the COPYING file in the package base         %%
   13 %%  directory for details.                                                 %%
   14 %%*************************************************************************%%
   15 
   16 
   17 
   18 \documentclass[a4paper]{article}
   19 
   20 
   21 % Include configuration
   22 \include{config}
   23 
   24 % Extra packages
   25 \usepackage[USenglish]{babel}
   26 \usepackage{pslatex}
   27 \usepackage{tabularx}
   28 \usepackage{xspace}
   29 
   30 % Page & text layout
   31 \usepackage{geometry}
   32 \geometry{twoside=false,a4paper,hmargin={2cm,2cm},vmargin={2cm,2cm}}
   33 \setlength{\parindent}{0cm}
   34 \setlength{\headheight}{34pt}
   35 \setlength{\footskip}{15pt}
   36 \renewcommand{\arraystretch}{1.1}
   37 
   38 % Headers & footers
   39 \usepackage{fancyhdr}
   40 \pagestyle{fancy}
   41 \makeatletter
   42 \fancyhf{}
   43 \fancyhead[L]{\bfseries\LARGE\@title}
   44 \fancyhead[R]{\includegraphics[height=2.0\baselineskip]{scalasca_logo}}%
   45 \fancyfoot[L]{Project URL: \texttt{\PackageUrl}}
   46 \fancyfoot[C]{\thepage}
   47 \fancyfoot[R]{Contact/support: \texttt{\PackageBugreport}}
   48 \makeatother
   49 \renewcommand{\footrulewidth}{\headrulewidth}
   50 
   51 % Graphics
   52 \usepackage{graphicx}
   53 
   54 % Shortcuts
   55 \newcommand{\Version}{v\PackageMajor.\PackageMinor\xspace}
   56 \newcommand{\Scalasca}{\textsc{Scalasca}\xspace}
   57 \newcommand{\Scorep}{\textsc{Score-P}\xspace}
   58 \newcommand{\Cube}{\textsc{Cube}\xspace}
   59 
   60 % Document setup
   61 \title{\Scalasca \Version Quick Reference}
   62 
   63 
   64 \begin{document}
   65 
   66 %--- GENERAL ----------------------------------------------------------------
   67 
   68 \subsection*{General}
   69 
   70 \begin{itemize}
   71   \item \Scalasca is an open-source toolset for scalable performance analysis of
   72         large-scale parallel applications.
   73   \item \Scalasca uses the community measurement system \Scorep to generate profiles and
   74         tracing results.
   75   \item Use the \textbf{\texttt{scorep}}  and \textbf{\texttt{scalasca}} commands with
   76         appropriate action flags to
   77         \textit{instrument\/} application object files and executables,
   78         \textit{analyze\/} execution measurements, and interactively
   79         \textit{examine\/} measurement/analysis experiment archives.
   80   \item For short usage explanations, use \Scalasca commands without arguments,
   81         or add `\textbf{\texttt{-v}}' for verbose commentary.
   82   \item View possible parameters for the \Scorep instrumenter by calling
   83         \textbf{\texttt{scorep --help}}
   84 \end{itemize}
   85 
   86 %--- INSTRUMENTATION --------------------------------------------------------
   87 
   88 \subsection*{\Scorep Instrumentation}
   89 \begin{itemize}
   90   \item Prepend \textbf{\texttt{scorep}} and any instrumentation flags to your
   91         compile/link commands.  Alternatively, use the \Scorep compiler wrappers.
   92   \item Additional options to the instrumenter must be specified
   93         \textit{before} the compiler/linker command.
   94   \item By default, MPI and OpenMP operations are automatically instrumented, and
   95         many compilers are also able to instrument all routines found in source files
   96         (unless explicitly disabled with \textbf{\texttt{--nocompiler}}).
   97   \item To enable manual instrumentation (described on page \pageref{sec:manual_inst})
   98         using the user instrumentation API, add \textbf{\texttt{--user}},
   99         and/or using \textsc{pomp} directives, add \textbf{\texttt{--pomp}}.
  100   \item To enable automatic source-to-source function instrumentation with PDToolkit, use \textbf{\texttt{--pdt}}.
  101   \item To enable CUDA instrumentation use \textbf{\texttt{--cuda}}.
  102   \item Examples: \\
  103     \begin{minipage}[t]{0.4\linewidth}
  104       Original command: \\\ttfamily
  105       mpicc -c foo.c \\
  106       mpicxx -o foo foo.cpp \\
  107       mpif90 -openmp -o bar bar.f90
  108     \end{minipage}
  109     \begin{minipage}[t]{0.59\linewidth}
  110       \Scorep instrumentation command: \\\ttfamily
  111       \textbf{scorep} mpicc -c foo.c \\
  112       \textbf{scorep --user} mpicxx -o foo foo.cpp \\
  113       \textbf{scorep} mpif90 -openmp -o bar bar.f90
  114     \end{minipage}
  115   \item Often it is preferable to prefix Makefile compile/link commands
  116         with \texttt{\$(PREP)} and set \texttt{PREP="scorep"} for
  117         instrumented builds (leaving \texttt{PREP} unset for uninstrumented builds).
  118 \end{itemize}
  119 
  120 
  121 %--- MEASUREMENT / ANALYSIS -------------------------------------------------
  122 
  123 \subsection*{Measurement \& Analysis}
  124 \begin{itemize}
  125   \item Prepend \textbf{\texttt{scalasca -analyze}} (or \textbf{\texttt{\em scan}})
  126         to the usual execution command line to perform a measurement with
  127         \Scalasca runtime summarization and associated automatic trace analysis
  128         (if applicable).
  129   \item \Scorep instrumented applications can be run without prefix, using only
  130         environment variables for control. In this case the trace analysis won't
  131         be started automatically.
  132   \item To reuse an existing measurement for analysis, add the flag \textbf{\texttt{-a}}.
  133   \item Each measurement is stored in a new experiment archive
  134         which is not overwritten by a subsequent measurement.
  135   \item By default, only a runtime summary (profile) is collected
  136         (equivalent to specifying \textbf{\texttt{-s}}).
  137   \item To enable trace collection \& analysis, add the flag \textbf{\texttt{-t}}.
  138   \item An archive directory name can be explicitly specified with
  139         \textbf{\texttt{scan -e {\em title}}}.
  140   \item To analyze MPI and hybrid OpenMP+MPI applications,
  141         use the usual MPI launcher command and arguments.
  142   \item To analyze serial and (pure) OpenMP applications, omit the MPI launcher
  143         command.
  144   \item Examples:\\
  145     \begin{minipage}[t]{0.255\linewidth}
  146       Original command: \\\ttfamily
  147       mpiexec -np 4 foo args \\
  148       OMP\_NUM\_THREADS=3 bar \\
  149       mpiexec -np 4 foobar
  150     \end{minipage}
  151     \begin{minipage}[t]{0.5\linewidth}
  152       \Scalasca measurement \& analysis command: \\\ttfamily
  153       \textbf{scalasca -analyze} mpiexec -np 4 foo args \\
  154       OMP\_NUM\_THREADS=3 \textbf{scan -t} bar \\
  155       \textbf{scan -s} mpiexec -np 4 foobar
  156     \end{minipage}
  157     \begin{minipage}[t]{0.325\linewidth}
  158       Experiment archive: \\\ttfamily
  159       \# scorep\_foo\_4\_sum \\
  160       \# scorep\_bar\_Ox3\_trace \\
  161       \# scorep\_foobar\_4x3\_sum \\
  162       \# (w/ 3 OpenMP threads)
  163     \end{minipage}
  164 \end{itemize}
  165 
  166 
  167 \subsubsection*{Measurement configuration}
  168 
  169 Measurement is controlled by a number of variables which can be set through
  170 corresponding environment variables: the configuration is stored in the
  171 experiment archive as \texttt{scorep.cfg}. The most important variables are:
  172 \\[1ex]
  173 \begin{tabularx}{\linewidth}{lX@{\hspace*{10mm}}l}
  174   \textbf{Variable} & \textbf{Purpose} & \textbf{\Scorep default} \\
  175 
  176   \texttt{SCOREP\_EXPERIMENT\_DIRECTORY} &
  177     Experiment archive title, explicitly specified by \texttt{-e} or
  178     automatically given a reasonable name if not specified.
  179     &
  180     \texttt{scorep-}\textit{timestamp} \\
  181 
  182    \texttt{SCOREP\_ENABLE\_PROFILING} &
  183     Enabling or disabling profile generation.
  184     &
  185     true \\
  186 
  187    \texttt{SCOREP\_ENABLE\_TRACING} &
  188     Enabling or disabling trace generation.
  189     &
  190     false \\
  191 
  192  \texttt{SCOREP\_FILTERING\_FILE} &
  193     Name of file containing a specification of functions which should be
  194     ignored during measurement.
  195     &
  196     --- \\
  197 
  198    \texttt{SCOREP\_VERBOSE} &
  199     Controls generation of additional (debugging) output by measurement
  200     system. &
  201     false \\
  202 
  203   \texttt{SCOREP\_TOTAL\_MEMORY} &
  204     Size of per-process memory reserved for \Scorep in bytes. &
  205     16\,384\,000 \\
  206 \end{tabularx}
  207 \\[1ex]
  208 The full list of supported configuration variables and their values can be retrieved using
  209 \texttt{scorep-info config-vars --full}.
  210 
  211 %--- Score-P ----------------------------------------------------------
  212 
  213 \subsection*{\Scorep experiment archives -- typical directory content}
  214 
  215     \begin{tabularx}{\linewidth}{lX}
  216       \textbf{File} & \textbf{Description} \\
  217         \texttt{profile.cubex} &
  218                 Analysis report of runtime summarization. \\
  219         \texttt{scorep.cfg} &
  220                 Measurement configuration when the experiment was collected.\\
  221         \texttt{scorep.log} &
  222                 Output of the instrumented program and measurement system.\\
  223         \texttt{scout.cubex} &
  224                 Intermediate analysis report of the parallel trace analyzer.\\
  225         \texttt{scout.log} &
  226                  Output of the parallel trace analyzer.\\
  227        \texttt{summary.cubex} &
  228                 Post-processed analysis report of runtime summarization.
  229                 (May include HWC metrics.)\\
  230         \texttt{trace.cubex} &
  231                 Post-processed analysis report of the trace analyzer.
  232                 (Does not include HWC metrics.)\\
  233         \texttt{trace.stat} &
  234                 Most-severe pattern instances and pattern statistics.\\
  235         \texttt{traces/} &
  236                 Sub-directory containing event traces for each process/thread.\\
  237         \texttt{traces.def/.otf2} &
  238                 Definitions and anchor files for the event traces.\\
  239     \end{tabularx}
  240 
  241 \subsubsection*{Determining trace buffer capacity requirements}
  242 Based on an analysis report, the required trace buffer capacity can be
  243 estimated using
  244 \begin{flushright}
  245   \fbox{\includegraphics[width=28mm]{score_tree}}\\
  246   \vspace*{-22mm}
  247 \end{flushright}
  248 \begin{center}
  249   \ttfamily
  250   scorep-score [-r] [-f \textit{\rmfamily filter\_file}]
  251                \textit{\rmfamily experiment\_archive}/profile.cubex
  252 \end{center}
  253 \begin{itemize}
  254   \item To get detailed information per region (i.e., function or subroutine), use
  255         \texttt{-r}
  256   \item To take a proposed filter file into account, use
  257         \texttt{-f \textit{\rmfamily filter\_file}}
  258   \item The report specifies the maximum estimated
  259         required memory per process, which can be used to set\\
  260         \texttt{SCOREP\_TOTAL\_MEMORY} appropriately to avoid intermediate flushes
  261         in subsequent tracing experiments.
  262 \end{itemize}
  263 
  264 % first column
  265 \begin{minipage}[t]{0.5\textwidth}
  266 The \Scorep filter format allows to include and exclude functions from measurement
  267 using their names with possible use of wildcards.
  268 The respective commands are processed in sequence, allowing for hierarchical inclusion-exclusion schemes.
  269 To the right is an example for a standard filter file.
  270 \end{minipage}
  271 \begin{minipage}[t]{0.49\textwidth}
  272 \ttfamily
  273  \hspace*{2ex} SCOREP\_REGION\_NAMES\_BEGIN \\
  274  \hspace*{5ex}   EXCLUDE \\
  275  \hspace*{9ex}     binvcrhs*  \\
  276  \hspace*{9ex}     matmul\_sub*  \\
  277  \hspace*{2ex} SCOREP\_REGION\_NAMES\_END
  278 \end{minipage}
  279 
  280 \subsection*{\Cube{4} algebra and utilities}
  281 
  282 Uniform behavioural encoding, processing and interactive examination of
  283 parallel application execution analysis reports.
  284 
  285 \begin{itemize}
  286   \item \Cube{4} provides a variety of utilities
  287   for differencing, combining and other operations on analysis reports, e.g.,
  288   \texttt{cube\_diff}, \texttt{cube\_mean}, \texttt{cube\_merge}.
  289   \item \texttt{cube\_cut} can be used to prune uninteresting
  290 call-trees and/or re-root with a specified call-tree node.
  291   \item \texttt{cube\_stat} can be used to produce custom statistical reports in CSV or plain text format.
  292   \item \texttt{cube\_topoassist} can be used to add or modify topology specifications.
  293 \end{itemize}
  294 
  295 \clearpage
  296 
  297 %--- ANALYSIS REPORT EXAMINATION (Qt version) -------------------------------
  298 
  299 \subsection*{Analysis Report Examination}
  300 \begin{itemize}
  301   \item To interactively examine the contents of a \Scalasca experiment,
  302         after final processing of runtime summary and trace analysis,
  303         use \textbf{\texttt{scalasca -examine}} (or \textbf{\texttt{\em square}}) with the
  304         experiment archive directory name as argument.
  305   \item To skip the graphical user interface and get a textual score output (using the
  306         \texttt{scorep-score} utility), add the \textbf{\texttt{-s}} flag.
  307   \item If multiple analysis reports are available,
  308         a trace analysis report is shown in preference to a runtime
  309         summary report: other reports can be specified directly or selected
  310         from the File/Open menu.
  311   \item Results are displayed using three coupled tree browsers showing
  312     \begin{itemize}
  313       \setlength{\itemsep}{0cm}
  314       \item Metrics (i.e., performance properties/problems)
  315       \item Call-tree or flat region profile
  316       \item System location (alternative: graphical displays, such as
  317             box/violin plot, Sunburst view, or physical/virtual Cartesian
  318             topologies. Topologies with more than 3 dimensions will be folded).
  319     \end{itemize}
  320 \end{itemize}
  321 
  322 \includegraphics[viewport=31 56 803 528,clip,width=\linewidth]{ctest-epik-hyb}
  323 
  324 \vspace*{-2mm}
  325 \begin{itemize}
  326   \item Analyses are presented in trees, where collapsed nodes represent
  327 {\em inclusive\/} values (consisting of the value of the node itself and all of
  328 its child nodes), which can be selectively expanded to reveal {\em exclusive\/}
  329 values (i.e., the node `self' value) and child nodes.
  330   \item When a node is selected from any tree,
  331         its {\em severity\/} value (and percentage) are shown in the panel below it,
  332         and that value distributed across the tree(s) to the right of it.
  333 
  334   \item Selective expansion of critical nodes, guided by the
  335 color scale, can be used to hone in on performance problems.
  336 
  337   \item Each tree browser provides additional information via a context menu
  338         (on the right mouse button), such as the description of the selected metric
  339         or source code for the selected region (where available).
  340   \item Metric severity values can be displayed in various modes: \\[1ex]
  341     \begin{tabularx}{\linewidth}{lX}
  342       \textbf{Mode} & \textbf{Description} \\
  343 
  344       Absolute &
  345         Absolute value in the corresponding unit of measurement. \\
  346 
  347       Root percent &
  348         Percentage relative to the inclusive value of the root node of the
  349         corresponding hierarchy. \\
  350 
  351       Selection percent &
  352         Percentage relative to the value of selected node in corresponding
  353         tree browser to the left. \\
  354 
  355       Peer percent &
  356         Percentage relative to the maximum of all peer values (all values of
  357         the current leaf level). \\
  358 
  359       Peer distribution &
  360         Percentage relative to the maximum and non-zero minimum of all peer
  361         values. \\
  362 
  363       External percent &
  364         Similar to ``Root percent,'' but reference values are taken from
  365         another experiment.
  366     \end{tabularx}
  367 \end{itemize}
  368 
  369 \clearpage
  370 
  371 
  372 %--- MANUAL INSTRUMENTATION --------------------------------------------
  373 
  374 \subsection*{Manual source-code instrumentation}
  375 \label{sec:manual_inst}
  376 
  377 \begin{itemize}
  378   \item Region or phase annotations manually inserted in source files can augment or
  379 substitute automatic instrumentation, and can improve the structure of analysis reports
  380 to make them more readily comprehensible.
  381   \item These annotations can be used to mark any sequence
  382 or block of statements, such as functions, phases, loop nests, etc., and can be
  383 nested, provided that \emph{every enter has a matching exit}.
  384   \item If automatic compiler instrumentation is not used (or not
  385 available), it is typically desirable to manually instrument at least
  386 the \texttt{main} function/program and perhaps its major phases (e.g.,
  387 initialization, core/body, finalization).
  388 \end{itemize}
  389 
  390 \subsection*{{\sc Score-P} user instrumentation API}
  391 \label{sec:Scorep_inst}
  392     \begin{minipage}[t]{0.49\linewidth}
  393       C/C++: \\\ttfamily
  394       \#include <scorep/SCOREP\_User.h> \\
  395       ... \\
  396       void foo() \{ \\
  397       \hspace*{1ex} ... // local declarations \\
  398       \hspace*{1ex} SCOREP\_USER\_FUNC\_BEGIN(); \\
  399       \hspace*{1ex} ... // executable statements \\
  400       \hspace*{1ex} if (...) \{ \\
  401       \hspace*{2ex} SCOREP\_USER\_FUNC\_END(); \\
  402       \hspace*{2ex} return; \\
  403       \hspace*{1ex} \} else \{ \\
  404       \hspace*{2ex} SCOREP\_USER\_REGION\_DEFINE(r\_name); \\
  405       \hspace*{2ex} SCOREP\_USER\_REGION\_BEGIN(r\_name, "bar", \\
  406       \hspace*{10ex} SCOREP\_USER\_REGION\_TYPE\_COMMON); \\
  407       \hspace*{2ex} ... \\
  408       \hspace*{2ex} SCOREP\_USER\_REGION\_END(r\_name); \\
  409       \hspace*{1ex} \} \\
  410       \hspace*{1ex} ... // executable statements \\
  411       \hspace*{1ex} SCOREP\_USER\_FUNC\_END(); \\
  412       \}
  413     \end{minipage}
  414     \begin{minipage}[t]{0.49\linewidth}
  415       Fortran: \\\ttfamily
  416       \#include <scorep/SCOREP\_User.inc> \\
  417       ... \\
  418       subroutine foo \\
  419       \hspace*{1ex} ... ! local declarations \\
  420       \hspace*{1ex} SCOREP\_USER\_FUNC\_DEFINE() \\
  421       \hspace*{1ex} SCOREP\_USER\_REGION\_DEFINE(r\_name) \\
  422       \hspace*{1ex} SCOREP\_USER\_FUNC\_BEGIN("foo") \\
  423       \hspace*{1ex} ... ! executable statements \\
  424       \hspace*{1ex} if (...) then \\
  425       \hspace*{2ex} SCOREP\_USER\_FUNC\_END() \\
  426       \hspace*{2ex} return \\
  427       \hspace*{1ex} else \\
  428       \hspace*{1ex} SCOREP\_USER\_REGION\_BEGIN(r\_name, "bar", \\
  429       \hspace*{9ex} SCOREP\_USER\_REGION\_TYPE\_COMMON) \\
  430       \hspace*{1ex} ... \\
  431       \hspace*{2ex} SCOREP\_USER\_REGION\_END(r\_name) \\
  432       \hspace*{1ex} end if \\
  433       \hspace*{1ex} SCOREP\_USER\_FUNC\_END() \\
  434       end subroutine foo
  435     \end{minipage}
  436 
  437 \begin{itemize}
  438   \item \texttt{SCOREP\_USER\_FUNC\_BEGIN} and \texttt{SCOREP\_USER\_FUNC\_END} are
  439 provided explicitly to mark the entry and exit(s) of functions/subroutines.
  440   \item Function names are automatically provided by C/C++, however, in
  441 annotated Fortran functions/subroutines an appropriate name should be
  442 registered with \texttt{SCOREP\_USER\_FUNC\_BEGIN("func\_name")}.
  443   \item Region identifiers (e.g., \texttt{r\_name}) should be registered
  444 with \texttt{SCOREP\_USER\_REGION\_DEFINE} in each annotated prologue before use with
  445 \texttt{SCOREP\_USER\_REGION\_BEGIN} and \texttt{SCOREP\_USER\_REGION\_END} in the associated body.
  446   \item Every exit/break/continue/return/etc.{} out of each annotated region
  447 must have corresponding \texttt{\_END()} annotation(s).
  448   \item Source files annotated in this way need to be compiled with the
  449 \texttt{--user} flag given to the \Scorep instrumenter, otherwise the
  450 annotations are ignored. Fortran source files need to be preprocessed
  451 (e.g., by FPP or CPP).
  452 \end{itemize}
  453 
  454 \subsubsection*{{\sc pomp} user instrumentation API}
  455 \label{sec:pomp_inst}
  456 
  457 {\sc pomp} annotations provide a mechanism for preprocessors (such as
  458 {\sc opari2}) to conditionally insert user instrumentation.
  459 
  460     \begin{minipage}[t]{0.5\linewidth}
  461       C/C++: \\\ttfamily
  462       \#pragma pomp inst init // once only, in main \\
  463       ... \\
  464       \#pragma pomp inst begin(name) \\
  465       \hspace*{2ex} ... \\
  466       \hspace*{2ex} [ \#pragma pomp inst altend(name) ] \\
  467       \hspace*{2ex} ... \\
  468       \#pragma pomp inst end(name)
  469     \end{minipage}
  470     \begin{minipage}[t]{0.5\linewidth}
  471       Fortran: \\\ttfamily
  472       !POMP\$ INST INIT !~once only, in main program \\
  473       ... \\
  474       !POMP\$ INST BEGIN(name) \\
  475       \hspace*{2ex} ... \\
  476       \hspace*{2ex} [ !POMP\$ INST ALTEND(name) ] \\
  477       \hspace*{2ex} ... \\
  478       !POMP\$ INST END(name)
  479     \end{minipage}
  480 
  481 \begin{itemize}
  482   \item Every intermediate exit/break/return/etc.{} from each annotated region
  483 must have an \texttt{altend} or \texttt{ALTEND} annotation.
  484   \item Source files annotated in this way need to be processed with the
  485 \texttt{--pomp} flag given to the \Scorep instrumenter, otherwise the
  486 annotations are ignored.
  487 \end{itemize}
  488 
  489 
  490 \clearpage
  491 
  492 
  493 
  494 %--- Tips ---------------------------------------------------------
  495 
  496 \subsection*{Tips for effective use of the \Scalasca toolset}
  497 
  498 \begin{enumerate}
  499 
  500 \item Determine one or more repeatable execution configurations (input
  501 data, number of processes/threads) and time their overall execution to
  502 have a baseline for reference.  (If possible, also identify maximum
  503 memory requirements.)
  504 \begin{itemize}
  505 \item Ensure that the execution terminates cleanly, e.g., with
  506     \verb+MPI_Finalize+ and not calling \texttt{STOP} or
  507     \texttt{exit(\mbox{\rmfamily\itshape val})}.
  508 \item Excessively long execution durations can make
  509 measurement and analysis inconvenient, therefore the test configuration
  510 shouldn't be longer than sufficient to be representative.
  511 \end{itemize}
  512 
  513 \item Modify the application build procedure (e.g., Makefile) to prepend the
  514 \Scorep instrumenter to compile and link commands, and produce an
  515 instrumented executable.
  516 
  517 \begin{itemize}
  518 \item MPI library calls and OpenMP parallel regions will be instrumented by
  519 default, along with user functions if supported by the compiler.
  520 \item Serial libraries and source modules using neither MPI nor OpenMP
  521 are generally not worth instrumenting with \Scorep, and indeed may result in
  522 undesirable measurement overheads.
  523 \end{itemize}
  524 
  525 \item Prefix the usual launch/run command with the \Scalasca analyzer to
  526 run the instrumented executable under control of the \Scalasca
  527 measurement collection and analysis nexus to produce an experiment
  528 archive directory.
  529 
  530 \begin{itemize}
  531 \item By default the experiment archive is produced in the current
  532 working directory, and its name will start with `\verb+scorep_+' followed
  533 by some configuration descriptors if created automatically.
  534 The \Scalasca measurement \& analysis nexus automatically
  535 generates a default experiment title from the target executable, compute node
  536 mode (if appropriate), number of MPI processes (or \texttt{O} if
  537 omitted), number of OpenMP threads (if \texttt{OMP\_NUM\_THREADS} is set),
  538 summarization or tracing mode, and optional metric specification.
  539 \item If a similarly configured experiment has already been run and its
  540 archive directory blocks new measurement experiments.
  541 \item If no path is given, e.g., in a run without \Scalasca, the name will
  542 start with `\verb+scorep-+' followed by an unique identifier (timestamp).
  543 In this mode an existing archive is renamed with an unique
  544 suffix unless specified otherwise.
  545 \item A call-path profile summary report containing Time and Visits metrics
  546 (and when appropriate also MPI file I/O and message statistics and
  547 hardware counters) for each process/thread is produced by default.
  548 \item If the (default) measurement configuration is inadequate for a
  549 complete measurement to be collected, warnings will indicate that
  550 one or more configuration variables should be adjusted (e.g.,
  551 \verb+SCOREP_TOTAL_MEMORY+).
  552 \item Compare the runtime to the (uninstrumented) reference to
  553 estimate implicit instrumentation dilation overhead.
  554 \end{itemize}
  555 
  556 \item Use the \Scalasca examiner to explore the analysis report in the
  557 experiment archive.
  558 
  559 \begin{itemize}
  560 \item Uninstrumented or filtered routines will not appear in the
  561 analysis report, and their associated metric severities will be attributed to
  562 the last measured routine from which they are called (as if they were `inlined').
  563 \item Additional structure can be included in the analysis report by
  564 using the Score-P user instrumentation API to specify (nested) regions or phases
  565 as annotations in the source code.
  566 \end{itemize}
  567 
  568 \item Score the quality of the summary analysis report (particularly if
  569 dilation is significant),  adjust measurement configuration using a
  570 filter file, adjust OpenMP instrumentation, or selectively instrument
  571 source modules (or routines).
  572 
  573 \begin{itemize}
  574 \item Investigate use of a filter file specifying instrumented routines
  575 to be ignored during measurement collection.
  576 \item Routines with very high visit counts and relatively low total
  577 times (which are not MPI functions and OpenMP parallel regions)
  578 are appropriate candidates for filtering, and can be identified from the
  579 flat profile view in the GUI, or score reports generated with
  580 `\verb+scalasca -examine -s+' or using `\verb+scorep_score -r+'.
  581 \item Highly-recursive functions are typically also worth removing:
  582 recursion is often indicated by a large maximum call path depth.
  583 \item A prospective filter file can be specified to scoring with '\verb+-f+'
  584 for evaluation prior to being used to re-do measurement and re-check dilation.
  585 \item Some routines might still present excessive overhead even when
  586 filtered, and these should not be instrumented.  The build
  587 procedure may need to be adjusted not to prefix the \Scorep
  588 instrumenter when compiling the associated source modules.
  589 When \Scorep is configured with PDToolkit, it can be used to selectively instrument
  590 entire source modules or individual routines (see PDToolkit documentation for details).
  591 \end{itemize}
  592 
  593 \item Use scoring on the (revised/filtered) summary analysis report
  594 to determine an appropriate size for the \Scorep memory settings.
  595 
  596 \begin{itemize}
  597 \item For the estimated memory requirements per process, the \verb+SCOREP_TOTAL_MEMORY+
  598 environment variable can be adjusted to avoid intermediate buffer flushes.
  599 \item \verb+SCOREP_TOTAL_MEMORY+ should be set after consideration of the
  600 memory available when measuring the instrumented application execution
  601 and the system's I/O and filesystem performance and capacity.  (These
  602 vary enormously from system to system and can quickly be overwhelmed by
  603 large traces!)
  604 \item Additional user routines can be included in a filter file to
  605 reduce trace buffer requirements.
  606 \end{itemize}
  607 
  608 \item Repeat measurement specifying the `\verb+-t+' flag to the
  609 \Scalasca analyzer (along with other configuration settings if
  610 necessary) to collect and automatically analyze execution traces.
  611 
  612 \begin{itemize}
  613 \item Traces are generally written directly into the experiment archive to
  614 avoid copying at completion.
  615 \item A filesystem capabable of efficient parallel file I/O should be used when available.
  616 \item If there are \Scorep messages reporting trace flushing to disk prior to
  617 closing the experiment, these intermediate flushes are often highly
  618 disruptive.
  619 Enlarging trace buffer sizes and/or adjusting
  620 instrumentation or the measurement filter and/or configuring a shorter
  621 execution (perhaps with fewer iterations or timesteps) may be appropriate.
  622 \item Parallel trace analysis requires several times as much memory as the
  623 size of the respective (uncompressed process) traces, and it is currently not
  624 possible to analyze incomplete traces.  When memory is restricted,
  625 trace sizes should be reduced accordingly.
  626 \item If clock condition violations are reported during trace
  627 analysis, set the \verb+SCAN_ANALYZE_OPTS+ environment variable to
  628 \verb+--time-correct+ to incorporate a logical clock correction step during
  629 analysis.
  630 \item Traces from hybrid OpenMP/MPI application executions are analyzed
  631 in parallel by default.  If an OpenMP-aware trace analyzer is not available,
  632 metrics are only calculated for the master thread of OpenMP teams.
  633 \item After the analysis report has been examined and verified to be
  634 complete, it is generally unnecessary to keep the often extremely large
  635 trace files used to generate it (unless further analysis or conversion
  636 is planned): these are in the \verb+traces+
  637 subdirectory of a trace experiment archive, which can be deleted.
  638 \end{itemize}
  639 
  640 \item In addition to interactive exploration of analysis reports with
  641 the \Scalasca examiner, they can be processed with a variety of
  642 \Cube algebra tools and utilities.
  643 
  644 \item If you encounter difficulties using \Scorep or \Scalasca to instrument applications,
  645 configuring measurement collection and analysis, or interpreting
  646 analysis reports, contact \texttt{\PackageBugreport} for assistance.
  647 
  648 \end{enumerate}
  649 
  650 \end{document}