"Fossies" - the Fresh Open Source Software Archive

Member "hevea-2.35/examples/pat.tex" (16 Jan 2021, 53021 Bytes) of package /linux/www/hevea-2.35.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \documentclass{article}
    2 \usepackage{fullpage}
    3 \usepackage{hevea}
    4 
    5 \addtolength{\topmargin}{-1cm}
    6 \input{pat.def}
    7 
    8 
    9 \def\showgraph{\par\medskip\centerline{\begin{toimage}
   10 \box\graph
   11 \end{toimage}
   12 \imageflush}\medskip}
   13 \def\status#1{{\tt #1}}
   14 
   15 \title{Compiling Join-Patterns
   16 \footnote{
   17 This work is partly supported by the ESPRIT 
   18     CONFER-2 WG-21836}}
   19 \author {Luc Maranget \and Fabrice
   20 Le~Fessant\\[.5em]
   21 INRIA Rocquencourt, 
   22     BP 105, 78153 Le Chesnay Cedex France.
   23 \\[.5em]
   24 {\tt\small \{Luc.Maranget, Fabrice.Le-Fessant\}@inria.fr}
   25 }
   26 
   27 \date {}
   28 
   29 
   30 \htmlfoot{\rule{}{}\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}}
   31 \htmlhead{\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}\rule{}{}}
   32 \toplinks{suite.html}{index.html}{env.html}
   33 \begin{document}
   34 
   35 
   36 
   37 \maketitle
   38 \thispagestyle{empty}
   39 
   40 \begin{abstract}
   41   
   42   The join-calculus is both a name passing calculus and a core
   43   language for concurrent and distributed programming.  An essential
   44   part of its implementation is the compilation of {\em
   45     join-patterns}.  Join-patterns define new channels and all the
   46   synchronizations they take part to at the same time.  Relying on the
   47   experience based on our two implementations, we study the
   48   translation of join-patterns into deterministic finite-state
   49   automata as well as some related optimizations.
   50 
   51   
   52 \end{abstract}
   53 \tableofcontents
   54 
   55 \section {Introduction}
   56 
   57 %%Baratin
   58 Join-pattern is the distinctive feature of the join-calculus, seen
   59 both as a process calculus and as a programming language.  On the
   60 calculus side, join-calculus can roughly be seen as a functional
   61 calculus plus join-patterns, thus achieving the same expressive power
   62 as previous name-passing process calculi~\cite{Milner-pi-calculus}.
   63 Join-definitions are made of several clauses, each clause being a pair
   64 of a join-pattern and of a guarded process.  A join-pattern expresses
   65 a synchronization between several {\em names} (or channels). When
   66 messages are pending on all the names that appear in a given
   67 join-pattern, then the corresponding clause is said to be active and
   68 its guarded process {\em may} be fired.  A definition whose
   69 join-patterns share some names expresses sophisticated
   70 synchronizations. In such a definition, a message on a name that
   71 appears in several active clauses is consumed as soon as one of the
   72 corresponding guarded processes is fired.
   73 
   74 Join-languages are built on top of the join-calculus taken as a core
   75 language.  Therefore, names are first-class citizens, computations are
   76 first abstracted as collections of asynchronous processes, and
   77 join-patterns provide an unique, clear and powerful mechanism for
   78 synchronizing these computations.  The documentation for the
   79 join-language~\cite{Join} includes a tutorial that shows how join
   80 definitions may encode classical synchronization constructs such as
   81 locks, barriers, shared counters,\ldots
   82 
   83 %%%Conclusion de l'intro
   84 On the implementation side, join-patterns are meant to be heavily used
   85 by programmers, as the only synchronization primitive available.
   86 Thus, their compilation requires much care.  At the moment, we propose
   87 two compilers: the {\tt join} compiler~\cite{Join}, a language of its
   88 own, and the {\tt jocaml} compiler~\cite{Jocaml}, an extension of the
   89 Objective Caml functional language.
   90 
   91 %%% Plan
   92 Section~\ref{join-calculus} of this paper succinctly presents the
   93 join-calculus syntax and semantics. Then, section~\ref{automatas}
   94 introduces the kind of automata we use to compile
   95 join-synchronization, while section~\ref{implementation} presents two
   96 techniques for implementing them.  The first technique directly
   97 derives from automata description and is used in our {\tt join}
   98 compiler. The second technique performs some extra runtime tests, this
   99 is the technique used in our {\tt jocaml} compiler.
  100 Sections~\ref{optimization} and~\ref{fusion} discuss
  101 optimizations and section~\ref{conclusion} concludes.
  102 
  103 
  104 
  105 
  106 \section{A rapid tour of the join-calculus}
  107 \label{join-calculus}
  108 \subsection {Syntax}
  109 
  110 We here rephrase the traditional presentation of the core
  111 join-calculus~\cite{FournetGonthier96}, where names are
  112 the only value. Thus, we ignore the issue of system primitives and
  113 constants, since
  114 names provide sufficient expressive power for our purpose of
  115 describing our implementation of pattern matching (However we use
  116 primitives and constants in our examples).
  117 We slightly change the syntax of~\cite{FournetGonthier96}, in order to
  118 match the one of
  119 the {\tt join} programming language.
  120 
  121 We use $x$ to denote a name in general.
  122 $$
  123 \begin{array}{l@{\hspace{2cm}}l}
  124 \begin{array}{rcl}
  125 P & ::= & x\tuple{x_i\inu p}\\
  126   & \mid & \Def D in P\\
  127   & \mid & P | P
  128 \end{array}
  129 &
  130 \begin{array}{rcl}
  131 D & ::= & J |> P \\
  132   & \mid & D \And D\\[1em]
  133 
  134 J & ::= &  x\tuple {x_i\inu p}\\
  135   & \mid & J | J
  136 \end{array}
  137 \end{array}
  138 $$
  139 
  140 A process $P$ is either a message, a defining process, or a parallel
  141 composition of processes (note that names are polyadic, meaning that
  142 messages may be made of several values);
  143 %
  144 a definition $D$ consists of one or several clauses $J = P$ that
  145 associate a guarded process $P$ to a specific message pattern $J$;
  146 %
  147 a join-pattern $J$ consists of one or several messages in parallel. We
  148 say that the pattern $J = \ldots x\tuple {x_i \inu p}\ldots$
  149 defines the name $x$ and that a definition defines the set of the names
  150 defined by its patterns.
  151 Moreover, patterns are linear, i.e. names may appear at most once in
  152 a given pattern.
  153 
  154 Processes and definitions are known modulo renaming of bound
  155 variables, as substitution performs $\alpha$-conversion to avoid
  156 captures.
  157 
  158 \subsection{Semantics}
  159 
  160 
  161 This semantics is specified as a reflexive chemical abstract machine
  162 (RCHAM)~\cite{BeBo92}. The state of the computation
  163 is a {\em chemical soup} $\dd \tth \pp$ that consists of two
  164 multisets: active definitions~$\dd$ and running processes~$\pp$.
  165 
  166 The chemical soup evolves according to two families of rules: {\em
  167   Structural rules} $\redstruct$ are reversible ($\heat$ is heating,
  168 $\cool$ is cooling); they represent the syntactical rearrangement of
  169 terms (heating breaks terms into smaller ones, cooling builds larger
  170 terms from their components).  {\em Reduction rules} $\redsoupe$
  171 consume specific processes present in the soup, replacing them by some
  172 others; they are the basic computation steps.
  173 %
  174 
  175 We present simplified chemical rules
  176 (see~\cite{FournetGonthier96,FGLMR96_Calculus_mobile_agent} for
  177 the complete set of rules).
  178 Following the chemical tradition, every rule applies on any matching
  179 subpart of the soup, non-matching sub-parts of the soup being left
  180 implicit.
  181 $$
  182 \let \N \name
  183 \let \su \varphi
  184 \begin{array}{rl@{~}c@{~}rl@{\quad}l}
  185   &\tth P_1 | P_2   &\redstruct&           &\tth P_1, P_2 &\mbox{S-Par}\\
  186 D_1 \And D_2 &\tth             &\redstruct& D_ 1, D_2 &\tth   &\mbox{S-And}\\
  187              &\tth \Def D in P &\redstruct&         D  &\tth P
  188              &\mbox{S-Def} \\[1em]
  189       J |> P &\tth \su  (J)    &\redsoupe & J |> P    &\tth \su (P)
  190       &\mbox{R-$\beta$}
  191 \end{array}
  192 $$
  193 Two of the rules above have side conditions:
  194 \begin{itemize}
  195 \item (\name{S-Def}) 
  196 the names defined in $D$ must not appear anywhere
  197 in solution but in the reacting process and definition~$D$ and $P$.
  198 This condition is global; in combination with $\alpha$-renaming it
  199 enforces lexical scoping.
  200 
  201 \item (\name{R-$\beta$}) $\varphi (\cdot)$ substitute actual names for
  202 the received variables in $J$ and $P$.
  203 
  204 \end{itemize}
  205 Additionally, we only consider well-typed terms and reductions.
  206 See~\cite{join-typing-97} for details on a rich polymorphic type system
  207 for the join calculus. Here, this mostly amounts to assuming that
  208 message and name arity always agree.
  209 
  210 
  211 In this paper, we take particular interest in the reduction (\name{R-$\beta$}).
  212 Informally, when there are messages pending on all the names defined
  213 in a given join-pattern, then the process guarded by this join-pattern
  214 may be fired. When firing is performed, we say that a {\em matching} occurs.
  215 On the semantics level, there is a message $\tuple{x_i\inu p}$ pending
  216 on a name~$x$ when there is an active molecule
  217 ~$x\tuple {x_i\inu p}$ in the chemical soup.
  218 
  219 Thus, we may define the reactivity status of a given chemical soup as
  220 the multiset of the active molecules in it.
  221 Later on in this paper, we shall consider various abstractions of this
  222 reactivity status.
  223 
  224 
  225 \subsection{The join programming languages}
  226 
  227 Apart from primitives, join-languages support {\em synchronous}
  228 names, which the core join-calculus does not provide directly.
  229 Synchronous names send back results, a bit like functions.
  230 However synchronous names may engage in any kind of
  231 join-synchronization, just as asynchronous names do.
  232 A program written using synchronous names can be translated
  233 into the core join-calculus alone. 
  234 The translation is analogous to continuation passing style
  235 transformation in the $\lambda$-calculus.
  236 In our implementation, as far as pattern matching is concerned,
  237 a synchronous name
  238 behave like it was asynchronous and carried one additional continuation
  239 argument. All implementation difficulties concentrate in managing this extra
  240 argument, whose presence had no effect on pattern matching itself.
  241 
  242 
  243 
  244 The {\tt join} language~\cite{Join} is our first prototype. All
  245 examples in this paper are in {\tt join} syntax.  The system consists
  246 in a bytecode compiler and a bytecode interpreter.  Both compiler and
  247 interpreter are Objective Caml~\cite{Objective-Caml} programs and it
  248 is easy to lift Objective Caml data types and functions into {\tt
  249   join} abstract data types and primitives.  For instance, {\tt join}
  250 programs easily draw graphics, using the graphics Objective Caml
  251 library.  As a consequence, {\tt join} can be seen either as a
  252 language of its own, featuring many primitives, or as a distributed
  253 layer on top of Objective Caml.  Continuations are encoded using
  254 ad hoc threads, which are created and scheduled by the {\tt join}
  255 bytecode interpreter.
  256 
  257 
  258 The {\tt jocaml} system is our second implementation. In {\tt jocaml},
  259 all join-calculus constructs for concurrency, communication,
  260 synchronization and process mobility are directly available as
  261 syntactical extensions to Objective Caml.  On the runtime environment
  262 side, we have supplemented the original Objective Caml runtime system
  263 (which already provides a thread library) with a special ``join''
  264 library and a distributed garbage collector~\cite{GCDistri}. On the
  265 compiler side, the Objective Caml compiler has been extended to
  266 translate join-calculus source code into functions calls to the
  267 ``join'' library.  However, we also introduced a few new instructions
  268 to Objective Caml bytecode, but only to handle code mobility, a
  269 feature orthogonal to pattern matching.  The {\tt jocaml} system is
  270 currently available as a prototype version~\cite{Jocaml}.
  271 
  272 \section{Pattern matching in join definitions}
  273 \label{automatas}
  274 \subsection{Principle}
  275 
  276 Consider the following join definition:
  277 \begin{verbatim}
  278 let A(n) | B() = P(n)
  279 and A(n) | C() = Q(n)
  280 ;;
  281 \end{verbatim}
  282 This defines three names $A$, $B$ and $C$. Name $A$ has arity one,
  283 whereas names $B$ and $C$ have arity zero.  Names may be synchronous
  284 or asynchronous, depending on whether there are
  285 \verb+reply ... to ...+
  286 constructs applying to them inside the guarded processes~$P(n)$
  287 and $Q(n)$ or not.
  288 
  289 According to the general join-calculus semantics, the guarded process
  290 $P(n)$ may be fired whenever there are some messages pending on $A$ and
  291 $B$. Similarly, $Q(n)$ may be fired whenever there are some messages
  292 pending on $A$ and $C$. In both cases, the formal parameter $n$ is
  293 replaced by (or bound to in the implementation) one of the messages
  294 pending on $A$.
  295 
  296 Reactivity information is to be considered at the definition level,
  297 since matching is indeed performed at this level.  Moreover, in order
  298 to use finite-state automata, we want this information to range on a
  299 finite set of possible values.  As far as matching is concerned and by
  300 the linearity of patterns, only the presence or absence of messages
  301 matters.  Thus, let us call \status{0} the status of a name without
  302 any message pending, and \status{N} the status of a name with at least
  303 one message pending.  Then, the status of a definition is a tuple
  304 consisting of the statuses of the names it defines, once some
  305 arbitrary order of these names has been adopted.
  306 
  307 For instance, if some messages are pending on $B$ and $C$, whereas
  308 none is pending on $A$, then the status of the $A$, $B$, $C$
  309 definition is a three-tuple written \status{0NN}.
  310 
  311 A {\em matching status} is defined as a status that holds enough
  312 \status{N}, so that at least one guarded process can be fired.
  313 
  314 Definition status evolves towards matching status as messages arrive.
  315 This yields a first kind of ``increasing'' transitions.  More
  316 specifically, when a message arrives on some name, then this name
  317 status either evolves from \status{0} to \status{N} or remains
  318 \status{N}.  Definition status evolves accordingly.  In the $A$, $B$,
  319 $C$ case we get the following transitions.  (In this diagram,
  320 transitions are labeled by the name that gets a new message and
  321 matching statuses are filled in gray.)
  322 \begin{toimage}
  323 .PS
  324 cm = 0.3937;
  325 pt = 1/72;
  326 hu = 2 * cm;
  327 vu = 1.5 * cm
  328 circlerad = 12 * pt
  329 arcrad = circlerad
  330 rblabht = 2 * circlerad
  331 rblabwid = 4 * circlerad
  332 fillval=0.3
  333 pi=3.1415927
  334 
  335 define clab {
  336     {circle at Here $1 $2 $3}
  337 }
  338 
  339 define ilab {
  340     {box invis $1 $2 $3 at Here}
  341 }
  342 
  343 define ln {
  344     L: [circle rad 0]
  345     if $4 < 0 then {
  346       box $3 rjust wid 0 ht 0 at L.c + ($1*(hu/3)-circlerad/4, $2*(vu/3))
  347     } else {if $4 > 0 then {
  348       box $3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+circlerad/4, $2*(vu/3))
  349     } else {
  350       box $3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+ 0.001 * cm, $2*(vu/2.5))
  351     }}
  352     {line  ->  chop from L.c to L.c + ($1*hu, $2*vu)}
  353     move to  L.c + ($1*hu, $2*vu)
  354 }
  355 
  356 define dln {
  357     {line  -> dotted right $1*hu up $2*vu chop}
  358     move right $1*hu up $2*vu
  359 }
  360 
  361 define htccw {
  362   psi=$1
  363   phi=$2
  364   {
  365   c0=circlerad * 2 * cos(psi/2) * cos(phi+psi/2)
  366   s0=circlerad * 2 * cos(psi/2) * sin(phi+psi/2)
  367   c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi)
  368   c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi+psi/2)
  369   s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi+psi/2)
  370   c3=circlerad * cos(phi+psi) ; s3 = circlerad * sin(phi+psi)
  371   A: [circle rad 0.0]
  372   move to A.c + (c1,s1)
  373   arc $4 rad circlerad to A.c + (c2,s2)
  374   arc -> $4 rad circlerad to A.c + (c3,s3)
  375   circle invis $3 with.c at A.c + (c0,s0)}
  376 }
  377 
  378 define htcw {
  379   psi=$1
  380   phi=$2
  381   {
  382   c0=circlerad * 2 * cos(psi/2) * cos(phi-psi/2)
  383   s0=circlerad * 2 * cos(psi/2) * sin(phi-psi/2)
  384   c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi)
  385   c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi-psi/2)
  386   s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi-psi/2)
  387   c3=circlerad * cos(phi-psi) ; s3 = circlerad * sin(phi-psi)
  388   A: [circle rad 0.0]
  389   move to A.c + (c1,s1)
  390   arc cw $4 rad circlerad to A.c + (c2,s2)
  391   arc cw -> $4 rad circlerad to A.c + (c3,s3)
  392   circle invis $3 with.c at A.c + (c0,s0)}
  393 }
  394 
  395 define tourccw {
  396   htccw(pi/2,$1,$2,$3)
  397 } 
  398 
  399 define tourcw {
  400   htcw(pi/2,$1,$2,$3)
  401 }
  402 
  403 hu=3*cm
  404 [
  405 vu=2*cm
  406 circlerad=15*pt
  407 clab("\status{000}")
  408 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}") ; tourccw(pi/2,"$B$")
  409   {ln(0,-1,"$A$",0) ; clab("\status{NN0}",fill)
  410    tourccw(pi,"$A, B$")
  411    ln(1,-1,"$C$",-11)}
  412   {ln(1,-1,"$C$",1) ; clab("\status{0NN}")
  413     tourcw(pi/8,"$\,B, C$")
  414     ln(0,-1,"$A$",-1) ; clab("\status{NNN}",fill)
  415     tourccw(5*pi/4,"$A,B,C$")}}
  416 {ln(0,-1,"$A$",0) ; clab("\status{N00}") ; tourcw(pi/2.5,"$A$")
  417   {ln(-1,-1,"$B$",-1)}
  418   {ln(1,-1,"$C$",1)}}
  419 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ; tourcw(pi/2,"$C$")
  420   {ln(-1,-1,"$B$",1)}
  421   {ln(0,-1,"$A$",0) ; clab("\status{N0N}",fill) tourcw(0,"$A, C$")
  422    ln(-1,-1,"$B$",1)}}
  423 ]
  424 .PE
  425 \end{toimage}%$
  426 \showgraph
  427 
  428 
  429 Definition status also evolves when matching occurs. This yields
  430 new, ``decreasing'', transitions that we call matching transitions.
  431 According to the join-calculs semantics,
  432 matching may occur at any moment (provided of course that
  433 matching is possible).
  434 As a consequence, matching transitions start from matching states and
  435 they are unlabelled.
  436 In the $A$, $B$, $C$ case, they are as follows:
  437 \begin{toimage}
  438 .PS
  439 lab=0.5
  440 
  441 define iln {
  442     move right $1*hu up $2*vu
  443 }
  444 
  445 aphi=pi/4
  446 
  447 define alncw {
  448  c= circlerad*cos($1) ; s = circlerad*sin($1)
  449  c2 =  circlerad*cos($2) ; s2 = circlerad*sin($2)
  450  C: [circle rad 0]
  451  if $4 < 0 then {abs = -$4} else {abs = $4}
  452  x1 = c ; y1 = s ; x2 = $3*hu+c2 ; y2 = $4*vu+s2
  453  d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1)
  454  ar = sqrt(d) / (2*sin(aphi))
  455  D: arc cw -> $6 rad ar from C.c + (x1,y1) to C.c + (x2,y2)
  456  r=last arc.rad
  457  x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1)
  458  d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y))
  459  line invis from D.c to r/d  <D.c,(x3,y3)> ; $5
  460 }
  461 
  462 define aln {
  463  c= circlerad*cos($1) ; s = circlerad*sin($1)
  464  c2 =  circlerad*cos($2) ; s2 = circlerad*sin($2)
  465  C: [circle rad 0]
  466  if $4 < 0 then {abs = -$4} else {abs = $4}
  467  x1=c ; y1 = s ; x2 = $3*hu+c2 ; y2 = $4*vu+s2
  468  d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1)
  469   ar = sqrt(d) / (2*sin(aphi))
  470  D: arc -> $6 rad ar from C.c + (x1,y1) to C.c + (x2,y2)
  471  r=last arc.rad
  472  x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1)
  473  d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y))
  474  line invis from D.c to r/d  <D.c,(x3,y3)> ; $5
  475 }
  476 
  477 clab("\status{000}")
  478 {iln(-1,-1,invis) ; clab("\status{0N0}")
  479   {iln(0,-1,"$B$") ; clab("\status{NN0}",fill) ;
  480     {dln(0,1)} ; {dln(1,1)} ; {dln(1,2)}
  481     tourccw(pi,"",dotted) ; iln(1,-1,"$C$")}
  482   {iln(1,-1,"$C$") ; clab("\status{0NN}")
  483     iln(0,-1,"$B$") ; clab("\status{NNN}",fill)
  484      tourccw(5*pi/4,"",dotted)
  485      {dln(0,1)} ; {dln(-1,1)} ; {dln(1,1)} ;
  486      {aln(3*pi/8,-pi/3,0,2,"",dotted)}
  487      {dln(-1,2)} ; {dln(1,2)}}}
  488 {iln(0,-1,"$B$") ; clab("\status{N00}")
  489   {iln(-1,-1,"$A$")}
  490   {iln(1,-1,"$C$")}}
  491 {iln(1,-1,"$C$") ; clab("\status{00N}")
  492   {iln(-1,-1,"$A$")}
  493   {iln(0,-1,"$B$") ; clab("\status{N0N}",fill) ;
  494     {dln(0,1)} ; {dln(-1,1)} ; {dln(-1,2)}
  495     tourcw(0,"",dotted) ; iln(-1,-1,"$A$")}}
  496 .PE
  497 \end{toimage}
  498 \showgraph
  499 
  500 
  501 Observe that there may be several matching transitions starting from a
  502 given status.  This is not always a consequence of the
  503 non-deterministic semantics of the join-calculus.
  504 
  505 Often, ambiguity is only apparent.  For instance, matching transitions
  506 starting from \status{NN0} lead to \status{NN0}, \status{N00},
  507 \status{0N0} and \status{000}.  When such a matching occurs, two
  508 messages are consumed (one pending on $A$ and one pending on $B$)
  509 then, depending on whether there are some messages left pending on $A$
  510 and $B$ or not, status evolves to \status{NN0}, \status{N00},
  511 \status{0N0} or \status{000}.  From the implementation point of view,
  512 this means that a little runtime testing is required once matching has
  513 been performed. Here, we pay a price for using finite-state automata.
  514 
  515 However, some true non-determinism is still present.  Consider
  516 status~\status{NNN} for instance.  Then, both guarded processes of the
  517 $A$, $B$, $C$ definition can now be fired.  The choice of firing
  518 either $P(n)$ or $Q(n)$ will result in either consuming one message
  519 pending on $A$ and one on $B$, or consuming one message pending on $A$
  520 and one on $C$.
  521 
  522 Finally, a view of join-matching compilation can be given by taking
  523 together both kinds of transitions. This yields a non-deterministic
  524 automaton.
  525 
  526 Note that matching of non-linear patterns  can also be compiled using automata.
  527 For instance, if a name appears at most twice in one or more pattern,
  528 then it status will ramge on $\status{0}$, $\status{1}$ and $\status{N}$.
  529 We do not present this extension in greater detail, for simplicity, and
  530 because we do not implement non-linear patterns.
  531 
  532 \subsection{Towards deterministic automata}
  533 
  534 For efficiency and simplicity reasons we choose to implement matching
  535 using deterministic automata that react to message reception.
  536 
  537 Fortunately, it is quite possible to do so. It suffices to perform
  538 matching as soon as possible.  More precisely, when a message arrives
  539 and carries definition status into matching status, matching is
  540 performed immediately, while definition status is adjusted to reflect
  541 message consumption.  This results in pruning the status space just
  542 below matching statuses.
  543 
  544 
  545 In practise, in the $A$, $B$, $C$ case, we get the
  546 automaton of figure~\ref{abc}.
  547 \begin{figure}[th]
  548 \begin{toimage}
  549 .PS
  550 circlerad=16*pt
  551 hu=3 *cm
  552 vu=3 *cm
  553 clab("\status{000}")
  554 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}")
  555   tourccw(-9*pi/10,"$B$")
  556   tourccw(pi/2,"$A$",dotted)
  557   {alncw(pi/4,-3*pi/4,1,1,"$A$",dotted)}
  558   {ln(1,-1,"$C$",1) ; clab("\status{0NN}")
  559     htccw(pi/4,-pi/2-pi/8,"$B, C$")
  560     {alncw(3*pi/4,-pi/4,-1,1,"$A$",dashed)}
  561     {aln(pi/4,-3*pi/4,1,1,"$A$",dotted)}
  562     tourccw(-5*pi/12,"$A$",dotted)
  563     tourcw(-7*pi/12,"$A$",dashed)
  564 }}
  565 {ln(0,-1,"$A$",0) ; clab("\status{N00}")
  566   tourcw(5*pi/12,"$A$")
  567   tourccw(-3*pi/4,"$B$",dotted)
  568   tourccw(-17*pi/12,"$C$",dashed)
  569   {alncw(pi/2,-pi/2,0,1,"$B$",dotted)}
  570   {aln(pi/2,-pi/2,0,1,"$C$",dashed)}
  571 }
  572 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ;
  573   tourcw(-pi/10,"$C$")
  574   tourcw(pi/2,"$A$",dashed)
  575   {aln(3*pi/4,-pi/4,-1,1,"$A$",dashed)}
  576   {ln(-1,-1,"$B$",-1)}}
  577 .PE
  578 \end{toimage}
  579 \showgraph
  580 \caption{\label{abc} Automaton in the $A$, $B$, $C$ case}
  581 \end{figure}
  582 
  583 Observe that all transitions are now labeled and that a name labels a
  584 transition when message reception on this name triggers that
  585 transition.  Furthermore, matching transitions that correspond to
  586 firing $P(n)$ or firing $Q(n)$ are now represented differently (the
  587 former by a dotted arrow, the latter by a dashed arrow).  This
  588 highlights the difference between false and true non-deterministic
  589 transitions: real non-determinism is present when there are both
  590 dotted and dashed edges with the same label starting from the same
  591 node.
  592 
  593 For instance, there are two $B$-labeled dotted transitions starting
  594 from $\status{N00}$.  Non-determinism is only apparent here, since
  595 $P(n)$ is fired in both cases and that the selected transition depends
  596 only on whether there is at least one message left pending on $A$
  597 or not after firing $P(n)$.
  598 
  599 
  600 By contrast, from status \status{0NN}, the automaton may react to the
  601 arrival of a message on $A$ in two truly different manners, by firing
  602 either $P(n)$ or $Q(n)$.  This is clearly shown in figure~\ref{abc} by
  603 the  $A$-labeled edges that start from status~\status{0NN}, some of
  604 them being dashed and the others being dotted.  A simple way to avoid
  605 such a non-deterministic choice at run-time is to perform it at
  606 compile-time.  That is, here, we suppress either dotted or dashed
  607 $A$-labeled transitions that start from $\status{0NN}$.
  608 
  609 In the rest of the paper, we take automata such as the one of
  610 figure~\ref{abc} as suitable abstractions
  611 of join-pattern compilation output.
  612 
  613 \subsection{Automata and semantics}
  614 
  615 Both the ``match as soon as possible'' behavior and the deletion of
  616 some matching transitions have a price in terms of semantics.
  617 More precisely, some CHAM behaviors now just cannot be observed
  618 anymore.
  619 However,
  620 the CHAM semantics is a non-deterministic one: an initial
  621 configuration of the CHAM {\em may} evolve into a variety of configurations.
  622 Furthermore, there is no fairness constraint of any kind and
  623 no particular event is required to occur.
  624 
  625 As an example of the consequence of the ``match as soon as possible''
  626 behavior, consider this definition:
  627 \begin{verbatim}
  628 let A() = print(1);
  629 and B() = print(2);
  630 and A() | B() = print(3);
  631 ;;
  632 \end{verbatim}
  633 Then, we get the following automaton:
  634 \begin{toimage}
  635 .PS
  636 clab("\status{00}")
  637  tourcw(-3*pi/4,"$A$",dotted)
  638  tourccw(-pi/4,"$B$",dashed)
  639 .PE
  640 \end{toimage}
  641 \showgraph
  642 Status \status{NN} that is preceded by the two matching statuses
  643 \status{0N} and \status{N0} cannot be reached. As a consequence, the
  644 above program will never print a {\tt 3}, no matter how many messages
  645 are sent on $A$ and $B$.
  646 
  647 Next, to illustrate the effect of deleting ambiguous matching
  648 transitions, consider the following definition:
  649 \begin{verbatim}
  650 let A() = print(1);
  651 and A() | B() = print(2);
  652 \end{verbatim}
  653 Such a definition will get compiled into one of the following
  654 deterministic automata:
  655 \begin{toimage}
  656 .PS
  657 {clab("\status{00}")
  658  tourccw(pi/4,"$A$",dotted)
  659  {ln(0,-1,"$B$",0)
  660    clab("\status{0N}")
  661    tourcw(pi/4,"$B$");
  662    tourcw(-3*pi/4,"$A$",dotted)
  663  }
  664 }
  665 move right 1.5*hu
  666 {clab("\status{00}")
  667  tourccw(pi/4,"$A$",dotted)
  668  {ln(0,-1,"$B$",0)
  669    clab("\status{0N}")
  670    tourcw(pi/4,"$B$");
  671    tourcw(-3*pi/4,"$A$",dashed)
  672    {alncw(pi/2,-pi/2,0,1,"$A$",dashed)}   
  673  }
  674 }
  675 .PE
  676 \end{toimage}
  677 \showgraph
  678 In the case of the left automaton, only {\tt 1} will ever get printed.
  679 In the case of the right
  680 automaton, {\tt 2} will be printed when some messages arrives on~$B$
  681 and then on~$A$.
  682 Both automata lead to correct implementations of the semantics. However
  683 the second automata looks a better choice than the first one, since it
  684 yields more program behaviors.
  685 
  686 
  687 \section{Runtime definitions}
  688 
  689 \label{implementation}
  690 \subsection{Basics}
  691 
  692 Names are the basic values of the join-calculus, and thus any
  693 implementation of the join-calculus must supply a runtime
  694 representation for them. For instance, a name can be sent on some
  695 appropriate channel. At runtime, we must indeed send something.
  696 
  697 However, names that are defined together in the same join definition
  698 interact when matching is tested for and performed.  Moreover, by the
  699 very idea behind the join-calculus, matching is the only
  700 synchronization primitive.  In other words, only names that are
  701 defined by the same join definition have some kind of interaction that
  702 is of the runtime system responsibility.
  703 
  704 
  705 This makes possible and desirable to compile a source definition into
  706 a runtime ``definition'', a single vector structure that groups all
  707 the names defined in a given definition.  Names must still exist as
  708 individuals, they can be represented as infix pointers into their
  709 definition (as in {\tt join}), or as a definition pointer and an index
  710 (as in {\tt jocaml}).
  711 
  712 Both the {\tt join} and {\tt jocaml} systems implement the automata of
  713 the previous section.  However, they do so in quite different manners.
  714 The former focuses on minimizing runtime testing, while the latter
  715 involves a systematic runtime testing of the current status at every
  716 message arrival.
  717 
  718 \subsection{Definitions in {\tt join}}
  719 
  720 Runtime definitions are vector structures.  Every name defined in a
  721 definition occupies two slots in the vector structure.  The first
  722 entry holds a code pointer that stands for the name itself, while the
  723 second entry holds a pointer to a queue of pending messages, queues
  724 being organized as linked lists.  Runtime definitions include
  725 additional slots that hold the values of the variables that are free
  726 in guarded processes. This technique resembles much the one used by
  727 the SML/NJ compiler~\cite{Appel-compiling} to represent mutually
  728 recursive functions.  Message sending on name~$x$ is performed by
  729 stacking message values and then calling the code for name~$x$.  This
  730 code is retrieved by dereferencing twice the infix pointer that
  731 represents~$x$ at runtime.
  732 
  733 
  734 However, there is a big difference between mutually recursive
  735 functions and join definitions.  The code for name~$x$ is automaton
  736 code that reacts to the arrival of a new message on that name.  The
  737 compiler issues various versions of name code, one per possible status
  738 of the definition that defines~$x$.  Typically, name code either saves
  739 a message into the queue for~$x$ (in the non-matching case), or
  740 retrieves messages from other queues before firing a guarded process
  741 (in the matching case). In all cases, definition status may then
  742 need an update, which is performed by updating all code entries in
  743 the definition.
  744 
  745 
  746 \subsection{Definitions in {\tt jocaml}}
  747 
  748 In the {\tt jocaml} system, a name is a pointer to a definition plus
  749 an index.  Definitions are still vectors structures, but there is only
  750 one entry per name for message queues. Additionally, definitions hold
  751 guarded closures (i.e. guarded process code plus free variable
  752 values), a status field and a matching data structure.
  753 
  754 Status field holds the current status of the definition as a
  755 bit-field.  Each name status is encoded by a bit, using bit 1 for
  756 \status{N} and bit 0 for \status{0}, and bit position is given by name
  757 index.
  758 %  The matching data structure is an array of maps from statuses to
  759 %  clause indexes. 
  760 
  761 Message sending is performed by calling a generic C~function from the
  762 ``join'' library, taking message value, a definition pointer and a
  763 name index as arguments.  When a message is received on name~$x$, the
  764 bit for~$x$ is checked in the current status bit-field. If the bit is
  765 set, some messages on name~$x$ are already present. Thus, definition
  766 status does not change. Since the current status before message
  767 sending is guaranteed to be a non-matching one, the message is queued
  768 and the function is exited.
  769 
  770 Otherwise, the current definition status is searched in the matching
  771 structure for~$x$. This matching structure is an array of pattern
  772 encoding, guarded process index pairs. Pattern encodings are
  773 bit-fields, just like status encodings.
  774  corresponding to a join pattern containing name~$x$, from which name~$x$ has been removed. Using a sequential
  775 search by a bitwise "and" with each pattern encoding, the current
  776 status can be identified as matching or non-matching in at most $N_x$
  777 tests, where $N_x$ is the number of clauses whose pattern contains~$x$.
  778 
  779 If no match is found, the automaton state is updated and the message
  780 value is queued in the queue for $x$.  Otherwise, a guarded process
  781 index has been found, and is used to retrieve the associated guarded
  782 closure.  Arguments to the guarded process are extracted from the
  783 queues identified by the matching status found.  Status is updated at
  784 the same moment (when a queue becomes empty a bit is erased).  Finally
  785 the guarded process is fired.
  786 
  787 Therefore, the performance of this technique much relies on fast
  788 comparisons and modifications of definition statuses.  The best result
  789 is achieved when statuses are encoded by machine integers.  In that
  790 case, the number of names that a definition can define is limited by
  791 the integer size of the hoisting Objective Caml system (which
  792 typically is 31 or 63~bits).  If this is not considered enough, then
  793 statuses have to be encoded using several integers or one string. Both
  794 kinds of status encodings can be mixed, using integers for small
  795 definitions and strings for larger ones. However, in the current {\tt
  796   jocaml} system, we use a single integer to hold status, and a
  797 technique (described in section~\ref{fusion}) is used to associate
  798 several channels to a same bit in the status bit-field.
  799 
  800 \section{The pragmatics of compilation}\label{opt}
  801 \label{optimization}
  802 
  803 This section is dedicated to optimizations that are first pertinent
  804 for the {\tt join} technique and that  are performed by the
  805 current version of the {\tt join} compiler.
  806 
  807 We first introduce optimizations that improve the runtime behavior of
  808 programs, both in speed and dynamic memory usage. Then, we show how to
  809 reduce emitted code size.
  810 We focus on optimizing definitions written in
  811 object-oriented style, as described in the tutorial~\cite{Join}. As
  812 this programming style proved quite frequent,
  813 it is normal for us compiler writers to concentrate
  814 our efforts on such definitions.
  815 
  816 In this style, a definition is an objet. Object state is encoded by
  817 asynchronous {\em state names}, while synchronous {\em methods} access
  818 or modify object state.
  819 For instance, given one state name $S$ and $n$ methods
  820 $m_1$, $m_2$,\ldots $m_n$ taken in that order, we get:
  821 \begin{verbatim}
  822 let create(x_0) =
  823   let S(x) | m_1() = P_1(x)
  824   and S(x) | m_2() = P_2(x)
  825     ....
  826   and S(x) | m_n() = P_n(x) in
  827   S(x_0) | reply m_1,m_2,...,m_n to create
  828 ;;
  829 \end{verbatim}
  830 \noindent The synchronous call \verb+create(v)+ creates a new
  831 object (i.e. a new $S$, $m_1$, $m_2$,\ldots $m_n$ definition)
  832 and then sends back a $n$-tuple of its methods. Moreover, this call
  833 initializes object state with the value~$v$.
  834 
  835 
  836 \subsection{Refined status}
  837 As a working example of an object-style definition, consider the
  838 following adder:
  839 \begin{verbatim}
  840 let create(x_0) =
  841   let S(x) | get() = S(x) | reply x to get
  842   and S(x) | add(y) = S(x+y) | reply to add in
  843   S(x_0) | reply get,add to create
  844 ;;
  845 \end{verbatim}
  846 
  847 The adder has one state name $S$ and two methods $get$ and $add$.  We
  848 then try to figure out some ``normal'' runtime behavior for it.  As
  849 the initial \verb+S(x_0)+ is forked as soon as the adder definition
  850 has been created, a highly likely initial situation is that there is
  851 one message pending on $S$ and none on the other names.  Later on,
  852 as some external agent invokes the $get$ or $add$ method, the message
  853 pending on $S$ is consumed and the appropriate guarded process is
  854 fired.  Either process quickly sends a message on $S$.  Thus, a likely
  855 behavior is for the queue of $S$ to alternate between being empty and
  856 holding one element, the queue being empty for short periods.  By some
  857 aspects of the compilation of ``\verb+|+'' and of our scheduling
  858 policy that we will not examine here, this behavior is almost certain.
  859 
  860 As a matter of fact, this ``normal'' life cycle involves a blatant
  861 waste of memory, queue elements (cons cells) are allocated and
  862 deallocated in the general dynamic fashion, while the runtime usage of
  863 these cells would allow a more efficient policy.  It is more clever
  864 not to allocate a cell for the only message pending on $S$ and to use
  865 the queue entry attributed to $S$~in the runtime definition as a
  866 placeholder.  On the status side, this new situation is rendered by a
  867 new ``\status{1}'' status.  Hence, $S$ now possesses a three valued
  868 status: \status{0} (no message), \status{1} (one message in the queue
  869 slot) or \status{N} (some messages organized in a linked list).  Thus,
  870 assuming for the time being, that there may be an arbitrary number of
  871 messages pending on $S$, the adder compiles into the automaton of
  872 figure~\ref{adder} (adder names are taken in the order $S$, $get$,
  873 $add$). This new automaton can be seen as an evolution of the $A$,
  874 $B$, $C$ automaton of figure~\ref{abc}, with a slight change in
  875 channel names.
  876 
  877 \begin{figure}[th]
  878 \begin{toimage}
  879 .PS
  880 [
  881 circlerad=16*pt
  882 hu=3 *cm
  883 vu=3.5 *cm
  884 clab("\status{000}")
  885 {ln(-1,-1,"$get$",-1) ; clab("\status{0N0}")
  886   tourccw(-9*pi/10,"$get$")
  887   tourccw(pi/2,"$S$",dotted)
  888   {alncw(pi/4,-3*pi/4,1,1,"$S$",dotted)}
  889   {ln(1,-1,"$add$",1) ; clab("\status{0NN}")
  890     {alncw(3*pi/4,-pi/4,-1,1,"$S$",dashed)}
  891     {aln(pi/4,-3*pi/4,1,1,"$S$",dotted)}
  892     htccw(pi/3,-pi/2-pi/6,"$\begin{array}{l} get\\ add\end{array}$")
  893     tourccw(-5*pi/12,"$S$",dotted)
  894     tourcw(-7*pi/12,"$S$",dashed)
  895 }}
  896 {
  897   ln(0,-.67,"$S$",0) ; clab("\status{100}")
  898   {alncw(pi/2,-pi/2-pi/12,0,0.67,"$get$",dotted)}
  899   {aln(pi/2,-pi/2+pi/12,0,0.67,"$add$",dashed)}
  900   ln(0,-.67,"$S$",0) ; clab("\status{N00}")
  901   tourcw(5*pi/12,"$add$",dashed)
  902   tourccw(-3*pi/4,"$S$")
  903   tourccw(-17*pi/12,"$get$",dotted)
  904   {alncw(pi/2,-pi/2-pi/6,0,1.33,"$get$",dotted)}
  905   {aln(pi/2,-pi/2+pi/6,0,1.33,"$add$",dashed)}
  906 }
  907 {ln(1,-1,"$add$",1) ; clab("\status{00N}") ;
  908   tourcw(-pi/10,"$add$")
  909   tourcw(pi/2,"$S$",dashed)
  910   {aln(3*pi/4,-pi/4,-1,1,"$S$",dashed)}
  911   {ln(-1,-1,"$get$",-1)}}
  912 ]
  913 .PE
  914 \end{toimage}
  915 \showgraph
  916 \caption{\label{adder} Full automaton for the adder}
  917 \end{figure}
  918 
  919 Using the status~\status{1} not only spares memory, it also avoids
  920 some of the runtime tests that compute post-matching status.
  921 Basically, when a matching consumes the sole message pending on a name
  922 with status~\status{1}, then the automaton already knows that this
  923 name queue is empty.  For instance, when the automaton of
  924 figure~\ref{adder} is in the \status{100} status and that a message
  925 arrive on either one of the two methods, then the appropriate process
  926 is fired and status goes back to \status{000} without any runtime
  927 test.  By contrast, when the automaton is in the \status{00N} status
  928 and that a message arrive on $S$, the second guarded process is fired
  929 immediately, but a test on $add$ queue is then performed: if this
  930 queue is now empty then status goes back to \status{000}, otherwise
  931 status remains \status{00N}.  Receiving a message on $S$ when status
  932 is \status{0NN} is a bit more complicated. First, the automaton
  933 chooses to consume a message pending on either one of the two methods
  934 and to fire the appropriate process (figure~\ref{adder} does not
  935 specify this choice).  Then, the queue of the selected method has to
  936 be tested, in order to determine post-matching status.
  937 
  938 Status \status{1} is easy to implement using the {\tt join}
  939 compilation technique. The compiler issues different method codes for
  940 \status{100} and \status{N00}, and different codes can find
  941 $S$~argument at different places.  Implementing status \status{1} in
  942 {\tt jocaml} would be more tricky, since the encoding of states using
  943 bit-patterns would be far less straightforward than with
  944 \status{0}/\status{N} statuses only.  As a consequence, the {\tt
  945   jocaml} compiler does not perform the space optimization described
  946 in this section.
  947 
  948 \subsection{Taking advantage of semantical analysis}
  949 
  950 The automaton of figure~\ref{adder} has a \status{N00} status, to
  951 reflect the case when there are two messages or more pending on $S$.
  952 However, one quite easily sees that that status \status{N00} is
  953 useless.  First, as $S$~does not escape from the scope of its
  954 definition, message sending on~$S$ is performed at three places only:
  955 once initially (by \verb+S(x_0)+) and once in each guarded process.
  956 Thus, there is one message pending on $S$ initially.  A single message
  957 pending on~$S$ is consumed by any match and the process fired on that
  958 occasion is the only one to send one message on $S$. Therefore, there
  959 cannot be two messages or more pending on $S$. As a consequence the
  960 full automaton can be simplified by suppressing the \status{N00} node
  961 and every edge that starts from it or leads to it.
  962 
  963 In particular, there is no more $S$-labeled edge starting from node
  964 \status{100}. In the {\tt join} implementation this means that 
  965 the code entry for $S$ needs not be updated when going from status
  966 \status{000} to \status{100}. This entry is simply left as it is.
  967 Symmetrically, the code entry for $S$ will not have to be restored when
  968 status goes back to \status{000} after matching.
  969 
  970 
  971 
  972 
  973 Another important usage of semantical analysis is determining which names
  974 are state names.  For a given definition, the output of the
  975 analyzer is a status set~${\cal S}$, which is a safe approximation of
  976 the actual runtime statuses of that definition.  State names are the
  977 asynchronous names such that all statuses in ${\cal S}$ give them the
  978 status \status{0}~or~\status{1}.
  979 
  980 The current {\tt join} compiler includes a rudimentary
  981 name usage analyzer, which suffices for object definitions given in
  982 the style of 
  983 the $S$, $m_1$, $m_2$, \ldots, $m_n$ definitions , where all state
  984 variables are asynchronous and do not escape from the scope of their
  985 definition.
  986 
  987 An promising alternative would be to design an ad hoc syntax for
  988 distributed objects, or, and this would be more ambitious, a full
  989 object-oriented join-calculus. Then, the state variables of
  990 object-definitions would be apparent directly from user programs.
  991 
  992 
  993 
  994 \subsection{Avoiding status space explosion}\label{space}
  995 
  996 Consider any definition that defines $n$ names.  Ignoring
  997 \status{1}~statuses, the size of the status space of a given
  998 definition already is $2^n$.  The size of the non-matching status
  999 space is thus bounded by $2^n$.  As a full automaton for this
 1000 definition has one state per non-matching status, status space size
 1001 explosion would be a real nuisance in the case of the {\tt join}
 1002 compiler.  In particular, there are $n$~times the number of
 1003 non-matching statuses automaton code entries to create.
 1004 
 1005 Unfortunately the exponential upper bound is reached by practical
 1006 programs, as demonstrated by the general object-oriented definition
 1007 given at the beginning of this section~\ref{opt}.  In that case, all
 1008 definition statuses such that $S$ has the \status{0} status are
 1009 non-matching.  In such a situation, using runtime testing, as {\tt
 1010   jocaml} does, is not that much a penalty, when compared to code size
 1011 explosion.
 1012 
 1013 We thus introduce dynamic behavior in the automata.  We do so on a
 1014 name per name basis: the status of state names will be encoded by
 1015 automata states as before, whereas method statuses will now be
 1016 explicitly checked at runtime.  Thus, we introduce ``\status{?}'', a
 1017 new status, which means that nothing is known about the number of
 1018 messages pending on a name.  Additionally, we state that all methods
 1019 will have the \status{?} status, as soon as there is one message or
 1020 more pending on any of the methods.
 1021 
 1022 This technique can be seen as merging some states of
 1023 the full automaton compiled by considering complete status
 1024 information into new states with \status{?} statuses in them.
 1025 
 1026 For instance, in the adder example, we get the automaton of
 1027 figure~\ref{adder2}, where the three statuses \status{0N0},
 1028 \status{0NN} and \status{00N} of figure~\ref{adder} merge into the new
 1029 status~\status{0??}.  (Note that we also take advantage of name usage
 1030 analysis to delete status \status{N00}.)
 1031 
 1032 \begin{figure}[th]
 1033 \begin{toimage}
 1034 .PS
 1035 clab("\status{000}")
 1036 
 1037 {{iln(-.4,-.4) ; "$\begin{array}{l}get\\add\\[.5em]\end{array}$"}
 1038  ln(-1,-1,"",0) ; clab("\status{0??}")
 1039  tourccw(pi,"$\begin{array}{l}get\\add\\[.5em]\end{array}$")
 1040  tourcw(pi/4-pi/12,"$S$",dashed)
 1041  tourccw(pi/4+pi/12,"$S$",dotted)
 1042  {alncw(pi/4,-3*pi/4-pi/12,1,1,"$S$",dotted)}
 1043  {aln(pi/4,-3*pi/4+pi/12,1,1,"$S$",dashed)}
 1044 }
 1045 
 1046 {ln(1,-1,"$S$",0)
 1047    clab("\status{100}")
 1048    {aln(3*pi/4,-pi/4+pi/12,-1,1,"$add$",dashed)}
 1049    {alncw(3*pi/4,-pi/4-pi/12,-1,1,"$get$",dotted)}
 1050 }
 1051 .PE
 1052 \end{toimage}
 1053 \showgraph
 1054 \caption{\label{adder2} Final automaton for the adder}
 1055 \end{figure}
 1056 
 1057 Information on where runtime testing has to be performed can be
 1058 inferred from the diagram of figure~\ref{adder2}.  For instance,
 1059 assume that current status is \status{0??} and that a message arrives
 1060 on~$S$.  Since there is at least one message pending on a method, a
 1061 matching will occur.  Tests are needed though, before matching to
 1062 determine the matching clause, and after matching to determine
 1063 post-matching status.  Abstractly, the first series of tests changes
 1064 the \status{?}  statuses in either \status{0} or \status{N} statuses,
 1065 while the second series checks if there are still messages pending on
 1066 names with \status{?} status.  We are still investigating on how to
 1067 organize these tests efficiently without producing too much code
 1068 (see~\cite{Augustsson85,Maranget92} for a discussion of the size of
 1069 such code in the context of compiling ML~pattern-matching).
 1070 
 1071 By contrast, when status is \status{100} and that a message arrives on
 1072 $get$ or $add$, then the corresponding matching is known immediately
 1073 and the message pending on $S$ is consumed. Then, the queue for $S$ is
 1074 known to be empty and status can be restored to \status{000} with no
 1075 runtime testing at all.  As message arrival order is likely to be
 1076 first one message on $S$ and then one message on $get$ or $add$ the
 1077 final automaton of figure~\ref{adder2} responds efficiently to more
 1078 frequent case, still being able to respond to less frequent cases (for
 1079 instance, two messages on methods may arrive in a row).  Furthermore,
 1080 when trouble is over, the automaton has status \status{000} and is
 1081 thus ready for the normal case. In this example, a penalty in code
 1082 size is paid for improving code speed in the frequent, ``normal''
 1083 case, whereas this penalty is avoided in non-frequent cases, which are
 1084 treated by less efficient code.
 1085 
 1086 We introduced a \status{?} status on a name per name basis.  However,
 1087 there are other choices possible: {\em a priori}, there are many ways
 1088 to merge full automata states into final automata states. However, if
 1089 one really wants to avoid status space explosion the final automaton
 1090 should be constructed directly, without first constructing the full
 1091 automaton.  Adopting our per name \status{?} status makes this direct
 1092 construction possible. Additionally, the \status{?} status can be used
 1093 by the simple static analyzer as a status for the names it cannot
 1094 trace (e.g. names that escape their definition scope).  This
 1095 dramatically decreases the size of analyzer output and its running
 1096 time.
 1097 
 1098 \section{Optimizing further}
 1099 \label{fusion}
 1100 
 1101 We here describe a simple transformation on
 1102 join definitions, which does not rely on a full semantical analysis
 1103 (such as name usage analysis),
 1104 but only on a simple, local, syntactical analysis of join-patterns.
 1105 
 1106 
 1107 Let us take a simple example:
 1108 \begin{verbatim}
 1109 let S(x) | m_1(y) = P_1(x,y)
 1110 and S(x) | m_2(y) = P_2(x,y)
 1111   ....
 1112 and S(x) | m_n(y) = P_n(x,y)
 1113 ;;
 1114 \end{verbatim}
 1115 
 1116 In this example, a match occurs only when there are messages pending
 1117 both on $S$ and on one of the methods $m_1$, $m_2$,\ldots Thus, from the
 1118 synchronization
 1119 point of view, all the methods are equivalent. And indeed, we can
 1120 regroup them into one, single method channel $m$ by
 1121 transforming this join definition into:
 1122 
 1123 \begin{verbatim}
 1124 let S(x) | m(p,y) = P[p] (x,y);;
 1125 let m_1(y) = m(1,y);;
 1126 let m_2(y) = m(2,y);;
 1127  ....
 1128 let m_n(y) = m(n,y);;
 1129 \end{verbatim}
 1130 Where $P$ is the vector of processes $[P_1,P_2,...,P_n]$.
 1131 
 1132 Methods $m_1$ $m_2$,\ldots are now simple wrappers.  Method $m_i$ now
 1133 calls $m$ with an additional $i$~argument, which basically is the
 1134 index of $P_i$ in array $P$.  At this point, we must emphasize that we
 1135 describe this technique as a source to source transformation only for
 1136 clarity. However, the produced source code may not be correct with
 1137 respect to the join type system, when the types of methods are
 1138 different.  Anyhow, this optimization is implemented using ad hoc
 1139 mechanisms, this both improves efficiency and solves the typing
 1140 problem.
 1141 
 1142 Currently, this optimization is performed by the {\tt jocaml}
 1143 compiler.  This leads to a new interpretation of name indexes by the
 1144 ``join'' library.  The least significant bits in name indexes still
 1145 encode names that actually take part to synchronization (here $S$ and
 1146 $m$), while their most significant bits (which were previously unused)
 1147 now encode the extra $i$~argument.  This yields two benefits. First,
 1148 the number of status checks decreases, as the number of matching
 1149 statuses decreases.  Second, the number of channels that can be
 1150 defined in one definition can now exceed the hosting system integer
 1151 size, provided some names can be grouped together for synchronization.
 1152 
 1153 In the {\tt join} compiler, this technique might be used to reduce
 1154 automata size, since it lowers the number of non-matching statuses, by
 1155 reducing the number of synchronizing names.  Code entries for methods
 1156 $m_1$, $m_2$,\ldots would still be contained in the definition
 1157 structure, they would only stack the index of the process to fire, and
 1158 then call the code for method $m$. Moreover, they do not need to be
 1159 updated after each transition of the automaton.
 1160 
 1161 
 1162 Finally, this technique can also be applied to more complex
 1163 synchronizations.  Given a definition that defines names $x_1$, $x_2$,
 1164 \ldots, $x_n$, using patterns $J_1$, $J_2$, \ldots $J_m$.  We say that
 1165 two names are equivalent, when swapping them in the patterns yields
 1166 the same set of patterns.  We then replace equivalent names by a
 1167 single name, plus an index.
 1168 
 1169 Consider the following definition
 1170 \begin{verbatim}
 1171 let S_1(x) | m_1(y) = P_1(x,y)
 1172 and S_1(x) | m_2(y) = P_2(x,y)
 1173 and S_2(x) | m_1(y) = P_3(x,y)
 1174 and S_2(x) | m_2(y) = P_4(x,y)
 1175 ;;
 1176 \end{verbatim}
 1177 Then the set of defined names $\{S_1, S_2, m_1, m_2\}$ can be
 1178 partitioned into $\{S_1, S_2\}$ and $\{m_1, m_2\}$.
 1179 Then, the above program can be transformed into:
 1180 \begin{verbatim}
 1181 let S(p,x) | m(q,y) = P[p,q] (x,y);;
 1182 let m_1(y) = m(1,y);;
 1183 let m_2(y) = m(2,y);;
 1184 let S_1(y) = S(1,y);;
 1185 let S_2(y) = S(2,y);;
 1186 \end{verbatim}
 1187 (with $P[1,1] = P_1$, $P[1,2] = P_2$, $P[2,1] = P_3$ and $P[2,2] = P_4$)
 1188 
 1189 
 1190 \section{Conclusion and future work}\cutname{conclusion.html}
 1191 \label{conclusion}
 1192 
 1193 In join-calculus, a name definition, all receptors on that name and all
 1194 possible synchronizations on that name are grouped altogether in a single
 1195 join definition. This enables the compilation of synchronization
 1196 between concurrent or even distributed processes, using finite-state
 1197 automata.
 1198 In the distributed case, a message transport phase to the
 1199 machine that currently hosts the join definition (and hence the
 1200 automaton) is first performed.  This strengthens our point of view
 1201 that the join-calculus is the core of a distributed programming
 1202 language that can be compiled in practice, mainly because it restricts
 1203 reception on a channel to statically known parts of the program.  The
 1204 same argument applied to \`a la ML polymorphic typing
 1205 in~\cite{join-typing-97}.
 1206 
 1207 \begin{cutflow}{Benchmarks}\cutname{benchmarks.html}
 1208 \aname{bench}{}
 1209 \begin{table}[ht]
 1210 \begin{center}
 1211 \begin{tabular}{l@{\quad}r@{\quad}r@{\quad}r@{\quad}r@{\quad}r}
 1212 & \texttt{fib} &
 1213 \texttt{afib} &
 1214 \texttt{pat} &
 1215 \texttt{qsort} &
 1216 \texttt{count}\\ \hline
 1217 \texttt{join} &  32.0 & 14.5 & 37.2 & 9.9 & 16.4 \\
 1218 \texttt{jocaml} & 5.7 &  3.5 & 5.4 & 1.4 & 4.2 \\
 1219 \texttt{Bologna} & 11.9 & 6.2 &  9.4 & 16.8 & 5.3 \\
 1220 \end{tabular}
 1221 \end{center}
 1222 \caption{\label{perf}Some performance measures (wall-clock time, in seconds)}
 1223 \end{table}
 1224 Benckmarks sources are available on the \footahref{http://join.inria.fr/speed/}{web}
 1225 \end{cutflow}
 1226 Taking a few benchmarks (see table~\ref{perf}, or \ahrefloc{bench}{here})
 1227 as a set of sensible join programs, both the
 1228 {\tt join} and the {\tt jocaml} pattern matching compilation schemes
 1229 prove adequate.  In particular, none of the two schemes falls into the
 1230 pitfall associated to the compilation technique used.
 1231 
 1232 In the {\tt join} case, one can be afraid of code size, the technique
 1233 exposed in section~\ref{space} successfully avoids code size explosion
 1234 in practical cases.  The {\tt jocaml} technique appears expensive in
 1235 runtime checks and thus {\em a priori} produces slow code.  We choose
 1236 such a scheme of implementing automata using generic code, because it
 1237 can be implemented simply by adding code to the Objective Caml
 1238 bytecode interpreter.  Using bytecode specialized for automata
 1239 manipulation would have implied more important modifications of the
 1240 Objective Caml bytecode interpreter. Moreover, the {\tt jocaml} system
 1241 runs faster than the {\tt join} system, even for pure join programs,
 1242 showing that its weaker compilation of join definitions is more than
 1243 compensated by its total integration in the Objective Caml system.
 1244 
 1245 Comparison with the Bologna implementation~\cite{Bologna} of the
 1246 join-calculus is also instructive.  This system also produces
 1247 bytecode, which is interpreted by a C~program.  It proves faster than
 1248 {\tt join} and slower that {\tt jocaml} on most of the examples.
 1249 Taking a glance at the Bologna source code reveals that it uses a
 1250 scheme very similar to the one of {\tt jocaml}, with two slight
 1251 differences.  First, status is systematically encoded as an array of
 1252 integers.  Second when a message arrives on a name $x$ with an empty
 1253 queue, all patterns are tested (whereas in {\tt jocaml} only the
 1254 patterns that contain $x$ are tested).
 1255 
 1256 Performance of a given join system depends on many factors. In
 1257 particular, scheduling policy and message queue management have a
 1258 dramatic impact on it. Furthermore, a policy that gives good results
 1259 on one benchmark can be defeated by another.  For these reasons, we
 1260 cannot tell which compilation technique is the best by comparing
 1261 different implementations.
 1262 
 1263 
 1264 
 1265 Clearly, we now need to integrate all our compilation techniques in
 1266 the same system, in order to compare them more thoroughly and to
 1267 experiment further.  However, the definition of reactivity status and
 1268 the automata of section~\ref{automatas} provide a sound
 1269 basis for these future investigations.  Apart from future language
 1270 development and fully implementing the failure semantics of the
 1271 join-calculus, we also plan to investigate more on the implementation
 1272 of threads, attempting to minimize thread suspension and creation.
 1273 
 1274 
 1275 
 1276 \bibliography{bib}
 1277 \bibliographystyle{abbrv}
 1278 
 1279 
 1280 \end{document}
 1281 
 1282 %\subsection{Playing with the runtime scheduler}
 1283 %
 1284 %Concentrating on the {\it object style} of programming, lots of
 1285 %methods do not modify the automaton status, but only read or modify
 1286 %state names (such as {\tt set} and {\tt get}), with only little
 1287 %non-blocking computations. Most of the time, these methods are never
 1288 %interrupted, and thus behave as in critical sections from the
 1289 %threads scheduler point of view.
 1290 %
 1291 %As a consequence, when such a method is called, an optimization would
 1292 %be not to modify the automaton status at all, since the method will
 1293 %end by giving back its original status to the automaton. To prevent
 1294 %the automaton to receive other messages from other methods while in an
 1295 %incoherent state, these methods are executed in a critical section.
 1296 %
 1297 %This optimization relies on a simple analysis of methods, which can
 1298 %also be used to decrease the number of forked threads, since such
 1299 %methods do not need to be executed in a forked thread.