## "Fossies" - the Fresh Open Source Software Archive

### Member "hevea-2.35/examples/pat.tex" (16 Jan 2021, 53021 Bytes) of package /linux/www/hevea-2.35.tar.gz:

As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 \documentclass{article}
2 \usepackage{fullpage}
3 \usepackage{hevea}
4
6 \input{pat.def}
7
8
9 \def\showgraph{\par\medskip\centerline{\begin{toimage}
10 \box\graph
11 \end{toimage}
12 \imageflush}\medskip}
13 \def\status#1{{\tt #1}}
14
15 \title{Compiling Join-Patterns
16 \footnote{
17 This work is partly supported by the ESPRIT
18     CONFER-2 WG-21836}}
19 \author {Luc Maranget \and Fabrice
20 Le~Fessant\\[.5em]
21 INRIA Rocquencourt,
22     BP 105, 78153 Le Chesnay Cedex France.
23 \\[.5em]
24 {\tt\small \{Luc.Maranget, Fabrice.Le-Fessant\}@inria.fr}
25 }
26
27 \date {}
28
29
30 \htmlfoot{\rule{}{}\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}}
31 \htmlhead{\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}\rule{}{}}
33 \begin{document}
34
35
36
37 \maketitle
38 \thispagestyle{empty}
39
40 \begin{abstract}
41
42   The join-calculus is both a name passing calculus and a core
43   language for concurrent and distributed programming.  An essential
44   part of its implementation is the compilation of {\em
45     join-patterns}.  Join-patterns define new channels and all the
46   synchronizations they take part to at the same time.  Relying on the
47   experience based on our two implementations, we study the
48   translation of join-patterns into deterministic finite-state
49   automata as well as some related optimizations.
50
51
52 \end{abstract}
53 \tableofcontents
54
55 \section {Introduction}
56
57 %%Baratin
58 Join-pattern is the distinctive feature of the join-calculus, seen
59 both as a process calculus and as a programming language.  On the
60 calculus side, join-calculus can roughly be seen as a functional
61 calculus plus join-patterns, thus achieving the same expressive power
62 as previous name-passing process calculi~\cite{Milner-pi-calculus}.
63 Join-definitions are made of several clauses, each clause being a pair
64 of a join-pattern and of a guarded process.  A join-pattern expresses
65 a synchronization between several {\em names} (or channels). When
66 messages are pending on all the names that appear in a given
67 join-pattern, then the corresponding clause is said to be active and
68 its guarded process {\em may} be fired.  A definition whose
69 join-patterns share some names expresses sophisticated
70 synchronizations. In such a definition, a message on a name that
71 appears in several active clauses is consumed as soon as one of the
72 corresponding guarded processes is fired.
73
74 Join-languages are built on top of the join-calculus taken as a core
75 language.  Therefore, names are first-class citizens, computations are
76 first abstracted as collections of asynchronous processes, and
77 join-patterns provide an unique, clear and powerful mechanism for
78 synchronizing these computations.  The documentation for the
79 join-language~\cite{Join} includes a tutorial that shows how join
80 definitions may encode classical synchronization constructs such as
81 locks, barriers, shared counters,\ldots
82
83 %%%Conclusion de l'intro
84 On the implementation side, join-patterns are meant to be heavily used
85 by programmers, as the only synchronization primitive available.
86 Thus, their compilation requires much care.  At the moment, we propose
87 two compilers: the {\tt join} compiler~\cite{Join}, a language of its
88 own, and the {\tt jocaml} compiler~\cite{Jocaml}, an extension of the
89 Objective Caml functional language.
90
91 %%% Plan
92 Section~\ref{join-calculus} of this paper succinctly presents the
93 join-calculus syntax and semantics. Then, section~\ref{automatas}
94 introduces the kind of automata we use to compile
95 join-synchronization, while section~\ref{implementation} presents two
96 techniques for implementing them.  The first technique directly
97 derives from automata description and is used in our {\tt join}
98 compiler. The second technique performs some extra runtime tests, this
99 is the technique used in our {\tt jocaml} compiler.
100 Sections~\ref{optimization} and~\ref{fusion} discuss
101 optimizations and section~\ref{conclusion} concludes.
102
103
104
105
106 \section{A rapid tour of the join-calculus}
107 \label{join-calculus}
108 \subsection {Syntax}
109
110 We here rephrase the traditional presentation of the core
111 join-calculus~\cite{FournetGonthier96}, where names are
112 the only value. Thus, we ignore the issue of system primitives and
113 constants, since
114 names provide sufficient expressive power for our purpose of
115 describing our implementation of pattern matching (However we use
116 primitives and constants in our examples).
117 We slightly change the syntax of~\cite{FournetGonthier96}, in order to
118 match the one of
119 the {\tt join} programming language.
120
121 We use $x$ to denote a name in general.
122 $$123 \begin{array}{l@{\hspace{2cm}}l} 124 \begin{array}{rcl} 125 P & ::= & x\tuple{x_i\inu p}\\ 126 & \mid & \Def D in P\\ 127 & \mid & P | P 128 \end{array} 129 & 130 \begin{array}{rcl} 131 D & ::= & J |> P \\ 132 & \mid & D \And D\\[1em] 133 134 J & ::= & x\tuple {x_i\inu p}\\ 135 & \mid & J | J 136 \end{array} 137 \end{array} 138$$
139
140 A process $P$ is either a message, a defining process, or a parallel
141 composition of processes (note that names are polyadic, meaning that
142 messages may be made of several values);
143 %
144 a definition $D$ consists of one or several clauses $J = P$ that
145 associate a guarded process $P$ to a specific message pattern $J$;
146 %
147 a join-pattern $J$ consists of one or several messages in parallel. We
148 say that the pattern $J = \ldots x\tuple {x_i \inu p}\ldots$
149 defines the name $x$ and that a definition defines the set of the names
150 defined by its patterns.
151 Moreover, patterns are linear, i.e. names may appear at most once in
152 a given pattern.
153
154 Processes and definitions are known modulo renaming of bound
155 variables, as substitution performs $\alpha$-conversion to avoid
156 captures.
157
158 \subsection{Semantics}
159
160
161 This semantics is specified as a reflexive chemical abstract machine
162 (RCHAM)~\cite{BeBo92}. The state of the computation
163 is a {\em chemical soup} $\dd \tth \pp$ that consists of two
164 multisets: active definitions~$\dd$ and running processes~$\pp$.
165
166 The chemical soup evolves according to two families of rules: {\em
167   Structural rules} $\redstruct$ are reversible ($\heat$ is heating,
168 $\cool$ is cooling); they represent the syntactical rearrangement of
169 terms (heating breaks terms into smaller ones, cooling builds larger
170 terms from their components).  {\em Reduction rules} $\redsoupe$
171 consume specific processes present in the soup, replacing them by some
172 others; they are the basic computation steps.
173 %
174
175 We present simplified chemical rules
176 (see~\cite{FournetGonthier96,FGLMR96_Calculus_mobile_agent} for
177 the complete set of rules).
178 Following the chemical tradition, every rule applies on any matching
179 subpart of the soup, non-matching sub-parts of the soup being left
180 implicit.
181 $$182 \let \N \name 183 \let \su \varphi 184 \begin{array}{rl@{~}c@{~}rl@{\quad}l} 185 &\tth P_1 | P_2 &\redstruct& &\tth P_1, P_2 &\mbox{S-Par}\\ 186 D_1 \And D_2 &\tth &\redstruct& D_ 1, D_2 &\tth &\mbox{S-And}\\ 187 &\tth \Def D in P &\redstruct& D &\tth P 188 &\mbox{S-Def} \\[1em] 189 J |> P &\tth \su (J) &\redsoupe & J |> P &\tth \su (P) 190 &\mbox{R-\beta} 191 \end{array} 192$$
193 Two of the rules above have side conditions:
194 \begin{itemize}
195 \item (\name{S-Def})
196 the names defined in $D$ must not appear anywhere
197 in solution but in the reacting process and definition~$D$ and $P$.
198 This condition is global; in combination with $\alpha$-renaming it
199 enforces lexical scoping.
200
201 \item (\name{R-$\beta$}) $\varphi (\cdot)$ substitute actual names for
202 the received variables in $J$ and $P$.
203
204 \end{itemize}
205 Additionally, we only consider well-typed terms and reductions.
206 See~\cite{join-typing-97} for details on a rich polymorphic type system
207 for the join calculus. Here, this mostly amounts to assuming that
208 message and name arity always agree.
209
210
211 In this paper, we take particular interest in the reduction (\name{R-$\beta$}).
212 Informally, when there are messages pending on all the names defined
213 in a given join-pattern, then the process guarded by this join-pattern
214 may be fired. When firing is performed, we say that a {\em matching} occurs.
215 On the semantics level, there is a message $\tuple{x_i\inu p}$ pending
216 on a name~$x$ when there is an active molecule
217 ~$x\tuple {x_i\inu p}$ in the chemical soup.
218
219 Thus, we may define the reactivity status of a given chemical soup as
220 the multiset of the active molecules in it.
221 Later on in this paper, we shall consider various abstractions of this
222 reactivity status.
223
224
225 \subsection{The join programming languages}
226
227 Apart from primitives, join-languages support {\em synchronous}
228 names, which the core join-calculus does not provide directly.
229 Synchronous names send back results, a bit like functions.
230 However synchronous names may engage in any kind of
231 join-synchronization, just as asynchronous names do.
232 A program written using synchronous names can be translated
233 into the core join-calculus alone.
234 The translation is analogous to continuation passing style
235 transformation in the $\lambda$-calculus.
236 In our implementation, as far as pattern matching is concerned,
237 a synchronous name
238 behave like it was asynchronous and carried one additional continuation
239 argument. All implementation difficulties concentrate in managing this extra
240 argument, whose presence had no effect on pattern matching itself.
241
242
243
244 The {\tt join} language~\cite{Join} is our first prototype. All
245 examples in this paper are in {\tt join} syntax.  The system consists
246 in a bytecode compiler and a bytecode interpreter.  Both compiler and
247 interpreter are Objective Caml~\cite{Objective-Caml} programs and it
248 is easy to lift Objective Caml data types and functions into {\tt
249   join} abstract data types and primitives.  For instance, {\tt join}
250 programs easily draw graphics, using the graphics Objective Caml
251 library.  As a consequence, {\tt join} can be seen either as a
252 language of its own, featuring many primitives, or as a distributed
253 layer on top of Objective Caml.  Continuations are encoded using
254 ad hoc threads, which are created and scheduled by the {\tt join}
255 bytecode interpreter.
256
257
258 The {\tt jocaml} system is our second implementation. In {\tt jocaml},
259 all join-calculus constructs for concurrency, communication,
260 synchronization and process mobility are directly available as
261 syntactical extensions to Objective Caml.  On the runtime environment
262 side, we have supplemented the original Objective Caml runtime system
263 (which already provides a thread library) with a special join''
264 library and a distributed garbage collector~\cite{GCDistri}. On the
265 compiler side, the Objective Caml compiler has been extended to
266 translate join-calculus source code into functions calls to the
267 join'' library.  However, we also introduced a few new instructions
268 to Objective Caml bytecode, but only to handle code mobility, a
269 feature orthogonal to pattern matching.  The {\tt jocaml} system is
270 currently available as a prototype version~\cite{Jocaml}.
271
272 \section{Pattern matching in join definitions}
273 \label{automatas}
274 \subsection{Principle}
275
276 Consider the following join definition:
277 \begin{verbatim}
278 let A(n) | B() = P(n)
279 and A(n) | C() = Q(n)
280 ;;
281 \end{verbatim}
282 This defines three names $A$, $B$ and $C$. Name $A$ has arity one,
283 whereas names $B$ and $C$ have arity zero.  Names may be synchronous
284 or asynchronous, depending on whether there are
285 \verb+reply ... to ...+
286 constructs applying to them inside the guarded processes~$P(n)$
287 and $Q(n)$ or not.
288
289 According to the general join-calculus semantics, the guarded process
290 $P(n)$ may be fired whenever there are some messages pending on $A$ and
291 $B$. Similarly, $Q(n)$ may be fired whenever there are some messages
292 pending on $A$ and $C$. In both cases, the formal parameter $n$ is
293 replaced by (or bound to in the implementation) one of the messages
294 pending on $A$.
295
296 Reactivity information is to be considered at the definition level,
297 since matching is indeed performed at this level.  Moreover, in order
298 to use finite-state automata, we want this information to range on a
299 finite set of possible values.  As far as matching is concerned and by
300 the linearity of patterns, only the presence or absence of messages
301 matters.  Thus, let us call \status{0} the status of a name without
302 any message pending, and \status{N} the status of a name with at least
303 one message pending.  Then, the status of a definition is a tuple
304 consisting of the statuses of the names it defines, once some
305 arbitrary order of these names has been adopted.
306
307 For instance, if some messages are pending on $B$ and $C$, whereas
308 none is pending on $A$, then the status of the $A$, $B$, $C$
309 definition is a three-tuple written \status{0NN}.
310
311 A {\em matching status} is defined as a status that holds enough
312 \status{N}, so that at least one guarded process can be fired.
313
314 Definition status evolves towards matching status as messages arrive.
315 This yields a first kind of increasing'' transitions.  More
316 specifically, when a message arrives on some name, then this name
317 status either evolves from \status{0} to \status{N} or remains
318 \status{N}.  Definition status evolves accordingly.  In the $A$, $B$,
319 $C$ case we get the following transitions.  (In this diagram,
320 transitions are labeled by the name that gets a new message and
321 matching statuses are filled in gray.)
322 \begin{toimage}
323 .PS
324 cm = 0.3937;
325 pt = 1/72;
326 hu = 2 * cm;
327 vu = 1.5 * cm
328 circlerad = 12 * pt
330 rblabht = 2 * circlerad
331 rblabwid = 4 * circlerad
332 fillval=0.3
333 pi=3.1415927
334
335 define clab {
336     {circle at Here $1$2 $3} 337 } 338 339 define ilab { 340 {box invis$1 $2$3 at Here}
341 }
342
343 define ln {
344     L: [circle rad 0]
345     if $4 < 0 then { 346 box$3 rjust wid 0 ht 0 at L.c + ($1*(hu/3)-circlerad/4,$2*(vu/3))
347     } else {if $4 > 0 then { 348 box$3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+circlerad/4,$2*(vu/3))
349     } else {
350       box $3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+ 0.001 * cm, $2*(vu/2.5)) 351 }} 352 {line -> chop from L.c to L.c + ($1*hu, $2*vu)} 353 move to L.c + ($1*hu, $2*vu) 354 } 355 356 define dln { 357 {line -> dotted right$1*hu up $2*vu chop} 358 move right$1*hu up $2*vu 359 } 360 361 define htccw { 362 psi=$1
363   phi=$2 364 { 365 c0=circlerad * 2 * cos(psi/2) * cos(phi+psi/2) 366 s0=circlerad * 2 * cos(psi/2) * sin(phi+psi/2) 367 c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi) 368 c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi+psi/2) 369 s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi+psi/2) 370 c3=circlerad * cos(phi+psi) ; s3 = circlerad * sin(phi+psi) 371 A: [circle rad 0.0] 372 move to A.c + (c1,s1) 373 arc$4 rad circlerad to A.c + (c2,s2)
374   arc -> $4 rad circlerad to A.c + (c3,s3) 375 circle invis$3 with.c at A.c + (c0,s0)}
376 }
377
378 define htcw {
379   psi=$1 380 phi=$2
381   {
382   c0=circlerad * 2 * cos(psi/2) * cos(phi-psi/2)
383   s0=circlerad * 2 * cos(psi/2) * sin(phi-psi/2)
384   c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi)
385   c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi-psi/2)
386   s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi-psi/2)
387   c3=circlerad * cos(phi-psi) ; s3 = circlerad * sin(phi-psi)
388   A: [circle rad 0.0]
389   move to A.c + (c1,s1)
390   arc cw $4 rad circlerad to A.c + (c2,s2) 391 arc cw ->$4 rad circlerad to A.c + (c3,s3)
392   circle invis $3 with.c at A.c + (c0,s0)} 393 } 394 395 define tourccw { 396 htccw(pi/2,$1,$2,$3)
397 }
398
399 define tourcw {
400   htcw(pi/2,$1,$2,$3) 401 } 402 403 hu=3*cm 404 [ 405 vu=2*cm 406 circlerad=15*pt 407 clab("\status{000}") 408 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}") ; tourccw(pi/2,"$B$") 409 {ln(0,-1,"$A$",0) ; clab("\status{NN0}",fill) 410 tourccw(pi,"$A, B$") 411 ln(1,-1,"$C$",-11)} 412 {ln(1,-1,"$C$",1) ; clab("\status{0NN}") 413 tourcw(pi/8,"$\,B, C$") 414 ln(0,-1,"$A$",-1) ; clab("\status{NNN}",fill) 415 tourccw(5*pi/4,"$A,B,C$")}} 416 {ln(0,-1,"$A$",0) ; clab("\status{N00}") ; tourcw(pi/2.5,"$A$") 417 {ln(-1,-1,"$B$",-1)} 418 {ln(1,-1,"$C$",1)}} 419 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ; tourcw(pi/2,"$C$") 420 {ln(-1,-1,"$B$",1)} 421 {ln(0,-1,"$A$",0) ; clab("\status{N0N}",fill) tourcw(0,"$A, C$") 422 ln(-1,-1,"$B$",1)}} 423 ] 424 .PE 425 \end{toimage}%$
426 \showgraph
427
428
429 Definition status also evolves when matching occurs. This yields
430 new, decreasing'', transitions that we call matching transitions.
431 According to the join-calculs semantics,
432 matching may occur at any moment (provided of course that
433 matching is possible).
434 As a consequence, matching transitions start from matching states and
435 they are unlabelled.
436 In the $A$, $B$, $C$ case, they are as follows:
437 \begin{toimage}
438 .PS
439 lab=0.5
440
441 define iln {
442     move right $1*hu up$2*vu
443 }
444
445 aphi=pi/4
446
447 define alncw {
448  c= circlerad*cos($1) ; s = circlerad*sin($1)
449  c2 =  circlerad*cos($2) ; s2 = circlerad*sin($2)
450  C: [circle rad 0]
451  if $4 < 0 then {abs = -$4} else {abs = $4} 452 x1 = c ; y1 = s ; x2 =$3*hu+c2 ; y2 = $4*vu+s2 453 d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1) 454 ar = sqrt(d) / (2*sin(aphi)) 455 D: arc cw ->$6 rad ar from C.c + (x1,y1) to C.c + (x2,y2)
457  x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1)
458  d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y))
459  line invis from D.c to r/d  <D.c,(x3,y3)> ; $5 460 } 461 462 define aln { 463 c= circlerad*cos($1) ; s = circlerad*sin($1) 464 c2 = circlerad*cos($2) ; s2 = circlerad*sin($2) 465 C: [circle rad 0] 466 if$4 < 0 then {abs = -$4} else {abs =$4}
467  x1=c ; y1 = s ; x2 = $3*hu+c2 ; y2 =$4*vu+s2
468  d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1)
469   ar = sqrt(d) / (2*sin(aphi))
470  D: arc -> $6 rad ar from C.c + (x1,y1) to C.c + (x2,y2) 471 r=last arc.rad 472 x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1) 473 d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y)) 474 line invis from D.c to r/d <D.c,(x3,y3)> ;$5
475 }
476
477 clab("\status{000}")
478 {iln(-1,-1,invis) ; clab("\status{0N0}")
479   {iln(0,-1,"$B$") ; clab("\status{NN0}",fill) ;
480     {dln(0,1)} ; {dln(1,1)} ; {dln(1,2)}
481     tourccw(pi,"",dotted) ; iln(1,-1,"$C$")}
482   {iln(1,-1,"$C$") ; clab("\status{0NN}")
483     iln(0,-1,"$B$") ; clab("\status{NNN}",fill)
484      tourccw(5*pi/4,"",dotted)
485      {dln(0,1)} ; {dln(-1,1)} ; {dln(1,1)} ;
486      {aln(3*pi/8,-pi/3,0,2,"",dotted)}
487      {dln(-1,2)} ; {dln(1,2)}}}
488 {iln(0,-1,"$B$") ; clab("\status{N00}")
489   {iln(-1,-1,"$A$")}
490   {iln(1,-1,"$C$")}}
491 {iln(1,-1,"$C$") ; clab("\status{00N}")
492   {iln(-1,-1,"$A$")}
493   {iln(0,-1,"$B$") ; clab("\status{N0N}",fill) ;
494     {dln(0,1)} ; {dln(-1,1)} ; {dln(-1,2)}
495     tourcw(0,"",dotted) ; iln(-1,-1,"$A$")}}
496 .PE
497 \end{toimage}
498 \showgraph
499
500
501 Observe that there may be several matching transitions starting from a
502 given status.  This is not always a consequence of the
503 non-deterministic semantics of the join-calculus.
504
505 Often, ambiguity is only apparent.  For instance, matching transitions
506 starting from \status{NN0} lead to \status{NN0}, \status{N00},
507 \status{0N0} and \status{000}.  When such a matching occurs, two
508 messages are consumed (one pending on $A$ and one pending on $B$)
509 then, depending on whether there are some messages left pending on $A$
510 and $B$ or not, status evolves to \status{NN0}, \status{N00},
511 \status{0N0} or \status{000}.  From the implementation point of view,
512 this means that a little runtime testing is required once matching has
513 been performed. Here, we pay a price for using finite-state automata.
514
515 However, some true non-determinism is still present.  Consider
516 status~\status{NNN} for instance.  Then, both guarded processes of the
517 $A$, $B$, $C$ definition can now be fired.  The choice of firing
518 either $P(n)$ or $Q(n)$ will result in either consuming one message
519 pending on $A$ and one on $B$, or consuming one message pending on $A$
520 and one on $C$.
521
522 Finally, a view of join-matching compilation can be given by taking
523 together both kinds of transitions. This yields a non-deterministic
524 automaton.
525
526 Note that matching of non-linear patterns  can also be compiled using automata.
527 For instance, if a name appears at most twice in one or more pattern,
528 then it status will ramge on $\status{0}$, $\status{1}$ and $\status{N}$.
529 We do not present this extension in greater detail, for simplicity, and
530 because we do not implement non-linear patterns.
531
532 \subsection{Towards deterministic automata}
533
534 For efficiency and simplicity reasons we choose to implement matching
535 using deterministic automata that react to message reception.
536
537 Fortunately, it is quite possible to do so. It suffices to perform
538 matching as soon as possible.  More precisely, when a message arrives
539 and carries definition status into matching status, matching is
540 performed immediately, while definition status is adjusted to reflect
541 message consumption.  This results in pruning the status space just
542 below matching statuses.
543
544
545 In practise, in the $A$, $B$, $C$ case, we get the
546 automaton of figure~\ref{abc}.
547 \begin{figure}[th]
548 \begin{toimage}
549 .PS
551 hu=3 *cm
552 vu=3 *cm
553 clab("\status{000}")
554 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}")
555   tourccw(-9*pi/10,"$B$")
556   tourccw(pi/2,"$A$",dotted)
557   {alncw(pi/4,-3*pi/4,1,1,"$A$",dotted)}
558   {ln(1,-1,"$C$",1) ; clab("\status{0NN}")
559     htccw(pi/4,-pi/2-pi/8,"$B, C$")
560     {alncw(3*pi/4,-pi/4,-1,1,"$A$",dashed)}
561     {aln(pi/4,-3*pi/4,1,1,"$A$",dotted)}
562     tourccw(-5*pi/12,"$A$",dotted)
563     tourcw(-7*pi/12,"$A$",dashed)
564 }}
565 {ln(0,-1,"$A$",0) ; clab("\status{N00}")
566   tourcw(5*pi/12,"$A$")
567   tourccw(-3*pi/4,"$B$",dotted)
568   tourccw(-17*pi/12,"$C$",dashed)
569   {alncw(pi/2,-pi/2,0,1,"$B$",dotted)}
570   {aln(pi/2,-pi/2,0,1,"$C$",dashed)}
571 }
572 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ;
573   tourcw(-pi/10,"$C$")
574   tourcw(pi/2,"$A$",dashed)
575   {aln(3*pi/4,-pi/4,-1,1,"$A$",dashed)}
576   {ln(-1,-1,"$B$",-1)}}
577 .PE
578 \end{toimage}
579 \showgraph
580 \caption{\label{abc} Automaton in the $A$, $B$, $C$ case}
581 \end{figure}
582
583 Observe that all transitions are now labeled and that a name labels a
584 transition when message reception on this name triggers that
585 transition.  Furthermore, matching transitions that correspond to
586 firing $P(n)$ or firing $Q(n)$ are now represented differently (the
587 former by a dotted arrow, the latter by a dashed arrow).  This
588 highlights the difference between false and true non-deterministic
589 transitions: real non-determinism is present when there are both
590 dotted and dashed edges with the same label starting from the same
591 node.
592
593 For instance, there are two $B$-labeled dotted transitions starting
594 from $\status{N00}$.  Non-determinism is only apparent here, since
595 $P(n)$ is fired in both cases and that the selected transition depends
596 only on whether there is at least one message left pending on $A$
597 or not after firing $P(n)$.
598
599
600 By contrast, from status \status{0NN}, the automaton may react to the
601 arrival of a message on $A$ in two truly different manners, by firing
602 either $P(n)$ or $Q(n)$.  This is clearly shown in figure~\ref{abc} by
603 the  $A$-labeled edges that start from status~\status{0NN}, some of
604 them being dashed and the others being dotted.  A simple way to avoid
605 such a non-deterministic choice at run-time is to perform it at
606 compile-time.  That is, here, we suppress either dotted or dashed
607 $A$-labeled transitions that start from $\status{0NN}$.
608
609 In the rest of the paper, we take automata such as the one of
610 figure~\ref{abc} as suitable abstractions
611 of join-pattern compilation output.
612
613 \subsection{Automata and semantics}
614
615 Both the match as soon as possible'' behavior and the deletion of
616 some matching transitions have a price in terms of semantics.
617 More precisely, some CHAM behaviors now just cannot be observed
618 anymore.
619 However,
620 the CHAM semantics is a non-deterministic one: an initial
621 configuration of the CHAM {\em may} evolve into a variety of configurations.
622 Furthermore, there is no fairness constraint of any kind and
623 no particular event is required to occur.
624
625 As an example of the consequence of the match as soon as possible''
626 behavior, consider this definition:
627 \begin{verbatim}
628 let A() = print(1);
629 and B() = print(2);
630 and A() | B() = print(3);
631 ;;
632 \end{verbatim}
633 Then, we get the following automaton:
634 \begin{toimage}
635 .PS
636 clab("\status{00}")
637  tourcw(-3*pi/4,"$A$",dotted)
638  tourccw(-pi/4,"$B$",dashed)
639 .PE
640 \end{toimage}
641 \showgraph
642 Status \status{NN} that is preceded by the two matching statuses
643 \status{0N} and \status{N0} cannot be reached. As a consequence, the
644 above program will never print a {\tt 3}, no matter how many messages
645 are sent on $A$ and $B$.
646
647 Next, to illustrate the effect of deleting ambiguous matching
648 transitions, consider the following definition:
649 \begin{verbatim}
650 let A() = print(1);
651 and A() | B() = print(2);
652 \end{verbatim}
653 Such a definition will get compiled into one of the following
654 deterministic automata:
655 \begin{toimage}
656 .PS
657 {clab("\status{00}")
658  tourccw(pi/4,"$A$",dotted)
659  {ln(0,-1,"$B$",0)
660    clab("\status{0N}")
661    tourcw(pi/4,"$B$");
662    tourcw(-3*pi/4,"$A$",dotted)
663  }
664 }
665 move right 1.5*hu
666 {clab("\status{00}")
667  tourccw(pi/4,"$A$",dotted)
668  {ln(0,-1,"$B$",0)
669    clab("\status{0N}")
670    tourcw(pi/4,"$B$");
671    tourcw(-3*pi/4,"$A$",dashed)
672    {alncw(pi/2,-pi/2,0,1,"$A$",dashed)}
673  }
674 }
675 .PE
676 \end{toimage}
677 \showgraph
678 In the case of the left automaton, only {\tt 1} will ever get printed.
679 In the case of the right
680 automaton, {\tt 2} will be printed when some messages arrives on~$B$
681 and then on~$A$.
682 Both automata lead to correct implementations of the semantics. However
683 the second automata looks a better choice than the first one, since it
684 yields more program behaviors.
685
686
687 \section{Runtime definitions}
688
689 \label{implementation}
690 \subsection{Basics}
691
692 Names are the basic values of the join-calculus, and thus any
693 implementation of the join-calculus must supply a runtime
694 representation for them. For instance, a name can be sent on some
695 appropriate channel. At runtime, we must indeed send something.
696
697 However, names that are defined together in the same join definition
698 interact when matching is tested for and performed.  Moreover, by the
699 very idea behind the join-calculus, matching is the only
700 synchronization primitive.  In other words, only names that are
701 defined by the same join definition have some kind of interaction that
702 is of the runtime system responsibility.
703
704
705 This makes possible and desirable to compile a source definition into
706 a runtime definition'', a single vector structure that groups all
707 the names defined in a given definition.  Names must still exist as
708 individuals, they can be represented as infix pointers into their
709 definition (as in {\tt join}), or as a definition pointer and an index
710 (as in {\tt jocaml}).
711
712 Both the {\tt join} and {\tt jocaml} systems implement the automata of
713 the previous section.  However, they do so in quite different manners.
714 The former focuses on minimizing runtime testing, while the latter
715 involves a systematic runtime testing of the current status at every
716 message arrival.
717
718 \subsection{Definitions in {\tt join}}
719
720 Runtime definitions are vector structures.  Every name defined in a
721 definition occupies two slots in the vector structure.  The first
722 entry holds a code pointer that stands for the name itself, while the
723 second entry holds a pointer to a queue of pending messages, queues
724 being organized as linked lists.  Runtime definitions include
725 additional slots that hold the values of the variables that are free
726 in guarded processes. This technique resembles much the one used by
727 the SML/NJ compiler~\cite{Appel-compiling} to represent mutually
728 recursive functions.  Message sending on name~$x$ is performed by
729 stacking message values and then calling the code for name~$x$.  This
730 code is retrieved by dereferencing twice the infix pointer that
731 represents~$x$ at runtime.
732
733
734 However, there is a big difference between mutually recursive
735 functions and join definitions.  The code for name~$x$ is automaton
736 code that reacts to the arrival of a new message on that name.  The
737 compiler issues various versions of name code, one per possible status
738 of the definition that defines~$x$.  Typically, name code either saves
739 a message into the queue for~$x$ (in the non-matching case), or
740 retrieves messages from other queues before firing a guarded process
741 (in the matching case). In all cases, definition status may then
742 need an update, which is performed by updating all code entries in
743 the definition.
744
745
746 \subsection{Definitions in {\tt jocaml}}
747
748 In the {\tt jocaml} system, a name is a pointer to a definition plus
749 an index.  Definitions are still vectors structures, but there is only
750 one entry per name for message queues. Additionally, definitions hold
751 guarded closures (i.e. guarded process code plus free variable
752 values), a status field and a matching data structure.
753
754 Status field holds the current status of the definition as a
755 bit-field.  Each name status is encoded by a bit, using bit 1 for
756 \status{N} and bit 0 for \status{0}, and bit position is given by name
757 index.
758 %  The matching data structure is an array of maps from statuses to
759 %  clause indexes.
760
761 Message sending is performed by calling a generic C~function from the
762 join'' library, taking message value, a definition pointer and a
763 name index as arguments.  When a message is received on name~$x$, the
764 bit for~$x$ is checked in the current status bit-field. If the bit is
765 set, some messages on name~$x$ are already present. Thus, definition
766 status does not change. Since the current status before message
767 sending is guaranteed to be a non-matching one, the message is queued
768 and the function is exited.
769
770 Otherwise, the current definition status is searched in the matching
771 structure for~$x$. This matching structure is an array of pattern
772 encoding, guarded process index pairs. Pattern encodings are
773 bit-fields, just like status encodings.
774  corresponding to a join pattern containing name~$x$, from which name~$x$ has been removed. Using a sequential
775 search by a bitwise "and" with each pattern encoding, the current
776 status can be identified as matching or non-matching in at most $N_x$
777 tests, where $N_x$ is the number of clauses whose pattern contains~$x$.
778
779 If no match is found, the automaton state is updated and the message
780 value is queued in the queue for $x$.  Otherwise, a guarded process
781 index has been found, and is used to retrieve the associated guarded
782 closure.  Arguments to the guarded process are extracted from the
783 queues identified by the matching status found.  Status is updated at
784 the same moment (when a queue becomes empty a bit is erased).  Finally
785 the guarded process is fired.
786
787 Therefore, the performance of this technique much relies on fast
788 comparisons and modifications of definition statuses.  The best result
789 is achieved when statuses are encoded by machine integers.  In that
790 case, the number of names that a definition can define is limited by
791 the integer size of the hoisting Objective Caml system (which
792 typically is 31 or 63~bits).  If this is not considered enough, then
793 statuses have to be encoded using several integers or one string. Both
794 kinds of status encodings can be mixed, using integers for small
795 definitions and strings for larger ones. However, in the current {\tt
796   jocaml} system, we use a single integer to hold status, and a
797 technique (described in section~\ref{fusion}) is used to associate
798 several channels to a same bit in the status bit-field.
799
800 \section{The pragmatics of compilation}\label{opt}
801 \label{optimization}
802
803 This section is dedicated to optimizations that are first pertinent
804 for the {\tt join} technique and that  are performed by the
805 current version of the {\tt join} compiler.
806
807 We first introduce optimizations that improve the runtime behavior of
808 programs, both in speed and dynamic memory usage. Then, we show how to
809 reduce emitted code size.
810 We focus on optimizing definitions written in
811 object-oriented style, as described in the tutorial~\cite{Join}. As
812 this programming style proved quite frequent,
813 it is normal for us compiler writers to concentrate
814 our efforts on such definitions.
815
816 In this style, a definition is an objet. Object state is encoded by
817 asynchronous {\em state names}, while synchronous {\em methods} access
818 or modify object state.
819 For instance, given one state name $S$ and $n$ methods
820 $m_1$, $m_2$,\ldots $m_n$ taken in that order, we get:
821 \begin{verbatim}
822 let create(x_0) =
823   let S(x) | m_1() = P_1(x)
824   and S(x) | m_2() = P_2(x)
825     ....
826   and S(x) | m_n() = P_n(x) in
827   S(x_0) | reply m_1,m_2,...,m_n to create
828 ;;
829 \end{verbatim}
830 \noindent The synchronous call \verb+create(v)+ creates a new
831 object (i.e. a new $S$, $m_1$, $m_2$,\ldots $m_n$ definition)
832 and then sends back a $n$-tuple of its methods. Moreover, this call
833 initializes object state with the value~$v$.
834
835
836 \subsection{Refined status}
837 As a working example of an object-style definition, consider the
839 \begin{verbatim}
840 let create(x_0) =
841   let S(x) | get() = S(x) | reply x to get
842   and S(x) | add(y) = S(x+y) | reply to add in
843   S(x_0) | reply get,add to create
844 ;;
845 \end{verbatim}
846
847 The adder has one state name $S$ and two methods $get$ and $add$.  We
848 then try to figure out some normal'' runtime behavior for it.  As
849 the initial \verb+S(x_0)+ is forked as soon as the adder definition
850 has been created, a highly likely initial situation is that there is
851 one message pending on $S$ and none on the other names.  Later on,
852 as some external agent invokes the $get$ or $add$ method, the message
853 pending on $S$ is consumed and the appropriate guarded process is
854 fired.  Either process quickly sends a message on $S$.  Thus, a likely
855 behavior is for the queue of $S$ to alternate between being empty and
856 holding one element, the queue being empty for short periods.  By some
857 aspects of the compilation of \verb+|+'' and of our scheduling
858 policy that we will not examine here, this behavior is almost certain.
859
860 As a matter of fact, this normal'' life cycle involves a blatant
861 waste of memory, queue elements (cons cells) are allocated and
862 deallocated in the general dynamic fashion, while the runtime usage of
863 these cells would allow a more efficient policy.  It is more clever
864 not to allocate a cell for the only message pending on $S$ and to use
865 the queue entry attributed to $S$~in the runtime definition as a
866 placeholder.  On the status side, this new situation is rendered by a
867 new \status{1}'' status.  Hence, $S$ now possesses a three valued
868 status: \status{0} (no message), \status{1} (one message in the queue
869 slot) or \status{N} (some messages organized in a linked list).  Thus,
870 assuming for the time being, that there may be an arbitrary number of
871 messages pending on $S$, the adder compiles into the automaton of
872 figure~\ref{adder} (adder names are taken in the order $S$, $get$,
873 $add$). This new automaton can be seen as an evolution of the $A$,
874 $B$, $C$ automaton of figure~\ref{abc}, with a slight change in
875 channel names.
876
877 \begin{figure}[th]
878 \begin{toimage}
879 .PS
880 [
882 hu=3 *cm
883 vu=3.5 *cm
884 clab("\status{000}")
885 {ln(-1,-1,"$get$",-1) ; clab("\status{0N0}")
886   tourccw(-9*pi/10,"$get$")
887   tourccw(pi/2,"$S$",dotted)
888   {alncw(pi/4,-3*pi/4,1,1,"$S$",dotted)}
889   {ln(1,-1,"$add$",1) ; clab("\status{0NN}")
890     {alncw(3*pi/4,-pi/4,-1,1,"$S$",dashed)}
891     {aln(pi/4,-3*pi/4,1,1,"$S$",dotted)}
892     htccw(pi/3,-pi/2-pi/6,"$\begin{array}{l} get\\ add\end{array}$")
893     tourccw(-5*pi/12,"$S$",dotted)
894     tourcw(-7*pi/12,"$S$",dashed)
895 }}
896 {
897   ln(0,-.67,"$S$",0) ; clab("\status{100}")
898   {alncw(pi/2,-pi/2-pi/12,0,0.67,"$get$",dotted)}
899   {aln(pi/2,-pi/2+pi/12,0,0.67,"$add$",dashed)}
900   ln(0,-.67,"$S$",0) ; clab("\status{N00}")
901   tourcw(5*pi/12,"$add$",dashed)
902   tourccw(-3*pi/4,"$S$")
903   tourccw(-17*pi/12,"$get$",dotted)
904   {alncw(pi/2,-pi/2-pi/6,0,1.33,"$get$",dotted)}
905   {aln(pi/2,-pi/2+pi/6,0,1.33,"$add$",dashed)}
906 }
907 {ln(1,-1,"$add$",1) ; clab("\status{00N}") ;
908   tourcw(-pi/10,"$add$")
909   tourcw(pi/2,"$S$",dashed)
910   {aln(3*pi/4,-pi/4,-1,1,"$S$",dashed)}
911   {ln(-1,-1,"$get$",-1)}}
912 ]
913 .PE
914 \end{toimage}
915 \showgraph
916 \caption{\label{adder} Full automaton for the adder}
917 \end{figure}
918
919 Using the status~\status{1} not only spares memory, it also avoids
920 some of the runtime tests that compute post-matching status.
921 Basically, when a matching consumes the sole message pending on a name
922 with status~\status{1}, then the automaton already knows that this
923 name queue is empty.  For instance, when the automaton of
924 figure~\ref{adder} is in the \status{100} status and that a message
925 arrive on either one of the two methods, then the appropriate process
926 is fired and status goes back to \status{000} without any runtime
927 test.  By contrast, when the automaton is in the \status{00N} status
928 and that a message arrive on $S$, the second guarded process is fired
929 immediately, but a test on $add$ queue is then performed: if this
930 queue is now empty then status goes back to \status{000}, otherwise
931 status remains \status{00N}.  Receiving a message on $S$ when status
932 is \status{0NN} is a bit more complicated. First, the automaton
933 chooses to consume a message pending on either one of the two methods
934 and to fire the appropriate process (figure~\ref{adder} does not
935 specify this choice).  Then, the queue of the selected method has to
936 be tested, in order to determine post-matching status.
937
938 Status \status{1} is easy to implement using the {\tt join}
939 compilation technique. The compiler issues different method codes for
940 \status{100} and \status{N00}, and different codes can find
941 $S$~argument at different places.  Implementing status \status{1} in
942 {\tt jocaml} would be more tricky, since the encoding of states using
943 bit-patterns would be far less straightforward than with
944 \status{0}/\status{N} statuses only.  As a consequence, the {\tt
945   jocaml} compiler does not perform the space optimization described
946 in this section.
947
948 \subsection{Taking advantage of semantical analysis}
949
950 The automaton of figure~\ref{adder} has a \status{N00} status, to
951 reflect the case when there are two messages or more pending on $S$.
952 However, one quite easily sees that that status \status{N00} is
953 useless.  First, as $S$~does not escape from the scope of its
954 definition, message sending on~$S$ is performed at three places only:
955 once initially (by \verb+S(x_0)+) and once in each guarded process.
956 Thus, there is one message pending on $S$ initially.  A single message
957 pending on~$S$ is consumed by any match and the process fired on that
958 occasion is the only one to send one message on $S$. Therefore, there
959 cannot be two messages or more pending on $S$. As a consequence the
960 full automaton can be simplified by suppressing the \status{N00} node
961 and every edge that starts from it or leads to it.
962
963 In particular, there is no more $S$-labeled edge starting from node
964 \status{100}. In the {\tt join} implementation this means that
965 the code entry for $S$ needs not be updated when going from status
966 \status{000} to \status{100}. This entry is simply left as it is.
967 Symmetrically, the code entry for $S$ will not have to be restored when
968 status goes back to \status{000} after matching.
969
970
971
972
973 Another important usage of semantical analysis is determining which names
974 are state names.  For a given definition, the output of the
975 analyzer is a status set~${\cal S}$, which is a safe approximation of
976 the actual runtime statuses of that definition.  State names are the
977 asynchronous names such that all statuses in ${\cal S}$ give them the
978 status \status{0}~or~\status{1}.
979
980 The current {\tt join} compiler includes a rudimentary
981 name usage analyzer, which suffices for object definitions given in
982 the style of
983 the $S$, $m_1$, $m_2$, \ldots, $m_n$ definitions , where all state
984 variables are asynchronous and do not escape from the scope of their
985 definition.
986
987 An promising alternative would be to design an ad hoc syntax for
988 distributed objects, or, and this would be more ambitious, a full
989 object-oriented join-calculus. Then, the state variables of
990 object-definitions would be apparent directly from user programs.
991
992
993
994 \subsection{Avoiding status space explosion}\label{space}
995
996 Consider any definition that defines $n$ names.  Ignoring
997 \status{1}~statuses, the size of the status space of a given
998 definition already is $2^n$.  The size of the non-matching status
999 space is thus bounded by $2^n$.  As a full automaton for this
1000 definition has one state per non-matching status, status space size
1001 explosion would be a real nuisance in the case of the {\tt join}
1002 compiler.  In particular, there are $n$~times the number of
1003 non-matching statuses automaton code entries to create.
1004
1005 Unfortunately the exponential upper bound is reached by practical
1006 programs, as demonstrated by the general object-oriented definition
1007 given at the beginning of this section~\ref{opt}.  In that case, all
1008 definition statuses such that $S$ has the \status{0} status are
1009 non-matching.  In such a situation, using runtime testing, as {\tt
1010   jocaml} does, is not that much a penalty, when compared to code size
1011 explosion.
1012
1013 We thus introduce dynamic behavior in the automata.  We do so on a
1014 name per name basis: the status of state names will be encoded by
1015 automata states as before, whereas method statuses will now be
1016 explicitly checked at runtime.  Thus, we introduce \status{?}'', a
1017 new status, which means that nothing is known about the number of
1018 messages pending on a name.  Additionally, we state that all methods
1019 will have the \status{?} status, as soon as there is one message or
1020 more pending on any of the methods.
1021
1022 This technique can be seen as merging some states of
1023 the full automaton compiled by considering complete status
1024 information into new states with \status{?} statuses in them.
1025
1026 For instance, in the adder example, we get the automaton of
1027 figure~\ref{adder2}, where the three statuses \status{0N0},
1028 \status{0NN} and \status{00N} of figure~\ref{adder} merge into the new
1029 status~\status{0??}.  (Note that we also take advantage of name usage
1030 analysis to delete status \status{N00}.)
1031
1032 \begin{figure}[th]
1033 \begin{toimage}
1034 .PS
1035 clab("\status{000}")
1036
1037 {{iln(-.4,-.4) ; "$\begin{array}{l}get\\add\\[.5em]\end{array}$"}
1038  ln(-1,-1,"",0) ; clab("\status{0??}")
1039  tourccw(pi,"$\begin{array}{l}get\\add\\[.5em]\end{array}$")
1040  tourcw(pi/4-pi/12,"$S$",dashed)
1041  tourccw(pi/4+pi/12,"$S$",dotted)
1042  {alncw(pi/4,-3*pi/4-pi/12,1,1,"$S$",dotted)}
1043  {aln(pi/4,-3*pi/4+pi/12,1,1,"$S$",dashed)}
1044 }
1045
1046 {ln(1,-1,"$S$",0)
1047    clab("\status{100}")
1048    {aln(3*pi/4,-pi/4+pi/12,-1,1,"$add$",dashed)}
1049    {alncw(3*pi/4,-pi/4-pi/12,-1,1,"$get$",dotted)}
1050 }
1051 .PE
1052 \end{toimage}
1053 \showgraph
1054 \caption{\label{adder2} Final automaton for the adder}
1055 \end{figure}
1056
1057 Information on where runtime testing has to be performed can be
1058 inferred from the diagram of figure~\ref{adder2}.  For instance,
1059 assume that current status is \status{0??} and that a message arrives
1060 on~$S$.  Since there is at least one message pending on a method, a
1061 matching will occur.  Tests are needed though, before matching to
1062 determine the matching clause, and after matching to determine
1063 post-matching status.  Abstractly, the first series of tests changes
1064 the \status{?}  statuses in either \status{0} or \status{N} statuses,
1065 while the second series checks if there are still messages pending on
1066 names with \status{?} status.  We are still investigating on how to
1067 organize these tests efficiently without producing too much code
1068 (see~\cite{Augustsson85,Maranget92} for a discussion of the size of
1069 such code in the context of compiling ML~pattern-matching).
1070
1071 By contrast, when status is \status{100} and that a message arrives on
1072 $get$ or $add$, then the corresponding matching is known immediately
1073 and the message pending on $S$ is consumed. Then, the queue for $S$ is
1074 known to be empty and status can be restored to \status{000} with no
1075 runtime testing at all.  As message arrival order is likely to be
1076 first one message on $S$ and then one message on $get$ or $add$ the
1077 final automaton of figure~\ref{adder2} responds efficiently to more
1078 frequent case, still being able to respond to less frequent cases (for
1079 instance, two messages on methods may arrive in a row).  Furthermore,
1080 when trouble is over, the automaton has status \status{000} and is
1081 thus ready for the normal case. In this example, a penalty in code
1082 size is paid for improving code speed in the frequent, normal''
1083 case, whereas this penalty is avoided in non-frequent cases, which are
1084 treated by less efficient code.
1085
1086 We introduced a \status{?} status on a name per name basis.  However,
1087 there are other choices possible: {\em a priori}, there are many ways
1088 to merge full automata states into final automata states. However, if
1089 one really wants to avoid status space explosion the final automaton
1090 should be constructed directly, without first constructing the full
1091 automaton.  Adopting our per name \status{?} status makes this direct
1092 construction possible. Additionally, the \status{?} status can be used
1093 by the simple static analyzer as a status for the names it cannot
1094 trace (e.g. names that escape their definition scope).  This
1095 dramatically decreases the size of analyzer output and its running
1096 time.
1097
1098 \section{Optimizing further}
1099 \label{fusion}
1100
1101 We here describe a simple transformation on
1102 join definitions, which does not rely on a full semantical analysis
1103 (such as name usage analysis),
1104 but only on a simple, local, syntactical analysis of join-patterns.
1105
1106
1107 Let us take a simple example:
1108 \begin{verbatim}
1109 let S(x) | m_1(y) = P_1(x,y)
1110 and S(x) | m_2(y) = P_2(x,y)
1111   ....
1112 and S(x) | m_n(y) = P_n(x,y)
1113 ;;
1114 \end{verbatim}
1115
1116 In this example, a match occurs only when there are messages pending
1117 both on $S$ and on one of the methods $m_1$, $m_2$,\ldots Thus, from the
1118 synchronization
1119 point of view, all the methods are equivalent. And indeed, we can
1120 regroup them into one, single method channel $m$ by
1121 transforming this join definition into:
1122
1123 \begin{verbatim}
1124 let S(x) | m(p,y) = P[p] (x,y);;
1125 let m_1(y) = m(1,y);;
1126 let m_2(y) = m(2,y);;
1127  ....
1128 let m_n(y) = m(n,y);;
1129 \end{verbatim}
1130 Where $P$ is the vector of processes $[P_1,P_2,...,P_n]$.
1131
1132 Methods $m_1$ $m_2$,\ldots are now simple wrappers.  Method $m_i$ now
1133 calls $m$ with an additional $i$~argument, which basically is the
1134 index of $P_i$ in array $P$.  At this point, we must emphasize that we
1135 describe this technique as a source to source transformation only for
1136 clarity. However, the produced source code may not be correct with
1137 respect to the join type system, when the types of methods are
1138 different.  Anyhow, this optimization is implemented using ad hoc
1139 mechanisms, this both improves efficiency and solves the typing
1140 problem.
1141
1142 Currently, this optimization is performed by the {\tt jocaml}
1143 compiler.  This leads to a new interpretation of name indexes by the
1144 join'' library.  The least significant bits in name indexes still
1145 encode names that actually take part to synchronization (here $S$ and
1146 $m$), while their most significant bits (which were previously unused)
1147 now encode the extra $i$~argument.  This yields two benefits. First,
1148 the number of status checks decreases, as the number of matching
1149 statuses decreases.  Second, the number of channels that can be
1150 defined in one definition can now exceed the hosting system integer
1151 size, provided some names can be grouped together for synchronization.
1152
1153 In the {\tt join} compiler, this technique might be used to reduce
1154 automata size, since it lowers the number of non-matching statuses, by
1155 reducing the number of synchronizing names.  Code entries for methods
1156 $m_1$, $m_2$,\ldots would still be contained in the definition
1157 structure, they would only stack the index of the process to fire, and
1158 then call the code for method $m$. Moreover, they do not need to be
1159 updated after each transition of the automaton.
1160
1161
1162 Finally, this technique can also be applied to more complex
1163 synchronizations.  Given a definition that defines names $x_1$, $x_2$,
1164 \ldots, $x_n$, using patterns $J_1$, $J_2$, \ldots $J_m$.  We say that
1165 two names are equivalent, when swapping them in the patterns yields
1166 the same set of patterns.  We then replace equivalent names by a
1167 single name, plus an index.
1168
1169 Consider the following definition
1170 \begin{verbatim}
1171 let S_1(x) | m_1(y) = P_1(x,y)
1172 and S_1(x) | m_2(y) = P_2(x,y)
1173 and S_2(x) | m_1(y) = P_3(x,y)
1174 and S_2(x) | m_2(y) = P_4(x,y)
1175 ;;
1176 \end{verbatim}
1177 Then the set of defined names $\{S_1, S_2, m_1, m_2\}$ can be
1178 partitioned into $\{S_1, S_2\}$ and $\{m_1, m_2\}$.
1179 Then, the above program can be transformed into:
1180 \begin{verbatim}
1181 let S(p,x) | m(q,y) = P[p,q] (x,y);;
1182 let m_1(y) = m(1,y);;
1183 let m_2(y) = m(2,y);;
1184 let S_1(y) = S(1,y);;
1185 let S_2(y) = S(2,y);;
1186 \end{verbatim}
1187 (with $P[1,1] = P_1$, $P[1,2] = P_2$, $P[2,1] = P_3$ and $P[2,2] = P_4$)
1188
1189
1190 \section{Conclusion and future work}\cutname{conclusion.html}
1191 \label{conclusion}
1192
1193 In join-calculus, a name definition, all receptors on that name and all
1194 possible synchronizations on that name are grouped altogether in a single
1195 join definition. This enables the compilation of synchronization
1196 between concurrent or even distributed processes, using finite-state
1197 automata.
1198 In the distributed case, a message transport phase to the
1199 machine that currently hosts the join definition (and hence the
1200 automaton) is first performed.  This strengthens our point of view
1201 that the join-calculus is the core of a distributed programming
1202 language that can be compiled in practice, mainly because it restricts
1203 reception on a channel to statically known parts of the program.  The
1204 same argument applied to \a la ML polymorphic typing
1205 in~\cite{join-typing-97}.
1206
1207 \begin{cutflow}{Benchmarks}\cutname{benchmarks.html}
1208 \aname{bench}{}
1209 \begin{table}[ht]
1210 \begin{center}
1212 & \texttt{fib} &
1213 \texttt{afib} &
1214 \texttt{pat} &
1215 \texttt{qsort} &
1216 \texttt{count}\\ \hline
1217 \texttt{join} &  32.0 & 14.5 & 37.2 & 9.9 & 16.4 \\
1218 \texttt{jocaml} & 5.7 &  3.5 & 5.4 & 1.4 & 4.2 \\
1219 \texttt{Bologna} & 11.9 & 6.2 &  9.4 & 16.8 & 5.3 \\
1220 \end{tabular}
1221 \end{center}
1222 \caption{\label{perf}Some performance measures (wall-clock time, in seconds)}
1223 \end{table}
1224 Benckmarks sources are available on the \footahref{http://join.inria.fr/speed/}{web}
1225 \end{cutflow}
1226 Taking a few benchmarks (see table~\ref{perf}, or \ahrefloc{bench}{here})
1227 as a set of sensible join programs, both the
1228 {\tt join} and the {\tt jocaml} pattern matching compilation schemes
1229 prove adequate.  In particular, none of the two schemes falls into the
1230 pitfall associated to the compilation technique used.
1231
1232 In the {\tt join} case, one can be afraid of code size, the technique
1233 exposed in section~\ref{space} successfully avoids code size explosion
1234 in practical cases.  The {\tt jocaml} technique appears expensive in
1235 runtime checks and thus {\em a priori} produces slow code.  We choose
1236 such a scheme of implementing automata using generic code, because it
1237 can be implemented simply by adding code to the Objective Caml
1238 bytecode interpreter.  Using bytecode specialized for automata
1239 manipulation would have implied more important modifications of the
1240 Objective Caml bytecode interpreter. Moreover, the {\tt jocaml} system
1241 runs faster than the {\tt join} system, even for pure join programs,
1242 showing that its weaker compilation of join definitions is more than
1243 compensated by its total integration in the Objective Caml system.
1244
1245 Comparison with the Bologna implementation~\cite{Bologna} of the
1246 join-calculus is also instructive.  This system also produces
1247 bytecode, which is interpreted by a C~program.  It proves faster than
1248 {\tt join} and slower that {\tt jocaml} on most of the examples.
1249 Taking a glance at the Bologna source code reveals that it uses a
1250 scheme very similar to the one of {\tt jocaml}, with two slight
1251 differences.  First, status is systematically encoded as an array of
1252 integers.  Second when a message arrives on a name $x$ with an empty
1253 queue, all patterns are tested (whereas in {\tt jocaml} only the
1254 patterns that contain $x$ are tested).
1255
1256 Performance of a given join system depends on many factors. In
1257 particular, scheduling policy and message queue management have a
1258 dramatic impact on it. Furthermore, a policy that gives good results
1259 on one benchmark can be defeated by another.  For these reasons, we
1260 cannot tell which compilation technique is the best by comparing
1261 different implementations.
1262
1263
1264
1265 Clearly, we now need to integrate all our compilation techniques in
1266 the same system, in order to compare them more thoroughly and to
1267 experiment further.  However, the definition of reactivity status and
1268 the automata of section~\ref{automatas} provide a sound
1269 basis for these future investigations.  Apart from future language
1270 development and fully implementing the failure semantics of the
1271 join-calculus, we also plan to investigate more on the implementation
1272 of threads, attempting to minimize thread suspension and creation.
1273
1274
1275
1276 \bibliography{bib}
1277 \bibliographystyle{abbrv}
1278
1279
1280 \end{document}
1281
1282 %\subsection{Playing with the runtime scheduler}
1283 %
1284 %Concentrating on the {\it object style} of programming, lots of
1285 %methods do not modify the automaton status, but only read or modify
1286 %state names (such as {\tt set} and {\tt get}), with only little
1287 %non-blocking computations. Most of the time, these methods are never
1288 %interrupted, and thus behave as in critical sections from the
1289 %threads scheduler point of view.
1290 %
1291 %As a consequence, when such a method is called, an optimization would
1292 %be not to modify the automaton status at all, since the method will
1293 %end by giving back its original status to the automaton. To prevent
1294 %the automaton to receive other messages from other methods while in an
1295 %incoherent state, these methods are executed in a critical section.
1296 %
1297 %This optimization relies on a simple analysis of methods, which can
1298 %also be used to decrease the number of forked threads, since such
1299 %methods do not need to be executed in a forked thread.
`