As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) TeX and LaTeX source code syntax highlighting (style: standard) with prefixed line numbers.
Alternatively you can here view or download the uninterpreted source code file.

1 \documentclass{article} 2 \usepackage{fullpage} 3 \usepackage{hevea} 4 5 \addtolength{\topmargin}{-1cm} 6 \input{pat.def} 7 8 9 \def\showgraph{\par\medskip\centerline{\begin{toimage} 10 \box\graph 11 \end{toimage} 12 \imageflush}\medskip} 13 \def\status#1{{\tt #1}} 14 15 \title{Compiling Join-Patterns 16 \footnote{ 17 This work is partly supported by the ESPRIT 18 CONFER-2 WG-21836}} 19 \author {Luc Maranget \and Fabrice 20 Le~Fessant\\[.5em] 21 INRIA Rocquencourt, 22 BP 105, 78153 Le Chesnay Cedex France. 23 \\[.5em] 24 {\tt\small \{Luc.Maranget, Fabrice.Le-Fessant\}@inria.fr} 25 } 26 27 \date {} 28 29 30 \htmlfoot{\rule{}{}\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}} 31 \htmlhead{\begin{center}DVI version \ahrefurl{pat.dvi}\end{center}\rule{}{}} 32 \toplinks{suite.html}{index.html}{env.html} 33 \begin{document} 34 35 36 37 \maketitle 38 \thispagestyle{empty} 39 40 \begin{abstract} 41 42 The join-calculus is both a name passing calculus and a core 43 language for concurrent and distributed programming. An essential 44 part of its implementation is the compilation of {\em 45 join-patterns}. Join-patterns define new channels and all the 46 synchronizations they take part to at the same time. Relying on the 47 experience based on our two implementations, we study the 48 translation of join-patterns into deterministic finite-state 49 automata as well as some related optimizations. 50 51 52 \end{abstract} 53 \tableofcontents 54 55 \section {Introduction} 56 57 %%Baratin 58 Join-pattern is the distinctive feature of the join-calculus, seen 59 both as a process calculus and as a programming language. On the 60 calculus side, join-calculus can roughly be seen as a functional 61 calculus plus join-patterns, thus achieving the same expressive power 62 as previous name-passing process calculi~\cite{Milner-pi-calculus}. 63 Join-definitions are made of several clauses, each clause being a pair 64 of a join-pattern and of a guarded process. A join-pattern expresses 65 a synchronization between several {\em names} (or channels). When 66 messages are pending on all the names that appear in a given 67 join-pattern, then the corresponding clause is said to be active and 68 its guarded process {\em may} be fired. A definition whose 69 join-patterns share some names expresses sophisticated 70 synchronizations. In such a definition, a message on a name that 71 appears in several active clauses is consumed as soon as one of the 72 corresponding guarded processes is fired. 73 74 Join-languages are built on top of the join-calculus taken as a core 75 language. Therefore, names are first-class citizens, computations are 76 first abstracted as collections of asynchronous processes, and 77 join-patterns provide an unique, clear and powerful mechanism for 78 synchronizing these computations. The documentation for the 79 join-language~\cite{Join} includes a tutorial that shows how join 80 definitions may encode classical synchronization constructs such as 81 locks, barriers, shared counters,\ldots 82 83 %%%Conclusion de l'intro 84 On the implementation side, join-patterns are meant to be heavily used 85 by programmers, as the only synchronization primitive available. 86 Thus, their compilation requires much care. At the moment, we propose 87 two compilers: the {\tt join} compiler~\cite{Join}, a language of its 88 own, and the {\tt jocaml} compiler~\cite{Jocaml}, an extension of the 89 Objective Caml functional language. 90 91 %%% Plan 92 Section~\ref{join-calculus} of this paper succinctly presents the 93 join-calculus syntax and semantics. Then, section~\ref{automatas} 94 introduces the kind of automata we use to compile 95 join-synchronization, while section~\ref{implementation} presents two 96 techniques for implementing them. The first technique directly 97 derives from automata description and is used in our {\tt join} 98 compiler. The second technique performs some extra runtime tests, this 99 is the technique used in our {\tt jocaml} compiler. 100 Sections~\ref{optimization} and~\ref{fusion} discuss 101 optimizations and section~\ref{conclusion} concludes. 102 103 104 105 106 \section{A rapid tour of the join-calculus} 107 \label{join-calculus} 108 \subsection {Syntax} 109 110 We here rephrase the traditional presentation of the core 111 join-calculus~\cite{FournetGonthier96}, where names are 112 the only value. Thus, we ignore the issue of system primitives and 113 constants, since 114 names provide sufficient expressive power for our purpose of 115 describing our implementation of pattern matching (However we use 116 primitives and constants in our examples). 117 We slightly change the syntax of~\cite{FournetGonthier96}, in order to 118 match the one of 119 the {\tt join} programming language. 120 121 We use $x$ to denote a name in general. 122 $$ 123 \begin{array}{l@{\hspace{2cm}}l} 124 \begin{array}{rcl} 125 P & ::= & x\tuple{x_i\inu p}\\ 126 & \mid & \Def D in P\\ 127 & \mid & P | P 128 \end{array} 129 & 130 \begin{array}{rcl} 131 D & ::= & J |> P \\ 132 & \mid & D \And D\\[1em] 133 134 J & ::= & x\tuple {x_i\inu p}\\ 135 & \mid & J | J 136 \end{array} 137 \end{array} 138 $$ 139 140 A process $P$ is either a message, a defining process, or a parallel 141 composition of processes (note that names are polyadic, meaning that 142 messages may be made of several values); 143 % 144 a definition $D$ consists of one or several clauses $J = P$ that 145 associate a guarded process $P$ to a specific message pattern $J$; 146 % 147 a join-pattern $J$ consists of one or several messages in parallel. We 148 say that the pattern $J = \ldots x\tuple {x_i \inu p}\ldots$ 149 defines the name $x$ and that a definition defines the set of the names 150 defined by its patterns. 151 Moreover, patterns are linear, i.e. names may appear at most once in 152 a given pattern. 153 154 Processes and definitions are known modulo renaming of bound 155 variables, as substitution performs $\alpha$-conversion to avoid 156 captures. 157 158 \subsection{Semantics} 159 160 161 This semantics is specified as a reflexive chemical abstract machine 162 (RCHAM)~\cite{BeBo92}. The state of the computation 163 is a {\em chemical soup} $\dd \tth \pp$ that consists of two 164 multisets: active definitions~$\dd$ and running processes~$\pp$. 165 166 The chemical soup evolves according to two families of rules: {\em 167 Structural rules} $\redstruct$ are reversible ($\heat$ is heating, 168 $\cool$ is cooling); they represent the syntactical rearrangement of 169 terms (heating breaks terms into smaller ones, cooling builds larger 170 terms from their components). {\em Reduction rules} $\redsoupe$ 171 consume specific processes present in the soup, replacing them by some 172 others; they are the basic computation steps. 173 % 174 175 We present simplified chemical rules 176 (see~\cite{FournetGonthier96,FGLMR96_Calculus_mobile_agent} for 177 the complete set of rules). 178 Following the chemical tradition, every rule applies on any matching 179 subpart of the soup, non-matching sub-parts of the soup being left 180 implicit. 181 $$ 182 \let \N \name 183 \let \su \varphi 184 \begin{array}{rl@{~}c@{~}rl@{\quad}l} 185 &\tth P_1 | P_2 &\redstruct& &\tth P_1, P_2 &\mbox{S-Par}\\ 186 D_1 \And D_2 &\tth &\redstruct& D_ 1, D_2 &\tth &\mbox{S-And}\\ 187 &\tth \Def D in P &\redstruct& D &\tth P 188 &\mbox{S-Def} \\[1em] 189 J |> P &\tth \su (J) &\redsoupe & J |> P &\tth \su (P) 190 &\mbox{R-$\beta$} 191 \end{array} 192 $$ 193 Two of the rules above have side conditions: 194 \begin{itemize} 195 \item (\name{S-Def}) 196 the names defined in $D$ must not appear anywhere 197 in solution but in the reacting process and definition~$D$ and $P$. 198 This condition is global; in combination with $\alpha$-renaming it 199 enforces lexical scoping. 200 201 \item (\name{R-$\beta$}) $\varphi (\cdot)$ substitute actual names for 202 the received variables in $J$ and $P$. 203 204 \end{itemize} 205 Additionally, we only consider well-typed terms and reductions. 206 See~\cite{join-typing-97} for details on a rich polymorphic type system 207 for the join calculus. Here, this mostly amounts to assuming that 208 message and name arity always agree. 209 210 211 In this paper, we take particular interest in the reduction (\name{R-$\beta$}). 212 Informally, when there are messages pending on all the names defined 213 in a given join-pattern, then the process guarded by this join-pattern 214 may be fired. When firing is performed, we say that a {\em matching} occurs. 215 On the semantics level, there is a message $\tuple{x_i\inu p}$ pending 216 on a name~$x$ when there is an active molecule 217 ~$x\tuple {x_i\inu p}$ in the chemical soup. 218 219 Thus, we may define the reactivity status of a given chemical soup as 220 the multiset of the active molecules in it. 221 Later on in this paper, we shall consider various abstractions of this 222 reactivity status. 223 224 225 \subsection{The join programming languages} 226 227 Apart from primitives, join-languages support {\em synchronous} 228 names, which the core join-calculus does not provide directly. 229 Synchronous names send back results, a bit like functions. 230 However synchronous names may engage in any kind of 231 join-synchronization, just as asynchronous names do. 232 A program written using synchronous names can be translated 233 into the core join-calculus alone. 234 The translation is analogous to continuation passing style 235 transformation in the $\lambda$-calculus. 236 In our implementation, as far as pattern matching is concerned, 237 a synchronous name 238 behave like it was asynchronous and carried one additional continuation 239 argument. All implementation difficulties concentrate in managing this extra 240 argument, whose presence had no effect on pattern matching itself. 241 242 243 244 The {\tt join} language~\cite{Join} is our first prototype. All 245 examples in this paper are in {\tt join} syntax. The system consists 246 in a bytecode compiler and a bytecode interpreter. Both compiler and 247 interpreter are Objective Caml~\cite{Objective-Caml} programs and it 248 is easy to lift Objective Caml data types and functions into {\tt 249 join} abstract data types and primitives. For instance, {\tt join} 250 programs easily draw graphics, using the graphics Objective Caml 251 library. As a consequence, {\tt join} can be seen either as a 252 language of its own, featuring many primitives, or as a distributed 253 layer on top of Objective Caml. Continuations are encoded using 254 ad hoc threads, which are created and scheduled by the {\tt join} 255 bytecode interpreter. 256 257 258 The {\tt jocaml} system is our second implementation. In {\tt jocaml}, 259 all join-calculus constructs for concurrency, communication, 260 synchronization and process mobility are directly available as 261 syntactical extensions to Objective Caml. On the runtime environment 262 side, we have supplemented the original Objective Caml runtime system 263 (which already provides a thread library) with a special ``join'' 264 library and a distributed garbage collector~\cite{GCDistri}. On the 265 compiler side, the Objective Caml compiler has been extended to 266 translate join-calculus source code into functions calls to the 267 ``join'' library. However, we also introduced a few new instructions 268 to Objective Caml bytecode, but only to handle code mobility, a 269 feature orthogonal to pattern matching. The {\tt jocaml} system is 270 currently available as a prototype version~\cite{Jocaml}. 271 272 \section{Pattern matching in join definitions} 273 \label{automatas} 274 \subsection{Principle} 275 276 Consider the following join definition: 277 \begin{verbatim} 278 let A(n) | B() = P(n) 279 and A(n) | C() = Q(n) 280 ;; 281 \end{verbatim} 282 This defines three names $A$, $B$ and $C$. Name $A$ has arity one, 283 whereas names $B$ and $C$ have arity zero. Names may be synchronous 284 or asynchronous, depending on whether there are 285 \verb+reply ... to ...+ 286 constructs applying to them inside the guarded processes~$P(n)$ 287 and $Q(n)$ or not. 288 289 According to the general join-calculus semantics, the guarded process 290 $P(n)$ may be fired whenever there are some messages pending on $A$ and 291 $B$. Similarly, $Q(n)$ may be fired whenever there are some messages 292 pending on $A$ and $C$. In both cases, the formal parameter $n$ is 293 replaced by (or bound to in the implementation) one of the messages 294 pending on $A$. 295 296 Reactivity information is to be considered at the definition level, 297 since matching is indeed performed at this level. Moreover, in order 298 to use finite-state automata, we want this information to range on a 299 finite set of possible values. As far as matching is concerned and by 300 the linearity of patterns, only the presence or absence of messages 301 matters. Thus, let us call \status{0} the status of a name without 302 any message pending, and \status{N} the status of a name with at least 303 one message pending. Then, the status of a definition is a tuple 304 consisting of the statuses of the names it defines, once some 305 arbitrary order of these names has been adopted. 306 307 For instance, if some messages are pending on $B$ and $C$, whereas 308 none is pending on $A$, then the status of the $A$, $B$, $C$ 309 definition is a three-tuple written \status{0NN}. 310 311 A {\em matching status} is defined as a status that holds enough 312 \status{N}, so that at least one guarded process can be fired. 313 314 Definition status evolves towards matching status as messages arrive. 315 This yields a first kind of ``increasing'' transitions. More 316 specifically, when a message arrives on some name, then this name 317 status either evolves from \status{0} to \status{N} or remains 318 \status{N}. Definition status evolves accordingly. In the $A$, $B$, 319 $C$ case we get the following transitions. (In this diagram, 320 transitions are labeled by the name that gets a new message and 321 matching statuses are filled in gray.) 322 \begin{toimage} 323 .PS 324 cm = 0.3937; 325 pt = 1/72; 326 hu = 2 * cm; 327 vu = 1.5 * cm 328 circlerad = 12 * pt 329 arcrad = circlerad 330 rblabht = 2 * circlerad 331 rblabwid = 4 * circlerad 332 fillval=0.3 333 pi=3.1415927 334 335 define clab { 336 {circle at Here $1 $2 $3} 337 } 338 339 define ilab { 340 {box invis $1 $2 $3 at Here} 341 } 342 343 define ln { 344 L: [circle rad 0] 345 if $4 < 0 then { 346 box $3 rjust wid 0 ht 0 at L.c + ($1*(hu/3)-circlerad/4, $2*(vu/3)) 347 } else {if $4 > 0 then { 348 box $3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+circlerad/4, $2*(vu/3)) 349 } else { 350 box $3 ljust wid 0 ht 0 at L.c + ($1*(hu/3)+ 0.001 * cm, $2*(vu/2.5)) 351 }} 352 {line -> chop from L.c to L.c + ($1*hu, $2*vu)} 353 move to L.c + ($1*hu, $2*vu) 354 } 355 356 define dln { 357 {line -> dotted right $1*hu up $2*vu chop} 358 move right $1*hu up $2*vu 359 } 360 361 define htccw { 362 psi=$1 363 phi=$2 364 { 365 c0=circlerad * 2 * cos(psi/2) * cos(phi+psi/2) 366 s0=circlerad * 2 * cos(psi/2) * sin(phi+psi/2) 367 c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi) 368 c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi+psi/2) 369 s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi+psi/2) 370 c3=circlerad * cos(phi+psi) ; s3 = circlerad * sin(phi+psi) 371 A: [circle rad 0.0] 372 move to A.c + (c1,s1) 373 arc $4 rad circlerad to A.c + (c2,s2) 374 arc -> $4 rad circlerad to A.c + (c3,s3) 375 circle invis $3 with.c at A.c + (c0,s0)} 376 } 377 378 define htcw { 379 psi=$1 380 phi=$2 381 { 382 c0=circlerad * 2 * cos(psi/2) * cos(phi-psi/2) 383 s0=circlerad * 2 * cos(psi/2) * sin(phi-psi/2) 384 c1=circlerad * cos(phi) ; s1 = circlerad * sin(phi) 385 c2=circlerad * (2 * cos(psi/2) + 1) * cos(phi-psi/2) 386 s2=circlerad * (2 * cos(psi/2) + 1) * sin(phi-psi/2) 387 c3=circlerad * cos(phi-psi) ; s3 = circlerad * sin(phi-psi) 388 A: [circle rad 0.0] 389 move to A.c + (c1,s1) 390 arc cw $4 rad circlerad to A.c + (c2,s2) 391 arc cw -> $4 rad circlerad to A.c + (c3,s3) 392 circle invis $3 with.c at A.c + (c0,s0)} 393 } 394 395 define tourccw { 396 htccw(pi/2,$1,$2,$3) 397 } 398 399 define tourcw { 400 htcw(pi/2,$1,$2,$3) 401 } 402 403 hu=3*cm 404 [ 405 vu=2*cm 406 circlerad=15*pt 407 clab("\status{000}") 408 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}") ; tourccw(pi/2,"$B$") 409 {ln(0,-1,"$A$",0) ; clab("\status{NN0}",fill) 410 tourccw(pi,"$A, B$") 411 ln(1,-1,"$C$",-11)} 412 {ln(1,-1,"$C$",1) ; clab("\status{0NN}") 413 tourcw(pi/8,"$\,B, C$") 414 ln(0,-1,"$A$",-1) ; clab("\status{NNN}",fill) 415 tourccw(5*pi/4,"$A,B,C$")}} 416 {ln(0,-1,"$A$",0) ; clab("\status{N00}") ; tourcw(pi/2.5,"$A$") 417 {ln(-1,-1,"$B$",-1)} 418 {ln(1,-1,"$C$",1)}} 419 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ; tourcw(pi/2,"$C$") 420 {ln(-1,-1,"$B$",1)} 421 {ln(0,-1,"$A$",0) ; clab("\status{N0N}",fill) tourcw(0,"$A, C$") 422 ln(-1,-1,"$B$",1)}} 423 ] 424 .PE 425 \end{toimage}%$ 426 \showgraph 427 428 429 Definition status also evolves when matching occurs. This yields 430 new, ``decreasing'', transitions that we call matching transitions. 431 According to the join-calculs semantics, 432 matching may occur at any moment (provided of course that 433 matching is possible). 434 As a consequence, matching transitions start from matching states and 435 they are unlabelled. 436 In the $A$, $B$, $C$ case, they are as follows: 437 \begin{toimage} 438 .PS 439 lab=0.5 440 441 define iln { 442 move right $1*hu up $2*vu 443 } 444 445 aphi=pi/4 446 447 define alncw { 448 c= circlerad*cos($1) ; s = circlerad*sin($1) 449 c2 = circlerad*cos($2) ; s2 = circlerad*sin($2) 450 C: [circle rad 0] 451 if $4 < 0 then {abs = -$4} else {abs = $4} 452 x1 = c ; y1 = s ; x2 = $3*hu+c2 ; y2 = $4*vu+s2 453 d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1) 454 ar = sqrt(d) / (2*sin(aphi)) 455 D: arc cw -> $6 rad ar from C.c + (x1,y1) to C.c + (x2,y2) 456 r=last arc.rad 457 x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1) 458 d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y)) 459 line invis from D.c to r/d <D.c,(x3,y3)> ; $5 460 } 461 462 define aln { 463 c= circlerad*cos($1) ; s = circlerad*sin($1) 464 c2 = circlerad*cos($2) ; s2 = circlerad*sin($2) 465 C: [circle rad 0] 466 if $4 < 0 then {abs = -$4} else {abs = $4} 467 x1=c ; y1 = s ; x2 = $3*hu+c2 ; y2 = $4*vu+s2 468 d = (x2-x1) * (x2-x1) + (y2-y1)*(y2-y1) 469 ar = sqrt(d) / (2*sin(aphi)) 470 D: arc -> $6 rad ar from C.c + (x1,y1) to C.c + (x2,y2) 471 r=last arc.rad 472 x3 = C.c.x + lab*(x1+x2) ; y3 = C.c.y + lab*(y2+y1) 473 d = sqrt((x3-D.c.x) * (x3-D.c.x) + (y3 -D.c.y)*(y3-D.c.y)) 474 line invis from D.c to r/d <D.c,(x3,y3)> ; $5 475 } 476 477 clab("\status{000}") 478 {iln(-1,-1,invis) ; clab("\status{0N0}") 479 {iln(0,-1,"$B$") ; clab("\status{NN0}",fill) ; 480 {dln(0,1)} ; {dln(1,1)} ; {dln(1,2)} 481 tourccw(pi,"",dotted) ; iln(1,-1,"$C$")} 482 {iln(1,-1,"$C$") ; clab("\status{0NN}") 483 iln(0,-1,"$B$") ; clab("\status{NNN}",fill) 484 tourccw(5*pi/4,"",dotted) 485 {dln(0,1)} ; {dln(-1,1)} ; {dln(1,1)} ; 486 {aln(3*pi/8,-pi/3,0,2,"",dotted)} 487 {dln(-1,2)} ; {dln(1,2)}}} 488 {iln(0,-1,"$B$") ; clab("\status{N00}") 489 {iln(-1,-1,"$A$")} 490 {iln(1,-1,"$C$")}} 491 {iln(1,-1,"$C$") ; clab("\status{00N}") 492 {iln(-1,-1,"$A$")} 493 {iln(0,-1,"$B$") ; clab("\status{N0N}",fill) ; 494 {dln(0,1)} ; {dln(-1,1)} ; {dln(-1,2)} 495 tourcw(0,"",dotted) ; iln(-1,-1,"$A$")}} 496 .PE 497 \end{toimage} 498 \showgraph 499 500 501 Observe that there may be several matching transitions starting from a 502 given status. This is not always a consequence of the 503 non-deterministic semantics of the join-calculus. 504 505 Often, ambiguity is only apparent. For instance, matching transitions 506 starting from \status{NN0} lead to \status{NN0}, \status{N00}, 507 \status{0N0} and \status{000}. When such a matching occurs, two 508 messages are consumed (one pending on $A$ and one pending on $B$) 509 then, depending on whether there are some messages left pending on $A$ 510 and $B$ or not, status evolves to \status{NN0}, \status{N00}, 511 \status{0N0} or \status{000}. From the implementation point of view, 512 this means that a little runtime testing is required once matching has 513 been performed. Here, we pay a price for using finite-state automata. 514 515 However, some true non-determinism is still present. Consider 516 status~\status{NNN} for instance. Then, both guarded processes of the 517 $A$, $B$, $C$ definition can now be fired. The choice of firing 518 either $P(n)$ or $Q(n)$ will result in either consuming one message 519 pending on $A$ and one on $B$, or consuming one message pending on $A$ 520 and one on $C$. 521 522 Finally, a view of join-matching compilation can be given by taking 523 together both kinds of transitions. This yields a non-deterministic 524 automaton. 525 526 Note that matching of non-linear patterns can also be compiled using automata. 527 For instance, if a name appears at most twice in one or more pattern, 528 then it status will ramge on $\status{0}$, $\status{1}$ and $\status{N}$. 529 We do not present this extension in greater detail, for simplicity, and 530 because we do not implement non-linear patterns. 531 532 \subsection{Towards deterministic automata} 533 534 For efficiency and simplicity reasons we choose to implement matching 535 using deterministic automata that react to message reception. 536 537 Fortunately, it is quite possible to do so. It suffices to perform 538 matching as soon as possible. More precisely, when a message arrives 539 and carries definition status into matching status, matching is 540 performed immediately, while definition status is adjusted to reflect 541 message consumption. This results in pruning the status space just 542 below matching statuses. 543 544 545 In practise, in the $A$, $B$, $C$ case, we get the 546 automaton of figure~\ref{abc}. 547 \begin{figure}[th] 548 \begin{toimage} 549 .PS 550 circlerad=16*pt 551 hu=3 *cm 552 vu=3 *cm 553 clab("\status{000}") 554 {ln(-1,-1,"$B$",-1) ; clab("\status{0N0}") 555 tourccw(-9*pi/10,"$B$") 556 tourccw(pi/2,"$A$",dotted) 557 {alncw(pi/4,-3*pi/4,1,1,"$A$",dotted)} 558 {ln(1,-1,"$C$",1) ; clab("\status{0NN}") 559 htccw(pi/4,-pi/2-pi/8,"$B, C$") 560 {alncw(3*pi/4,-pi/4,-1,1,"$A$",dashed)} 561 {aln(pi/4,-3*pi/4,1,1,"$A$",dotted)} 562 tourccw(-5*pi/12,"$A$",dotted) 563 tourcw(-7*pi/12,"$A$",dashed) 564 }} 565 {ln(0,-1,"$A$",0) ; clab("\status{N00}") 566 tourcw(5*pi/12,"$A$") 567 tourccw(-3*pi/4,"$B$",dotted) 568 tourccw(-17*pi/12,"$C$",dashed) 569 {alncw(pi/2,-pi/2,0,1,"$B$",dotted)} 570 {aln(pi/2,-pi/2,0,1,"$C$",dashed)} 571 } 572 {ln(1,-1,"$C$",1) ; clab("\status{00N}") ; 573 tourcw(-pi/10,"$C$") 574 tourcw(pi/2,"$A$",dashed) 575 {aln(3*pi/4,-pi/4,-1,1,"$A$",dashed)} 576 {ln(-1,-1,"$B$",-1)}} 577 .PE 578 \end{toimage} 579 \showgraph 580 \caption{\label{abc} Automaton in the $A$, $B$, $C$ case} 581 \end{figure} 582 583 Observe that all transitions are now labeled and that a name labels a 584 transition when message reception on this name triggers that 585 transition. Furthermore, matching transitions that correspond to 586 firing $P(n)$ or firing $Q(n)$ are now represented differently (the 587 former by a dotted arrow, the latter by a dashed arrow). This 588 highlights the difference between false and true non-deterministic 589 transitions: real non-determinism is present when there are both 590 dotted and dashed edges with the same label starting from the same 591 node. 592 593 For instance, there are two $B$-labeled dotted transitions starting 594 from $\status{N00}$. Non-determinism is only apparent here, since 595 $P(n)$ is fired in both cases and that the selected transition depends 596 only on whether there is at least one message left pending on $A$ 597 or not after firing $P(n)$. 598 599 600 By contrast, from status \status{0NN}, the automaton may react to the 601 arrival of a message on $A$ in two truly different manners, by firing 602 either $P(n)$ or $Q(n)$. This is clearly shown in figure~\ref{abc} by 603 the $A$-labeled edges that start from status~\status{0NN}, some of 604 them being dashed and the others being dotted. A simple way to avoid 605 such a non-deterministic choice at run-time is to perform it at 606 compile-time. That is, here, we suppress either dotted or dashed 607 $A$-labeled transitions that start from $\status{0NN}$. 608 609 In the rest of the paper, we take automata such as the one of 610 figure~\ref{abc} as suitable abstractions 611 of join-pattern compilation output. 612 613 \subsection{Automata and semantics} 614 615 Both the ``match as soon as possible'' behavior and the deletion of 616 some matching transitions have a price in terms of semantics. 617 More precisely, some CHAM behaviors now just cannot be observed 618 anymore. 619 However, 620 the CHAM semantics is a non-deterministic one: an initial 621 configuration of the CHAM {\em may} evolve into a variety of configurations. 622 Furthermore, there is no fairness constraint of any kind and 623 no particular event is required to occur. 624 625 As an example of the consequence of the ``match as soon as possible'' 626 behavior, consider this definition: 627 \begin{verbatim} 628 let A() = print(1); 629 and B() = print(2); 630 and A() | B() = print(3); 631 ;; 632 \end{verbatim} 633 Then, we get the following automaton: 634 \begin{toimage} 635 .PS 636 clab("\status{00}") 637 tourcw(-3*pi/4,"$A$",dotted) 638 tourccw(-pi/4,"$B$",dashed) 639 .PE 640 \end{toimage} 641 \showgraph 642 Status \status{NN} that is preceded by the two matching statuses 643 \status{0N} and \status{N0} cannot be reached. As a consequence, the 644 above program will never print a {\tt 3}, no matter how many messages 645 are sent on $A$ and $B$. 646 647 Next, to illustrate the effect of deleting ambiguous matching 648 transitions, consider the following definition: 649 \begin{verbatim} 650 let A() = print(1); 651 and A() | B() = print(2); 652 \end{verbatim} 653 Such a definition will get compiled into one of the following 654 deterministic automata: 655 \begin{toimage} 656 .PS 657 {clab("\status{00}") 658 tourccw(pi/4,"$A$",dotted) 659 {ln(0,-1,"$B$",0) 660 clab("\status{0N}") 661 tourcw(pi/4,"$B$"); 662 tourcw(-3*pi/4,"$A$",dotted) 663 } 664 } 665 move right 1.5*hu 666 {clab("\status{00}") 667 tourccw(pi/4,"$A$",dotted) 668 {ln(0,-1,"$B$",0) 669 clab("\status{0N}") 670 tourcw(pi/4,"$B$"); 671 tourcw(-3*pi/4,"$A$",dashed) 672 {alncw(pi/2,-pi/2,0,1,"$A$",dashed)} 673 } 674 } 675 .PE 676 \end{toimage} 677 \showgraph 678 In the case of the left automaton, only {\tt 1} will ever get printed. 679 In the case of the right 680 automaton, {\tt 2} will be printed when some messages arrives on~$B$ 681 and then on~$A$. 682 Both automata lead to correct implementations of the semantics. However 683 the second automata looks a better choice than the first one, since it 684 yields more program behaviors. 685 686 687 \section{Runtime definitions} 688 689 \label{implementation} 690 \subsection{Basics} 691 692 Names are the basic values of the join-calculus, and thus any 693 implementation of the join-calculus must supply a runtime 694 representation for them. For instance, a name can be sent on some 695 appropriate channel. At runtime, we must indeed send something. 696 697 However, names that are defined together in the same join definition 698 interact when matching is tested for and performed. Moreover, by the 699 very idea behind the join-calculus, matching is the only 700 synchronization primitive. In other words, only names that are 701 defined by the same join definition have some kind of interaction that 702 is of the runtime system responsibility. 703 704 705 This makes possible and desirable to compile a source definition into 706 a runtime ``definition'', a single vector structure that groups all 707 the names defined in a given definition. Names must still exist as 708 individuals, they can be represented as infix pointers into their 709 definition (as in {\tt join}), or as a definition pointer and an index 710 (as in {\tt jocaml}). 711 712 Both the {\tt join} and {\tt jocaml} systems implement the automata of 713 the previous section. However, they do so in quite different manners. 714 The former focuses on minimizing runtime testing, while the latter 715 involves a systematic runtime testing of the current status at every 716 message arrival. 717 718 \subsection{Definitions in {\tt join}} 719 720 Runtime definitions are vector structures. Every name defined in a 721 definition occupies two slots in the vector structure. The first 722 entry holds a code pointer that stands for the name itself, while the 723 second entry holds a pointer to a queue of pending messages, queues 724 being organized as linked lists. Runtime definitions include 725 additional slots that hold the values of the variables that are free 726 in guarded processes. This technique resembles much the one used by 727 the SML/NJ compiler~\cite{Appel-compiling} to represent mutually 728 recursive functions. Message sending on name~$x$ is performed by 729 stacking message values and then calling the code for name~$x$. This 730 code is retrieved by dereferencing twice the infix pointer that 731 represents~$x$ at runtime. 732 733 734 However, there is a big difference between mutually recursive 735 functions and join definitions. The code for name~$x$ is automaton 736 code that reacts to the arrival of a new message on that name. The 737 compiler issues various versions of name code, one per possible status 738 of the definition that defines~$x$. Typically, name code either saves 739 a message into the queue for~$x$ (in the non-matching case), or 740 retrieves messages from other queues before firing a guarded process 741 (in the matching case). In all cases, definition status may then 742 need an update, which is performed by updating all code entries in 743 the definition. 744 745 746 \subsection{Definitions in {\tt jocaml}} 747 748 In the {\tt jocaml} system, a name is a pointer to a definition plus 749 an index. Definitions are still vectors structures, but there is only 750 one entry per name for message queues. Additionally, definitions hold 751 guarded closures (i.e. guarded process code plus free variable 752 values), a status field and a matching data structure. 753 754 Status field holds the current status of the definition as a 755 bit-field. Each name status is encoded by a bit, using bit 1 for 756 \status{N} and bit 0 for \status{0}, and bit position is given by name 757 index. 758 % The matching data structure is an array of maps from statuses to 759 % clause indexes. 760 761 Message sending is performed by calling a generic C~function from the 762 ``join'' library, taking message value, a definition pointer and a 763 name index as arguments. When a message is received on name~$x$, the 764 bit for~$x$ is checked in the current status bit-field. If the bit is 765 set, some messages on name~$x$ are already present. Thus, definition 766 status does not change. Since the current status before message 767 sending is guaranteed to be a non-matching one, the message is queued 768 and the function is exited. 769 770 Otherwise, the current definition status is searched in the matching 771 structure for~$x$. This matching structure is an array of pattern 772 encoding, guarded process index pairs. Pattern encodings are 773 bit-fields, just like status encodings. 774 corresponding to a join pattern containing name~$x$, from which name~$x$ has been removed. Using a sequential 775 search by a bitwise "and" with each pattern encoding, the current 776 status can be identified as matching or non-matching in at most $N_x$ 777 tests, where $N_x$ is the number of clauses whose pattern contains~$x$. 778 779 If no match is found, the automaton state is updated and the message 780 value is queued in the queue for $x$. Otherwise, a guarded process 781 index has been found, and is used to retrieve the associated guarded 782 closure. Arguments to the guarded process are extracted from the 783 queues identified by the matching status found. Status is updated at 784 the same moment (when a queue becomes empty a bit is erased). Finally 785 the guarded process is fired. 786 787 Therefore, the performance of this technique much relies on fast 788 comparisons and modifications of definition statuses. The best result 789 is achieved when statuses are encoded by machine integers. In that 790 case, the number of names that a definition can define is limited by 791 the integer size of the hoisting Objective Caml system (which 792 typically is 31 or 63~bits). If this is not considered enough, then 793 statuses have to be encoded using several integers or one string. Both 794 kinds of status encodings can be mixed, using integers for small 795 definitions and strings for larger ones. However, in the current {\tt 796 jocaml} system, we use a single integer to hold status, and a 797 technique (described in section~\ref{fusion}) is used to associate 798 several channels to a same bit in the status bit-field. 799 800 \section{The pragmatics of compilation}\label{opt} 801 \label{optimization} 802 803 This section is dedicated to optimizations that are first pertinent 804 for the {\tt join} technique and that are performed by the 805 current version of the {\tt join} compiler. 806 807 We first introduce optimizations that improve the runtime behavior of 808 programs, both in speed and dynamic memory usage. Then, we show how to 809 reduce emitted code size. 810 We focus on optimizing definitions written in 811 object-oriented style, as described in the tutorial~\cite{Join}. As 812 this programming style proved quite frequent, 813 it is normal for us compiler writers to concentrate 814 our efforts on such definitions. 815 816 In this style, a definition is an objet. Object state is encoded by 817 asynchronous {\em state names}, while synchronous {\em methods} access 818 or modify object state. 819 For instance, given one state name $S$ and $n$ methods 820 $m_1$, $m_2$,\ldots $m_n$ taken in that order, we get: 821 \begin{verbatim} 822 let create(x_0) = 823 let S(x) | m_1() = P_1(x) 824 and S(x) | m_2() = P_2(x) 825 .... 826 and S(x) | m_n() = P_n(x) in 827 S(x_0) | reply m_1,m_2,...,m_n to create 828 ;; 829 \end{verbatim} 830 \noindent The synchronous call \verb+create(v)+ creates a new 831 object (i.e. a new $S$, $m_1$, $m_2$,\ldots $m_n$ definition) 832 and then sends back a $n$-tuple of its methods. Moreover, this call 833 initializes object state with the value~$v$. 834 835 836 \subsection{Refined status} 837 As a working example of an object-style definition, consider the 838 following adder: 839 \begin{verbatim} 840 let create(x_0) = 841 let S(x) | get() = S(x) | reply x to get 842 and S(x) | add(y) = S(x+y) | reply to add in 843 S(x_0) | reply get,add to create 844 ;; 845 \end{verbatim} 846 847 The adder has one state name $S$ and two methods $get$ and $add$. We 848 then try to figure out some ``normal'' runtime behavior for it. As 849 the initial \verb+S(x_0)+ is forked as soon as the adder definition 850 has been created, a highly likely initial situation is that there is 851 one message pending on $S$ and none on the other names. Later on, 852 as some external agent invokes the $get$ or $add$ method, the message 853 pending on $S$ is consumed and the appropriate guarded process is 854 fired. Either process quickly sends a message on $S$. Thus, a likely 855 behavior is for the queue of $S$ to alternate between being empty and 856 holding one element, the queue being empty for short periods. By some 857 aspects of the compilation of ``\verb+|+'' and of our scheduling 858 policy that we will not examine here, this behavior is almost certain. 859 860 As a matter of fact, this ``normal'' life cycle involves a blatant 861 waste of memory, queue elements (cons cells) are allocated and 862 deallocated in the general dynamic fashion, while the runtime usage of 863 these cells would allow a more efficient policy. It is more clever 864 not to allocate a cell for the only message pending on $S$ and to use 865 the queue entry attributed to $S$~in the runtime definition as a 866 placeholder. On the status side, this new situation is rendered by a 867 new ``\status{1}'' status. Hence, $S$ now possesses a three valued 868 status: \status{0} (no message), \status{1} (one message in the queue 869 slot) or \status{N} (some messages organized in a linked list). Thus, 870 assuming for the time being, that there may be an arbitrary number of 871 messages pending on $S$, the adder compiles into the automaton of 872 figure~\ref{adder} (adder names are taken in the order $S$, $get$, 873 $add$). This new automaton can be seen as an evolution of the $A$, 874 $B$, $C$ automaton of figure~\ref{abc}, with a slight change in 875 channel names. 876 877 \begin{figure}[th] 878 \begin{toimage} 879 .PS 880 [ 881 circlerad=16*pt 882 hu=3 *cm 883 vu=3.5 *cm 884 clab("\status{000}") 885 {ln(-1,-1,"$get$",-1) ; clab("\status{0N0}") 886 tourccw(-9*pi/10,"$get$") 887 tourccw(pi/2,"$S$",dotted) 888 {alncw(pi/4,-3*pi/4,1,1,"$S$",dotted)} 889 {ln(1,-1,"$add$",1) ; clab("\status{0NN}") 890 {alncw(3*pi/4,-pi/4,-1,1,"$S$",dashed)} 891 {aln(pi/4,-3*pi/4,1,1,"$S$",dotted)} 892 htccw(pi/3,-pi/2-pi/6,"$\begin{array}{l} get\\ add\end{array}$") 893 tourccw(-5*pi/12,"$S$",dotted) 894 tourcw(-7*pi/12,"$S$",dashed) 895 }} 896 { 897 ln(0,-.67,"$S$",0) ; clab("\status{100}") 898 {alncw(pi/2,-pi/2-pi/12,0,0.67,"$get$",dotted)} 899 {aln(pi/2,-pi/2+pi/12,0,0.67,"$add$",dashed)} 900 ln(0,-.67,"$S$",0) ; clab("\status{N00}") 901 tourcw(5*pi/12,"$add$",dashed) 902 tourccw(-3*pi/4,"$S$") 903 tourccw(-17*pi/12,"$get$",dotted) 904 {alncw(pi/2,-pi/2-pi/6,0,1.33,"$get$",dotted)} 905 {aln(pi/2,-pi/2+pi/6,0,1.33,"$add$",dashed)} 906 } 907 {ln(1,-1,"$add$",1) ; clab("\status{00N}") ; 908 tourcw(-pi/10,"$add$") 909 tourcw(pi/2,"$S$",dashed) 910 {aln(3*pi/4,-pi/4,-1,1,"$S$",dashed)} 911 {ln(-1,-1,"$get$",-1)}} 912 ] 913 .PE 914 \end{toimage} 915 \showgraph 916 \caption{\label{adder} Full automaton for the adder} 917 \end{figure} 918 919 Using the status~\status{1} not only spares memory, it also avoids 920 some of the runtime tests that compute post-matching status. 921 Basically, when a matching consumes the sole message pending on a name 922 with status~\status{1}, then the automaton already knows that this 923 name queue is empty. For instance, when the automaton of 924 figure~\ref{adder} is in the \status{100} status and that a message 925 arrive on either one of the two methods, then the appropriate process 926 is fired and status goes back to \status{000} without any runtime 927 test. By contrast, when the automaton is in the \status{00N} status 928 and that a message arrive on $S$, the second guarded process is fired 929 immediately, but a test on $add$ queue is then performed: if this 930 queue is now empty then status goes back to \status{000}, otherwise 931 status remains \status{00N}. Receiving a message on $S$ when status 932 is \status{0NN} is a bit more complicated. First, the automaton 933 chooses to consume a message pending on either one of the two methods 934 and to fire the appropriate process (figure~\ref{adder} does not 935 specify this choice). Then, the queue of the selected method has to 936 be tested, in order to determine post-matching status. 937 938 Status \status{1} is easy to implement using the {\tt join} 939 compilation technique. The compiler issues different method codes for 940 \status{100} and \status{N00}, and different codes can find 941 $S$~argument at different places. Implementing status \status{1} in 942 {\tt jocaml} would be more tricky, since the encoding of states using 943 bit-patterns would be far less straightforward than with 944 \status{0}/\status{N} statuses only. As a consequence, the {\tt 945 jocaml} compiler does not perform the space optimization described 946 in this section. 947 948 \subsection{Taking advantage of semantical analysis} 949 950 The automaton of figure~\ref{adder} has a \status{N00} status, to 951 reflect the case when there are two messages or more pending on $S$. 952 However, one quite easily sees that that status \status{N00} is 953 useless. First, as $S$~does not escape from the scope of its 954 definition, message sending on~$S$ is performed at three places only: 955 once initially (by \verb+S(x_0)+) and once in each guarded process. 956 Thus, there is one message pending on $S$ initially. A single message 957 pending on~$S$ is consumed by any match and the process fired on that 958 occasion is the only one to send one message on $S$. Therefore, there 959 cannot be two messages or more pending on $S$. As a consequence the 960 full automaton can be simplified by suppressing the \status{N00} node 961 and every edge that starts from it or leads to it. 962 963 In particular, there is no more $S$-labeled edge starting from node 964 \status{100}. In the {\tt join} implementation this means that 965 the code entry for $S$ needs not be updated when going from status 966 \status{000} to \status{100}. This entry is simply left as it is. 967 Symmetrically, the code entry for $S$ will not have to be restored when 968 status goes back to \status{000} after matching. 969 970 971 972 973 Another important usage of semantical analysis is determining which names 974 are state names. For a given definition, the output of the 975 analyzer is a status set~${\cal S}$, which is a safe approximation of 976 the actual runtime statuses of that definition. State names are the 977 asynchronous names such that all statuses in ${\cal S}$ give them the 978 status \status{0}~or~\status{1}. 979 980 The current {\tt join} compiler includes a rudimentary 981 name usage analyzer, which suffices for object definitions given in 982 the style of 983 the $S$, $m_1$, $m_2$, \ldots, $m_n$ definitions , where all state 984 variables are asynchronous and do not escape from the scope of their 985 definition. 986 987 An promising alternative would be to design an ad hoc syntax for 988 distributed objects, or, and this would be more ambitious, a full 989 object-oriented join-calculus. Then, the state variables of 990 object-definitions would be apparent directly from user programs. 991 992 993 994 \subsection{Avoiding status space explosion}\label{space} 995 996 Consider any definition that defines $n$ names. Ignoring 997 \status{1}~statuses, the size of the status space of a given 998 definition already is $2^n$. The size of the non-matching status 999 space is thus bounded by $2^n$. As a full automaton for this 1000 definition has one state per non-matching status, status space size 1001 explosion would be a real nuisance in the case of the {\tt join} 1002 compiler. In particular, there are $n$~times the number of 1003 non-matching statuses automaton code entries to create. 1004 1005 Unfortunately the exponential upper bound is reached by practical 1006 programs, as demonstrated by the general object-oriented definition 1007 given at the beginning of this section~\ref{opt}. In that case, all 1008 definition statuses such that $S$ has the \status{0} status are 1009 non-matching. In such a situation, using runtime testing, as {\tt 1010 jocaml} does, is not that much a penalty, when compared to code size 1011 explosion. 1012 1013 We thus introduce dynamic behavior in the automata. We do so on a 1014 name per name basis: the status of state names will be encoded by 1015 automata states as before, whereas method statuses will now be 1016 explicitly checked at runtime. Thus, we introduce ``\status{?}'', a 1017 new status, which means that nothing is known about the number of 1018 messages pending on a name. Additionally, we state that all methods 1019 will have the \status{?} status, as soon as there is one message or 1020 more pending on any of the methods. 1021 1022 This technique can be seen as merging some states of 1023 the full automaton compiled by considering complete status 1024 information into new states with \status{?} statuses in them. 1025 1026 For instance, in the adder example, we get the automaton of 1027 figure~\ref{adder2}, where the three statuses \status{0N0}, 1028 \status{0NN} and \status{00N} of figure~\ref{adder} merge into the new 1029 status~\status{0??}. (Note that we also take advantage of name usage 1030 analysis to delete status \status{N00}.) 1031 1032 \begin{figure}[th] 1033 \begin{toimage} 1034 .PS 1035 clab("\status{000}") 1036 1037 {{iln(-.4,-.4) ; "$\begin{array}{l}get\\add\\[.5em]\end{array}$"} 1038 ln(-1,-1,"",0) ; clab("\status{0??}") 1039 tourccw(pi,"$\begin{array}{l}get\\add\\[.5em]\end{array}$") 1040 tourcw(pi/4-pi/12,"$S$",dashed) 1041 tourccw(pi/4+pi/12,"$S$",dotted) 1042 {alncw(pi/4,-3*pi/4-pi/12,1,1,"$S$",dotted)} 1043 {aln(pi/4,-3*pi/4+pi/12,1,1,"$S$",dashed)} 1044 } 1045 1046 {ln(1,-1,"$S$",0) 1047 clab("\status{100}") 1048 {aln(3*pi/4,-pi/4+pi/12,-1,1,"$add$",dashed)} 1049 {alncw(3*pi/4,-pi/4-pi/12,-1,1,"$get$",dotted)} 1050 } 1051 .PE 1052 \end{toimage} 1053 \showgraph 1054 \caption{\label{adder2} Final automaton for the adder} 1055 \end{figure} 1056 1057 Information on where runtime testing has to be performed can be 1058 inferred from the diagram of figure~\ref{adder2}. For instance, 1059 assume that current status is \status{0??} and that a message arrives 1060 on~$S$. Since there is at least one message pending on a method, a 1061 matching will occur. Tests are needed though, before matching to 1062 determine the matching clause, and after matching to determine 1063 post-matching status. Abstractly, the first series of tests changes 1064 the \status{?} statuses in either \status{0} or \status{N} statuses, 1065 while the second series checks if there are still messages pending on 1066 names with \status{?} status. We are still investigating on how to 1067 organize these tests efficiently without producing too much code 1068 (see~\cite{Augustsson85,Maranget92} for a discussion of the size of 1069 such code in the context of compiling ML~pattern-matching). 1070 1071 By contrast, when status is \status{100} and that a message arrives on 1072 $get$ or $add$, then the corresponding matching is known immediately 1073 and the message pending on $S$ is consumed. Then, the queue for $S$ is 1074 known to be empty and status can be restored to \status{000} with no 1075 runtime testing at all. As message arrival order is likely to be 1076 first one message on $S$ and then one message on $get$ or $add$ the 1077 final automaton of figure~\ref{adder2} responds efficiently to more 1078 frequent case, still being able to respond to less frequent cases (for 1079 instance, two messages on methods may arrive in a row). Furthermore, 1080 when trouble is over, the automaton has status \status{000} and is 1081 thus ready for the normal case. In this example, a penalty in code 1082 size is paid for improving code speed in the frequent, ``normal'' 1083 case, whereas this penalty is avoided in non-frequent cases, which are 1084 treated by less efficient code. 1085 1086 We introduced a \status{?} status on a name per name basis. However, 1087 there are other choices possible: {\em a priori}, there are many ways 1088 to merge full automata states into final automata states. However, if 1089 one really wants to avoid status space explosion the final automaton 1090 should be constructed directly, without first constructing the full 1091 automaton. Adopting our per name \status{?} status makes this direct 1092 construction possible. Additionally, the \status{?} status can be used 1093 by the simple static analyzer as a status for the names it cannot 1094 trace (e.g. names that escape their definition scope). This 1095 dramatically decreases the size of analyzer output and its running 1096 time. 1097 1098 \section{Optimizing further} 1099 \label{fusion} 1100 1101 We here describe a simple transformation on 1102 join definitions, which does not rely on a full semantical analysis 1103 (such as name usage analysis), 1104 but only on a simple, local, syntactical analysis of join-patterns. 1105 1106 1107 Let us take a simple example: 1108 \begin{verbatim} 1109 let S(x) | m_1(y) = P_1(x,y) 1110 and S(x) | m_2(y) = P_2(x,y) 1111 .... 1112 and S(x) | m_n(y) = P_n(x,y) 1113 ;; 1114 \end{verbatim} 1115 1116 In this example, a match occurs only when there are messages pending 1117 both on $S$ and on one of the methods $m_1$, $m_2$,\ldots Thus, from the 1118 synchronization 1119 point of view, all the methods are equivalent. And indeed, we can 1120 regroup them into one, single method channel $m$ by 1121 transforming this join definition into: 1122 1123 \begin{verbatim} 1124 let S(x) | m(p,y) = P[p] (x,y);; 1125 let m_1(y) = m(1,y);; 1126 let m_2(y) = m(2,y);; 1127 .... 1128 let m_n(y) = m(n,y);; 1129 \end{verbatim} 1130 Where $P$ is the vector of processes $[P_1,P_2,...,P_n]$. 1131 1132 Methods $m_1$ $m_2$,\ldots are now simple wrappers. Method $m_i$ now 1133 calls $m$ with an additional $i$~argument, which basically is the 1134 index of $P_i$ in array $P$. At this point, we must emphasize that we 1135 describe this technique as a source to source transformation only for 1136 clarity. However, the produced source code may not be correct with 1137 respect to the join type system, when the types of methods are 1138 different. Anyhow, this optimization is implemented using ad hoc 1139 mechanisms, this both improves efficiency and solves the typing 1140 problem. 1141 1142 Currently, this optimization is performed by the {\tt jocaml} 1143 compiler. This leads to a new interpretation of name indexes by the 1144 ``join'' library. The least significant bits in name indexes still 1145 encode names that actually take part to synchronization (here $S$ and 1146 $m$), while their most significant bits (which were previously unused) 1147 now encode the extra $i$~argument. This yields two benefits. First, 1148 the number of status checks decreases, as the number of matching 1149 statuses decreases. Second, the number of channels that can be 1150 defined in one definition can now exceed the hosting system integer 1151 size, provided some names can be grouped together for synchronization. 1152 1153 In the {\tt join} compiler, this technique might be used to reduce 1154 automata size, since it lowers the number of non-matching statuses, by 1155 reducing the number of synchronizing names. Code entries for methods 1156 $m_1$, $m_2$,\ldots would still be contained in the definition 1157 structure, they would only stack the index of the process to fire, and 1158 then call the code for method $m$. Moreover, they do not need to be 1159 updated after each transition of the automaton. 1160 1161 1162 Finally, this technique can also be applied to more complex 1163 synchronizations. Given a definition that defines names $x_1$, $x_2$, 1164 \ldots, $x_n$, using patterns $J_1$, $J_2$, \ldots $J_m$. We say that 1165 two names are equivalent, when swapping them in the patterns yields 1166 the same set of patterns. We then replace equivalent names by a 1167 single name, plus an index. 1168 1169 Consider the following definition 1170 \begin{verbatim} 1171 let S_1(x) | m_1(y) = P_1(x,y) 1172 and S_1(x) | m_2(y) = P_2(x,y) 1173 and S_2(x) | m_1(y) = P_3(x,y) 1174 and S_2(x) | m_2(y) = P_4(x,y) 1175 ;; 1176 \end{verbatim} 1177 Then the set of defined names $\{S_1, S_2, m_1, m_2\}$ can be 1178 partitioned into $\{S_1, S_2\}$ and $\{m_1, m_2\}$. 1179 Then, the above program can be transformed into: 1180 \begin{verbatim} 1181 let S(p,x) | m(q,y) = P[p,q] (x,y);; 1182 let m_1(y) = m(1,y);; 1183 let m_2(y) = m(2,y);; 1184 let S_1(y) = S(1,y);; 1185 let S_2(y) = S(2,y);; 1186 \end{verbatim} 1187 (with $P[1,1] = P_1$, $P[1,2] = P_2$, $P[2,1] = P_3$ and $P[2,2] = P_4$) 1188 1189 1190 \section{Conclusion and future work}\cutname{conclusion.html} 1191 \label{conclusion} 1192 1193 In join-calculus, a name definition, all receptors on that name and all 1194 possible synchronizations on that name are grouped altogether in a single 1195 join definition. This enables the compilation of synchronization 1196 between concurrent or even distributed processes, using finite-state 1197 automata. 1198 In the distributed case, a message transport phase to the 1199 machine that currently hosts the join definition (and hence the 1200 automaton) is first performed. This strengthens our point of view 1201 that the join-calculus is the core of a distributed programming 1202 language that can be compiled in practice, mainly because it restricts 1203 reception on a channel to statically known parts of the program. The 1204 same argument applied to \`a la ML polymorphic typing 1205 in~\cite{join-typing-97}. 1206 1207 \begin{cutflow}{Benchmarks}\cutname{benchmarks.html} 1208 \aname{bench}{} 1209 \begin{table}[ht] 1210 \begin{center} 1211 \begin{tabular}{l@{\quad}r@{\quad}r@{\quad}r@{\quad}r@{\quad}r} 1212 & \texttt{fib} & 1213 \texttt{afib} & 1214 \texttt{pat} & 1215 \texttt{qsort} & 1216 \texttt{count}\\ \hline 1217 \texttt{join} & 32.0 & 14.5 & 37.2 & 9.9 & 16.4 \\ 1218 \texttt{jocaml} & 5.7 & 3.5 & 5.4 & 1.4 & 4.2 \\ 1219 \texttt{Bologna} & 11.9 & 6.2 & 9.4 & 16.8 & 5.3 \\ 1220 \end{tabular} 1221 \end{center} 1222 \caption{\label{perf}Some performance measures (wall-clock time, in seconds)} 1223 \end{table} 1224 Benckmarks sources are available on the \footahref{http://join.inria.fr/speed/}{web} 1225 \end{cutflow} 1226 Taking a few benchmarks (see table~\ref{perf}, or \ahrefloc{bench}{here}) 1227 as a set of sensible join programs, both the 1228 {\tt join} and the {\tt jocaml} pattern matching compilation schemes 1229 prove adequate. In particular, none of the two schemes falls into the 1230 pitfall associated to the compilation technique used. 1231 1232 In the {\tt join} case, one can be afraid of code size, the technique 1233 exposed in section~\ref{space} successfully avoids code size explosion 1234 in practical cases. The {\tt jocaml} technique appears expensive in 1235 runtime checks and thus {\em a priori} produces slow code. We choose 1236 such a scheme of implementing automata using generic code, because it 1237 can be implemented simply by adding code to the Objective Caml 1238 bytecode interpreter. Using bytecode specialized for automata 1239 manipulation would have implied more important modifications of the 1240 Objective Caml bytecode interpreter. Moreover, the {\tt jocaml} system 1241 runs faster than the {\tt join} system, even for pure join programs, 1242 showing that its weaker compilation of join definitions is more than 1243 compensated by its total integration in the Objective Caml system. 1244 1245 Comparison with the Bologna implementation~\cite{Bologna} of the 1246 join-calculus is also instructive. This system also produces 1247 bytecode, which is interpreted by a C~program. It proves faster than 1248 {\tt join} and slower that {\tt jocaml} on most of the examples. 1249 Taking a glance at the Bologna source code reveals that it uses a 1250 scheme very similar to the one of {\tt jocaml}, with two slight 1251 differences. First, status is systematically encoded as an array of 1252 integers. Second when a message arrives on a name $x$ with an empty 1253 queue, all patterns are tested (whereas in {\tt jocaml} only the 1254 patterns that contain $x$ are tested). 1255 1256 Performance of a given join system depends on many factors. In 1257 particular, scheduling policy and message queue management have a 1258 dramatic impact on it. Furthermore, a policy that gives good results 1259 on one benchmark can be defeated by another. For these reasons, we 1260 cannot tell which compilation technique is the best by comparing 1261 different implementations. 1262 1263 1264 1265 Clearly, we now need to integrate all our compilation techniques in 1266 the same system, in order to compare them more thoroughly and to 1267 experiment further. However, the definition of reactivity status and 1268 the automata of section~\ref{automatas} provide a sound 1269 basis for these future investigations. Apart from future language 1270 development and fully implementing the failure semantics of the 1271 join-calculus, we also plan to investigate more on the implementation 1272 of threads, attempting to minimize thread suspension and creation. 1273 1274 1275 1276 \bibliography{bib} 1277 \bibliographystyle{abbrv} 1278 1279 1280 \end{document} 1281 1282 %\subsection{Playing with the runtime scheduler} 1283 % 1284 %Concentrating on the {\it object style} of programming, lots of 1285 %methods do not modify the automaton status, but only read or modify 1286 %state names (such as {\tt set} and {\tt get}), with only little 1287 %non-blocking computations. Most of the time, these methods are never 1288 %interrupted, and thus behave as in critical sections from the 1289 %threads scheduler point of view. 1290 % 1291 %As a consequence, when such a method is called, an optimization would 1292 %be not to modify the automaton status at all, since the method will 1293 %end by giving back its original status to the automaton. To prevent 1294 %the automaton to receive other messages from other methods while in an 1295 %incoherent state, these methods are executed in a critical section. 1296 % 1297 %This optimization relies on a simple analysis of methods, which can 1298 %also be used to decrease the number of forked threads, since such 1299 %methods do not need to be executed in a forked thread.