"Fossies" - the Fresh Open Source Software Archive

Member "go/doc/asm.html" (9 Sep 2020, 34427 Bytes) of package /windows/misc/go1.14.9.windows-386.zip:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) HTML source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 <!--{
    2     "Title": "A Quick Guide to Go's Assembler",
    3     "Path":  "/doc/asm"
    4 }-->
    5 
    6 <h2 id="introduction">A Quick Guide to Go's Assembler</h2>
    7 
    8 <p>
    9 This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
   10 The document is not comprehensive.
   11 </p>
   12 
   13 <p>
   14 The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
   15 <a href="https://9p.io/sys/doc/asm.html">elsewhere</a>.
   16 If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
   17 The current document provides a summary of the syntax and the differences with
   18 what is explained in that document, and
   19 describes the peculiarities that apply when writing assembly code to interact with Go.
   20 </p>
   21 
   22 <p>
   23 The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
   24 Some of the details map precisely to the machine, but some do not.
   25 This is because the compiler suite (see
   26 <a href="https://9p.io/sys/doc/compiler.html">this description</a>)
   27 needs no assembler pass in the usual pipeline.
   28 Instead, the compiler operates on a kind of semi-abstract instruction set,
   29 and instruction selection occurs partly after code generation.
   30 The assembler works on the semi-abstract form, so
   31 when you see an instruction like <code>MOV</code>
   32 what the toolchain actually generates for that operation might
   33 not be a move instruction at all, perhaps a clear or load.
   34 Or it might correspond exactly to the machine instruction with that name.
   35 In general, machine-specific operations tend to appear as themselves, while more general concepts like
   36 memory move and subroutine call and return are more abstract.
   37 The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
   38 </p>
   39 
   40 <p>
   41 The assembler program is a way to parse a description of that
   42 semi-abstract instruction set and turn it into instructions to be
   43 input to the linker.
   44 If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
   45 are many examples in the sources of the standard library, in packages such as
   46 <a href="/pkg/runtime/"><code>runtime</code></a> and
   47 <a href="/pkg/math/big/"><code>math/big</code></a>.
   48 You can also examine what the compiler emits as assembly code
   49 (the actual output may differ from what you see here):
   50 </p>
   51 
   52 <pre>
   53 $ cat x.go
   54 package main
   55 
   56 func main() {
   57     println(3)
   58 }
   59 $ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
   60 "".main STEXT size=74 args=0x0 locals=0x10
   61     0x0000 00000 (x.go:3)   TEXT    "".main(SB), $16-0
   62     0x0000 00000 (x.go:3)   MOVQ    (TLS), CX
   63     0x0009 00009 (x.go:3)   CMPQ    SP, 16(CX)
   64     0x000d 00013 (x.go:3)   JLS 67
   65     0x000f 00015 (x.go:3)   SUBQ    $16, SP
   66     0x0013 00019 (x.go:3)   MOVQ    BP, 8(SP)
   67     0x0018 00024 (x.go:3)   LEAQ    8(SP), BP
   68     0x001d 00029 (x.go:3)   FUNCDATA    $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
   69     0x001d 00029 (x.go:3)   FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
   70     0x001d 00029 (x.go:3)   FUNCDATA    $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
   71     0x001d 00029 (x.go:4)   PCDATA  $0, $0
   72     0x001d 00029 (x.go:4)   PCDATA  $1, $0
   73     0x001d 00029 (x.go:4)   CALL    runtime.printlock(SB)
   74     0x0022 00034 (x.go:4)   MOVQ    $3, (SP)
   75     0x002a 00042 (x.go:4)   CALL    runtime.printint(SB)
   76     0x002f 00047 (x.go:4)   CALL    runtime.printnl(SB)
   77     0x0034 00052 (x.go:4)   CALL    runtime.printunlock(SB)
   78     0x0039 00057 (x.go:5)   MOVQ    8(SP), BP
   79     0x003e 00062 (x.go:5)   ADDQ    $16, SP
   80     0x0042 00066 (x.go:5)   RET
   81     0x0043 00067 (x.go:5)   NOP
   82     0x0043 00067 (x.go:3)   PCDATA  $1, $-1
   83     0x0043 00067 (x.go:3)   PCDATA  $0, $-1
   84     0x0043 00067 (x.go:3)   CALL    runtime.morestack_noctxt(SB)
   85     0x0048 00072 (x.go:3)   JMP 0
   86 ...
   87 </pre>
   88 
   89 <p>
   90 The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
   91 for use by the garbage collector; they are introduced by the compiler.
   92 </p>
   93 
   94 <p>
   95 To see what gets put in the binary after linking, use <code>go tool objdump</code>:
   96 </p>
   97 
   98 <pre>
   99 $ go build -o x.exe x.go
  100 $ go tool objdump -s main.main x.exe
  101 TEXT main.main(SB) /tmp/x.go
  102   x.go:3        0x10501c0       65488b0c2530000000  MOVQ GS:0x30, CX
  103   x.go:3        0x10501c9       483b6110        CMPQ 0x10(CX), SP
  104   x.go:3        0x10501cd       7634            JBE 0x1050203
  105   x.go:3        0x10501cf       4883ec10        SUBQ $0x10, SP
  106   x.go:3        0x10501d3       48896c2408      MOVQ BP, 0x8(SP)
  107   x.go:3        0x10501d8       488d6c2408      LEAQ 0x8(SP), BP
  108   x.go:4        0x10501dd       e86e45fdff      CALL runtime.printlock(SB)
  109   x.go:4        0x10501e2       48c7042403000000    MOVQ $0x3, 0(SP)
  110   x.go:4        0x10501ea       e8e14cfdff      CALL runtime.printint(SB)
  111   x.go:4        0x10501ef       e8ec47fdff      CALL runtime.printnl(SB)
  112   x.go:4        0x10501f4       e8d745fdff      CALL runtime.printunlock(SB)
  113   x.go:5        0x10501f9       488b6c2408      MOVQ 0x8(SP), BP
  114   x.go:5        0x10501fe       4883c410        ADDQ $0x10, SP
  115   x.go:5        0x1050202       c3          RET
  116   x.go:3        0x1050203       e83882ffff      CALL runtime.morestack_noctxt(SB)
  117   x.go:3        0x1050208       ebb6            JMP main.main(SB)
  118 </pre>
  119 
  120 <h3 id="constants">Constants</h3>
  121 
  122 <p>
  123 Although the assembler takes its guidance from the Plan 9 assemblers,
  124 it is a distinct program, so there are some differences.
  125 One is in constant evaluation.
  126 Constant expressions in the assembler are parsed using Go's operator
  127 precedence, not the C-like precedence of the original.
  128 Thus <code>3&amp;1<<2</code> is 4, not 0—it parses as <code>(3&amp;1)<<2</code>
  129 not <code>3&amp;(1<<2)</code>.
  130 Also, constants are always evaluated as 64-bit unsigned integers.
  131 Thus <code>-2</code> is not the integer value minus two,
  132 but the unsigned 64-bit integer with the same bit pattern.
  133 The distinction rarely matters but
  134 to avoid ambiguity, division or right shift where the right operand's
  135 high bit is set is rejected.
  136 </p>
  137 
  138 <h3 id="symbols">Symbols</h3>
  139 
  140 <p>
  141 Some symbols, such as <code>R1</code> or <code>LR</code>,
  142 are predefined and refer to registers.
  143 The exact set depends on the architecture.
  144 </p>
  145 
  146 <p>
  147 There are four predeclared symbols that refer to pseudo-registers.
  148 These are not real registers, but rather virtual registers maintained by
  149 the toolchain, such as a frame pointer.
  150 The set of pseudo-registers is the same for all architectures:
  151 </p>
  152 
  153 <ul>
  154 
  155 <li>
  156 <code>FP</code>: Frame pointer: arguments and locals.
  157 </li>
  158 
  159 <li>
  160 <code>PC</code>: Program counter:
  161 jumps and branches.
  162 </li>
  163 
  164 <li>
  165 <code>SB</code>: Static base pointer: global symbols.
  166 </li>
  167 
  168 <li>
  169 <code>SP</code>: Stack pointer: top of stack.
  170 </li>
  171 
  172 </ul>
  173 
  174 <p>
  175 All user-defined symbols are written as offsets to the pseudo-registers
  176 <code>FP</code> (arguments and locals) and <code>SB</code> (globals).
  177 </p>
  178 
  179 <p>
  180 The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
  181 is the name <code>foo</code> as an address in memory.
  182 This form is used to name global functions and data.
  183 Adding <code>&lt;&gt;</code> to the name, as in <span style="white-space: nowrap"><code>foo&lt;&gt;(SB)</code></span>, makes the name
  184 visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
  185 Adding an offset to the name refers to that offset from the symbol's address, so
  186 <code>foo+4(SB)</code> is four bytes past the start of <code>foo</code>.
  187 </p>
  188 
  189 <p>
  190 The <code>FP</code> pseudo-register is a virtual frame pointer
  191 used to refer to function arguments.
  192 The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
  193 Thus <code>0(FP)</code> is the first argument to the function,
  194 <code>8(FP)</code> is the second (on a 64-bit machine), and so on.
  195 However, when referring to a function argument this way, it is necessary to place a name
  196 at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
  197 (The meaning of the offset—offset from the frame pointer—distinct
  198 from its use with <code>SB</code>, where it is an offset from the symbol.)
  199 The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
  200 The actual name is semantically irrelevant but should be used to document
  201 the argument's name.
  202 It is worth stressing that <code>FP</code> is always a
  203 pseudo-register, not a hardware
  204 register, even on architectures with a hardware frame pointer.
  205 </p>
  206 
  207 <p>
  208 For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
  209 and offsets match.
  210 On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
  211 a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
  212 If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
  213 </p>
  214 
  215 <p>
  216 The <code>SP</code> pseudo-register is a virtual stack pointer
  217 used to refer to frame-local variables and the arguments being
  218 prepared for function calls.
  219 It points to the top of the local stack frame, so references should use negative offsets
  220 in the range [−framesize, 0):
  221 <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
  222 </p>
  223 
  224 <p>
  225 On architectures with a hardware register named <code>SP</code>,
  226 the name prefix distinguishes
  227 references to the virtual stack pointer from references to the architectural
  228 <code>SP</code> register.
  229 That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
  230 are different memory locations:
  231 the first refers to the virtual stack pointer pseudo-register,
  232 while the second refers to the
  233 hardware's <code>SP</code> register.
  234 </p>
  235 
  236 <p>
  237 On machines where <code>SP</code> and <code>PC</code> are
  238 traditionally aliases for a physical, numbered register,
  239 in the Go assembler the names <code>SP</code> and <code>PC</code>
  240 are still treated specially;
  241 for instance, references to <code>SP</code> require a symbol,
  242 much like <code>FP</code>.
  243 To access the actual hardware register use the true <code>R</code> name.
  244 For example, on the ARM architecture the hardware
  245 <code>SP</code> and <code>PC</code> are accessible as
  246 <code>R13</code> and <code>R15</code>.
  247 </p>
  248 
  249 <p>
  250 Branches and direct jumps are always written as offsets to the PC, or as
  251 jumps to labels:
  252 </p>
  253 
  254 <pre>
  255 label:
  256     MOVW $0, R1
  257     JMP label
  258 </pre>
  259 
  260 <p>
  261 Each label is visible only within the function in which it is defined.
  262 It is therefore permitted for multiple functions in a file to define
  263 and use the same label names.
  264 Direct jumps and call instructions can target text symbols,
  265 such as <code>name(SB)</code>, but not offsets from symbols,
  266 such as <code>name+4(SB)</code>.
  267 </p>
  268 
  269 <p>
  270 Instructions, registers, and assembler directives are always in UPPER CASE to remind you
  271 that assembly programming is a fraught endeavor.
  272 (Exception: the <code>g</code> register renaming on ARM.)
  273 </p>
  274 
  275 <p>
  276 In Go object files and binaries, the full name of a symbol is the
  277 package path followed by a period and the symbol name:
  278 <code>fmt.Printf</code> or <code>math/rand.Int</code>.
  279 Because the assembler's parser treats period and slash as punctuation,
  280 those strings cannot be used directly as identifier names.
  281 Instead, the assembler allows the middle dot character U+00B7
  282 and the division slash U+2215 in identifiers and rewrites them to
  283 plain period and slash.
  284 Within an assembler source file, the symbols above are written as
  285 <code>fmt·Printf</code> and <code>math∕rand·Int</code>.
  286 The assembly listings generated by the compilers when using the <code>-S</code> flag
  287 show the period and slash directly instead of the Unicode replacements
  288 required by the assemblers.
  289 </p>
  290 
  291 <p>
  292 Most hand-written assembly files do not include the full package path
  293 in symbol names, because the linker inserts the package path of the current
  294 object file at the beginning of any name starting with a period:
  295 in an assembly source file within the math/rand package implementation,
  296 the package's Int function can be referred to as <code>·Int</code>.
  297 This convention avoids the need to hard-code a package's import path in its
  298 own source code, making it easier to move the code from one location to another.
  299 </p>
  300 
  301 <h3 id="directives">Directives</h3>
  302 
  303 <p>
  304 The assembler uses various directives to bind text and data to symbol names.
  305 For example, here is a simple complete function definition. The <code>TEXT</code>
  306 directive declares the symbol <code>runtime·profileloop</code> and the instructions
  307 that follow form the body of the function.
  308 The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
  309 (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
  310 After the symbol, the arguments are flags (see below)
  311 and the frame size, a constant (but see below):
  312 </p>
  313 
  314 <pre>
  315 TEXT runtime·profileloop(SB),NOSPLIT,$8
  316     MOVQ    $runtime·profileloop1(SB), CX
  317     MOVQ    CX, 0(SP)
  318     CALL    runtime·externalthreadhandler(SB)
  319     RET
  320 </pre>
  321 
  322 <p>
  323 In the general case, the frame size is followed by an argument size, separated by a minus sign.
  324 (It's not a subtraction, just idiosyncratic syntax.)
  325 The frame size <code>$24-8</code> states that the function has a 24-byte frame
  326 and is called with 8 bytes of argument, which live on the caller's frame.
  327 If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
  328 the argument size must be provided.
  329 For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
  330 argument size is correct.
  331 </p>
  332 
  333 <p>
  334 Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
  335 static base pseudo-register <code>SB</code>.
  336 This function would be called from Go source for package <code>runtime</code> using the
  337 simple name <code>profileloop</code>.
  338 </p>
  339 
  340 <p>
  341 Global data symbols are defined by a sequence of initializing
  342 <code>DATA</code> directives followed by a <code>GLOBL</code> directive.
  343 Each <code>DATA</code> directive initializes a section of the
  344 corresponding memory.
  345 The memory not explicitly initialized is zeroed.
  346 The general form of the <code>DATA</code> directive is
  347 
  348 <pre>
  349 DATA    symbol+offset(SB)/width, value
  350 </pre>
  351 
  352 <p>
  353 which initializes the symbol memory at the given offset and width with the given value.
  354 The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
  355 </p>
  356 
  357 <p>
  358 The <code>GLOBL</code> directive declares a symbol to be global.
  359 The arguments are optional flags and the size of the data being declared as a global,
  360 which will have initial value all zeros unless a <code>DATA</code> directive
  361 has initialized it.
  362 The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
  363 </p>
  364 
  365 <p>
  366 For example,
  367 </p>
  368 
  369 <pre>
  370 DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
  371 DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
  372 ...
  373 DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
  374 GLOBL divtab&lt;&gt;(SB), RODATA, $64
  375 
  376 GLOBL runtime·tlsoffset(SB), NOPTR, $4
  377 </pre>
  378 
  379 <p>
  380 declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
  381 and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
  382 contains no pointers.
  383 </p>
  384 
  385 <p>
  386 There may be one or two arguments to the directives.
  387 If there are two, the first is a bit mask of flags,
  388 which can be written as numeric expressions, added or or-ed together,
  389 or can be set symbolically for easier absorption by a human.
  390 Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
  391 </p>
  392 
  393 <ul>
  394 <li>
  395 <code>NOPROF</code> = 1
  396 <br>
  397 (For <code>TEXT</code> items.)
  398 Don't profile the marked function.  This flag is deprecated.
  399 </li>
  400 <li>
  401 <code>DUPOK</code> = 2
  402 <br>
  403 It is legal to have multiple instances of this symbol in a single binary.
  404 The linker will choose one of the duplicates to use.
  405 </li>
  406 <li>
  407 <code>NOSPLIT</code> = 4
  408 <br>
  409 (For <code>TEXT</code> items.)
  410 Don't insert the preamble to check if the stack must be split.
  411 The frame for the routine, plus anything it calls, must fit in the
  412 spare space at the top of the stack segment.
  413 Used to protect routines such as the stack splitting code itself.
  414 </li>
  415 <li>
  416 <code>RODATA</code> = 8
  417 <br>
  418 (For <code>DATA</code> and <code>GLOBL</code> items.)
  419 Put this data in a read-only section.
  420 </li>
  421 <li>
  422 <code>NOPTR</code> = 16
  423 <br>
  424 (For <code>DATA</code> and <code>GLOBL</code> items.)
  425 This data contains no pointers and therefore does not need to be
  426 scanned by the garbage collector.
  427 </li>
  428 <li>
  429 <code>WRAPPER</code> = 32
  430 <br>
  431 (For <code>TEXT</code> items.)
  432 This is a wrapper function and should not count as disabling <code>recover</code>.
  433 </li>
  434 <li>
  435 <code>NEEDCTXT</code> = 64
  436 <br>
  437 (For <code>TEXT</code> items.)
  438 This function is a closure so it uses its incoming context register.
  439 </li>
  440 </ul>
  441 
  442 <h3 id="runtime">Runtime Coordination</h3>
  443 
  444 <p>
  445 For garbage collection to run correctly, the runtime must know the
  446 location of pointers in all global data and in most stack frames.
  447 The Go compiler emits this information when compiling Go source files,
  448 but assembly programs must define it explicitly.
  449 </p>
  450 
  451 <p>
  452 A data symbol marked with the <code>NOPTR</code> flag (see above)
  453 is treated as containing no pointers to runtime-allocated data.
  454 A data symbol with the <code>RODATA</code> flag
  455 is allocated in read-only memory and is therefore treated
  456 as implicitly marked <code>NOPTR</code>.
  457 A data symbol with a total size smaller than a pointer
  458 is also treated as implicitly marked <code>NOPTR</code>.
  459 It is not possible to define a symbol containing pointers in an assembly source file;
  460 such a symbol must be defined in a Go source file instead.
  461 Assembly source can still refer to the symbol by name
  462 even without <code>DATA</code> and <code>GLOBL</code> directives.
  463 A good general rule of thumb is to define all non-<code>RODATA</code>
  464 symbols in Go instead of in assembly.
  465 </p>
  466 
  467 <p>
  468 Each function also needs annotations giving the location of
  469 live pointers in its arguments, results, and local stack frame.
  470 For an assembly function with no pointer results and
  471 either no local stack frame or no function calls,
  472 the only requirement is to define a Go prototype for the function
  473 in a Go source file in the same package. The name of the assembly
  474 function must not contain the package name component (for example,
  475 function <code>Syscall</code> in package <code>syscall</code> should
  476 use the name <code>·Syscall</code> instead of the equivalent name
  477 <code>syscall·Syscall</code> in its <code>TEXT</code> directive).
  478 For more complex situations, explicit annotation is needed.
  479 These annotations use pseudo-instructions defined in the standard
  480 <code>#include</code> file <code>funcdata.h</code>.
  481 </p>
  482 
  483 <p>
  484 If a function has no arguments and no results,
  485 the pointer information can be omitted.
  486 This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
  487 on the <code>TEXT</code> instruction.
  488 Otherwise, pointer information must be provided by
  489 a Go prototype for the function in a Go source file,
  490 even for assembly functions not called directly from Go.
  491 (The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
  492 At the start of the function, the arguments are assumed
  493 to be initialized but the results are assumed uninitialized.
  494 If the results will hold live pointers during a call instruction,
  495 the function should start by zeroing the results and then
  496 executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
  497 This instruction records that the results are now initialized
  498 and should be scanned during stack movement and garbage collection.
  499 It is typically easier to arrange that assembly functions do not
  500 return pointers or do not contain call instructions;
  501 no assembly functions in the standard library use
  502 <code>GO_RESULTS_INITIALIZED</code>.
  503 </p>
  504 
  505 <p>
  506 If a function has no local stack frame,
  507 the pointer information can be omitted.
  508 This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
  509 on the <code>TEXT</code> instruction.
  510 The pointer information can also be omitted if the
  511 function contains no call instructions.
  512 Otherwise, the local stack frame must not contain pointers,
  513 and the assembly must confirm this fact by executing the
  514 pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
  515 Because stack resizing is implemented by moving the stack,
  516 the stack pointer may change during any function call:
  517 even pointers to stack data must not be kept in local variables.
  518 </p>
  519 
  520 <p>
  521 Assembly functions should always be given Go prototypes,
  522 both to provide pointer information for the arguments and results
  523 and to let <code>go</code> <code>vet</code> check that
  524 the offsets being used to access them are correct.
  525 </p>
  526 
  527 <h2 id="architectures">Architecture-specific details</h2>
  528 
  529 <p>
  530 It is impractical to list all the instructions and other details for each machine.
  531 To see what instructions are defined for a given machine, say ARM,
  532 look in the source for the <code>obj</code> support library for
  533 that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
  534 In that directory is a file <code>a.out.go</code>; it contains
  535 a long list of constants starting with <code>A</code>, like this:
  536 </p>
  537 
  538 <pre>
  539 const (
  540     AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
  541     AEOR
  542     ASUB
  543     ARSB
  544     AADD
  545     ...
  546 </pre>
  547 
  548 <p>
  549 This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
  550 Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
  551 represents the bitwise and instruction,
  552 <code>AND</code> (without the leading <code>A</code>),
  553 and is written in assembly source as <code>AND</code>.
  554 The enumeration is mostly in alphabetical order.
  555 (The architecture-independent <code>AXXX</code>, defined in the
  556 <code>cmd/internal/obj</code> package,
  557 represents an invalid instruction).
  558 The sequence of the <code>A</code> names has nothing to do with the actual
  559 encoding of the machine instructions.
  560 The <code>cmd/internal/obj</code> package takes care of that detail.
  561 </p>
  562 
  563 <p>
  564 The instructions for both the 386 and AMD64 architectures are listed in
  565 <code>cmd/internal/obj/x86/a.out.go</code>.
  566 </p>
  567 
  568 <p>
  569 The architectures share syntax for common addressing modes such as
  570 <code>(R1)</code> (register indirect),
  571 <code>4(R1)</code> (register indirect with offset), and
  572 <code>$foo(SB)</code> (absolute address).
  573 The assembler also supports some (not necessarily all) addressing modes
  574 specific to each architecture.
  575 The sections below list these.
  576 </p>
  577 
  578 <p>
  579 One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
  580 <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
  581 This rule applies even on architectures where the conventional notation uses the opposite direction.
  582 </p>
  583 
  584 <p>
  585 Here follow some descriptions of key Go-specific details for the supported architectures.
  586 </p>
  587 
  588 <h3 id="x86">32-bit Intel 386</h3>
  589 
  590 <p>
  591 The runtime pointer to the <code>g</code> structure is maintained
  592 through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
  593 An OS-dependent macro <code>get_tls</code> is defined for the assembler if the source is
  594 in the <code>runtime</code> package and includes a special header, <code>go_tls.h</code>:
  595 </p>
  596 
  597 <pre>
  598 #include "go_tls.h"
  599 </pre>
  600 
  601 <p>
  602 Within the runtime, the <code>get_tls</code> macro loads its argument register
  603 with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
  604 contains the <code>m</code> pointer.
  605 There's another special header containing the offsets for each
  606 element of <code>g</code>, called <code>go_asm.h</code>.
  607 The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
  608 </p>
  609 
  610 <pre>
  611 #include "go_tls.h"
  612 #include "go_asm.h"
  613 ...
  614 get_tls(CX)
  615 MOVL    g(CX), AX     // Move g into AX.
  616 MOVL    g_m(AX), BX   // Move g.m into BX.
  617 </pre>
  618 
  619 <p>
  620 Note: The code above works only in the <code>runtime</code> package, while <code>go_tls.h</code> also
  621 applies to <a href="#arm">arm</a>, <a href="#amd64">amd64</a> and amd64p32, and <code>go_asm.h</code> applies to all architectures.
  622 </p>
  623 
  624 <p>
  625 Addressing modes:
  626 </p>
  627 
  628 <ul>
  629 
  630 <li>
  631 <code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
  632 </li>
  633 
  634 <li>
  635 <code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
  636 These modes accept only 1, 2, 4, and 8 as scale factors.
  637 </li>
  638 
  639 </ul>
  640 
  641 <p>
  642 When using the compiler and assembler's
  643 <code>-dynlink</code> or <code>-shared</code> modes,
  644 any load or store of a fixed memory location such as a global variable
  645 must be assumed to overwrite <code>CX</code>.
  646 Therefore, to be safe for use with these modes,
  647 assembly sources should typically avoid CX except between memory references.
  648 </p>
  649 
  650 <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
  651 
  652 <p>
  653 The two architectures behave largely the same at the assembler level.
  654 Assembly code to access the <code>m</code> and <code>g</code>
  655 pointers on the 64-bit version is the same as on the 32-bit 386,
  656 except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
  657 </p>
  658 
  659 <pre>
  660 get_tls(CX)
  661 MOVQ    g(CX), AX     // Move g into AX.
  662 MOVQ    g_m(AX), BX   // Move g.m into BX.
  663 </pre>
  664 
  665 <h3 id="arm">ARM</h3>
  666 
  667 <p>
  668 The registers <code>R10</code> and <code>R11</code>
  669 are reserved by the compiler and linker.
  670 </p>
  671 
  672 <p>
  673 <code>R10</code> points to the <code>g</code> (goroutine) structure.
  674 Within assembler source code, this pointer must be referred to as <code>g</code>;
  675 the name <code>R10</code> is not recognized.
  676 </p>
  677 
  678 <p>
  679 To make it easier for people and compilers to write assembly, the ARM linker
  680 allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
  681 that may not be expressible using a single hardware instruction.
  682 It implements these forms as multiple instructions, often using the <code>R11</code> register
  683 to hold temporary values.
  684 Hand-written assembly can use <code>R11</code>, but doing so requires
  685 being sure that the linker is not also using it to implement any of the other
  686 instructions in the function.
  687 </p>
  688 
  689 <p>
  690 When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
  691 tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
  692 </p>
  693 
  694 <p>
  695 The name <code>SP</code> always refers to the virtual stack pointer described earlier.
  696 For the hardware register, use <code>R13</code>.
  697 </p>
  698 
  699 <p>
  700 Condition code syntax is to append a period and the one- or two-letter code to the instruction,
  701 as in <code>MOVW.EQ</code>.
  702 Multiple codes may be appended: <code>MOVM.IA.W</code>.
  703 The order of the code modifiers is irrelevant.
  704 </p>
  705 
  706 <p>
  707 Addressing modes:
  708 </p>
  709 
  710 <ul>
  711 
  712 <li>
  713 <code>R0-&gt;16</code>
  714 <br>
  715 <code>R0&gt;&gt;16</code>
  716 <br>
  717 <code>R0&lt;&lt;16</code>
  718 <br>
  719 <code>R0@&gt;16</code>:
  720 For <code>&lt;&lt;</code>, left shift <code>R0</code> by 16 bits.
  721 The other codes are <code>-&gt;</code> (arithmetic right shift),
  722 <code>&gt;&gt;</code> (logical right shift), and
  723 <code>@&gt;</code> (rotate right).
  724 </li>
  725 
  726 <li>
  727 <code>R0-&gt;R1</code>
  728 <br>
  729 <code>R0&gt;&gt;R1</code>
  730 <br>
  731 <code>R0&lt;&lt;R1</code>
  732 <br>
  733 <code>R0@&gt;R1</code>:
  734 For <code>&lt;&lt;</code>, left shift <code>R0</code> by the count in <code>R1</code>.
  735 The other codes are <code>-&gt;</code> (arithmetic right shift),
  736 <code>&gt;&gt;</code> (logical right shift), and
  737 <code>@&gt;</code> (rotate right).
  738 
  739 </li>
  740 
  741 <li>
  742 <code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
  743 <code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
  744 </li>
  745 
  746 <li>
  747 <code>(R5, R6)</code>: Destination register pair.
  748 </li>
  749 
  750 </ul>
  751 
  752 <h3 id="arm64">ARM64</h3>
  753 
  754 <p>
  755 The ARM64 port is in an experimental state.
  756 </p>
  757 
  758 <p>
  759 <code>R18</code> is the "platform register", reserved on the Apple platform.
  760 To prevent accidental misuse, the register is named <code>R18_PLATFORM</code>.
  761 <code>R27</code> and <code>R28</code> are reserved by the compiler and linker.
  762 <code>R29</code> is the frame pointer.
  763 <code>R30</code> is the link register.
  764 </p>
  765 
  766 <p>
  767 Instruction modifiers are appended to the instruction following a period.
  768 The only modifiers are <code>P</code> (postincrement) and <code>W</code>
  769 (preincrement):
  770 <code>MOVW.P</code>, <code>MOVW.W</code>
  771 </p>
  772 
  773 <p>
  774 Addressing modes:
  775 </p>
  776 
  777 <ul>
  778 
  779 <li>
  780 <code>R0-&gt;16</code>
  781 <br>
  782 <code>R0&gt;&gt;16</code>
  783 <br>
  784 <code>R0&lt;&lt;16</code>
  785 <br>
  786 <code>R0@&gt;16</code>:
  787 These are the same as on the 32-bit ARM.
  788 </li>
  789 
  790 <li>
  791 <code>$(8&lt;&lt;12)</code>:
  792 Left shift the immediate value <code>8</code> by <code>12</code> bits.
  793 </li>
  794 
  795 <li>
  796 <code>8(R0)</code>:
  797 Add the value of <code>R0</code> and <code>8</code>.
  798 </li>
  799 
  800 <li>
  801 <code>(R2)(R0)</code>:
  802 The location at <code>R0</code> plus <code>R2</code>.
  803 </li>
  804 
  805 <li>
  806 <code>R0.UXTB</code>
  807 <br>
  808 <code>R0.UXTB&lt;&lt;imm</code>:
  809 <code>UXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and zero-extend it to the size of <code>R0</code>.
  810 <code>R0.UXTB&lt;&lt;imm</code>: left shift the result of <code>R0.UXTB</code> by <code>imm</code> bits.
  811 The <code>imm</code> value can be 0, 1, 2, 3, or 4.
  812 The other extensions include <code>UXTH</code> (16-bit), <code>UXTW</code> (32-bit), and <code>UXTX</code> (64-bit).
  813 </li>
  814 
  815 <li>
  816 <code>R0.SXTB</code>
  817 <br>
  818 <code>R0.SXTB&lt;&lt;imm</code>:
  819 <code>SXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and sign-extend it to the size of <code>R0</code>.
  820 <code>R0.SXTB&lt;&lt;imm</code>: left shift the result of <code>R0.SXTB</code> by <code>imm</code> bits.
  821 The <code>imm</code> value can be 0, 1, 2, 3, or 4.
  822 The other extensions include <code>SXTH</code> (16-bit), <code>SXTW</code> (32-bit), and <code>SXTX</code> (64-bit).
  823 </li>
  824 
  825 <li>
  826 <code>(R5, R6)</code>: Register pair for <code>LDAXP</code>/<code>LDP</code>/<code>LDXP</code>/<code>STLXP</code>/<code>STP</code>/<code>STP</code>.
  827 </li>
  828 
  829 </ul>
  830 
  831 <p>
  832 Reference: <a href="/pkg/cmd/internal/obj/arm64">Go ARM64 Assembly Instructions Reference Manual</a>
  833 </p>
  834 
  835 <h3 id="ppc64">PPC64</h3>
  836 
  837 <p>
  838 This assembler is used by GOARCH values ppc64 and ppc64le.
  839 </p>
  840 
  841 <p>
  842 Reference: <a href="/pkg/cmd/internal/obj/ppc64">Go PPC64 Assembly Instructions Reference Manual</a>
  843 </p>
  844 
  845 </ul>
  846 
  847 <h3 id="s390x">IBM z/Architecture, a.k.a. s390x</h3>
  848 
  849 <p>
  850 The registers <code>R10</code> and <code>R11</code> are reserved.
  851 The assembler uses them to hold temporary values when assembling some instructions.
  852 </p>
  853 
  854 <p>
  855 <code>R13</code> points to the <code>g</code> (goroutine) structure.
  856 This register must be referred to as <code>g</code>; the name <code>R13</code> is not recognized.
  857 </p>
  858 
  859 <p>
  860 <code>R15</code> points to the stack frame and should typically only be accessed using the
  861 virtual registers <code>SP</code> and <code>FP</code>.
  862 </p>
  863 
  864 <p>
  865 Load- and store-multiple instructions operate on a range of registers.
  866 The range of registers is specified by a start register and an end register.
  867 For example, <code>LMG</code> <code>(R9),</code> <code>R5,</code> <code>R7</code> would load
  868 <code>R5</code>, <code>R6</code> and <code>R7</code> with the 64-bit values at
  869 <code>0(R9)</code>, <code>8(R9)</code> and <code>16(R9)</code> respectively.
  870 </p>
  871 
  872 <p>
  873 Storage-and-storage instructions such as <code>MVC</code> and <code>XC</code> are written
  874 with the length as the first argument.
  875 For example, <code>XC</code> <code>$8,</code> <code>(R9),</code> <code>(R9)</code> would clear
  876 eight bytes at the address specified in <code>R9</code>.
  877 </p>
  878 
  879 <p>
  880 If a vector instruction takes a length or an index as an argument then it will be the
  881 first argument.
  882 For example, <code>VLEIF</code> <code>$1,</code> <code>$16,</code> <code>V2</code> will load
  883 the value sixteen into index one of <code>V2</code>.
  884 Care should be taken when using vector instructions to ensure that they are available at
  885 runtime.
  886 To use vector instructions a machine must have both the vector facility (bit 129 in the
  887 facility list) and kernel support.
  888 Without kernel support a vector instruction will have no effect (it will be equivalent
  889 to a <code>NOP</code> instruction).
  890 </p>
  891 
  892 <p>
  893 Addressing modes:
  894 </p>
  895 
  896 <ul>
  897 
  898 <li>
  899 <code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>.
  900 It is a scaled mode as on the x86, but the only scale allowed is <code>1</code>.
  901 </li>
  902 
  903 </ul>
  904 
  905 <h3 id="mips">MIPS, MIPS64</h3>
  906 
  907 <p>
  908 General purpose registers are named <code>R0</code> through <code>R31</code>,
  909 floating point registers are <code>F0</code> through <code>F31</code>.
  910 </p>
  911 
  912 <p>
  913 <code>R30</code> is reserved to point to <code>g</code>.
  914 <code>R23</code> is used as a temporary register.
  915 </p>
  916 
  917 <p>
  918 In a <code>TEXT</code> directive, the frame size <code>$-4</code> for MIPS or
  919 <code>$-8</code> for MIPS64 instructs the linker not to save <code>LR</code>.
  920 </p>
  921 
  922 <p>
  923 <code>SP</code> refers to the virtual stack pointer.
  924 For the hardware register, use <code>R29</code>.
  925 </p>
  926 
  927 <p>
  928 Addressing modes:
  929 </p>
  930 
  931 <ul>
  932 
  933 <li>
  934 <code>16(R1)</code>: The location at <code>R1</code> plus 16.
  935 </li>
  936 
  937 <li>
  938 <code>(R1)</code>: Alias for <code>0(R1)</code>.
  939 </li>
  940 
  941 </ul>
  942 
  943 <p>
  944 The value of <code>GOMIPS</code> environment variable (<code>hardfloat</code> or
  945 <code>softfloat</code>) is made available to assembly code by predefining either
  946 <code>GOMIPS_hardfloat</code> or <code>GOMIPS_softfloat</code>.
  947 </p>
  948 
  949 <p>
  950 The value of <code>GOMIPS64</code> environment variable (<code>hardfloat</code> or
  951 <code>softfloat</code>) is made available to assembly code by predefining either
  952 <code>GOMIPS64_hardfloat</code> or <code>GOMIPS64_softfloat</code>.
  953 </p>
  954 
  955 <h3 id="unsupported_opcodes">Unsupported opcodes</h3>
  956 
  957 <p>
  958 The assemblers are designed to support the compiler so not all hardware instructions
  959 are defined for all architectures: if the compiler doesn't generate it, it might not be there.
  960 If you need to use a missing instruction, there are two ways to proceed.
  961 One is to update the assembler to support that instruction, which is straightforward
  962 but only worthwhile if it's likely the instruction will be used again.
  963 Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
  964 and <code>WORD</code> directives
  965 to lay down explicit data into the instruction stream within a <code>TEXT</code>.
  966 Here's how the 386 runtime defines the 64-bit atomic load function.
  967 </p>
  968 
  969 <pre>
  970 // uint64 atomicload64(uint64 volatile* addr);
  971 // so actually
  972 // void atomicload64(uint64 *res, uint64 volatile *addr);
  973 TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
  974     MOVL    ptr+0(FP), AX
  975     TESTL   $7, AX
  976     JZ  2(PC)
  977     MOVL    0, AX // crash with nil ptr deref
  978     LEAL    ret_lo+4(FP), BX
  979     // MOVQ (%EAX), %MM0
  980     BYTE $0x0f; BYTE $0x6f; BYTE $0x00
  981     // MOVQ %MM0, 0(%EBX)
  982     BYTE $0x0f; BYTE $0x7f; BYTE $0x03
  983     // EMMS
  984     BYTE $0x0F; BYTE $0x77
  985     RET
  986 </pre>