"Fossies" - the Fresh Open Source Software Archive

Member "gmp-6.2.1/mpn/pa32/README" (14 Nov 2020, 3526 Bytes) of package /linux/misc/gmp-6.2.1.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
    2 
    3 This file is part of the GNU MP Library.
    4 
    5 The GNU MP Library is free software; you can redistribute it and/or modify
    6 it under the terms of either:
    7 
    8   * the GNU Lesser General Public License as published by the Free
    9     Software Foundation; either version 3 of the License, or (at your
   10     option) any later version.
   11 
   12 or
   13 
   14   * the GNU General Public License as published by the Free Software
   15     Foundation; either version 2 of the License, or (at your option) any
   16     later version.
   17 
   18 or both in parallel, as here.
   19 
   20 The GNU MP Library is distributed in the hope that it will be useful, but
   21 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
   22 or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
   23 for more details.
   24 
   25 You should have received copies of the GNU General Public License and the
   26 GNU Lesser General Public License along with the GNU MP Library.  If not,
   27 see https://www.gnu.org/licenses/.
   28 
   29 
   30 
   31 
   32 
   33 
   34 This directory contains mpn functions for various HP PA-RISC chips.  Code
   35 that runs faster on the PA7100 and later implementations, is in the pa7100
   36 directory.
   37 
   38 RELEVANT OPTIMIZATION ISSUES
   39 
   40   Load and Store timing
   41 
   42 On the PA7000 no memory instructions can issue the two cycles after a store.
   43 For the PA7100, this is reduced to one cycle.
   44 
   45 The PA7100 has a lookup-free cache, so it helps to schedule loads and the
   46 dependent instruction really far from each other.
   47 
   48 STATUS
   49 
   50 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
   51    instructions below (but some sw pipelining is needed to avoid the
   52    xmpyu-fstds delay):
   53 
   54 	fldds	s1_ptr
   55 
   56 	xmpyu
   57 	fstds	N(%r30)
   58 	xmpyu
   59 	fstds	N(%r30)
   60 
   61 	ldws	N(%r30)
   62 	ldws	N(%r30)
   63 	ldws	N(%r30)
   64 	ldws	N(%r30)
   65 
   66 	addc
   67 	stws	res_ptr
   68 	addc
   69 	stws	res_ptr
   70 
   71 	addib	Loop
   72 
   73 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
   74    (asymptotically) on the PA7100, using the instructions below.  With proper
   75    sw pipelining and the unrolling level below, the speed becomes 8
   76    cycles/limb.
   77 
   78 	fldds	s1_ptr
   79 	fldds	s1_ptr
   80 
   81 	xmpyu
   82 	fstds	N(%r30)
   83 	xmpyu
   84 	fstds	N(%r30)
   85 	xmpyu
   86 	fstds	N(%r30)
   87 	xmpyu
   88 	fstds	N(%r30)
   89 
   90 	ldws	N(%r30)
   91 	ldws	N(%r30)
   92 	ldws	N(%r30)
   93 	ldws	N(%r30)
   94 	ldws	N(%r30)
   95 	ldws	N(%r30)
   96 	ldws	N(%r30)
   97 	ldws	N(%r30)
   98 	addc
   99 	addc
  100 	addc
  101 	addc
  102 	addc	%r0,%r0,cy-limb
  103 
  104 	ldws	res_ptr
  105 	ldws	res_ptr
  106 	ldws	res_ptr
  107 	ldws	res_ptr
  108 	add
  109 	stws	res_ptr
  110 	addc
  111 	stws	res_ptr
  112 	addc
  113 	stws	res_ptr
  114 	addc
  115 	stws	res_ptr
  116 
  117 	addib
  118 
  119 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
  120    support emerges.  But we want to use 64-bit operations whenever possible,
  121    in particular for loads and stores.  It is possible to handle mpn_add_n
  122    efficiently by rotating (when s1/s2 are aligned), masking+bit field
  123    inserting when (they are not).  The speed should double compared to the
  124    code used today.
  125 
  126 
  127 
  128 
  129 LABEL SYNTAX
  130 
  131 The HP-UX assembler takes labels starting in column 0 with no colon,
  132 
  133 	L$loop  ldws,mb -4(0,%r25),%r22
  134 
  135 Gas on hppa GNU/Linux however requires a colon,
  136 
  137 	L$loop: ldws,mb -4(0,%r25),%r22
  138 
  139 This is covered by using LDEF() from asm-defs.m4.  An alternative would be
  140 to use ".label" which is accepted by both,
  141 
  142 		.label  L$loop
  143 		ldws,mb -4(0,%r25),%r22
  144 
  145 but that's not as nice to look at, not if you're used to assembler code
  146 having labels in column 0.
  147 
  148 
  149 
  150 
  151 REFERENCES
  152 
  153 Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
  154 part number 92432-90012.
  155 
  156 
  157 
  158 ----------------
  159 Local variables:
  160 mode: text
  161 fill-column: 76
  162 End: