"Fossies" - the Fresh Open Source Software Archive

Member "xxgdb-1.12/regex.c" (19 Apr 1995, 48047 Bytes) of package /linux/misc/old/xxgdb-1.12.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file.

    1 /* Extended regular expression matching and search.
    2    Copyright (C) 1985 Free Software Foundation, Inc.
    3 
    4                NO WARRANTY
    5 
    6   BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
    7 NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW.  EXCEPT
    8 WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
    9 RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
   10 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
   11 BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
   12 FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY
   13 AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE PROGRAM PROVE
   14 DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
   15 CORRECTION.
   16 
   17  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
   18 STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
   19 WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
   20 LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
   21 OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
   22 USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
   23 DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
   24 A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
   25 PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
   26 DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
   27 
   28         GENERAL PUBLIC LICENSE TO COPY
   29 
   30   1. You may copy and distribute verbatim copies of this source file
   31 as you receive it, in any medium, provided that you conspicuously and
   32 appropriately publish on each copy a valid copyright notice "Copyright
   33 (C) 1985 Free Software Foundation, Inc."; and include following the
   34 copyright notice a verbatim copy of the above disclaimer of warranty
   35 and of this License.  You may charge a distribution fee for the
   36 physical act of transferring a copy.
   37 
   38   2. You may modify your copy or copies of this source file or
   39 any portion of it, and copy and distribute such modifications under
   40 the terms of Paragraph 1 above, provided that you also do the following:
   41 
   42     a) cause the modified files to carry prominent notices stating
   43     that you changed the files and the date of any change; and
   44 
   45     b) cause the whole of any work that you distribute or publish,
   46     that in whole or in part contains or is a derivative of this
   47     program or any part thereof, to be licensed at no charge to all
   48     third parties on terms identical to those contained in this
   49     License Agreement (except that you may choose to grant more extensive
   50     warranty protection to some or all third parties, at your option).
   51 
   52     c) You may charge a distribution fee for the physical act of
   53     transferring a copy, and you may at your option offer warranty
   54     protection in exchange for a fee.
   55 
   56 Mere aggregation of another unrelated program with this program (or its
   57 derivative) on a volume of a storage or distribution medium does not bring
   58 the other program under the scope of these terms.
   59 
   60   3. You may copy and distribute this program (or a portion or derivative
   61 of it, under Paragraph 2) in object code or executable form under the terms
   62 of Paragraphs 1 and 2 above provided that you also do one of the following:
   63 
   64     a) accompany it with the complete corresponding machine-readable
   65     source code, which must be distributed under the terms of
   66     Paragraphs 1 and 2 above; or,
   67 
   68     b) accompany it with a written offer, valid for at least three
   69     years, to give any third party free (except for a nominal
   70     shipping charge) a complete machine-readable copy of the
   71     corresponding source code, to be distributed under the terms of
   72     Paragraphs 1 and 2 above; or,
   73 
   74     c) accompany it with the information you received as to where the
   75     corresponding source code may be obtained.  (This alternative is
   76     allowed only for noncommercial distribution and only if you
   77     received the program in object code or executable form alone.)
   78 
   79 For an executable file, complete source code means all the source code for
   80 all modules it contains; but, as a special exception, it need not include
   81 source code for modules which are standard libraries that accompany the
   82 operating system on which the executable file runs.
   83 
   84   4. You may not copy, sublicense, distribute or transfer this program
   85 except as expressly provided under this License Agreement.  Any attempt
   86 otherwise to copy, sublicense, distribute or transfer this program is void and
   87 your rights to use the program under this License agreement shall be
   88 automatically terminated.  However, parties who have received computer
   89 software programs from you with this License Agreement will not have
   90 their licenses terminated so long as such parties remain in full compliance.
   91 
   92   5. If you wish to incorporate parts of this program into other free
   93 programs whose distribution conditions are different, write to the Free
   94 Software Foundation at 675 Mass Ave, Cambridge, MA 02139.  We have not yet
   95 worked out a simple rule that can be stated here, but we will often permit
   96 this.  We will be guided by the two goals of preserving the free status of
   97 all derivatives of our free software and of promoting the sharing and reuse of
   98 software.
   99 
  100 
  101 In other words, you are welcome to use, share and improve this program.
  102 You are forbidden to forbid anyone else to use, share and improve
  103 what you give them.   Help stamp out software-hoarding!  */
  104 
  105 
  106 /* To test, compile with -Dtest.
  107  This Dtestable feature turns this into a self-contained program
  108  which reads a pattern, describes how it compiles,
  109  then reads a string and searches for it.  */
  110 
  111 /* 
  112  * Modification: alloca.h included for sparc machines
  113  * By :      Po Cheung, po@cerc.utexas.edu
  114  * Date :    July 27, 1990
  115  */
  116 
  117 #ifndef NeXT
  118 #include <malloc.h>
  119 #endif
  120 #include <string.h>
  121 #include <stdio.h>
  122 #include <stdlib.h>
  123 #include <assert.h>
  124 
  125 #ifdef sparc
  126 #include <alloca.h>
  127 #else
  128 #pragma alloca
  129 #endif
  130 #define FAILURE_STACK   20000           /* max failure stack size */
  131 
  132 #ifdef __GNUC__
  133 #ifdef alloca
  134 #undef alloca
  135 #endif
  136 #define alloca __builtin_alloca
  137 #endif
  138 
  139 #ifdef __STDC__
  140 static int bcmp_translate (unsigned char *, unsigned char *s2,
  141                            int len, unsigned char *translate);  /* (MJH) */
  142 #endif /* __STDC__ */
  143 
  144 /* bcopy, bzero and bcmp are needed if we are on Solaris */
  145 
  146 #ifdef SYSV
  147 #ifdef SUNOS4
  148 
  149 void bcopy(s1, s2, len)
  150     register char *s1, *s2;
  151     int len;
  152 {
  153     for (; len; len--)
  154         *(s2++) = *(s1++);
  155 }
  156 
  157 int bcmp(s1, s2, len)
  158     register char *s1, *s2;
  159     int len;
  160 {
  161     for (; len; len--)
  162         if (*(s1++) != *(s2++))
  163             return 1;
  164     return 0;
  165 }
  166 
  167 void bzero(sp, len)
  168     register char *sp;
  169     int len;
  170 {
  171     for (; len; len--)
  172         *(sp++) = 0;
  173 }
  174 
  175 #endif /* SUNOS4 */
  176 #endif /* SYSV */
  177 
  178 #ifdef emacs
  179 
  180 /* The `emacs' switch turns on certain special matching commands
  181  that make sense only in emacs. */
  182 
  183 #include "config.h"
  184 #include "lisp.h"
  185 #include "buffer.h"
  186 #include "syntax.h"
  187 
  188 #else  /* not emacs */
  189 
  190 /*
  191  * Define the syntax stuff, so we can do the \<...\> things.
  192  */
  193 
  194 #ifndef Sword /* must be non-zero in some of the tests below... */
  195 #define Sword 1
  196 #endif
  197 
  198 #define SYNTAX(c) re_syntax_table[c]
  199 
  200 #ifdef SYNTAX_TABLE
  201 
  202 char *re_syntax_table;
  203 
  204 #else
  205 
  206 static char re_syntax_table[256];
  207 
  208 static void
  209 init_syntax_once ()
  210 {
  211    register int c;
  212    static int done = 0;
  213 
  214    if (done)
  215      return;
  216 
  217    bzero (re_syntax_table, sizeof re_syntax_table);
  218 
  219    for (c = 'a'; c <= 'z'; c++)
  220      re_syntax_table[c] = Sword;
  221 
  222    for (c = 'A'; c <= 'Z'; c++)
  223      re_syntax_table[c] = Sword;
  224 
  225    for (c = '0'; c <= '9'; c++)
  226      re_syntax_table[c] = Sword;
  227 
  228    done = 1;
  229 }
  230 
  231 #endif /* SYNTAX_TABLE */
  232 #endif /* not emacs */
  233 
  234 #include "regex.h"
  235 
  236 /* Number of failure points to allocate space for initially,
  237  when matching.  If this number is exceeded, more space is allocated,
  238  so it is not a hard limit.  */
  239 
  240 #ifndef NFAILURES
  241 #define NFAILURES 80
  242 #endif /* NFAILURES */
  243 
  244 /* width of a byte in bits */
  245 
  246 #define BYTEWIDTH 8
  247 
  248 #ifndef SIGN_EXTEND_CHAR
  249 #define SIGN_EXTEND_CHAR(x) ((signed char)(x))
  250 #endif
  251 
  252 /* compile_pattern takes a regular-expression descriptor string in the 
  253   user's format and converts it into a buffer full of byte commands 
  254   for matching.
  255 
  256   pattern   is the address of the pattern string
  257   size      is the length of it.
  258   bufp      is a  struct re_pattern_buffer *  which points to the info
  259         on where to store the byte commands.
  260         This structure contains a  char *  which points to the
  261         actual space, which should have been obtained with malloc.
  262         compile_pattern may use  realloc  to grow the buffer space.
  263 
  264   The number of bytes of commands can be found out by looking in
  265   the  struct re_pattern_buffer  that bufp pointed to,
  266   after compile_pattern returns.
  267 */
  268 
  269 #define PATPUSH(ch) (*b++ = (char) (ch))
  270 
  271 #define PATFETCH(c) \
  272  {if (p == pend) goto end_of_pattern; \
  273   c = * (unsigned char *) p++; \
  274   if (translate) c = translate[c]; }
  275 
  276 #define PATFETCH_RAW(c) \
  277  {if (p == pend) goto end_of_pattern; \
  278   c = * (unsigned char *) p++; }
  279 
  280 #define PATUNFETCH p--
  281 
  282 /* This version of EXTEND_BUFFER will now work Alpha Machines,
  283    and non Alpha Machines.
  284 
  285    It isn't quite clean but it is cleaner than the previous version
  286    that really only worked at all with 32bit pointers.
  287 
  288    Dean Michaels, TRLabs.
  289    dean@trlabs.ca
  290 */
  291 
  292 #define EXTEND_BUFFER \
  293   { char *old_buffer = bufp->buffer; \
  294     if (bufp->allocated == (1<<16)) goto too_big; \
  295     bufp->allocated *= 2; \
  296     if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \
  297     if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
  298       goto memory_exhausted; \
  299     c = bufp->buffer - old_buffer; \
  300     b -= (-(int)c); \
  301     if (fixup_jump) \
  302       fixup_jump -= (-(int)c); \
  303     if (laststart) \
  304       laststart -= (-(int)c); \
  305     begalt -= (-(int)c); \
  306     if (pending_exact) \
  307       pending_exact -= (-(int)c); \
  308   }
  309 
  310 static int store_jump (), insert_jump ();
  311 
  312 char *
  313 re_compile_pattern (pattern, size, bufp)
  314      char *pattern;
  315      int size;
  316      struct re_pattern_buffer *bufp;
  317 {
  318   register char *b = bufp->buffer;
  319   register char *p = pattern;
  320   char *pend = pattern + size;
  321   register unsigned c, c1;
  322   char *p1;
  323   unsigned char *translate = (unsigned char *) bufp->translate;
  324 
  325   /* address of the count-byte of the most recently inserted "exactn" command.
  326     This makes it possible to tell whether a new exact-match character
  327     can be added to that command or requires a new "exactn" command. */
  328      
  329   char *pending_exact = 0;
  330 
  331   /* address of the place where a forward-jump should go
  332     to the end of the containing expression.
  333     Each alternative of an "or", except the last, ends with a forward-jump
  334     of this sort. */
  335 
  336   char *fixup_jump = 0;
  337 
  338   /* address of start of the most recently finished expression.
  339     This tells postfix * where to find the start of its operand. */
  340 
  341   char *laststart = 0;
  342 
  343   /* In processing a repeat, 1 means zero matches is allowed */
  344 
  345   char zero_times_ok;
  346 
  347   /* In processing a repeat, 1 means many matches is allowed */
  348 
  349   char many_times_ok;
  350 
  351   /* address of beginning of regexp, or inside of last \( */
  352 
  353   char *begalt = b;
  354 
  355   /* Stack of information saved by \( and restored by \).
  356      Four stack elements are pushed by each \(:
  357        First, the value of b.
  358        Second, the value of fixup_jump.
  359        Third, the value of regnum.
  360        Fourth, the value of begalt.  */
  361 
  362   int stackb[40];
  363   int *stackp = stackb;
  364   int *stacke = stackb + 40;
  365   int *stackt;
  366 
  367   /* Counts \('s as they are encountered.  Remembered for the matching \),
  368      where it becomes the "register number" to put in the stop_memory command */
  369 
  370   int regnum = 1;
  371 
  372   bufp->fastmap_accurate = 0;
  373 
  374 #ifndef emacs
  375 #ifndef SYNTAX_TABLE
  376   /*
  377    * Initialize the syntax table.
  378    */
  379    init_syntax_once();
  380 #endif
  381 #endif
  382 
  383   if (bufp->allocated == 0)
  384     {
  385       bufp->allocated = 28;
  386       if (bufp->buffer)
  387     /* EXTEND_BUFFER loses when bufp->allocated is 0 */
  388     bufp->buffer = (char *) realloc (bufp->buffer, 28);
  389       else
  390     /* Caller did not allocate a buffer.  Do it for him */
  391     bufp->buffer = (char *) malloc (28);
  392       if (!bufp->buffer) goto memory_exhausted;
  393       begalt = b = bufp->buffer;
  394     }
  395 
  396   while (p != pend)
  397     {
  398       if (b - bufp->buffer > bufp->allocated - 10)
  399     /* Note that EXTEND_BUFFER clobbers c */
  400     EXTEND_BUFFER;
  401 
  402       PATFETCH (c);
  403 
  404       switch (c)
  405     {
  406     case '$':
  407       /* $ means succeed if at end of line, but only in special contexts.
  408         If randonly in the middle of a pattern, it is a normal character. */
  409       if (p == pend || (*p == '\\' && (p[1] == ')' || p[1] == '|')))
  410         {
  411           PATPUSH (endline);
  412           break;
  413         }
  414       goto normal_char;
  415 
  416     case '^':
  417       /* ^ means succeed if at beg of line, but only if no preceding pattern. */
  418       if (laststart) goto normal_char;
  419       PATPUSH (begline);
  420       break;
  421 
  422     case '*':
  423     case '+':
  424     case '?':
  425       /* If there is no previous pattern, char not special. */
  426       if (!laststart)
  427         goto normal_char;
  428       /* If there is a sequence of repetition chars,
  429          collapse it down to equivalent to just one.  */
  430       zero_times_ok = 0;
  431       many_times_ok = 0;
  432       while (1)
  433         {
  434           zero_times_ok |= c != '+';
  435           many_times_ok |= c != '?';
  436           if (p == pend)
  437         break;
  438           PATFETCH (c);
  439           if (!(c == '*' || c == '+' || c == '?'))
  440         {
  441           PATUNFETCH;
  442           break;
  443         }
  444         }
  445 
  446       /* Now we know whether 0 matches is allowed,
  447          and whether 2 or more matches is allowed.  */
  448       if (many_times_ok)
  449         {
  450           /* If more than one repetition is allowed,
  451          put in a backward jump at the end.  */
  452           store_jump (b, maybe_finalize_jump, laststart - 3);
  453           b += 3;
  454         }
  455       insert_jump (on_failure_jump, laststart, b + 3, b);
  456       pending_exact = 0;
  457       b += 3;
  458       if (!zero_times_ok)
  459         {
  460           /* At least one repetition required: insert before the loop
  461          a skip over the initial on-failure-jump instruction */
  462           insert_jump (dummy_failure_jump, laststart, laststart + 6, b);
  463           b += 3;
  464         }
  465       break;
  466 
  467     case '.':
  468       laststart = b;
  469       PATPUSH (anychar);
  470       break;
  471 
  472     case '[':
  473       if (b - bufp->buffer
  474           > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
  475         /* Note that EXTEND_BUFFER clobbers c */
  476         EXTEND_BUFFER;
  477 
  478       laststart = b;
  479       if (*p == '^')
  480         PATPUSH (charset_not), p++;
  481       else
  482         PATPUSH (charset);
  483       p1 = p;
  484 
  485       PATPUSH ((1 << BYTEWIDTH) / BYTEWIDTH);
  486       /* Clear the whole map */
  487       bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
  488       /* Read in characters and ranges, setting map bits */
  489       while (1)
  490         {
  491           PATFETCH (c);
  492           if (c == ']' && p != p1 + 1) break;
  493           if (*p == '-')
  494         {
  495           PATFETCH (c1);
  496           PATFETCH (c1);
  497           while (c <= c1)
  498             b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH), c++;
  499         }
  500           else
  501         {
  502           b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH);
  503         }
  504         }
  505       /* Discard any bitmap bytes that are all 0 at the end of the map.
  506          Decrement the map-length byte too. */
  507       while (b[-1] > 0 && b[b[-1] - 1] == 0)
  508         b[-1]--;
  509       b += b[-1];
  510       break;
  511 
  512         case '\\':
  513       if (p == pend) goto invalid_pattern;
  514       PATFETCH_RAW (c);
  515       switch (c)
  516         {
  517         case '(':
  518           if (stackp == stacke) goto nesting_too_deep;
  519           if (regnum < RE_NREGS)
  520             {
  521           PATPUSH (start_memory);
  522           PATPUSH (regnum);
  523             }
  524           *stackp++ = b - bufp->buffer;
  525           *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0;
  526           *stackp++ = regnum++;
  527           *stackp++ = begalt - bufp->buffer;
  528           fixup_jump = 0;
  529           laststart = 0;
  530           begalt = b;
  531           break;
  532 
  533         case ')':
  534           if (stackp == stackb) goto unmatched_close;
  535           begalt = *--stackp + bufp->buffer;
  536           if (fixup_jump)
  537         store_jump (fixup_jump, jump, b);
  538           if (stackp[-1] < RE_NREGS)
  539         {
  540           PATPUSH (stop_memory);
  541           PATPUSH (stackp[-1]);
  542         }
  543           stackp -= 2;
  544           fixup_jump = 0;
  545           if (*stackp)
  546         fixup_jump = *stackp + bufp->buffer - 1;
  547           laststart = *--stackp + bufp->buffer;
  548           break;
  549 
  550         case '|':
  551           insert_jump (on_failure_jump, begalt, b + 6, b);
  552           pending_exact = 0;
  553           b += 3;
  554           if (fixup_jump)
  555         store_jump (fixup_jump, jump, b);
  556           fixup_jump = b;
  557           b += 3;
  558           laststart = 0;
  559           begalt = b;
  560           break;
  561 
  562 #ifdef emacs
  563         case '=':
  564           PATPUSH (at_dot);
  565           break;
  566 
  567         case 's':   
  568           laststart = b;
  569           PATPUSH (syntaxspec);
  570           PATFETCH (c);
  571           PATPUSH (syntax_spec_code[c]);
  572           break;
  573 
  574         case 'S':
  575           laststart = b;
  576           PATPUSH (notsyntaxspec);
  577           PATFETCH (c);
  578           PATPUSH (syntax_spec_code[c]);
  579           break;
  580 #endif /* emacs */
  581 
  582         case 'w':
  583           laststart = b;
  584           PATPUSH (wordchar);
  585           break;
  586 
  587         case 'W':
  588           laststart = b;
  589           PATPUSH (notwordchar);
  590           break;
  591 
  592         case '<':
  593           PATPUSH (wordbeg);
  594           break;
  595 
  596         case '>':
  597           PATPUSH (wordend);
  598           break;
  599 
  600         case 'b':
  601           PATPUSH (wordbound);
  602           break;
  603 
  604         case 'B':
  605           PATPUSH (notwordbound);
  606           break;
  607 
  608         case '`':
  609           PATPUSH (begbuf);
  610           break;
  611 
  612         case '\'':
  613           PATPUSH (endbuf);
  614           break;
  615 
  616         case '1':
  617         case '2':
  618         case '3':
  619         case '4':
  620         case '5':
  621         case '6':
  622         case '7':
  623         case '8':
  624         case '9':
  625           c1 = c - '0';
  626           if (c1 >= regnum)
  627         goto normal_char;
  628           for (stackt = stackp - 2;  stackt > stackb;  stackt -= 4)
  629         if (*stackt == c1)
  630           goto normal_char;
  631           laststart = b;
  632           PATPUSH (duplicate);
  633           PATPUSH (c1);
  634           break;
  635         default:
  636           /* You might think it wuld be useful for \ to mean
  637          not to translate; but if we don't translate it
  638          it will never match anything.  */
  639           if (translate) c = translate[c];
  640           goto normal_char;
  641         }
  642       break;
  643 
  644     default:
  645     normal_char:
  646       if (!pending_exact || pending_exact + *pending_exact + 1 != b
  647           || *pending_exact == 0177 || *p == '*' || *p == '^'
  648           || *p == '+' || *p == '?')
  649         {
  650           laststart = b;
  651           PATPUSH (exactn);
  652           pending_exact = b;
  653           PATPUSH (0);
  654         }
  655       PATPUSH (c);
  656       (*pending_exact)++;
  657     }
  658     }
  659 
  660   if (fixup_jump)
  661     store_jump (fixup_jump, jump, b);
  662 
  663   if (stackp != stackb) goto unmatched_open;
  664 
  665   bufp->used = b - bufp->buffer;
  666   return 0;
  667 
  668  invalid_pattern:
  669   return "Invalid regular expression";
  670 
  671  unmatched_open:
  672   return "Unmatched \\(";
  673 
  674  unmatched_close:
  675   return "Unmatched \\)";
  676 
  677  end_of_pattern:
  678   return "Premature end of regular expression";
  679 
  680  nesting_too_deep:
  681   return "Nesting too deep";
  682 
  683  too_big:
  684   return "Regular expression too big";
  685 
  686  memory_exhausted:
  687   return "Memory exhausted";
  688 }
  689 
  690 /* Store where `from' points a jump operation to jump to where `to' points.
  691   `opcode' is the opcode to store. */
  692 
  693 static int
  694 store_jump (from, opcode, to)
  695      char *from, *to;
  696      char opcode;
  697 {
  698   from[0] = opcode;
  699   from[1] = (to - (from + 3)) & 0377;
  700   from[2] = (to - (from + 3)) >> 8;
  701 }
  702 
  703 /* Open up space at char FROM, and insert there a jump to TO.
  704    CURRENT_END gives te end of the storage no in use,
  705    so we know how much data to copy up.
  706    OP is the opcode of the jump to insert.
  707 
  708    If you call this function, you must zero out pending_exact.  */
  709 
  710 static int
  711 insert_jump (op, from, to, current_end)
  712      char op;
  713      char *from, *to, *current_end;
  714 {
  715   register char *pto = current_end + 3;
  716   register char *pfrom = current_end;
  717   while (pfrom != from)
  718     *--pto = *--pfrom;
  719   store_jump (from, op, to);
  720 }
  721 
  722 /* Given a pattern, compute a fastmap from it.
  723  The fastmap records which of the (1 << BYTEWIDTH) possible characters
  724  can start a string that matches the pattern.
  725  This fastmap is used by re_search to skip quickly over totally 
  726  implausible text.
  727 
  728  The caller must supply the address of a (1 << BYTEWIDTH)-byte data area
  729  as bufp->fastmap.
  730  The other components of bufp describe the pattern to be used.  */
  731 
  732 void
  733 re_compile_fastmap (bufp)
  734      struct re_pattern_buffer *bufp;
  735 {
  736   unsigned char *pattern = (unsigned char *) bufp->buffer;
  737   int size = bufp->used;
  738   register char *fastmap = bufp->fastmap;
  739   register unsigned char *p = pattern;
  740   register unsigned char *pend = pattern + size;
  741   register int j, k;
  742   unsigned char *translate = (unsigned char *) bufp->translate;
  743 
  744   unsigned char *stackb[NFAILURES];
  745   unsigned char **stackp = stackb;
  746 
  747   bzero (fastmap, (1 << BYTEWIDTH));
  748   bufp->fastmap_accurate = 1;
  749   bufp->can_be_null = 0;
  750       
  751   while (p)
  752     {
  753       if (p == pend)
  754     {
  755       bufp->can_be_null = 1;
  756       break;
  757     }
  758 #ifdef SWITCH_ENUM_BUG
  759       switch ((int) ((enum regexpcode) *p++))
  760 #else
  761       switch ((enum regexpcode) *p++)
  762 #endif
  763     {
  764     case exactn:
  765       if (translate)
  766         fastmap[translate[p[1]]] = 1;
  767       else
  768         fastmap[p[1]] = 1;
  769       break;
  770 
  771         case begline:
  772         case before_dot:
  773     case at_dot:
  774     case after_dot:
  775     case begbuf:
  776     case endbuf:
  777     case wordbound:
  778     case notwordbound:
  779     case wordbeg:
  780     case wordend:
  781       continue;
  782 
  783     case endline:
  784       if (translate)
  785         fastmap[translate['\n']] = 1;
  786       else
  787         fastmap['\n'] = 1;
  788       if (bufp->can_be_null != 1)
  789         bufp->can_be_null = 2;
  790       break;
  791 
  792     case finalize_jump:
  793     case maybe_finalize_jump:
  794     case jump:
  795     case dummy_failure_jump:
  796       bufp->can_be_null = 1;
  797       j = *p++ & 0377;
  798       j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
  799       p += j + 1;       /* The 1 compensates for missing ++ above */
  800       if (j > 0)
  801         continue;
  802       /* Jump backward reached implies we just went through
  803          the body of a loop and matched nothing.
  804          Opcode jumped to should be an on_failure_jump.
  805          Just treat it like an ordinary jump.
  806          For a * loop, it has pushed its failure point already;
  807          if so, discard that as redundant.  */
  808       if ((enum regexpcode) *p != on_failure_jump)
  809         continue;
  810       p++;
  811       j = *p++ & 0377;
  812       j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
  813       p += j + 1;       /* The 1 compensates for missing ++ above */
  814       if (stackp != stackb && *stackp == p)
  815         stackp--;
  816       continue;
  817       
  818     case on_failure_jump:
  819       j = *p++ & 0377;
  820       j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
  821       p++;
  822       *++stackp = p + j;
  823       continue;
  824 
  825     case start_memory:
  826     case stop_memory:
  827       p++;
  828       continue;
  829 
  830     case duplicate:
  831       bufp->can_be_null = 1;
  832       fastmap['\n'] = 1;
  833     case anychar:
  834       for (j = 0; j < (1 << BYTEWIDTH); j++)
  835         if (j != '\n')
  836           fastmap[j] = 1;
  837       if (bufp->can_be_null)
  838         return;
  839       /* Don't return; check the alternative paths
  840          so we can set can_be_null if appropriate.  */
  841       break;
  842 
  843     case wordchar:
  844       for (j = 0; j < (1 << BYTEWIDTH); j++)
  845         if (SYNTAX (j) == Sword)
  846           fastmap[j] = 1;
  847       break;
  848 
  849     case notwordchar:
  850       for (j = 0; j < (1 << BYTEWIDTH); j++)
  851         if (SYNTAX (j) != Sword)
  852           fastmap[j] = 1;
  853       break;
  854 
  855 #ifdef emacs
  856     case syntaxspec:
  857       k = *p++;
  858       for (j = 0; j < (1 << BYTEWIDTH); j++)
  859         if (SYNTAX (j) == (enum syntaxcode) k)
  860           fastmap[j] = 1;
  861       break;
  862 
  863     case notsyntaxspec:
  864       k = *p++;
  865       for (j = 0; j < (1 << BYTEWIDTH); j++)
  866         if (SYNTAX (j) != (enum syntaxcode) k)
  867           fastmap[j] = 1;
  868       break;
  869 #endif /* emacs */
  870 
  871     case charset:
  872       for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
  873         if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
  874           {
  875         if (translate)
  876           fastmap[translate[j]] = 1;
  877         else
  878           fastmap[j] = 1;
  879           }
  880       break;
  881 
  882     case charset_not:
  883       /* Chars beyond end of map must be allowed */
  884       for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
  885         if (translate)
  886           fastmap[translate[j]] = 1;
  887         else
  888           fastmap[j] = 1;
  889 
  890       for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
  891         if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
  892           {
  893         if (translate)
  894           fastmap[translate[j]] = 1;
  895         else
  896           fastmap[j] = 1;
  897           }
  898       break;
  899     }
  900 
  901       /* Get here means we have successfully found the possible starting
  902      characters of one path of the pattern.  We need not follow this 
  903      path any farther. Instead, look at the next alternative 
  904      remembered in the stack. */
  905       if (stackp != stackb)
  906     p = *stackp--;
  907       else
  908     break;
  909     }
  910 }
  911 
  912 /* Like re_search_2, below, but only one string is specified. */
  913 
  914 int
  915 re_search (pbufp, string, size, startpos, range, regs)
  916      struct re_pattern_buffer *pbufp;
  917      char *string;
  918      int size, startpos, range;
  919      struct re_registers *regs;
  920 {
  921   return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size);
  922 }
  923 
  924 /* Like re_match_2 but tries first a match starting at index STARTPOS,
  925    then at STARTPOS + 1, and so on.
  926    RANGE is the number of places to try before giving up.
  927    If RANGE is negative, the starting positions tried are
  928     STARTPOS, STARTPOS - 1, etc.
  929    It is up to the caller to make sure that range is not so large
  930    as to take the starting position outside of the input strings.
  931 
  932 The value returned is the position at which the match was found,
  933  or -1 if no match was found,
  934  or -2 if error (such as failure stack overflow).  */
  935 
  936 int
  937 re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, 
  938          mstop)
  939      struct re_pattern_buffer *pbufp;
  940      char *string1, *string2;
  941      int size1, size2;
  942      int startpos;
  943      register int range;
  944      struct re_registers *regs;
  945      int mstop;
  946 {
  947   register char *fastmap = pbufp->fastmap;
  948   register unsigned char *translate = (unsigned char *) pbufp->translate;
  949   int total = size1 + size2;
  950   int val;
  951 
  952   /* Update the fastmap now if not correct already */
  953   if (fastmap && !pbufp->fastmap_accurate)
  954     re_compile_fastmap (pbufp);
  955   
  956   /* Don't waste time in a long search for a pattern
  957      that says it is anchored.  */
  958   if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
  959       && range > 0)
  960     {
  961       if (startpos > 0)
  962     return -1;
  963       else
  964     range = 1;
  965     }
  966 
  967   while (1)
  968     {
  969       /* If a fastmap is supplied, skip quickly over characters
  970      that cannot possibly be the start of a match.
  971      Note, however, that if the pattern can possibly match
  972      the null string, we must test it at each starting point
  973      so that we take the first null string we get.  */
  974 
  975       if (fastmap && startpos < total && pbufp->can_be_null != 1)
  976     {
  977       if (range > 0)
  978         {
  979           register int lim = 0;
  980           register unsigned char *p;
  981           int irange = range;
  982           if (startpos < size1 && startpos + range >= size1)
  983         lim = range - (size1 - startpos);
  984 
  985           p = ((unsigned char *)
  986            &(startpos >= size1 ? string2 - size1 : string1)[startpos]);
  987 
  988           if (translate)
  989         {
  990           while (range > lim && !fastmap[translate[*p++]])
  991             range--;
  992         }
  993           else
  994         {
  995           while (range > lim && !fastmap[*p++])
  996             range--;
  997         }
  998           startpos += irange - range;
  999         }
 1000       else
 1001         {
 1002           register unsigned char c;
 1003           if (startpos >= size1) c = string2[startpos - size1];
 1004           else c = string1[startpos];
 1005           c &= 0xff;
 1006           if (translate ? !fastmap[translate[c]] : !fastmap[c])
 1007         goto advance;
 1008         }
 1009     }
 1010 
 1011       if (range >= 0 && startpos == total
 1012       && fastmap && pbufp->can_be_null == 0)
 1013     return -1;
 1014 
 1015       val = re_match_2 (pbufp, string1, size1, string2, size2, startpos, 
 1016             regs, mstop);
 1017       if (0 <= val)
 1018     {
 1019       if (val == -2)
 1020         return -2;
 1021       return startpos;
 1022     }
 1023 
 1024 #ifdef C_ALLOCA
 1025       alloca (0);
 1026 #endif /* C_ALLOCA */
 1027 
 1028     advance:
 1029       if (!range) break;
 1030       if (range > 0) range--, startpos++; else range++, startpos--;
 1031     }
 1032   return -1;
 1033 }
 1034 
 1035 #ifndef emacs   /* emacs never uses this */
 1036 int
 1037 re_match (pbufp, string, size, pos, regs)
 1038      struct re_pattern_buffer *pbufp;
 1039      char *string;
 1040      int size, pos;
 1041      struct re_registers *regs;
 1042 {
 1043   return re_match_2 (pbufp, 0, 0, string, size, pos, regs, size);
 1044 }
 1045 #endif /* emacs */
 1046 
 1047 /* Maximum size of failure stack.  Beyond this, overflow is an error.  */
 1048 /* 
 1049  * Modification: failure stack size increased from 2000 to FAILURE_STACK
 1050  * By :      Po Cheung, po@cerc.utexas.edu
 1051  * Date :    April 15, 1990
 1052  */
 1053 
 1054 int re_max_failures = FAILURE_STACK;
 1055 
 1056 /* Match the pattern described by PBUFP
 1057    against data which is the virtual concatenation of STRING1 and STRING2.
 1058    SIZE1 and SIZE2 are the sizes of the two data strings.
 1059    Start the match at position POS.
 1060    Do not consider matching past the position MSTOP.
 1061 
 1062    If pbufp->fastmap is nonzero, then it had better be up to date.
 1063 
 1064    The reason that the data to match are specified as two components
 1065    which are to be regarded as concatenated
 1066    is so this function can be used directly on the contents of an Emacs buffer.
 1067 
 1068    -1 is returned if there is no match.  -2 is returned if there is
 1069    an error (such as match stack overflow).  Otherwise the value is the length
 1070    of the substring which was matched.  */
 1071 
 1072 #define ALLOCA_CHECK(pt) {if ((pt) == NULL || (pt) == (char**) 0xdeadbeef) { fprintf(stderr,"alloca failed\n"); exit(1);}}
 1073 
 1074 int
 1075 re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
 1076      struct re_pattern_buffer *pbufp;
 1077      char *string1, *string2;
 1078      int size1, size2;
 1079      int pos;
 1080      struct re_registers *regs;
 1081      int mstop;
 1082 {
 1083   register char *p = pbufp->buffer;
 1084   register char *pend = p + pbufp->used;
 1085   /* End of first string */
 1086   char *end1;
 1087   /* End of second string */
 1088   char *end2;
 1089   /* Pointer just past last char to consider matching */
 1090   char *end_match_1, *end_match_2;
 1091   register char *d, *dend;
 1092   register int mcnt;
 1093   unsigned char *translate = (unsigned char *) pbufp->translate;
 1094 
 1095  /* Failure point stack.  Each place that can handle a failure further 
 1096     down the line pushes a failure point on this stack.  It consists of 
 1097     two char *'s.
 1098     The first one pushed is where to resume scanning the pattern;
 1099     the second pushed is where to resume scanning the strings.
 1100     If the latter is zero, the failure point is a "dummy".
 1101     If a failure happens and the innermost failure point is dormant,
 1102     it discards that failure point and tries the next one. */
 1103 
 1104   char **stackb = (char **) alloca (2 * NFAILURES * sizeof (char *));
 1105   char **stackp = stackb, **stacke = &stackb[2 * NFAILURES];
 1106 
 1107   /* Information on the "contents" of registers.
 1108      These are pointers into the input strings; they record
 1109      just what was matched (on this attempt) by some part of the pattern.
 1110      The start_memory command stores the start of a register's contents
 1111      and the stop_memory command stores the end.
 1112 
 1113      At that point, regstart[regnum] points to the first character in the
 1114      register, regend[regnum] points to the first character beyond the end 
 1115      of the register,
 1116      regstart_seg1[regnum] is true iff regstart[regnum] points into string1,
 1117      and regend_seg1[regnum] is true iff regend[regnum] points into string1.  */
 1118 
 1119   char *regstart[RE_NREGS];
 1120   char *regend[RE_NREGS];
 1121   char regstart_seg1[RE_NREGS], regend_seg1[RE_NREGS];
 1122 
 1123   ALLOCA_CHECK(stackb);
 1124 
 1125   /* Set up pointers to ends of strings.
 1126      Don't allow the second string to be empty unless both are empty.  */
 1127   if (!size2)
 1128     {
 1129       string2 = string1;
 1130       size2 = size1;
 1131       string1 = 0;
 1132       size1 = 0;
 1133     }
 1134   end1 = string1 + size1;
 1135   end2 = string2 + size2;
 1136 
 1137   /* Compute where to stop matching, within the two strings */
 1138   if (mstop <= size1)
 1139     {
 1140       end_match_1 = string1 + mstop;
 1141       end_match_2 = string2;
 1142     }
 1143   else
 1144     {
 1145       end_match_1 = end1;
 1146       end_match_2 = string2 + mstop - size1;
 1147     }
 1148 
 1149   /* Initialize \( and \) text positions to -1
 1150      to mark ones that no \( or \) has been seen for.  */
 1151 
 1152   for (mcnt = 0; mcnt < sizeof (regstart) / sizeof (*regstart); mcnt++)
 1153     regstart[mcnt] = (char *) -1;
 1154 
 1155   assert(regstart[0] == (char *) -1);
 1156 
 1157   /* `p' scans through the pattern as `d' scans through the data.
 1158      `dend' is the end of the input string that `d' points within.
 1159      `d' is advanced into the following input string whenever necessary,
 1160      but this happens before fetching;
 1161      therefore, at the beginning of the loop,
 1162      `d' can be pointing at the end of a string,
 1163      but it cannot equal string2.  */
 1164 
 1165   if (pos <= size1)
 1166     d = string1 + pos, dend = end_match_1;
 1167   else
 1168     d = string2 + pos - size1, dend = end_match_2;
 1169 
 1170 /* Write PREFETCH; just before fetching a character with *d.  */
 1171 #define PREFETCH \
 1172  while (d == dend)                          \
 1173   { if (dend == end_match_2) goto fail;  /* end of string2 => failure */   \
 1174     d = string2;  /* end of string1 => advance to string2. */       \
 1175     dend = end_match_2; }
 1176 
 1177   /* This loop loops over pattern commands.
 1178      It exits by returning from the function if match is complete,
 1179      or it drops through if match fails at this starting point in the 
 1180      input data. */
 1181 
 1182   while (1)
 1183     {
 1184       if (p == pend)
 1185     /* End of pattern means we have succeeded! */
 1186     {
 1187       /* If caller wants register contents data back, convert it to indices */
 1188       if (regs)
 1189         {
 1190           regs->start[0] = pos;
 1191           if (dend == end_match_1)
 1192         regs->end[0] = d - string1;
 1193           else
 1194         regs->end[0] = d - string2 + size1;
 1195           for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
 1196         {
 1197           if (regstart[mcnt] == (char *) -1)
 1198             {
 1199               regs->start[mcnt] = -1;
 1200               regs->end[mcnt] = -1;
 1201               continue;
 1202             }
 1203           if (regstart_seg1[mcnt])
 1204             regs->start[mcnt] = regstart[mcnt] - string1;
 1205           else
 1206             regs->start[mcnt] = regstart[mcnt] - string2 + size1;
 1207           if (regend_seg1[mcnt])
 1208             regs->end[mcnt] = regend[mcnt] - string1;
 1209           else
 1210             regs->end[mcnt] = regend[mcnt] - string2 + size1;
 1211         }
 1212         }
 1213       if (dend == end_match_1)
 1214         return (d - string1 - pos);
 1215       else
 1216         return d - string2 + size1 - pos;
 1217     }
 1218 
 1219       /* Otherwise match next pattern command */
 1220 #ifdef SWITCH_ENUM_BUG
 1221       switch ((int) ((enum regexpcode) *p++))
 1222 #else
 1223       switch ((enum regexpcode) *p++)
 1224 #endif
 1225     {
 1226 
 1227     /* \( is represented by a start_memory, \) by a stop_memory.
 1228         Both of those commands contain a "register number" argument.
 1229         The text matched within the \( and \) is recorded under that number.
 1230         Then, \<digit> turns into a `duplicate' command which
 1231         is followed by the numeric value of <digit> as the register number. */
 1232 
 1233     case start_memory:
 1234       regstart[*p] = d;
 1235       regstart_seg1[*p++] = (dend == end_match_1);
 1236       break;
 1237 
 1238     case stop_memory:
 1239       regend[*p] = d;
 1240       regend_seg1[*p++] = (dend == end_match_1);
 1241       break;
 1242 
 1243     case duplicate:
 1244       {
 1245         int regno = *p++;   /* Get which register to match against */
 1246         register char *d2, *dend2;
 1247 
 1248         d2 = regstart[regno];
 1249         dend2 = (regstart_seg1[regno] == regend_seg1[regno])
 1250                 ? regend[regno]
 1251             : end_match_1;
 1252         while (1)
 1253           {
 1254         /* Advance to next segment in register contents, if necessary */
 1255         while (d2 == dend2)
 1256           {
 1257             if (dend2 == end_match_2) break;
 1258             if (dend2 == regend[regno]) break;
 1259 /* end of string1 => advance to string2. */
 1260             d2 = string2, dend2 = regend[regno];  
 1261           }
 1262         /* At end of register contents => success */
 1263         if (d2 == dend2) break;
 1264 
 1265         /* Advance to next segment in data being matched, if necessary */
 1266         PREFETCH;
 1267 
 1268         /* mcnt gets # consecutive chars to compare */
 1269         mcnt = dend - d;
 1270         if (mcnt > dend2 - d2)
 1271           mcnt = dend2 - d2;
 1272         /* Compare that many; failure if mismatch, else skip them. */
 1273 #ifndef __STDC__    /* (MJH) */
 1274         if (translate ? bcmp_translate (d, d2, mcnt, translate) : bcmp (d, d2, mcnt))
 1275 #else
 1276         if (translate ? bcmp_translate ((unsigned char *)d,
 1277                         (unsigned char *)d2,
 1278                         mcnt,
 1279                         translate)
 1280             : bcmp (d, d2, mcnt))
 1281 #endif  /* __STDC__ */
 1282           goto fail;
 1283         d += mcnt, d2 += mcnt;
 1284           }
 1285       }
 1286       break;
 1287 
 1288     case anychar:
 1289       /* fetch a data character */
 1290       PREFETCH;
 1291       /* Match anything but a newline.  */
 1292       if ((translate ? translate[*(unsigned char *)d++] : *d++) == '\n')
 1293         goto fail;
 1294       break;
 1295 
 1296     case charset:
 1297     case charset_not:
 1298       {
 1299         /* Nonzero for charset_not */
 1300         int not = 0;
 1301         register int c;
 1302         if (*(p - 1) == (char) charset_not)
 1303           not = 1;
 1304 
 1305         /* fetch a data character */
 1306         PREFETCH;
 1307 
 1308         if (translate)
 1309           c = translate [*(unsigned char *)d];
 1310         else
 1311           c = *(unsigned char *)d;
 1312 
 1313         if (c < *p * BYTEWIDTH
 1314         && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
 1315           not = !not;
 1316 
 1317         p += 1 + *p;
 1318 
 1319         if (!not) goto fail;
 1320         d++;
 1321         break;
 1322       }
 1323 
 1324     case begline:
 1325       if (d == string1 || d[-1] == '\n')
 1326         break;
 1327       goto fail;
 1328 
 1329     case endline:
 1330       if (d == end2
 1331           || (d == end1 ? (size2 == 0 || *string2 == '\n') : *d == '\n'))
 1332         break;
 1333       goto fail;
 1334 
 1335     /* "or" constructs ("|") are handled by starting each alternative
 1336         with an on_failure_jump that points to the start of the next alternative.
 1337         Each alternative except the last ends with a jump to the joining point.
 1338         (Actually, each jump except for the last one really jumps
 1339          to the following jump, because tensioning the jumps is a hassle.) */
 1340 
 1341     /* The start of a stupid repeat has an on_failure_jump that points
 1342        past the end of the repeat text.
 1343        This makes a failure point so that, on failure to match a repetition,
 1344        matching restarts past as many repetitions have been found
 1345        with no way to fail and look for another one.  */
 1346 
 1347     /* A smart repeat is similar but loops back to the on_failure_jump
 1348        so that each repetition makes another failure point. */
 1349 
 1350     case on_failure_jump:
 1351       if (stackp == stacke)
 1352         {
 1353           char **stackx;
 1354           if (stacke - stackb > re_max_failures)
 1355         return -2;
 1356           stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *));
 1357           ALLOCA_CHECK(stackx);
 1358           bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
 1359           stackp += stackx - stackb;
 1360           stacke = stackx + 2 * (stacke - stackb);
 1361           stackb = stackx;
 1362         }
 1363       mcnt = *p++ & 0377;
 1364       mcnt += SIGN_EXTEND_CHAR (*p) << 8;
 1365       p++;
 1366       *stackp++ = mcnt + p;
 1367       *stackp++ = d;
 1368       break;
 1369 
 1370     /* The end of a smart repeat has an maybe_finalize_jump back.
 1371        Change it either to a finalize_jump or an ordinary jump. */
 1372 
 1373     case maybe_finalize_jump:
 1374       mcnt = *p++ & 0377;
 1375       mcnt += SIGN_EXTEND_CHAR (*p) << 8;
 1376       p++;
 1377       /* Compare what follows with the begining of the repeat.
 1378          If we can establish that there is nothing that they would
 1379          both match, we can change to finalize_jump */
 1380       if (p == pend)
 1381         p[-3] = (char) finalize_jump;
 1382       else if (*p == (char) exactn || *p == (char) endline)
 1383         {
 1384           register int c = *p == (char) endline ? '\n' : p[2];
 1385           register char *p1 = p + mcnt;
 1386           /* p1[0] ... p1[2] are an on_failure_jump.
 1387          Examine what follows that */
 1388           if (p1[3] == (char) exactn && p1[5] != c)
 1389         p[-3] = (char) finalize_jump;
 1390           else if (p1[3] == (char) charset || p1[3] == (char) charset_not)
 1391         {
 1392           int not = p1[3] == (char) charset_not;
 1393           if (c < p1[4] * BYTEWIDTH
 1394               && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
 1395             not = !not;
 1396           /* not is 1 if c would match */
 1397           /* That means it is not safe to finalize */
 1398           if (!not)
 1399             p[-3] = (char) finalize_jump;
 1400         }
 1401         }
 1402       p -= 2;
 1403       if (p[-1] != (char) finalize_jump)
 1404         {
 1405           p[-1] = (char) jump;
 1406           goto nofinalize;
 1407         }
 1408 
 1409     /* The end of a stupid repeat has a finalize-jump
 1410        back to the start, where another failure point will be made
 1411        which will point after all the repetitions found so far. */
 1412 
 1413     case finalize_jump:
 1414       stackp -= 2;
 1415 
 1416     case jump:
 1417     nofinalize:
 1418       mcnt = *p++ & 0377;
 1419       mcnt += SIGN_EXTEND_CHAR (*p) << 8;
 1420       p += mcnt + 1;    /* The 1 compensates for missing ++ above */
 1421       break;
 1422 
 1423     case dummy_failure_jump:
 1424       if (stackp == stacke)
 1425         {
 1426           char **stackx = (char **) alloca (2 * (stacke - stackb) *
 1427                         sizeof (char *));
 1428           ALLOCA_CHECK(stackx);
 1429           bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
 1430           stackp += stackx - stackb;
 1431           stacke = stackx + 2 * (stacke - stackb);
 1432           stackb = stackx;
 1433         }
 1434       *stackp++ = 0;
 1435       *stackp++ = 0;
 1436       goto nofinalize;
 1437 
 1438     case wordbound:
 1439       if (d == string1  /* Points to first char */
 1440           || d == end2  /* Points to end */
 1441           || (d == end1 && size2 == 0)) /* Points to end */
 1442         break;
 1443       if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
 1444           != (SYNTAX (d == end1 ? *(unsigned char *)string2 : 
 1445               *(unsigned char *)d) == Sword))
 1446         break;
 1447       goto fail;
 1448 
 1449     case notwordbound:
 1450       if (d == string1  /* Points to first char */
 1451           || d == end2  /* Points to end */
 1452           || (d == end1 && size2 == 0)) /* Points to end */
 1453         goto fail;
 1454       if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
 1455           != (SYNTAX (d == end1 ? *(unsigned char *)string2 : 
 1456               *(unsigned char *)d) == Sword))
 1457         goto fail;
 1458       break;
 1459 
 1460     case wordbeg:
 1461       if (d == end2  /* Points to end */
 1462           || (d == end1 && size2 == 0) /* Points to end */
 1463           || SYNTAX (*(unsigned char *) (d == end1 ? string2 : d)) != 
 1464           Sword) /* Next char not a letter */
 1465         goto fail;
 1466  /* prev char not letter */
 1467       if (d == string1  /* Points to first char */
 1468           || SYNTAX (((unsigned char *)d)[-1]) != Sword) 
 1469         break;
 1470       goto fail;
 1471 
 1472     case wordend:
 1473       if (d == string1  /* Points to first char */
 1474           || SYNTAX (((unsigned char *)d)[-1]) != 
 1475           Sword)  /* prev char not letter */
 1476         goto fail;
 1477       if (d == end2  /* Points to end */
 1478           || (d == end1 && size2 == 0) /* Points to end */
 1479           || SYNTAX (d == end1 ? *(unsigned char *)string2 : 
 1480              *(unsigned char *)d) != 
 1481           Sword) /* Next char not a letter */
 1482         break;
 1483       goto fail;
 1484 
 1485 #ifdef emacs
 1486     case before_dot:
 1487       if (((d - string2 <= (unsigned) size2)
 1488            ? d - (char *) bf_p2 : d - (char *) bf_p1)
 1489           <= point)
 1490         goto fail;
 1491       break;
 1492 
 1493     case at_dot:
 1494       if (((d - string2 <= (unsigned) size2)
 1495            ? d - (char *) bf_p2 : d - (char *) bf_p1)
 1496           == point)
 1497         goto fail;
 1498       break;
 1499 
 1500     case after_dot:
 1501       if (((d - string2 <= (unsigned) size2)
 1502            ? d - (char *) bf_p2 : d - (char *) bf_p1)
 1503           >= point)
 1504         goto fail;
 1505       break;
 1506 
 1507     case wordchar:
 1508       mcnt = (int) Sword;
 1509       goto matchsyntax;
 1510 
 1511     case syntaxspec:
 1512       mcnt = *p++;
 1513     matchsyntax:
 1514       PREFETCH;
 1515       if (SYNTAX (*(unsigned char *)d++) != (enum syntaxcode) mcnt) goto fail;
 1516       break;
 1517       
 1518     case notwordchar:
 1519       mcnt = (int) Sword;
 1520       goto matchnotsyntax;
 1521 
 1522     case notsyntaxspec:
 1523       mcnt = *p++;
 1524     matchnotsyntax:
 1525       PREFETCH;
 1526       if (SYNTAX (*(unsigned char *)d++) == (enum syntaxcode) mcnt) goto fail;
 1527       break;
 1528 #else
 1529     case wordchar:
 1530       PREFETCH;
 1531       if (SYNTAX (*(unsigned char *)d++) == 0) goto fail;
 1532       break;
 1533       
 1534     case notwordchar:
 1535       PREFETCH;
 1536       if (SYNTAX (*(unsigned char *)d++) != 0) goto fail;
 1537       break;
 1538 #endif /* not emacs */
 1539 
 1540     case begbuf:
 1541       if (d == string1) /* Note, d cannot equal string2 */
 1542         break;      /* unless string1 == string2.  */
 1543       goto fail;
 1544 
 1545     case endbuf:
 1546       if (d == end2 || (d == end1 && size2 == 0))
 1547         break;
 1548       goto fail;
 1549 
 1550     case exactn:
 1551       /* Match the next few pattern characters exactly.
 1552          mcnt is how many characters to match. */
 1553       mcnt = *p++;
 1554       if (translate)
 1555         {
 1556           do
 1557         {
 1558           PREFETCH;
 1559           if (translate[*(unsigned char *)d++] != *p++) goto fail;
 1560         }
 1561           while (--mcnt);
 1562         }
 1563       else
 1564         {
 1565           do
 1566         {
 1567           PREFETCH;
 1568           if (*d++ != *p++) goto fail;
 1569         }
 1570           while (--mcnt);
 1571         }
 1572       break;
 1573     }
 1574       continue;    /* Successfully matched one pattern command; keep matching */
 1575 
 1576       /* Jump here if any matching operation fails. */
 1577     fail:
 1578       if (stackp != stackb)
 1579     /* A restart point is known.  Restart there and pop it. */
 1580     {
 1581       if (!stackp[-2])
 1582         {   /* If innermost failure point is dormant, flush it and keep looking */
 1583           stackp -= 2;
 1584           goto fail;
 1585         }
 1586       d = *--stackp;
 1587       p = *--stackp;
 1588       if (d >= string1 && d <= end1)
 1589         dend = end_match_1;
 1590     }
 1591       else break;   /* Matching at this starting point really fails! */
 1592     }
 1593   return -1;         /* Failure to match */
 1594 }
 1595 
 1596 static int
 1597 bcmp_translate (s1, s2, len, translate)
 1598      unsigned char *s1, *s2;
 1599      register int len;
 1600      unsigned char *translate;
 1601 {
 1602   register unsigned char *p1 = s1, *p2 = s2;
 1603   while (len)
 1604     {
 1605       if (translate [*p1++] != translate [*p2++]) return 1;
 1606       len--;
 1607     }
 1608   return 0;
 1609 }
 1610 
 1611 /* Entry points compatible with bsd4.2 regex library */
 1612 
 1613 #ifndef emacs
 1614 
 1615 static struct re_pattern_buffer re_comp_buf;
 1616 
 1617 char *
 1618 re_comp (s)
 1619      char *s;
 1620 {
 1621   if (!s)
 1622     {
 1623       if (!re_comp_buf.buffer)
 1624     return "No previous regular expression";
 1625       return 0;
 1626     }
 1627 
 1628   if (!re_comp_buf.buffer)
 1629     {
 1630       if (!(re_comp_buf.buffer = (char *) malloc (200)))
 1631     return "Memory exhausted";
 1632       re_comp_buf.allocated = 200;
 1633       if (!(re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH)))
 1634     return "Memory exhausted";
 1635     }
 1636   return re_compile_pattern (s, strlen (s), &re_comp_buf);
 1637 }
 1638 
 1639 int
 1640 re_exec (s)
 1641      char *s;
 1642 {
 1643   int len = strlen (s);
 1644   return 0 <= re_search (&re_comp_buf, s, len, 0, len, 0);
 1645 }
 1646 
 1647 #endif /* emacs */
 1648 
 1649 #ifdef test
 1650 
 1651 #include <stdio.h>
 1652 
 1653 /* Indexed by a character, gives the upper case equivalent of the character */
 1654 
 1655 static char upcase[0400] = 
 1656   { 000, 001, 002, 003, 004, 005, 006, 007,
 1657     010, 011, 012, 013, 014, 015, 016, 017,
 1658     020, 021, 022, 023, 024, 025, 026, 027,
 1659     030, 031, 032, 033, 034, 035, 036, 037,
 1660     040, 041, 042, 043, 044, 045, 046, 047,
 1661     050, 051, 052, 053, 054, 055, 056, 057,
 1662     060, 061, 062, 063, 064, 065, 066, 067,
 1663     070, 071, 072, 073, 074, 075, 076, 077,
 1664     0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
 1665     0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
 1666     0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
 1667     0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
 1668     0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
 1669     0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
 1670     0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
 1671     0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177,
 1672     0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
 1673     0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
 1674     0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
 1675     0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
 1676     0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
 1677     0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
 1678     0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
 1679     0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
 1680     0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
 1681     0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
 1682     0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
 1683     0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
 1684     0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
 1685     0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
 1686     0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
 1687     0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377
 1688   };
 1689 
 1690 main ()
 1691 {
 1692   char pat[80];
 1693   struct re_pattern_buffer buf;
 1694   int i;
 1695   char c;
 1696   char fastmap[(1 << BYTEWIDTH)];
 1697 
 1698   buf.allocated = 40;
 1699   buf.buffer = (char *) malloc (buf.allocated);
 1700   buf.fastmap = fastmap;
 1701   buf.translate = upcase;
 1702 
 1703   while (1)
 1704     {
 1705       gets (pat);
 1706 
 1707       if (*pat)
 1708     {
 1709           re_compile_pattern (pat, strlen(pat), &buf);
 1710 
 1711       for (i = 0; i < buf.used; i++)
 1712         printchar (buf.buffer[i]);
 1713 
 1714       putchar ('\n');
 1715 
 1716       printf ("%d allocated, %d used.\n", buf.allocated, buf.used);
 1717 
 1718       re_compile_fastmap (&buf);
 1719       printf ("Allowed by fastmap: ");
 1720       for (i = 0; i < (1 << BYTEWIDTH); i++)
 1721         if (fastmap[i]) printchar (i);
 1722       putchar ('\n');
 1723     }
 1724 
 1725       gets (pat);   /* Now read the string to match against */
 1726 
 1727       i = re_match (&buf, pat, strlen (pat), 0, 0);
 1728       printf ("Match value %d.\n", i);
 1729     }
 1730 }
 1731 
 1732 #ifdef NOTDEF
 1733 print_buf (bufp)
 1734      struct re_pattern_buffer *bufp;
 1735 {
 1736   int i;
 1737 
 1738   printf ("buf is :\n----------------\n");
 1739   for (i = 0; i < bufp->used; i++)
 1740     printchar (bufp->buffer[i]);
 1741   
 1742   printf ("\n%d allocated, %d used.\n", bufp->allocated, bufp->used);
 1743   
 1744   printf ("Allowed by fastmap: ");
 1745   for (i = 0; i < (1 << BYTEWIDTH); i++)
 1746     if (bufp->fastmap[i])
 1747       printchar (i);
 1748   printf ("\nAllowed by translate: ");
 1749   if (bufp->translate)
 1750     for (i = 0; i < (1 << BYTEWIDTH); i++)
 1751       if (bufp->translate[i])
 1752     printchar (i);
 1753   printf ("\nfastmap is%s accurate\n", bufp->fastmap_accurate ? "" : "n't");
 1754   printf ("can %s be null\n----------", bufp->can_be_null ? "" : "not");
 1755 }
 1756 #endif
 1757 
 1758 printchar (c)
 1759      char c;
 1760 {
 1761   if (c < 041 || c >= 0177)
 1762     {
 1763       putchar ('\\');
 1764       putchar (((c >> 6) & 3) + '0');
 1765       putchar (((c >> 3) & 7) + '0');
 1766       putchar ((c & 7) + '0');
 1767     }
 1768   else
 1769     putchar (c);
 1770 }
 1771 
 1772 error (string)
 1773      char *string;
 1774 {
 1775   puts (string);
 1776 #ifdef CREATE_IO_WINDOW
 1777   if (iowinpid) kill(iowinpid, SIGKILL);
 1778   iowinpid = 0;
 1779 #endif /* CREATE_IO_WINDOW */
 1780   exit (1);
 1781 }
 1782 
 1783 #endif /* test */