"Fossies" - the Fresh Open Source Software Archive 
Member "xxgdb-1.12/regex.c" (19 Apr 1995, 48047 Bytes) of package /linux/misc/old/xxgdb-1.12.tar.gz:
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style:
standard) with prefixed line numbers and
code folding option.
Alternatively you can here
view or
download the uninterpreted source code file.
1 /* Extended regular expression matching and search.
2 Copyright (C) 1985 Free Software Foundation, Inc.
3
4 NO WARRANTY
5
6 BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
7 NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
8 WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
9 RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
10 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
11 BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
12 FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
13 AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
14 DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
15 CORRECTION.
16
17 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
18 STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
19 WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
20 LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
21 OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
22 USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
23 DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
24 A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
25 PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
26 DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
27
28 GENERAL PUBLIC LICENSE TO COPY
29
30 1. You may copy and distribute verbatim copies of this source file
31 as you receive it, in any medium, provided that you conspicuously and
32 appropriately publish on each copy a valid copyright notice "Copyright
33 (C) 1985 Free Software Foundation, Inc."; and include following the
34 copyright notice a verbatim copy of the above disclaimer of warranty
35 and of this License. You may charge a distribution fee for the
36 physical act of transferring a copy.
37
38 2. You may modify your copy or copies of this source file or
39 any portion of it, and copy and distribute such modifications under
40 the terms of Paragraph 1 above, provided that you also do the following:
41
42 a) cause the modified files to carry prominent notices stating
43 that you changed the files and the date of any change; and
44
45 b) cause the whole of any work that you distribute or publish,
46 that in whole or in part contains or is a derivative of this
47 program or any part thereof, to be licensed at no charge to all
48 third parties on terms identical to those contained in this
49 License Agreement (except that you may choose to grant more extensive
50 warranty protection to some or all third parties, at your option).
51
52 c) You may charge a distribution fee for the physical act of
53 transferring a copy, and you may at your option offer warranty
54 protection in exchange for a fee.
55
56 Mere aggregation of another unrelated program with this program (or its
57 derivative) on a volume of a storage or distribution medium does not bring
58 the other program under the scope of these terms.
59
60 3. You may copy and distribute this program (or a portion or derivative
61 of it, under Paragraph 2) in object code or executable form under the terms
62 of Paragraphs 1 and 2 above provided that you also do one of the following:
63
64 a) accompany it with the complete corresponding machine-readable
65 source code, which must be distributed under the terms of
66 Paragraphs 1 and 2 above; or,
67
68 b) accompany it with a written offer, valid for at least three
69 years, to give any third party free (except for a nominal
70 shipping charge) a complete machine-readable copy of the
71 corresponding source code, to be distributed under the terms of
72 Paragraphs 1 and 2 above; or,
73
74 c) accompany it with the information you received as to where the
75 corresponding source code may be obtained. (This alternative is
76 allowed only for noncommercial distribution and only if you
77 received the program in object code or executable form alone.)
78
79 For an executable file, complete source code means all the source code for
80 all modules it contains; but, as a special exception, it need not include
81 source code for modules which are standard libraries that accompany the
82 operating system on which the executable file runs.
83
84 4. You may not copy, sublicense, distribute or transfer this program
85 except as expressly provided under this License Agreement. Any attempt
86 otherwise to copy, sublicense, distribute or transfer this program is void and
87 your rights to use the program under this License agreement shall be
88 automatically terminated. However, parties who have received computer
89 software programs from you with this License Agreement will not have
90 their licenses terminated so long as such parties remain in full compliance.
91
92 5. If you wish to incorporate parts of this program into other free
93 programs whose distribution conditions are different, write to the Free
94 Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
95 worked out a simple rule that can be stated here, but we will often permit
96 this. We will be guided by the two goals of preserving the free status of
97 all derivatives of our free software and of promoting the sharing and reuse of
98 software.
99
100
101 In other words, you are welcome to use, share and improve this program.
102 You are forbidden to forbid anyone else to use, share and improve
103 what you give them. Help stamp out software-hoarding! */
104
105
106 /* To test, compile with -Dtest.
107 This Dtestable feature turns this into a self-contained program
108 which reads a pattern, describes how it compiles,
109 then reads a string and searches for it. */
110
111 /*
112 * Modification: alloca.h included for sparc machines
113 * By : Po Cheung, po@cerc.utexas.edu
114 * Date : July 27, 1990
115 */
116
117 #ifndef NeXT
118 #include <malloc.h>
119 #endif
120 #include <string.h>
121 #include <stdio.h>
122 #include <stdlib.h>
123 #include <assert.h>
124
125 #ifdef sparc
126 #include <alloca.h>
127 #else
128 #pragma alloca
129 #endif
130 #define FAILURE_STACK 20000 /* max failure stack size */
131
132 #ifdef __GNUC__
133 #ifdef alloca
134 #undef alloca
135 #endif
136 #define alloca __builtin_alloca
137 #endif
138
139 #ifdef __STDC__
140 static int bcmp_translate (unsigned char *, unsigned char *s2,
141 int len, unsigned char *translate); /* (MJH) */
142 #endif /* __STDC__ */
143
144 /* bcopy, bzero and bcmp are needed if we are on Solaris */
145
146 #ifdef SYSV
147 #ifdef SUNOS4
148
149 void bcopy(s1, s2, len)
150 register char *s1, *s2;
151 int len;
152 {
153 for (; len; len--)
154 *(s2++) = *(s1++);
155 }
156
157 int bcmp(s1, s2, len)
158 register char *s1, *s2;
159 int len;
160 {
161 for (; len; len--)
162 if (*(s1++) != *(s2++))
163 return 1;
164 return 0;
165 }
166
167 void bzero(sp, len)
168 register char *sp;
169 int len;
170 {
171 for (; len; len--)
172 *(sp++) = 0;
173 }
174
175 #endif /* SUNOS4 */
176 #endif /* SYSV */
177
178 #ifdef emacs
179
180 /* The `emacs' switch turns on certain special matching commands
181 that make sense only in emacs. */
182
183 #include "config.h"
184 #include "lisp.h"
185 #include "buffer.h"
186 #include "syntax.h"
187
188 #else /* not emacs */
189
190 /*
191 * Define the syntax stuff, so we can do the \<...\> things.
192 */
193
194 #ifndef Sword /* must be non-zero in some of the tests below... */
195 #define Sword 1
196 #endif
197
198 #define SYNTAX(c) re_syntax_table[c]
199
200 #ifdef SYNTAX_TABLE
201
202 char *re_syntax_table;
203
204 #else
205
206 static char re_syntax_table[256];
207
208 static void
209 init_syntax_once ()
210 {
211 register int c;
212 static int done = 0;
213
214 if (done)
215 return;
216
217 bzero (re_syntax_table, sizeof re_syntax_table);
218
219 for (c = 'a'; c <= 'z'; c++)
220 re_syntax_table[c] = Sword;
221
222 for (c = 'A'; c <= 'Z'; c++)
223 re_syntax_table[c] = Sword;
224
225 for (c = '0'; c <= '9'; c++)
226 re_syntax_table[c] = Sword;
227
228 done = 1;
229 }
230
231 #endif /* SYNTAX_TABLE */
232 #endif /* not emacs */
233
234 #include "regex.h"
235
236 /* Number of failure points to allocate space for initially,
237 when matching. If this number is exceeded, more space is allocated,
238 so it is not a hard limit. */
239
240 #ifndef NFAILURES
241 #define NFAILURES 80
242 #endif /* NFAILURES */
243
244 /* width of a byte in bits */
245
246 #define BYTEWIDTH 8
247
248 #ifndef SIGN_EXTEND_CHAR
249 #define SIGN_EXTEND_CHAR(x) ((signed char)(x))
250 #endif
251
252 /* compile_pattern takes a regular-expression descriptor string in the
253 user's format and converts it into a buffer full of byte commands
254 for matching.
255
256 pattern is the address of the pattern string
257 size is the length of it.
258 bufp is a struct re_pattern_buffer * which points to the info
259 on where to store the byte commands.
260 This structure contains a char * which points to the
261 actual space, which should have been obtained with malloc.
262 compile_pattern may use realloc to grow the buffer space.
263
264 The number of bytes of commands can be found out by looking in
265 the struct re_pattern_buffer that bufp pointed to,
266 after compile_pattern returns.
267 */
268
269 #define PATPUSH(ch) (*b++ = (char) (ch))
270
271 #define PATFETCH(c) \
272 {if (p == pend) goto end_of_pattern; \
273 c = * (unsigned char *) p++; \
274 if (translate) c = translate[c]; }
275
276 #define PATFETCH_RAW(c) \
277 {if (p == pend) goto end_of_pattern; \
278 c = * (unsigned char *) p++; }
279
280 #define PATUNFETCH p--
281
282 /* This version of EXTEND_BUFFER will now work Alpha Machines,
283 and non Alpha Machines.
284
285 It isn't quite clean but it is cleaner than the previous version
286 that really only worked at all with 32bit pointers.
287
288 Dean Michaels, TRLabs.
289 dean@trlabs.ca
290 */
291
292 #define EXTEND_BUFFER \
293 { char *old_buffer = bufp->buffer; \
294 if (bufp->allocated == (1<<16)) goto too_big; \
295 bufp->allocated *= 2; \
296 if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \
297 if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \
298 goto memory_exhausted; \
299 c = bufp->buffer - old_buffer; \
300 b -= (-(int)c); \
301 if (fixup_jump) \
302 fixup_jump -= (-(int)c); \
303 if (laststart) \
304 laststart -= (-(int)c); \
305 begalt -= (-(int)c); \
306 if (pending_exact) \
307 pending_exact -= (-(int)c); \
308 }
309
310 static int store_jump (), insert_jump ();
311
312 char *
313 re_compile_pattern (pattern, size, bufp)
314 char *pattern;
315 int size;
316 struct re_pattern_buffer *bufp;
317 {
318 register char *b = bufp->buffer;
319 register char *p = pattern;
320 char *pend = pattern + size;
321 register unsigned c, c1;
322 char *p1;
323 unsigned char *translate = (unsigned char *) bufp->translate;
324
325 /* address of the count-byte of the most recently inserted "exactn" command.
326 This makes it possible to tell whether a new exact-match character
327 can be added to that command or requires a new "exactn" command. */
328
329 char *pending_exact = 0;
330
331 /* address of the place where a forward-jump should go
332 to the end of the containing expression.
333 Each alternative of an "or", except the last, ends with a forward-jump
334 of this sort. */
335
336 char *fixup_jump = 0;
337
338 /* address of start of the most recently finished expression.
339 This tells postfix * where to find the start of its operand. */
340
341 char *laststart = 0;
342
343 /* In processing a repeat, 1 means zero matches is allowed */
344
345 char zero_times_ok;
346
347 /* In processing a repeat, 1 means many matches is allowed */
348
349 char many_times_ok;
350
351 /* address of beginning of regexp, or inside of last \( */
352
353 char *begalt = b;
354
355 /* Stack of information saved by \( and restored by \).
356 Four stack elements are pushed by each \(:
357 First, the value of b.
358 Second, the value of fixup_jump.
359 Third, the value of regnum.
360 Fourth, the value of begalt. */
361
362 int stackb[40];
363 int *stackp = stackb;
364 int *stacke = stackb + 40;
365 int *stackt;
366
367 /* Counts \('s as they are encountered. Remembered for the matching \),
368 where it becomes the "register number" to put in the stop_memory command */
369
370 int regnum = 1;
371
372 bufp->fastmap_accurate = 0;
373
374 #ifndef emacs
375 #ifndef SYNTAX_TABLE
376 /*
377 * Initialize the syntax table.
378 */
379 init_syntax_once();
380 #endif
381 #endif
382
383 if (bufp->allocated == 0)
384 {
385 bufp->allocated = 28;
386 if (bufp->buffer)
387 /* EXTEND_BUFFER loses when bufp->allocated is 0 */
388 bufp->buffer = (char *) realloc (bufp->buffer, 28);
389 else
390 /* Caller did not allocate a buffer. Do it for him */
391 bufp->buffer = (char *) malloc (28);
392 if (!bufp->buffer) goto memory_exhausted;
393 begalt = b = bufp->buffer;
394 }
395
396 while (p != pend)
397 {
398 if (b - bufp->buffer > bufp->allocated - 10)
399 /* Note that EXTEND_BUFFER clobbers c */
400 EXTEND_BUFFER;
401
402 PATFETCH (c);
403
404 switch (c)
405 {
406 case '$':
407 /* $ means succeed if at end of line, but only in special contexts.
408 If randonly in the middle of a pattern, it is a normal character. */
409 if (p == pend || (*p == '\\' && (p[1] == ')' || p[1] == '|')))
410 {
411 PATPUSH (endline);
412 break;
413 }
414 goto normal_char;
415
416 case '^':
417 /* ^ means succeed if at beg of line, but only if no preceding pattern. */
418 if (laststart) goto normal_char;
419 PATPUSH (begline);
420 break;
421
422 case '*':
423 case '+':
424 case '?':
425 /* If there is no previous pattern, char not special. */
426 if (!laststart)
427 goto normal_char;
428 /* If there is a sequence of repetition chars,
429 collapse it down to equivalent to just one. */
430 zero_times_ok = 0;
431 many_times_ok = 0;
432 while (1)
433 {
434 zero_times_ok |= c != '+';
435 many_times_ok |= c != '?';
436 if (p == pend)
437 break;
438 PATFETCH (c);
439 if (!(c == '*' || c == '+' || c == '?'))
440 {
441 PATUNFETCH;
442 break;
443 }
444 }
445
446 /* Now we know whether 0 matches is allowed,
447 and whether 2 or more matches is allowed. */
448 if (many_times_ok)
449 {
450 /* If more than one repetition is allowed,
451 put in a backward jump at the end. */
452 store_jump (b, maybe_finalize_jump, laststart - 3);
453 b += 3;
454 }
455 insert_jump (on_failure_jump, laststart, b + 3, b);
456 pending_exact = 0;
457 b += 3;
458 if (!zero_times_ok)
459 {
460 /* At least one repetition required: insert before the loop
461 a skip over the initial on-failure-jump instruction */
462 insert_jump (dummy_failure_jump, laststart, laststart + 6, b);
463 b += 3;
464 }
465 break;
466
467 case '.':
468 laststart = b;
469 PATPUSH (anychar);
470 break;
471
472 case '[':
473 if (b - bufp->buffer
474 > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
475 /* Note that EXTEND_BUFFER clobbers c */
476 EXTEND_BUFFER;
477
478 laststart = b;
479 if (*p == '^')
480 PATPUSH (charset_not), p++;
481 else
482 PATPUSH (charset);
483 p1 = p;
484
485 PATPUSH ((1 << BYTEWIDTH) / BYTEWIDTH);
486 /* Clear the whole map */
487 bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
488 /* Read in characters and ranges, setting map bits */
489 while (1)
490 {
491 PATFETCH (c);
492 if (c == ']' && p != p1 + 1) break;
493 if (*p == '-')
494 {
495 PATFETCH (c1);
496 PATFETCH (c1);
497 while (c <= c1)
498 b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH), c++;
499 }
500 else
501 {
502 b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH);
503 }
504 }
505 /* Discard any bitmap bytes that are all 0 at the end of the map.
506 Decrement the map-length byte too. */
507 while (b[-1] > 0 && b[b[-1] - 1] == 0)
508 b[-1]--;
509 b += b[-1];
510 break;
511
512 case '\\':
513 if (p == pend) goto invalid_pattern;
514 PATFETCH_RAW (c);
515 switch (c)
516 {
517 case '(':
518 if (stackp == stacke) goto nesting_too_deep;
519 if (regnum < RE_NREGS)
520 {
521 PATPUSH (start_memory);
522 PATPUSH (regnum);
523 }
524 *stackp++ = b - bufp->buffer;
525 *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0;
526 *stackp++ = regnum++;
527 *stackp++ = begalt - bufp->buffer;
528 fixup_jump = 0;
529 laststart = 0;
530 begalt = b;
531 break;
532
533 case ')':
534 if (stackp == stackb) goto unmatched_close;
535 begalt = *--stackp + bufp->buffer;
536 if (fixup_jump)
537 store_jump (fixup_jump, jump, b);
538 if (stackp[-1] < RE_NREGS)
539 {
540 PATPUSH (stop_memory);
541 PATPUSH (stackp[-1]);
542 }
543 stackp -= 2;
544 fixup_jump = 0;
545 if (*stackp)
546 fixup_jump = *stackp + bufp->buffer - 1;
547 laststart = *--stackp + bufp->buffer;
548 break;
549
550 case '|':
551 insert_jump (on_failure_jump, begalt, b + 6, b);
552 pending_exact = 0;
553 b += 3;
554 if (fixup_jump)
555 store_jump (fixup_jump, jump, b);
556 fixup_jump = b;
557 b += 3;
558 laststart = 0;
559 begalt = b;
560 break;
561
562 #ifdef emacs
563 case '=':
564 PATPUSH (at_dot);
565 break;
566
567 case 's':
568 laststart = b;
569 PATPUSH (syntaxspec);
570 PATFETCH (c);
571 PATPUSH (syntax_spec_code[c]);
572 break;
573
574 case 'S':
575 laststart = b;
576 PATPUSH (notsyntaxspec);
577 PATFETCH (c);
578 PATPUSH (syntax_spec_code[c]);
579 break;
580 #endif /* emacs */
581
582 case 'w':
583 laststart = b;
584 PATPUSH (wordchar);
585 break;
586
587 case 'W':
588 laststart = b;
589 PATPUSH (notwordchar);
590 break;
591
592 case '<':
593 PATPUSH (wordbeg);
594 break;
595
596 case '>':
597 PATPUSH (wordend);
598 break;
599
600 case 'b':
601 PATPUSH (wordbound);
602 break;
603
604 case 'B':
605 PATPUSH (notwordbound);
606 break;
607
608 case '`':
609 PATPUSH (begbuf);
610 break;
611
612 case '\'':
613 PATPUSH (endbuf);
614 break;
615
616 case '1':
617 case '2':
618 case '3':
619 case '4':
620 case '5':
621 case '6':
622 case '7':
623 case '8':
624 case '9':
625 c1 = c - '0';
626 if (c1 >= regnum)
627 goto normal_char;
628 for (stackt = stackp - 2; stackt > stackb; stackt -= 4)
629 if (*stackt == c1)
630 goto normal_char;
631 laststart = b;
632 PATPUSH (duplicate);
633 PATPUSH (c1);
634 break;
635 default:
636 /* You might think it wuld be useful for \ to mean
637 not to translate; but if we don't translate it
638 it will never match anything. */
639 if (translate) c = translate[c];
640 goto normal_char;
641 }
642 break;
643
644 default:
645 normal_char:
646 if (!pending_exact || pending_exact + *pending_exact + 1 != b
647 || *pending_exact == 0177 || *p == '*' || *p == '^'
648 || *p == '+' || *p == '?')
649 {
650 laststart = b;
651 PATPUSH (exactn);
652 pending_exact = b;
653 PATPUSH (0);
654 }
655 PATPUSH (c);
656 (*pending_exact)++;
657 }
658 }
659
660 if (fixup_jump)
661 store_jump (fixup_jump, jump, b);
662
663 if (stackp != stackb) goto unmatched_open;
664
665 bufp->used = b - bufp->buffer;
666 return 0;
667
668 invalid_pattern:
669 return "Invalid regular expression";
670
671 unmatched_open:
672 return "Unmatched \\(";
673
674 unmatched_close:
675 return "Unmatched \\)";
676
677 end_of_pattern:
678 return "Premature end of regular expression";
679
680 nesting_too_deep:
681 return "Nesting too deep";
682
683 too_big:
684 return "Regular expression too big";
685
686 memory_exhausted:
687 return "Memory exhausted";
688 }
689
690 /* Store where `from' points a jump operation to jump to where `to' points.
691 `opcode' is the opcode to store. */
692
693 static int
694 store_jump (from, opcode, to)
695 char *from, *to;
696 char opcode;
697 {
698 from[0] = opcode;
699 from[1] = (to - (from + 3)) & 0377;
700 from[2] = (to - (from + 3)) >> 8;
701 }
702
703 /* Open up space at char FROM, and insert there a jump to TO.
704 CURRENT_END gives te end of the storage no in use,
705 so we know how much data to copy up.
706 OP is the opcode of the jump to insert.
707
708 If you call this function, you must zero out pending_exact. */
709
710 static int
711 insert_jump (op, from, to, current_end)
712 char op;
713 char *from, *to, *current_end;
714 {
715 register char *pto = current_end + 3;
716 register char *pfrom = current_end;
717 while (pfrom != from)
718 *--pto = *--pfrom;
719 store_jump (from, op, to);
720 }
721
722 /* Given a pattern, compute a fastmap from it.
723 The fastmap records which of the (1 << BYTEWIDTH) possible characters
724 can start a string that matches the pattern.
725 This fastmap is used by re_search to skip quickly over totally
726 implausible text.
727
728 The caller must supply the address of a (1 << BYTEWIDTH)-byte data area
729 as bufp->fastmap.
730 The other components of bufp describe the pattern to be used. */
731
732 void
733 re_compile_fastmap (bufp)
734 struct re_pattern_buffer *bufp;
735 {
736 unsigned char *pattern = (unsigned char *) bufp->buffer;
737 int size = bufp->used;
738 register char *fastmap = bufp->fastmap;
739 register unsigned char *p = pattern;
740 register unsigned char *pend = pattern + size;
741 register int j, k;
742 unsigned char *translate = (unsigned char *) bufp->translate;
743
744 unsigned char *stackb[NFAILURES];
745 unsigned char **stackp = stackb;
746
747 bzero (fastmap, (1 << BYTEWIDTH));
748 bufp->fastmap_accurate = 1;
749 bufp->can_be_null = 0;
750
751 while (p)
752 {
753 if (p == pend)
754 {
755 bufp->can_be_null = 1;
756 break;
757 }
758 #ifdef SWITCH_ENUM_BUG
759 switch ((int) ((enum regexpcode) *p++))
760 #else
761 switch ((enum regexpcode) *p++)
762 #endif
763 {
764 case exactn:
765 if (translate)
766 fastmap[translate[p[1]]] = 1;
767 else
768 fastmap[p[1]] = 1;
769 break;
770
771 case begline:
772 case before_dot:
773 case at_dot:
774 case after_dot:
775 case begbuf:
776 case endbuf:
777 case wordbound:
778 case notwordbound:
779 case wordbeg:
780 case wordend:
781 continue;
782
783 case endline:
784 if (translate)
785 fastmap[translate['\n']] = 1;
786 else
787 fastmap['\n'] = 1;
788 if (bufp->can_be_null != 1)
789 bufp->can_be_null = 2;
790 break;
791
792 case finalize_jump:
793 case maybe_finalize_jump:
794 case jump:
795 case dummy_failure_jump:
796 bufp->can_be_null = 1;
797 j = *p++ & 0377;
798 j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
799 p += j + 1; /* The 1 compensates for missing ++ above */
800 if (j > 0)
801 continue;
802 /* Jump backward reached implies we just went through
803 the body of a loop and matched nothing.
804 Opcode jumped to should be an on_failure_jump.
805 Just treat it like an ordinary jump.
806 For a * loop, it has pushed its failure point already;
807 if so, discard that as redundant. */
808 if ((enum regexpcode) *p != on_failure_jump)
809 continue;
810 p++;
811 j = *p++ & 0377;
812 j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
813 p += j + 1; /* The 1 compensates for missing ++ above */
814 if (stackp != stackb && *stackp == p)
815 stackp--;
816 continue;
817
818 case on_failure_jump:
819 j = *p++ & 0377;
820 j += SIGN_EXTEND_CHAR (*(char *)p) << 8;
821 p++;
822 *++stackp = p + j;
823 continue;
824
825 case start_memory:
826 case stop_memory:
827 p++;
828 continue;
829
830 case duplicate:
831 bufp->can_be_null = 1;
832 fastmap['\n'] = 1;
833 case anychar:
834 for (j = 0; j < (1 << BYTEWIDTH); j++)
835 if (j != '\n')
836 fastmap[j] = 1;
837 if (bufp->can_be_null)
838 return;
839 /* Don't return; check the alternative paths
840 so we can set can_be_null if appropriate. */
841 break;
842
843 case wordchar:
844 for (j = 0; j < (1 << BYTEWIDTH); j++)
845 if (SYNTAX (j) == Sword)
846 fastmap[j] = 1;
847 break;
848
849 case notwordchar:
850 for (j = 0; j < (1 << BYTEWIDTH); j++)
851 if (SYNTAX (j) != Sword)
852 fastmap[j] = 1;
853 break;
854
855 #ifdef emacs
856 case syntaxspec:
857 k = *p++;
858 for (j = 0; j < (1 << BYTEWIDTH); j++)
859 if (SYNTAX (j) == (enum syntaxcode) k)
860 fastmap[j] = 1;
861 break;
862
863 case notsyntaxspec:
864 k = *p++;
865 for (j = 0; j < (1 << BYTEWIDTH); j++)
866 if (SYNTAX (j) != (enum syntaxcode) k)
867 fastmap[j] = 1;
868 break;
869 #endif /* emacs */
870
871 case charset:
872 for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
873 if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
874 {
875 if (translate)
876 fastmap[translate[j]] = 1;
877 else
878 fastmap[j] = 1;
879 }
880 break;
881
882 case charset_not:
883 /* Chars beyond end of map must be allowed */
884 for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
885 if (translate)
886 fastmap[translate[j]] = 1;
887 else
888 fastmap[j] = 1;
889
890 for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
891 if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
892 {
893 if (translate)
894 fastmap[translate[j]] = 1;
895 else
896 fastmap[j] = 1;
897 }
898 break;
899 }
900
901 /* Get here means we have successfully found the possible starting
902 characters of one path of the pattern. We need not follow this
903 path any farther. Instead, look at the next alternative
904 remembered in the stack. */
905 if (stackp != stackb)
906 p = *stackp--;
907 else
908 break;
909 }
910 }
911
912 /* Like re_search_2, below, but only one string is specified. */
913
914 int
915 re_search (pbufp, string, size, startpos, range, regs)
916 struct re_pattern_buffer *pbufp;
917 char *string;
918 int size, startpos, range;
919 struct re_registers *regs;
920 {
921 return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size);
922 }
923
924 /* Like re_match_2 but tries first a match starting at index STARTPOS,
925 then at STARTPOS + 1, and so on.
926 RANGE is the number of places to try before giving up.
927 If RANGE is negative, the starting positions tried are
928 STARTPOS, STARTPOS - 1, etc.
929 It is up to the caller to make sure that range is not so large
930 as to take the starting position outside of the input strings.
931
932 The value returned is the position at which the match was found,
933 or -1 if no match was found,
934 or -2 if error (such as failure stack overflow). */
935
936 int
937 re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs,
938 mstop)
939 struct re_pattern_buffer *pbufp;
940 char *string1, *string2;
941 int size1, size2;
942 int startpos;
943 register int range;
944 struct re_registers *regs;
945 int mstop;
946 {
947 register char *fastmap = pbufp->fastmap;
948 register unsigned char *translate = (unsigned char *) pbufp->translate;
949 int total = size1 + size2;
950 int val;
951
952 /* Update the fastmap now if not correct already */
953 if (fastmap && !pbufp->fastmap_accurate)
954 re_compile_fastmap (pbufp);
955
956 /* Don't waste time in a long search for a pattern
957 that says it is anchored. */
958 if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
959 && range > 0)
960 {
961 if (startpos > 0)
962 return -1;
963 else
964 range = 1;
965 }
966
967 while (1)
968 {
969 /* If a fastmap is supplied, skip quickly over characters
970 that cannot possibly be the start of a match.
971 Note, however, that if the pattern can possibly match
972 the null string, we must test it at each starting point
973 so that we take the first null string we get. */
974
975 if (fastmap && startpos < total && pbufp->can_be_null != 1)
976 {
977 if (range > 0)
978 {
979 register int lim = 0;
980 register unsigned char *p;
981 int irange = range;
982 if (startpos < size1 && startpos + range >= size1)
983 lim = range - (size1 - startpos);
984
985 p = ((unsigned char *)
986 &(startpos >= size1 ? string2 - size1 : string1)[startpos]);
987
988 if (translate)
989 {
990 while (range > lim && !fastmap[translate[*p++]])
991 range--;
992 }
993 else
994 {
995 while (range > lim && !fastmap[*p++])
996 range--;
997 }
998 startpos += irange - range;
999 }
1000 else
1001 {
1002 register unsigned char c;
1003 if (startpos >= size1) c = string2[startpos - size1];
1004 else c = string1[startpos];
1005 c &= 0xff;
1006 if (translate ? !fastmap[translate[c]] : !fastmap[c])
1007 goto advance;
1008 }
1009 }
1010
1011 if (range >= 0 && startpos == total
1012 && fastmap && pbufp->can_be_null == 0)
1013 return -1;
1014
1015 val = re_match_2 (pbufp, string1, size1, string2, size2, startpos,
1016 regs, mstop);
1017 if (0 <= val)
1018 {
1019 if (val == -2)
1020 return -2;
1021 return startpos;
1022 }
1023
1024 #ifdef C_ALLOCA
1025 alloca (0);
1026 #endif /* C_ALLOCA */
1027
1028 advance:
1029 if (!range) break;
1030 if (range > 0) range--, startpos++; else range++, startpos--;
1031 }
1032 return -1;
1033 }
1034
1035 #ifndef emacs /* emacs never uses this */
1036 int
1037 re_match (pbufp, string, size, pos, regs)
1038 struct re_pattern_buffer *pbufp;
1039 char *string;
1040 int size, pos;
1041 struct re_registers *regs;
1042 {
1043 return re_match_2 (pbufp, 0, 0, string, size, pos, regs, size);
1044 }
1045 #endif /* emacs */
1046
1047 /* Maximum size of failure stack. Beyond this, overflow is an error. */
1048 /*
1049 * Modification: failure stack size increased from 2000 to FAILURE_STACK
1050 * By : Po Cheung, po@cerc.utexas.edu
1051 * Date : April 15, 1990
1052 */
1053
1054 int re_max_failures = FAILURE_STACK;
1055
1056 /* Match the pattern described by PBUFP
1057 against data which is the virtual concatenation of STRING1 and STRING2.
1058 SIZE1 and SIZE2 are the sizes of the two data strings.
1059 Start the match at position POS.
1060 Do not consider matching past the position MSTOP.
1061
1062 If pbufp->fastmap is nonzero, then it had better be up to date.
1063
1064 The reason that the data to match are specified as two components
1065 which are to be regarded as concatenated
1066 is so this function can be used directly on the contents of an Emacs buffer.
1067
1068 -1 is returned if there is no match. -2 is returned if there is
1069 an error (such as match stack overflow). Otherwise the value is the length
1070 of the substring which was matched. */
1071
1072 #define ALLOCA_CHECK(pt) {if ((pt) == NULL || (pt) == (char**) 0xdeadbeef) { fprintf(stderr,"alloca failed\n"); exit(1);}}
1073
1074 int
1075 re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
1076 struct re_pattern_buffer *pbufp;
1077 char *string1, *string2;
1078 int size1, size2;
1079 int pos;
1080 struct re_registers *regs;
1081 int mstop;
1082 {
1083 register char *p = pbufp->buffer;
1084 register char *pend = p + pbufp->used;
1085 /* End of first string */
1086 char *end1;
1087 /* End of second string */
1088 char *end2;
1089 /* Pointer just past last char to consider matching */
1090 char *end_match_1, *end_match_2;
1091 register char *d, *dend;
1092 register int mcnt;
1093 unsigned char *translate = (unsigned char *) pbufp->translate;
1094
1095 /* Failure point stack. Each place that can handle a failure further
1096 down the line pushes a failure point on this stack. It consists of
1097 two char *'s.
1098 The first one pushed is where to resume scanning the pattern;
1099 the second pushed is where to resume scanning the strings.
1100 If the latter is zero, the failure point is a "dummy".
1101 If a failure happens and the innermost failure point is dormant,
1102 it discards that failure point and tries the next one. */
1103
1104 char **stackb = (char **) alloca (2 * NFAILURES * sizeof (char *));
1105 char **stackp = stackb, **stacke = &stackb[2 * NFAILURES];
1106
1107 /* Information on the "contents" of registers.
1108 These are pointers into the input strings; they record
1109 just what was matched (on this attempt) by some part of the pattern.
1110 The start_memory command stores the start of a register's contents
1111 and the stop_memory command stores the end.
1112
1113 At that point, regstart[regnum] points to the first character in the
1114 register, regend[regnum] points to the first character beyond the end
1115 of the register,
1116 regstart_seg1[regnum] is true iff regstart[regnum] points into string1,
1117 and regend_seg1[regnum] is true iff regend[regnum] points into string1. */
1118
1119 char *regstart[RE_NREGS];
1120 char *regend[RE_NREGS];
1121 char regstart_seg1[RE_NREGS], regend_seg1[RE_NREGS];
1122
1123 ALLOCA_CHECK(stackb);
1124
1125 /* Set up pointers to ends of strings.
1126 Don't allow the second string to be empty unless both are empty. */
1127 if (!size2)
1128 {
1129 string2 = string1;
1130 size2 = size1;
1131 string1 = 0;
1132 size1 = 0;
1133 }
1134 end1 = string1 + size1;
1135 end2 = string2 + size2;
1136
1137 /* Compute where to stop matching, within the two strings */
1138 if (mstop <= size1)
1139 {
1140 end_match_1 = string1 + mstop;
1141 end_match_2 = string2;
1142 }
1143 else
1144 {
1145 end_match_1 = end1;
1146 end_match_2 = string2 + mstop - size1;
1147 }
1148
1149 /* Initialize \( and \) text positions to -1
1150 to mark ones that no \( or \) has been seen for. */
1151
1152 for (mcnt = 0; mcnt < sizeof (regstart) / sizeof (*regstart); mcnt++)
1153 regstart[mcnt] = (char *) -1;
1154
1155 assert(regstart[0] == (char *) -1);
1156
1157 /* `p' scans through the pattern as `d' scans through the data.
1158 `dend' is the end of the input string that `d' points within.
1159 `d' is advanced into the following input string whenever necessary,
1160 but this happens before fetching;
1161 therefore, at the beginning of the loop,
1162 `d' can be pointing at the end of a string,
1163 but it cannot equal string2. */
1164
1165 if (pos <= size1)
1166 d = string1 + pos, dend = end_match_1;
1167 else
1168 d = string2 + pos - size1, dend = end_match_2;
1169
1170 /* Write PREFETCH; just before fetching a character with *d. */
1171 #define PREFETCH \
1172 while (d == dend) \
1173 { if (dend == end_match_2) goto fail; /* end of string2 => failure */ \
1174 d = string2; /* end of string1 => advance to string2. */ \
1175 dend = end_match_2; }
1176
1177 /* This loop loops over pattern commands.
1178 It exits by returning from the function if match is complete,
1179 or it drops through if match fails at this starting point in the
1180 input data. */
1181
1182 while (1)
1183 {
1184 if (p == pend)
1185 /* End of pattern means we have succeeded! */
1186 {
1187 /* If caller wants register contents data back, convert it to indices */
1188 if (regs)
1189 {
1190 regs->start[0] = pos;
1191 if (dend == end_match_1)
1192 regs->end[0] = d - string1;
1193 else
1194 regs->end[0] = d - string2 + size1;
1195 for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
1196 {
1197 if (regstart[mcnt] == (char *) -1)
1198 {
1199 regs->start[mcnt] = -1;
1200 regs->end[mcnt] = -1;
1201 continue;
1202 }
1203 if (regstart_seg1[mcnt])
1204 regs->start[mcnt] = regstart[mcnt] - string1;
1205 else
1206 regs->start[mcnt] = regstart[mcnt] - string2 + size1;
1207 if (regend_seg1[mcnt])
1208 regs->end[mcnt] = regend[mcnt] - string1;
1209 else
1210 regs->end[mcnt] = regend[mcnt] - string2 + size1;
1211 }
1212 }
1213 if (dend == end_match_1)
1214 return (d - string1 - pos);
1215 else
1216 return d - string2 + size1 - pos;
1217 }
1218
1219 /* Otherwise match next pattern command */
1220 #ifdef SWITCH_ENUM_BUG
1221 switch ((int) ((enum regexpcode) *p++))
1222 #else
1223 switch ((enum regexpcode) *p++)
1224 #endif
1225 {
1226
1227 /* \( is represented by a start_memory, \) by a stop_memory.
1228 Both of those commands contain a "register number" argument.
1229 The text matched within the \( and \) is recorded under that number.
1230 Then, \<digit> turns into a `duplicate' command which
1231 is followed by the numeric value of <digit> as the register number. */
1232
1233 case start_memory:
1234 regstart[*p] = d;
1235 regstart_seg1[*p++] = (dend == end_match_1);
1236 break;
1237
1238 case stop_memory:
1239 regend[*p] = d;
1240 regend_seg1[*p++] = (dend == end_match_1);
1241 break;
1242
1243 case duplicate:
1244 {
1245 int regno = *p++; /* Get which register to match against */
1246 register char *d2, *dend2;
1247
1248 d2 = regstart[regno];
1249 dend2 = (regstart_seg1[regno] == regend_seg1[regno])
1250 ? regend[regno]
1251 : end_match_1;
1252 while (1)
1253 {
1254 /* Advance to next segment in register contents, if necessary */
1255 while (d2 == dend2)
1256 {
1257 if (dend2 == end_match_2) break;
1258 if (dend2 == regend[regno]) break;
1259 /* end of string1 => advance to string2. */
1260 d2 = string2, dend2 = regend[regno];
1261 }
1262 /* At end of register contents => success */
1263 if (d2 == dend2) break;
1264
1265 /* Advance to next segment in data being matched, if necessary */
1266 PREFETCH;
1267
1268 /* mcnt gets # consecutive chars to compare */
1269 mcnt = dend - d;
1270 if (mcnt > dend2 - d2)
1271 mcnt = dend2 - d2;
1272 /* Compare that many; failure if mismatch, else skip them. */
1273 #ifndef __STDC__ /* (MJH) */
1274 if (translate ? bcmp_translate (d, d2, mcnt, translate) : bcmp (d, d2, mcnt))
1275 #else
1276 if (translate ? bcmp_translate ((unsigned char *)d,
1277 (unsigned char *)d2,
1278 mcnt,
1279 translate)
1280 : bcmp (d, d2, mcnt))
1281 #endif /* __STDC__ */
1282 goto fail;
1283 d += mcnt, d2 += mcnt;
1284 }
1285 }
1286 break;
1287
1288 case anychar:
1289 /* fetch a data character */
1290 PREFETCH;
1291 /* Match anything but a newline. */
1292 if ((translate ? translate[*(unsigned char *)d++] : *d++) == '\n')
1293 goto fail;
1294 break;
1295
1296 case charset:
1297 case charset_not:
1298 {
1299 /* Nonzero for charset_not */
1300 int not = 0;
1301 register int c;
1302 if (*(p - 1) == (char) charset_not)
1303 not = 1;
1304
1305 /* fetch a data character */
1306 PREFETCH;
1307
1308 if (translate)
1309 c = translate [*(unsigned char *)d];
1310 else
1311 c = *(unsigned char *)d;
1312
1313 if (c < *p * BYTEWIDTH
1314 && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
1315 not = !not;
1316
1317 p += 1 + *p;
1318
1319 if (!not) goto fail;
1320 d++;
1321 break;
1322 }
1323
1324 case begline:
1325 if (d == string1 || d[-1] == '\n')
1326 break;
1327 goto fail;
1328
1329 case endline:
1330 if (d == end2
1331 || (d == end1 ? (size2 == 0 || *string2 == '\n') : *d == '\n'))
1332 break;
1333 goto fail;
1334
1335 /* "or" constructs ("|") are handled by starting each alternative
1336 with an on_failure_jump that points to the start of the next alternative.
1337 Each alternative except the last ends with a jump to the joining point.
1338 (Actually, each jump except for the last one really jumps
1339 to the following jump, because tensioning the jumps is a hassle.) */
1340
1341 /* The start of a stupid repeat has an on_failure_jump that points
1342 past the end of the repeat text.
1343 This makes a failure point so that, on failure to match a repetition,
1344 matching restarts past as many repetitions have been found
1345 with no way to fail and look for another one. */
1346
1347 /* A smart repeat is similar but loops back to the on_failure_jump
1348 so that each repetition makes another failure point. */
1349
1350 case on_failure_jump:
1351 if (stackp == stacke)
1352 {
1353 char **stackx;
1354 if (stacke - stackb > re_max_failures)
1355 return -2;
1356 stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *));
1357 ALLOCA_CHECK(stackx);
1358 bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
1359 stackp += stackx - stackb;
1360 stacke = stackx + 2 * (stacke - stackb);
1361 stackb = stackx;
1362 }
1363 mcnt = *p++ & 0377;
1364 mcnt += SIGN_EXTEND_CHAR (*p) << 8;
1365 p++;
1366 *stackp++ = mcnt + p;
1367 *stackp++ = d;
1368 break;
1369
1370 /* The end of a smart repeat has an maybe_finalize_jump back.
1371 Change it either to a finalize_jump or an ordinary jump. */
1372
1373 case maybe_finalize_jump:
1374 mcnt = *p++ & 0377;
1375 mcnt += SIGN_EXTEND_CHAR (*p) << 8;
1376 p++;
1377 /* Compare what follows with the begining of the repeat.
1378 If we can establish that there is nothing that they would
1379 both match, we can change to finalize_jump */
1380 if (p == pend)
1381 p[-3] = (char) finalize_jump;
1382 else if (*p == (char) exactn || *p == (char) endline)
1383 {
1384 register int c = *p == (char) endline ? '\n' : p[2];
1385 register char *p1 = p + mcnt;
1386 /* p1[0] ... p1[2] are an on_failure_jump.
1387 Examine what follows that */
1388 if (p1[3] == (char) exactn && p1[5] != c)
1389 p[-3] = (char) finalize_jump;
1390 else if (p1[3] == (char) charset || p1[3] == (char) charset_not)
1391 {
1392 int not = p1[3] == (char) charset_not;
1393 if (c < p1[4] * BYTEWIDTH
1394 && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
1395 not = !not;
1396 /* not is 1 if c would match */
1397 /* That means it is not safe to finalize */
1398 if (!not)
1399 p[-3] = (char) finalize_jump;
1400 }
1401 }
1402 p -= 2;
1403 if (p[-1] != (char) finalize_jump)
1404 {
1405 p[-1] = (char) jump;
1406 goto nofinalize;
1407 }
1408
1409 /* The end of a stupid repeat has a finalize-jump
1410 back to the start, where another failure point will be made
1411 which will point after all the repetitions found so far. */
1412
1413 case finalize_jump:
1414 stackp -= 2;
1415
1416 case jump:
1417 nofinalize:
1418 mcnt = *p++ & 0377;
1419 mcnt += SIGN_EXTEND_CHAR (*p) << 8;
1420 p += mcnt + 1; /* The 1 compensates for missing ++ above */
1421 break;
1422
1423 case dummy_failure_jump:
1424 if (stackp == stacke)
1425 {
1426 char **stackx = (char **) alloca (2 * (stacke - stackb) *
1427 sizeof (char *));
1428 ALLOCA_CHECK(stackx);
1429 bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
1430 stackp += stackx - stackb;
1431 stacke = stackx + 2 * (stacke - stackb);
1432 stackb = stackx;
1433 }
1434 *stackp++ = 0;
1435 *stackp++ = 0;
1436 goto nofinalize;
1437
1438 case wordbound:
1439 if (d == string1 /* Points to first char */
1440 || d == end2 /* Points to end */
1441 || (d == end1 && size2 == 0)) /* Points to end */
1442 break;
1443 if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
1444 != (SYNTAX (d == end1 ? *(unsigned char *)string2 :
1445 *(unsigned char *)d) == Sword))
1446 break;
1447 goto fail;
1448
1449 case notwordbound:
1450 if (d == string1 /* Points to first char */
1451 || d == end2 /* Points to end */
1452 || (d == end1 && size2 == 0)) /* Points to end */
1453 goto fail;
1454 if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
1455 != (SYNTAX (d == end1 ? *(unsigned char *)string2 :
1456 *(unsigned char *)d) == Sword))
1457 goto fail;
1458 break;
1459
1460 case wordbeg:
1461 if (d == end2 /* Points to end */
1462 || (d == end1 && size2 == 0) /* Points to end */
1463 || SYNTAX (*(unsigned char *) (d == end1 ? string2 : d)) !=
1464 Sword) /* Next char not a letter */
1465 goto fail;
1466 /* prev char not letter */
1467 if (d == string1 /* Points to first char */
1468 || SYNTAX (((unsigned char *)d)[-1]) != Sword)
1469 break;
1470 goto fail;
1471
1472 case wordend:
1473 if (d == string1 /* Points to first char */
1474 || SYNTAX (((unsigned char *)d)[-1]) !=
1475 Sword) /* prev char not letter */
1476 goto fail;
1477 if (d == end2 /* Points to end */
1478 || (d == end1 && size2 == 0) /* Points to end */
1479 || SYNTAX (d == end1 ? *(unsigned char *)string2 :
1480 *(unsigned char *)d) !=
1481 Sword) /* Next char not a letter */
1482 break;
1483 goto fail;
1484
1485 #ifdef emacs
1486 case before_dot:
1487 if (((d - string2 <= (unsigned) size2)
1488 ? d - (char *) bf_p2 : d - (char *) bf_p1)
1489 <= point)
1490 goto fail;
1491 break;
1492
1493 case at_dot:
1494 if (((d - string2 <= (unsigned) size2)
1495 ? d - (char *) bf_p2 : d - (char *) bf_p1)
1496 == point)
1497 goto fail;
1498 break;
1499
1500 case after_dot:
1501 if (((d - string2 <= (unsigned) size2)
1502 ? d - (char *) bf_p2 : d - (char *) bf_p1)
1503 >= point)
1504 goto fail;
1505 break;
1506
1507 case wordchar:
1508 mcnt = (int) Sword;
1509 goto matchsyntax;
1510
1511 case syntaxspec:
1512 mcnt = *p++;
1513 matchsyntax:
1514 PREFETCH;
1515 if (SYNTAX (*(unsigned char *)d++) != (enum syntaxcode) mcnt) goto fail;
1516 break;
1517
1518 case notwordchar:
1519 mcnt = (int) Sword;
1520 goto matchnotsyntax;
1521
1522 case notsyntaxspec:
1523 mcnt = *p++;
1524 matchnotsyntax:
1525 PREFETCH;
1526 if (SYNTAX (*(unsigned char *)d++) == (enum syntaxcode) mcnt) goto fail;
1527 break;
1528 #else
1529 case wordchar:
1530 PREFETCH;
1531 if (SYNTAX (*(unsigned char *)d++) == 0) goto fail;
1532 break;
1533
1534 case notwordchar:
1535 PREFETCH;
1536 if (SYNTAX (*(unsigned char *)d++) != 0) goto fail;
1537 break;
1538 #endif /* not emacs */
1539
1540 case begbuf:
1541 if (d == string1) /* Note, d cannot equal string2 */
1542 break; /* unless string1 == string2. */
1543 goto fail;
1544
1545 case endbuf:
1546 if (d == end2 || (d == end1 && size2 == 0))
1547 break;
1548 goto fail;
1549
1550 case exactn:
1551 /* Match the next few pattern characters exactly.
1552 mcnt is how many characters to match. */
1553 mcnt = *p++;
1554 if (translate)
1555 {
1556 do
1557 {
1558 PREFETCH;
1559 if (translate[*(unsigned char *)d++] != *p++) goto fail;
1560 }
1561 while (--mcnt);
1562 }
1563 else
1564 {
1565 do
1566 {
1567 PREFETCH;
1568 if (*d++ != *p++) goto fail;
1569 }
1570 while (--mcnt);
1571 }
1572 break;
1573 }
1574 continue; /* Successfully matched one pattern command; keep matching */
1575
1576 /* Jump here if any matching operation fails. */
1577 fail:
1578 if (stackp != stackb)
1579 /* A restart point is known. Restart there and pop it. */
1580 {
1581 if (!stackp[-2])
1582 { /* If innermost failure point is dormant, flush it and keep looking */
1583 stackp -= 2;
1584 goto fail;
1585 }
1586 d = *--stackp;
1587 p = *--stackp;
1588 if (d >= string1 && d <= end1)
1589 dend = end_match_1;
1590 }
1591 else break; /* Matching at this starting point really fails! */
1592 }
1593 return -1; /* Failure to match */
1594 }
1595
1596 static int
1597 bcmp_translate (s1, s2, len, translate)
1598 unsigned char *s1, *s2;
1599 register int len;
1600 unsigned char *translate;
1601 {
1602 register unsigned char *p1 = s1, *p2 = s2;
1603 while (len)
1604 {
1605 if (translate [*p1++] != translate [*p2++]) return 1;
1606 len--;
1607 }
1608 return 0;
1609 }
1610
1611 /* Entry points compatible with bsd4.2 regex library */
1612
1613 #ifndef emacs
1614
1615 static struct re_pattern_buffer re_comp_buf;
1616
1617 char *
1618 re_comp (s)
1619 char *s;
1620 {
1621 if (!s)
1622 {
1623 if (!re_comp_buf.buffer)
1624 return "No previous regular expression";
1625 return 0;
1626 }
1627
1628 if (!re_comp_buf.buffer)
1629 {
1630 if (!(re_comp_buf.buffer = (char *) malloc (200)))
1631 return "Memory exhausted";
1632 re_comp_buf.allocated = 200;
1633 if (!(re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH)))
1634 return "Memory exhausted";
1635 }
1636 return re_compile_pattern (s, strlen (s), &re_comp_buf);
1637 }
1638
1639 int
1640 re_exec (s)
1641 char *s;
1642 {
1643 int len = strlen (s);
1644 return 0 <= re_search (&re_comp_buf, s, len, 0, len, 0);
1645 }
1646
1647 #endif /* emacs */
1648
1649 #ifdef test
1650
1651 #include <stdio.h>
1652
1653 /* Indexed by a character, gives the upper case equivalent of the character */
1654
1655 static char upcase[0400] =
1656 { 000, 001, 002, 003, 004, 005, 006, 007,
1657 010, 011, 012, 013, 014, 015, 016, 017,
1658 020, 021, 022, 023, 024, 025, 026, 027,
1659 030, 031, 032, 033, 034, 035, 036, 037,
1660 040, 041, 042, 043, 044, 045, 046, 047,
1661 050, 051, 052, 053, 054, 055, 056, 057,
1662 060, 061, 062, 063, 064, 065, 066, 067,
1663 070, 071, 072, 073, 074, 075, 076, 077,
1664 0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
1665 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
1666 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
1667 0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
1668 0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
1669 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
1670 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
1671 0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177,
1672 0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
1673 0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
1674 0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
1675 0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
1676 0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
1677 0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
1678 0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
1679 0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
1680 0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
1681 0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
1682 0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
1683 0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
1684 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
1685 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
1686 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
1687 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377
1688 };
1689
1690 main ()
1691 {
1692 char pat[80];
1693 struct re_pattern_buffer buf;
1694 int i;
1695 char c;
1696 char fastmap[(1 << BYTEWIDTH)];
1697
1698 buf.allocated = 40;
1699 buf.buffer = (char *) malloc (buf.allocated);
1700 buf.fastmap = fastmap;
1701 buf.translate = upcase;
1702
1703 while (1)
1704 {
1705 gets (pat);
1706
1707 if (*pat)
1708 {
1709 re_compile_pattern (pat, strlen(pat), &buf);
1710
1711 for (i = 0; i < buf.used; i++)
1712 printchar (buf.buffer[i]);
1713
1714 putchar ('\n');
1715
1716 printf ("%d allocated, %d used.\n", buf.allocated, buf.used);
1717
1718 re_compile_fastmap (&buf);
1719 printf ("Allowed by fastmap: ");
1720 for (i = 0; i < (1 << BYTEWIDTH); i++)
1721 if (fastmap[i]) printchar (i);
1722 putchar ('\n');
1723 }
1724
1725 gets (pat); /* Now read the string to match against */
1726
1727 i = re_match (&buf, pat, strlen (pat), 0, 0);
1728 printf ("Match value %d.\n", i);
1729 }
1730 }
1731
1732 #ifdef NOTDEF
1733 print_buf (bufp)
1734 struct re_pattern_buffer *bufp;
1735 {
1736 int i;
1737
1738 printf ("buf is :\n----------------\n");
1739 for (i = 0; i < bufp->used; i++)
1740 printchar (bufp->buffer[i]);
1741
1742 printf ("\n%d allocated, %d used.\n", bufp->allocated, bufp->used);
1743
1744 printf ("Allowed by fastmap: ");
1745 for (i = 0; i < (1 << BYTEWIDTH); i++)
1746 if (bufp->fastmap[i])
1747 printchar (i);
1748 printf ("\nAllowed by translate: ");
1749 if (bufp->translate)
1750 for (i = 0; i < (1 << BYTEWIDTH); i++)
1751 if (bufp->translate[i])
1752 printchar (i);
1753 printf ("\nfastmap is%s accurate\n", bufp->fastmap_accurate ? "" : "n't");
1754 printf ("can %s be null\n----------", bufp->can_be_null ? "" : "not");
1755 }
1756 #endif
1757
1758 printchar (c)
1759 char c;
1760 {
1761 if (c < 041 || c >= 0177)
1762 {
1763 putchar ('\\');
1764 putchar (((c >> 6) & 3) + '0');
1765 putchar (((c >> 3) & 7) + '0');
1766 putchar ((c & 7) + '0');
1767 }
1768 else
1769 putchar (c);
1770 }
1771
1772 error (string)
1773 char *string;
1774 {
1775 puts (string);
1776 #ifdef CREATE_IO_WINDOW
1777 if (iowinpid) kill(iowinpid, SIGKILL);
1778 iowinpid = 0;
1779 #endif /* CREATE_IO_WINDOW */
1780 exit (1);
1781 }
1782
1783 #endif /* test */