"Fossies" - the Fresh Open Source Software Archive

Member "tre-0.8.0/doc/tre-api.html" (16 Apr 2009, 31050 Bytes) of package /linux/misc/tre-0.8.0.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) HTML source code syntax highlighting (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 <h1>TRE API reference manual</h1>
    2 
    3 <h2>The <tt>regcomp()</tt> functions</h2>
    4 <a name="regcomp"></a>
    5 
    6 <div class="code">
    7 <code>
    8 #include &lt;tre/regex.h&gt;
    9 <br>
   10 <br>
   11 <font class="type">int</font>
   12 <font class="func">regcomp</font>(<font
   13 class="type">regex_t</font> *<font class="arg">preg</font>,
   14 <font class="qual">const</font> <font class="type">char</font>
   15 *<font class="arg">regex</font>, <font class="type">int</font>
   16 <font class="arg">cflags</font>);
   17 <br>
   18 <font class="type">int</font> <font
   19 class="func">regncomp</font>(<font class="type">regex_t</font>
   20 *<font class="arg">preg</font>, <font class="qual">const</font>
   21 <font class="type">char</font> *<font class="arg">regex</font>,
   22 <font class="type">size_t</font> <font class="arg">len</font>,
   23 <font class="type">int</font> <font class="arg">cflags</font>);
   24 <br>
   25 <font class="type">int</font> <font
   26 class="func">regwcomp</font>(<font class="type">regex_t</font>
   27 *<font class="arg">preg</font>, <font class="qual">const</font>
   28 <font class="type">wchar_t</font> *<font
   29 class="arg">regex</font>, <font class="type">int</font> <font
   30 class="arg">cflags</font>);
   31 <br>
   32 <font class="type">int</font> <font
   33 class="func">regwncomp</font>(<font class="type">regex_t</font>
   34 *<font class="arg">preg</font>, <font class="qual">const</font>
   35 <font class="type">wchar_t</font> *<font
   36 class="arg">regex</font>, <font class="type">size_t</font>
   37 <font class="arg">len</font>, <font class="type">int</font>
   38 <font class="arg">cflags</font>);
   39 <br>
   40 <font class="type">void</font> <font
   41 class="func">regfree</font>(<font class="type">regex_t</font>
   42 *<font class="arg">preg</font>);
   43 <br>
   44 </code>
   45 </div>
   46 
   47 <p>
   48 The <tt><font class="func">regcomp</font>()</tt> function compiles
   49 the regex string pointed to by <tt><font
   50 class="arg">regex</font></tt> to an internal representation and
   51 stores the result in the pattern buffer structure pointed to by
   52 <tt><font class="arg">preg</font></tt>.  The <tt><font
   53 class="func">regncomp</font>()</tt> function is like <tt><font
   54 class="func">regcomp</font>()</tt>, but <tt><font
   55 class="arg">regex</font></tt> is not terminated with the null
   56 byte.  Instead, the <tt><font class="arg">len</font></tt> argument
   57 is used to give the length of the string, and the string may contain
   58 null bytes.  The <tt><font class="func">regwcomp</font>()</tt> and
   59 <tt><font class="func">regwncomp</font>()</tt> functions work like
   60 <tt><font class="func">regcomp</font>()</tt> and <tt><font
   61 class="func">regncomp</font>()</tt>, respectively, but take a wide
   62 character (<tt><font class="type">wchar_t</font></tt>) string
   63 instead of a byte string.
   64 </p>
   65 
   66 <p>
   67 The <tt><font class="arg">cflags</font></tt> argument is a the
   68 bitwise inclusive OR of zero or more of the following flags (defined
   69 in the header <tt>&lt;tre/regex.h&gt;</tt>):
   70 </p>
   71 
   72 <blockquote>
   73 <dl>
   74 <dt><tt>REG_EXTENDED</tt></dt>
   75 <dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when
   76 compiling <tt><font class="arg">regex</font></tt>.  The default
   77 syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is
   78 considered obsolete.</dd>
   79 
   80 <dt><tt>REG_ICASE</tt></dt>
   81 <dd>Ignore case.  Subsequent searches with the <a
   82 href="#regexec"><tt>regexec</tt></a> family of functions using this
   83 pattern buffer will be case insensitive.</dd>
   84 
   85 <dt><tt>REG_NOSUB</tt></dt>
   86 <dd>Do not report submatches.  Subsequent searches with the <a
   87 href="#regexec"><tt>regexec</tt></a> family of functions will only
   88 report whether a match was found or not and will not fill the submatch
   89 array.</dd>
   90 
   91 <dt><tt>REG_NEWLINE</tt></dt>
   92 <dd>Normally the newline character is treated as an ordinary
   93 character.  When this flag is used, the newline character
   94 (<tt>'\n'</tt>, ASCII code 10) is treated specially as follows:
   95 <ol>
   96 <li>The match-any-character operator (dot <tt>"."</tt> outside a
   97 bracket expression) does not match a newline.</li>
   98 <li>A non-matching list (<tt>[^...]</tt>) not containing a newline
   99 does not match a newline.</li>
  100 <li>The match-beginning-of-line operator <tt>^</tt> matches the empty
  101 string immediately after a newline as well as the empty string at the
  102 beginning of the string (but see the <code>REG_NOTBOL</code>
  103 <code>regexec()</code> flag below).
  104 <li>The match-end-of-line operator <tt>$</tt> matches the empty
  105 string immediately before a newline as well as the empty string at the
  106 end of the string (but see the <code>REG_NOTEOL</code>
  107 <code>regexec()</code> flag below).
  108 </ol>
  109 </dd>
  110 
  111 <dt><tt>REG_LITERAL</tt></dt>
  112 <dd>Interpret the entire <tt><font class="arg">regex</font></tt>
  113 argument as a literal string, that is, all characters will be
  114 considered ordinary.  This is a nonstandard extension, compatible with
  115 but not specified by POSIX.</dd>
  116 
  117 <dt><tt>REG_NOSPEC</tt></dt>
  118 <dd>Same as <tt>REG_LITERAL</tt>.  This flag is provided for
  119 compatibility with BSD.</dd>
  120 
  121 <dt><tt>REG_RIGHT_ASSOC</tt></dt>
  122 <dd>By default, concatenation is left associative in TRE, as per
  123 the grammar given in the <a
  124 href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base
  125 specifications on regular expressions</a> of Std 1003.1-2001 (POSIX).
  126 This flag flips associativity of concatenation to right associative.
  127 Associativity can have an effect on how a match is divided into
  128 submatches, but does not change what is matched by the entire regexp.
  129 </dd>
  130 
  131 <dt><tt>REG_UNGREEDY</tt></dt>
  132 <dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and
  133 can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior
  134 by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd>
  135 </dl>
  136 </blockquote>
  137 
  138 <p>
  139 After a successful call to <tt><font class="func">regcomp</font></tt> it is
  140 possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for
  141 searching for matches in strings (see below).  Once the pattern buffer is no
  142 longer needed, it should be freed with <tt><font
  143 class="func">regfree</font></tt> to free the memory allocated for it.
  144 </p>
  145 
  146 
  147 <p>
  148 The <tt><font class="type">regex_t</font></tt> structure has the
  149 following fields that the application can read:
  150 </p>
  151 <blockquote>
  152 <dl>
  153 <dt><tt><font class="type">size_t</font> <font
  154 class="arg">re_nsub</font></tt></dt>
  155 <dd>Number of parenthesized subexpressions in <tt><font
  156 class="arg">regex</font></tt>.
  157 </dd>
  158 </dl>
  159 </blockquote>
  160 
  161 <p>
  162 The <tt><font class="func">regcomp</font></tt> function returns
  163 zero if the compilation was successful, or one of the following error
  164 codes if there was an error:
  165 </p>
  166 <blockquote>
  167 <dl>
  168 <dt><tt>REG_BADPAT</tt></dt>
  169 <dd>Invalid regexp.  TRE returns this only if a multibyte character
  170 set is used in the current locale, and <tt><font
  171 class="arg">regex</font></tt> contained an invalid multibyte
  172 sequence.</dd>
  173 <dt><tt>REG_ECOLLATE</tt></dt>
  174 <dd>Invalid collating element referenced.  TRE returns this whenever
  175 equivalence classes or multicharacter collating elements are used in
  176 bracket expressions (they are not supported yet).</dd>
  177 <dt><tt>REG_ECTYPE</tt></dt>
  178 <dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd>
  179 <dt><tt>REG_EESCAPE</tt></dt>
  180 <dd>The last character of <tt><font class="arg">regex</font></tt>
  181 was a backslash (<tt>\</tt>).</dd>
  182 <dt><tt>REG_ESUBREG</tt></dt>
  183 <dd>Invalid back reference; number in <tt>\<i>digit</i></tt>
  184 invalid.</dd>
  185 <dt><tt>REG_EBRACK</tt></dt>
  186 <dd><tt>[]</tt> imbalance.</dd>
  187 <dt><tt>REG_EPAREN</tt></dt>
  188 <dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd>
  189 <dt><tt>REG_EBRACE</tt></dt>
  190 <dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd>
  191 <dt><tt>REG_BADBR</tt></dt>
  192 <dd><tt>{}</tt> content invalid: not a number, more than two numbers,
  193 first larger than second, or number too large.
  194 <dt><tt>REG_ERANGE</tt></dt>
  195 <dd>Invalid character range, e.g. ending point is earlier in the
  196 collating order than the starting point.</dd>
  197 <dt><tt>REG_ESPACE</tt></dt>
  198 <dd>Out of memory, or an internal limit exceeded.</dd>
  199 <dt><tt>REG_BADRPT</tt></dt>
  200 <dd>Invalid use of repetition operators: two or more repetition operators have
  201 been chained in an undefined way.</dd>
  202 </dl>
  203 </blockquote>
  204 
  205 
  206 <h2>The <tt>regexec()</tt> functions</h2>
  207 <a name="regexec"></a>
  208 
  209 <div class="code">
  210 <code>
  211 #include &lt;tre/regex.h&gt;
  212 <br>
  213 <br>
  214 <font class="type">int</font> <font
  215 class="func">regexec</font>(<font class="qual">const</font>
  216 <font class="type">regex_t</font> *<font
  217 class="arg">preg</font>, <font class="qual">const</font> <font
  218 class="type">char</font> *<font class="arg">string</font>,
  219 <font class="type">size_t</font> <font
  220 class="arg">nmatch</font>,
  221 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  222 <font class="type">regmatch_t</font> <font
  223 class="arg">pmatch</font>[], <font class="type">int</font>
  224 <font class="arg">eflags</font>);
  225 <br>
  226 <font class="type">int</font> <font
  227 class="func">regnexec</font>(<font class="qual">const</font>
  228 <font class="type">regex_t</font> *<font
  229 class="arg">preg</font>, <font class="qual">const</font> <font
  230 class="type">char</font> *<font class="arg">string</font>,
  231 <font class="type">size_t</font> <font class="arg">len</font>,
  232 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  233 <font class="type">size_t</font> <font
  234 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
  235 <font class="arg">pmatch</font>[], <font
  236 class="type">int</font> <font class="arg">eflags</font>);
  237 <br>
  238 <font class="type">int</font> <font
  239 class="func">regwexec</font>(<font class="qual">const</font>
  240 <font class="type">regex_t</font> *<font
  241 class="arg">preg</font>, <font class="qual">const</font> <font
  242 class="type">wchar_t</font> *<font class="arg">string</font>,
  243 <font class="type">size_t</font> <font
  244 class="arg">nmatch</font>,
  245 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  246 <font class="type">regmatch_t</font> <font
  247 class="arg">pmatch</font>[], <font class="type">int</font>
  248 <font class="arg">eflags</font>);
  249 <br>
  250 <font class="type">int</font> <font
  251 class="func">regwnexec</font>(<font class="qual">const</font>
  252 <font class="type">regex_t</font> *<font
  253 class="arg">preg</font>, <font class="qual">const</font> <font
  254 class="type">wchar_t</font> *<font class="arg">string</font>,
  255 <font class="type">size_t</font> <font class="arg">len</font>,
  256 <br>
  257 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  258 <font class="type">size_t</font> <font
  259 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
  260 <font class="arg">pmatch</font>[], <font
  261 class="type">int</font> <font class="arg">eflags</font>);
  262 </code>
  263 </div>
  264 
  265 <p>
  266 The <tt><font class="func">regexec</font>()</tt> function matches
  267 the null-terminated string against the compiled regexp <tt><font
  268 class="arg">preg</font></tt>, initialized by a previous call to
  269 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.  The
  270 <tt><font class="func">regnexec</font>()</tt> function is like
  271 <tt><font class="func">regexec</font>()</tt>, but <tt><font
  272 class="arg">string</font></tt> is not terminated with a null byte.
  273 Instead, the <tt><font class="arg">len</font></tt> argument is used
  274 to give the length of the string, and the string may contain null
  275 bytes.  The <tt><font class="func">regwexec</font>()</tt> and
  276 <tt><font class="func">regwnexec</font>()</tt> functions work like
  277 <tt><font class="func">regexec</font>()</tt> and <tt><font
  278 class="func">regnexec</font>()</tt>, respectively, but take a wide
  279 character (<tt><font class="type">wchar_t</font></tt>) string
  280 instead of a byte string. The <tt><font
  281 class="arg">eflags</font></tt> argument is a bitwise OR of zero or
  282 more of the following flags:
  283 </p>
  284 <blockquote>
  285 <dl>
  286 <dt><code>REG_NOTBOL</code></dt>
  287 <dd>
  288 <p>
  289 When this flag is used, the match-beginning-of-line operator
  290 <tt>^</tt> does not match the empty string at the beginning of
  291 <tt><font class="arg">string</font></tt>.  If
  292 <code>REG_NEWLINE</code> was used when compiling
  293 <tt><font class="arg">preg</font></tt> the empty string
  294 immediately after a newline character will still be matched.
  295 </p>
  296 </dd>
  297 
  298 <dt><code>REG_NOTEOL</code></dt>
  299 <dd>
  300 <p>
  301 When this flag is used, the match-end-of-line operator
  302 <tt>$</tt> does not match the empty string at the end of
  303 <tt><font class="arg">string</font></tt>.  If
  304 <code>REG_NEWLINE</code> was used when compiling
  305 <tt><font class="arg">preg</font></tt> the empty string
  306 immediately before a newline character will still be matched.
  307 </p>
  308 
  309 </dl>
  310 
  311 <p>
  312 These flags are useful when different portions of a string are passed
  313 to <code>regexec</code> and the beginning or end of the partial string
  314 should not be interpreted as the beginning or end of a line.
  315 </p>
  316 
  317 </blockquote>
  318 
  319 <p>
  320 If <code>REG_NOSUB</code> was used when compiling <tt><font
  321 class="arg">preg</font></tt>, <tt><font
  322 class="arg">nmatch</font></tt> is zero, or <tt><font
  323 class="arg">pmatch</font></tt> is <code>NULL</code>, then the
  324 <tt><font class="arg">pmatch</font></tt> argument is ignored.
  325 Otherwise, the submatches corresponding to the parenthesized
  326 subexpressions are filled in the elements of <tt><font
  327 class="arg">pmatch</font></tt>, which must be dimensioned to have
  328 at least <tt><font class="arg">nmatch</font></tt> elements.
  329 </p>
  330 
  331 <p>
  332 The <tt><font class="type">regmatch_t</font></tt> structure contains
  333 at least the following fields:
  334 </p>
  335 <blockquote>
  336 <dl>
  337 <dt><tt><font class="type">regoff_t</font> <font
  338 class="arg">rm_so</font></tt></dt>
  339 <dd>Offset from start of <tt><font class="arg">string</font></tt> to start of
  340 substring.  </dd>
  341 <dt><tt><font class="type">regoff_t</font> <font
  342 class="arg">rm_eo</font></tt></dt>
  343 <dd>Offset from start of <tt><font class="arg">string</font></tt> to the first
  344 character after the substring.  </dd>
  345 </dl>
  346 </blockquote>
  347 
  348 <p>
  349 The length of a submatch can be computed by subtracting <code>rm_eo</code> and
  350 <code>rm_so</code>.  If a parenthesized subexpression did not participate in a
  351 match, the <code>rm_so</code> and <code>rm_eo</code> fields for the
  352 corresponding <code>pmatch</code> element are set to <code>-1</code>.  Note
  353 that when a multibyte character set is in effect, the submatch offsets are
  354 given as byte offsets, not character offsets.
  355 </p>
  356 
  357 <p>
  358 The <code>regexec()</code> functions return zero if a match was found,
  359 otherwise they return <code>REG_NOMATCH</code> to indicate no match,
  360 or <code>REG_ESPACE</code> to indicate that enough temporary memory
  361 could not be allocated to complete the matching operation.
  362 </p>
  363 
  364 
  365 
  366 <h3>reguexec()</h3>
  367 
  368 <div class="code">
  369 <code>
  370 #include &lt;tre/regex.h&gt;
  371 <br>
  372 <br>
  373 <font class="qual">typedef struct</font> {
  374 <br>
  375 &nbsp;&nbsp;<font class="type">int</font> (*get_next_char)(<font
  376 class="type">tre_char_t</font> *<font class="arg">c</font>, <font
  377 class="type">unsigned int</font> *<font class="arg">pos_add</font>,
  378 <font class="type">void</font> *<font class="arg">context</font>);
  379 <br>
  380 &nbsp;&nbsp;<font class="type">void</font> (*rewind)(<font
  381 class="type">size_t</font> <font class="arg">pos</font>, <font
  382 class="type">void</font> *<font class="arg">context</font>);
  383 <br>
  384 &nbsp;&nbsp;<font class="type">int</font> (*compare)(<font
  385 class="type">size_t</font> <font class="arg">pos1</font>, <font
  386 class="type">size_t</font> <font class="arg">pos2</font>, <font
  387 class="type">size_t</font> <font class="arg">len</font>, <font
  388 class="type">void</font> *<font class="arg">context</font>);
  389 <br>
  390 &nbsp;&nbsp;<font class="type">void</font> *<font
  391 class="arg">context</font>;
  392 <br>
  393 } <font class="type">tre_str_source</font>;
  394 <br>
  395 <br>
  396 <font class="type">int</font> <font
  397 class="func">reguexec</font>(<font class="qual">const</font>
  398 <font class="type">regex_t</font> *<font
  399 class="arg">preg</font>, <font class="qual">const</font> <font
  400 class="type">tre_str_source</font> *<font class="arg">string</font>,
  401 <font class="type">size_t</font> <font class="arg">nmatch</font>,
  402 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  403 <font class="type">regmatch_t</font> <font
  404 class="arg">pmatch</font>[], <font class="type">int</font>
  405 <font class="arg">eflags</font>);
  406 </code>
  407 </div>
  408 
  409 <p>
  410 The <tt><font class="func">reguexec</font>()</tt> function works just
  411 like the other <tt>regexec()</tt> functions, except that the input
  412 string is read from user specified callback functions instead of a
  413 character array.  This makes it possible, for example, to match
  414 regexps over arbitrary user specified data structures.
  415 </p>
  416 
  417 <p>
  418 The <tt><font class="type">tre_str_source</font></tt> structure
  419 contains the following fields:
  420 </p>
  421 <blockquote>
  422 <dl>
  423 <dt><tt>get_next_char</tt></dt>
  424 <dd>This function must retrieve the next available character.  If a
  425 character is not available, the space pointed to by
  426 <tt><font class="arg">c</font></tt> must be set to zero and it must return
  427 a nonzero value.  If a character is available, it must be stored
  428 to the space pointed to by
  429 <tt><font class="arg">c</font></tt>, and the integer pointer to by
  430 <tt><font class="arg">pos_add</font></tt> must be set to the
  431 number of units advanced in the input (the value must be
  432 <tt>&gt;=1</tt>), and zero must be returned.</dd>
  433 
  434 <dt><tt>rewind</tt></dt>
  435 <dd>This function must rewind the input stream to the position
  436 specified by <tt><font class="arg">pos</font></tt>.  Unless the regexp
  437 uses back references, <tt>rewind</tt> is not needed and can be set to
  438 <tt>NULL</tt>.</dd>
  439 
  440 <dt><tt>compare</tt></dt>
  441 <dd>This function compares two substrings in the input streams
  442 starting at the positions specified by <tt><font
  443 class="arg">pos1</font></tt> and <tt><font
  444 class="arg">pos2</font></tt> of length <tt><font
  445 class="arg">len</font></tt>.  If the substrings are equal,
  446 <tt>compare</tt> must return zero, otherwise a nonzero value must be
  447 returned.  Unless the regexp uses back references, <tt>compare</tt> is
  448 not needed and can be set to <tt>NULL</tt>.</dd>
  449 
  450 <dt><tt>context</tt></dt>
  451 <dd>This is a context variable, passed as the last argument to
  452 all of the above functions for keeping track of the internal state of
  453 the users code.</dd>
  454 
  455 </dl>
  456 </blockquote>
  457 
  458 <p>
  459 The position in the input stream is measured in <tt><font
  460 class="type">size_t</font></tt> units.  The current position is the
  461 sum of the increments gotten from <tt><font
  462 class="arg">pos_add</font></tt> (plus the position of the last
  463 <tt>rewind</tt>, if any).  The starting position is zero.  Submatch
  464 positions filled in the <tt><font class="arg">pmatch</font>[]</tt>
  465 array are, of course, given using positions computed in this way.
  466 </p>
  467 
  468 <p>
  469 For an example of how to use <tt>reguexec()</tt>, see the
  470 <tt>tests/test-str-source.c</tt> file in the TRE source code
  471 distribution.
  472 </p>
  473 
  474 <h2>The approximate matching functions</h2>
  475 <a name="regaexec"></a>
  476 
  477 <div class="code">
  478 <code>
  479 #include &lt;tre/regex.h&gt;
  480 <br>
  481 <br>
  482 <font class="qual">typedef struct</font> {<br>
  483 &nbsp;&nbsp;<font class="type">int</font>
  484 <font class="arg">cost_ins</font>;<br>
  485 &nbsp;&nbsp;<font class="type">int</font>
  486 <font class="arg">cost_del</font>;<br>
  487 &nbsp;&nbsp;<font class="type">int</font>
  488 <font class="arg">cost_subst</font>;<br>
  489 &nbsp;&nbsp;<font class="type">int</font>
  490 <font class="arg">max_cost</font>;<br><br>
  491 &nbsp;&nbsp;<font class="type">int</font>
  492 <font class="arg">max_ins</font>;<br>
  493 &nbsp;&nbsp;<font class="type">int</font>
  494 <font class="arg">max_del</font>;<br>
  495 &nbsp;&nbsp;<font class="type">int</font>
  496 <font class="arg">max_subst</font>;<br>
  497 &nbsp;&nbsp;<font class="type">int</font>
  498 <font class="arg">max_err</font>;<br>
  499 } <font class="type">regaparams_t</font>;<br>
  500 <br>
  501 <font class="qual">typedef struct</font> {<br>
  502 &nbsp;&nbsp;<font class="type">size_t</font>
  503 <font class="arg">nmatch</font>;<br>
  504 &nbsp;&nbsp;<font class="type">regmatch_t</font>
  505 *<font class="arg">pmatch</font>;<br>
  506 &nbsp;&nbsp;<font class="type">int</font>
  507 <font class="arg">cost</font>;<br>
  508 &nbsp;&nbsp;<font class="type">int</font>
  509 <font class="arg">num_ins</font>;<br>
  510 &nbsp;&nbsp;<font class="type">int</font>
  511 <font class="arg">num_del</font>;<br>
  512 &nbsp;&nbsp;<font class="type">int</font>
  513 <font class="arg">num_subst</font>;<br>
  514 } <font class="type">regamatch_t</font>;<br>
  515 <br>
  516 <font class="type">int</font> <font
  517 class="func">regaexec</font>(<font class="qual">const</font>
  518 <font class="type">regex_t</font> *<font
  519 class="arg">preg</font>, <font class="qual">const</font> <font
  520 class="type">char</font> *<font class="arg">string</font>,<br>
  521 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  522 <font class="type">regamatch_t</font>
  523 *<font class="arg">match</font>,
  524 <font class="type">regaparams_t</font>
  525 <font class="arg">params</font>,
  526 <font class="type">int</font>
  527 <font class="arg">eflags</font>);
  528 <br>
  529 <font class="type">int</font> <font
  530 class="func">reganexec</font>(<font class="qual">const</font>
  531 <font class="type">regex_t</font> *<font
  532 class="arg">preg</font>, <font class="qual">const</font> <font
  533 class="type">char</font> *<font class="arg">string</font>,
  534 <font class="type">size_t</font> <font class="arg">len</font>,<br>
  535 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  536 <font class="type">regamatch_t</font>
  537 *<font class="arg">match</font>,
  538 <font class="type">regaparams_t</font>
  539 <font class="arg">params</font>,
  540 <font class="type">int</font> <font class="arg">eflags</font>);
  541 <br>
  542 <font class="type">int</font> <font
  543 class="func">regawexec</font>(<font class="qual">const</font>
  544 <font class="type">regex_t</font> *<font
  545 class="arg">preg</font>, <font class="qual">const</font> <font
  546 class="type">wchar_t</font> *<font class="arg">string</font>,<br>
  547 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  548 <font class="type">regamatch_t</font>
  549 *<font class="arg">match</font>,
  550 <font class="type">regaparams_t</font>
  551 <font class="arg">params</font>,
  552 <font class="type">int</font>
  553 <font class="arg">eflags</font>);
  554 <br>
  555 <font class="type">int</font>
  556 <font class="func">regawnexec</font>(
  557 <font class="qual">const</font>
  558 <font class="type">regex_t</font>
  559 *<font class="arg">preg</font>,
  560 <font class="qual">const</font>
  561 <font class="type">wchar_t</font>
  562 *<font class="arg">string</font>,
  563 <font class="type">size_t</font>
  564 <font class="arg">len</font>,<br>
  565 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  566 <font class="type">regamatch_t</font>
  567 *<font class="arg">match</font>,
  568 <font class="type">regaparams_t</font>
  569 <font class="arg">params</font>,
  570 <font class="type">int</font>
  571 <font class="arg">eflags</font>);
  572 <br>
  573 </code>
  574 </div>
  575 
  576 <p>
  577 The <tt><font class="func">regaexec</font>()</tt> function searches for
  578 the best match in <tt><font class="arg">string</font></tt>
  579 against the compiled regexp <tt><font
  580 class="arg">preg</font></tt>, initialized by a previous call to
  581 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.
  582 </p>
  583 
  584 <p>
  585 The <tt><font class="func">reganexec</font>()</tt> function is like
  586 <tt><font class="func">regaexec</font>()</tt>, but <tt><font
  587 class="arg">string</font></tt> is not terminated by a null byte.
  588 Instead, the <tt><font class="arg">len</font></tt> argument is used to
  589 tell the length of the string, and the string may contain null
  590 bytes. The <tt><font class="func">regawexec</font>()</tt> and
  591 <tt><font class="func">regawnexec</font>()</tt> functions work like
  592 <tt><font class="func">regaexec</font>()</tt> and <tt><font
  593 class="func">reganexec</font>()</tt>, respectively, but take a wide
  594 character (<tt><font class="type">wchar_t</font></tt>) string instead
  595 of a byte string.
  596 </p>
  597 
  598 <p>
  599 The <tt><font class="arg">eflags</font></tt> argument is like for
  600 the regexec() functions.
  601 </p>
  602 
  603 <p>
  604 The <tt><font class="arg">params</font></tt> struct controls the
  605 approximate matching parameters:
  606 <blockquote>
  607 <dl>
  608   <dt><tt><font class="type">int</font></tt>
  609       <tt><font class="arg">cost_ins</font></tt></dt>
  610   <dd>The default cost of an inserted character, that is, an extra
  611       character in <tt><font class="arg">string</font></tt>.</dd>
  612 
  613   <dt><tt><font class="type">int</font></tt>
  614       <tt><font class="arg">cost_del</font></tt></dt>
  615   <dd>The default cost of a deleted character, that is, a character
  616       missing from <tt><font class="arg">string</font></tt>.</dd>
  617 
  618   <dt><tt><font class="type">int</font></tt>
  619       <tt><font class="arg">cost_subst</font></tt></dt>
  620   <dd>The default cost of a substituted character.</dd>
  621 
  622   <dt><tt><font class="type">int</font></tt>
  623       <tt><font class="arg">max_cost</font></tt></dt>
  624   <dd>The maximum allowed cost of a match.  If this is set to zero,
  625       an exact matching is searched for, and results equivalent to
  626       those returned by the <tt>regexec()</tt> functions are
  627       returned.</dd>
  628 
  629   <dt><tt><font class="type">int</font></tt>
  630       <tt><font class="arg">max_ins</font></tt></dt>
  631   <dd>Maximum allowed number of inserted characters.</dd>
  632 
  633   <dt><tt><font class="type">int</font></tt>
  634       <tt><font class="arg">max_del</font></tt></dt>
  635   <dd>Maximum allowed number of deleted characters.</dd>
  636 
  637   <dt><tt><font class="type">int</font></tt>
  638       <tt><font class="arg">max_subst</font></tt></dt>
  639   <dd>Maximum allowed number of substituted characters.</dd>
  640 
  641   <dt><tt><font class="type">int</font></tt>
  642       <tt><font class="arg">max_err</font></tt></dt>
  643   <dd>Maximum allowed number of errors (inserts + deletes +
  644       substitutes).</dd>
  645 </dl>
  646 </blockquote>
  647 
  648 <p>
  649 The <tt><font class="arg">match</font></tt> argument points to a
  650 <tt><font class="type">regamatch_t</font></tt> structure.  The
  651 <tt><font class="arg">nmatch</font></tt> and <tt><font
  652 class="arg">pmatch</font></tt> field must be filled by the caller.  If
  653 <code>REG_NOSUB</code> was used when compiling the regexp, or
  654 <code>match-&gt;nmatch</code> is zero, or
  655 <code>match-&gt;pmatch</code> is <code>NULL</code>, the
  656 <code>match-&gt;pmatch</code> argument is ignored.  Otherwise, the
  657 submatches corresponding to the parenthesized subexpressions are
  658 filled in the elements of <code>match-&gt;pmatch</code>, which must be
  659 dimensioned to have at least <code>match-&gt;nmatch</code> elements.
  660 The <code>match-&gt;cost</code> field is set to the cost of the match
  661 found, and the <code>match-&gt;num_ins</code>,
  662 <code>match-&gt;num_del</code>, and <code>match-&gt;num_subst</code>
  663 fields are set to the number of inserts, deletes, and substitutes in
  664 the match, respectively.
  665 </p>
  666 
  667 <p>
  668 The <tt>regaexec()</tt> functions return zero if a match with cost
  669 smaller than <code>params-&gt;max_cost</code> was found, otherwise
  670 they return <code>REG_NOMATCH</code> to indicate no match, or
  671 <code>REG_ESPACE</code> to indicate that enough temporary memory could
  672 not be allocated to complete the matching operation.
  673 </p>
  674 
  675 <h2>Miscellaneous</h2>
  676 
  677 <div class="code">
  678 <code>
  679 #include &lt;tre/regex.h&gt;
  680 <br>
  681 <br>
  682 <font class="type">int</font> <font
  683 class="func">tre_have_backrefs</font>(<font class="qual">const</font>
  684 <font class="type">regex_t</font> *<font class="arg">preg</font>);
  685 <br>
  686 <font class="type">int</font> <font
  687 class="func">tre_have_approx</font>(<font class="qual">const</font>
  688 <font class="type">regex_t</font> *<font class="arg">preg</font>);
  689 <br>
  690 </code>
  691 </div>
  692 
  693 <p>
  694 The <tt><font class="func">tre_have_backrefs</font>()</tt> and
  695 <tt><font class="func">tre_have_approx</font>()</tt> functions return
  696 1 if the compiled pattern has back references or uses approximate
  697 matching, respectively, and 0 if not.
  698 </p>
  699 
  700 
  701 <h2>Checking build time options</h2>
  702 
  703 <a name="tre_config"></a>
  704 <div class="code">
  705 <code>
  706 #include &lt;tre/regex.h&gt;
  707 <br>
  708 <br>
  709 <font class="type">char</font> *<font
  710 class="func">tre_version</font>(<font class="type">void</font>);
  711 <br>
  712 <font class="type">int</font> <font
  713 class="func">tre_config</font>(<font class="type">int</font> <font
  714 class="arg">query</font>, <font class="type">void</font> *<font
  715 class="arg">result</font>);
  716 <br>
  717 </code>
  718 </div>
  719 
  720 <p>
  721 The <tt><font class="func">tre_config</font>()</tt> function can be
  722 used to retrieve information of which optional features have been
  723 compiled into the TRE library and information of other parameters that
  724 may change between releases.
  725 </p>
  726 
  727 <p>
  728 The <tt><font class="arg">query</font></tt> argument is an integer
  729 telling what information is requested for.  The <tt><font
  730 class="arg">result</font></tt> argument is a pointer to a variable
  731 where the information is returned.  The return value of a call to
  732 <tt><font class="func">tre_config</font>()</tt> is zero if <tt><font
  733 class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise.
  734 </p>
  735 
  736 <p>
  737 The following values are recognized for <tt><font
  738 class="arg">query</font></tt>:
  739 
  740 <blockquote>
  741 <dl>
  742 <dt><tt>TRE_CONFIG_APPROX</tt></dt>
  743 <dd>The result is an integer that is set to one if approximate
  744 matching support is available, zero if not.</dd>
  745 <dt><tt>TRE_CONFIG_WCHAR</tt></dt>
  746 <dd>The result is an integer that is set to one if wide character
  747 support is available, zero if not.</dd>
  748 <dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt>
  749 <dd>The result is an integer that is set to one if multibyte character
  750 set support is available, zero if not.</dd>
  751 <dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt>
  752 <dd>The result is an integer that is set to one if TRE has been
  753 compiled to be compatible with the system regex ABI, zero if not.</dd>
  754 <dt><tt>TRE_CONFIG_VERSION</tt></dt>
  755 <dd>The result is a pointer to a static character string that gives
  756 the version of the TRE library.</dd>
  757 </dl>
  758 </blockquote>
  759 
  760 
  761 <p>
  762 The <tt><font class="func">tre_version</font>()</tt> function returns
  763 a short human readable character string which shows the software name,
  764 version, and license.
  765 
  766 <h2>Preprocessor definitions</h2>
  767 
  768 <p>The header <tt>&lt;tre/regex.h&gt;</tt> defines certain
  769 C preprocessor symbols.
  770 
  771 <h3>Version information</h3>
  772 
  773 <p>The following definitions may be useful for checking whether a new
  774 enough version is being used.  Note that it is recommended to use the
  775 <tt>pkg-config</tt> tool for version and other checks in Autoconf
  776 scripts.</p>
  777 
  778 <blockquote>
  779 <dl>
  780 <dt><tt>TRE_VERSION</tt></dt>
  781 <dd>The version string. </dd>
  782 
  783 <dt><tt>TRE_VERSION_1</tt></dt>
  784 <dd>The major version number (first part of version string).</dd>
  785 
  786 <dt><tt>TRE_VERSION_2</tt></dt>
  787 <dd>The minor version number (second part of version string).</dd>
  788 
  789 <dt><tt>TRE_VERSION_3</tt></dt>
  790 <dd>The micro version number (third part of version string).</dd>
  791 
  792 </dl>
  793 </blockquote>
  794 
  795 <h3>Features</h3>
  796 
  797 <p>The following definitions may be useful for checking whether all
  798 necessary features are enabled.  Use these only if compile time
  799 checking suffices (linking statically with TRE).  When linking
  800 dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used
  801 instead.</p>
  802 
  803 <blockquote>
  804 <dl>
  805 <dt><tt>TRE_APPROX</tt></dt>
  806 <dd>This is defined if approximate matching support is enabled.  The
  807 prototypes for approximate matching functions are defined only if
  808 <tt>TRE_APPROX</tt> is defined.</dd>
  809 
  810 <dt><tt>TRE_WCHAR</tt></dt>
  811 <dd>This is defined if wide character support is enabled.  The
  812 prototypes for wide character matching functions are defined only if
  813 <tt>TRE_WCHAR</tt> is defined.</dd>
  814 
  815 <dt><tt>TRE_MULTIBYTE</tt></dt>
  816 <dd>This is defined if multibyte character set support is enabled.
  817 If this is not set any locale settings are ignored, and the default
  818 locale is used when parsing regexps and matching strings.</dd>
  819 
  820 </dl>
  821 </blockquote>