"Fossies" - the Fresh Open Source Software Archive

Member "tin-2.4.1/pcre/README" (28 Aug 2013, 24136 Bytes) of package /linux/misc/tin-2.4.1.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 README file for PCRE (Perl-compatible regular expression library)
    2 -----------------------------------------------------------------
    3 
    4 The latest release of PCRE is always available from
    5 
    6   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
    7 
    8 Please read the NEWS file if you are upgrading from a previous release.
    9 
   10 
   11 The PCRE APIs
   12 -------------
   13 
   14 PCRE is written in C, and it has its own API. The distribution now includes a
   15 set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
   16 for details).
   17 
   18 Also included are a set of C wrapper functions that are based on the POSIX
   19 API. These end up in the library called libpcreposix. Note that this just
   20 provides a POSIX calling interface to PCRE: the regular expressions themselves
   21 still follow Perl syntax and semantics. The header file for the POSIX-style
   22 functions is called pcreposix.h. The official POSIX name is regex.h, but I
   23 didn't want to risk possible problems with existing files of that name by
   24 distributing it that way. To use it with an existing program that uses the
   25 POSIX API, it will have to be renamed or pointed at by a link.
   26 
   27 If you are using the POSIX interface to PCRE and there is already a POSIX regex
   28 library installed on your system, you must take care when linking programs to
   29 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
   30 up the "real" POSIX functions of the same name.
   31 
   32 
   33 Documentation for PCRE
   34 ----------------------
   35 
   36 If you install PCRE in the normal way, you will end up with an installed set of
   37 man pages whose names all start with "pcre". The one that is just called "pcre"
   38 lists all the others. In addition to these man pages, the PCRE documentation is
   39 supplied in two other forms; however, as there is no standard place to install
   40 them, they are left in the doc directory of the unpacked source distribution.
   41 These forms are:
   42 
   43   1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
   44      first of these is a concatenation of the text forms of all the section 3
   45      man pages except those that summarize individual functions. The other two
   46      are the text forms of the section 1 man pages for the pcregrep and
   47      pcretest commands. Text forms are provided for ease of scanning with text
   48      editors or similar tools.
   49 
   50   2. A subdirectory called doc/html contains all the documentation in HTML
   51      form, hyperlinked in various ways, and rooted in a file called
   52      doc/index.html.
   53 
   54 
   55 Contributions by users of PCRE
   56 ------------------------------
   57 
   58 You can find contributions from PCRE users in the directory
   59 
   60   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
   61 
   62 where there is also a README file giving brief descriptions of what they are.
   63 Several of them provide support for compiling PCRE on various flavours of
   64 Windows systems (I myself do not use Windows). Some are complete in themselves;
   65 others are pointers to URLs containing relevant files.
   66 
   67 
   68 Building PCRE on a Unix-like system
   69 -----------------------------------
   70 
   71 If you are using HP's ANSI C++ compiler (aCC), please see the special note
   72 in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
   73 
   74 To build PCRE on a Unix-like system, first run the "configure" command from the
   75 PCRE distribution directory, with your current directory set to the directory
   76 where you want the files to be created. This command is a standard GNU
   77 "autoconf" configuration script, for which generic instructions are supplied in
   78 INSTALL.
   79 
   80 Most commonly, people build PCRE within its own distribution directory, and in
   81 this case, on many systems, just running "./configure" is sufficient, but the
   82 usual methods of changing standard defaults are available. For example:
   83 
   84 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
   85 
   86 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
   87 of the default, and that "make install" should install PCRE under /opt/local
   88 instead of the default /usr/local.
   89 
   90 If you want to build in a different directory, just run "configure" with that
   91 directory as current. For example, suppose you have unpacked the PCRE source
   92 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
   93 
   94 cd /build/pcre/pcre-xxx
   95 /source/pcre/pcre-xxx/configure
   96 
   97 PCRE is written in C and is normally compiled as a C library. However, it is
   98 possible to build it as a C++ library, though the provided building apparatus
   99 does not have any features to support this.
  100 
  101 There are some optional features that can be included or omitted from the PCRE
  102 library. You can read more about them in the pcrebuild man page.
  103 
  104 . If you want to suppress the building of the C++ wrapper library, you can add
  105   --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
  106   will try to find a C++ compiler and C++ header files, and if it succeeds, it
  107   will try to build the C++ wrapper.
  108 
  109 . If you want to make use of the support for UTF-8 character strings in PCRE,
  110   you must add --enable-utf8 to the "configure" command. Without it, the code
  111   for handling UTF-8 is not included in the library. (Even when included, it
  112   still has to be enabled by an option at run time.)
  113 
  114 . If, in addition to support for UTF-8 character strings, you want to include
  115   support for the \P, \p, and \X sequences that recognize Unicode character
  116   properties, you must add --enable-unicode-properties to the "configure"
  117   command. This adds about 30K to the size of the library (in the form of a
  118   property table); only the basic two-letter properties such as Lu are
  119   supported.
  120 
  121 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
  122   of the Unicode newline sequences as indicating the end of a line. Whatever
  123   you specify at build time is the default; the caller of PCRE can change the
  124   selection at run time. The default newline indicator is a single LF character
  125   (the Unix standard). You can specify the default newline indicator by adding
  126   --newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any
  127   to the "configure" command, respectively.
  128 
  129 . When called via the POSIX interface, PCRE uses malloc() to get additional
  130   storage for processing capturing parentheses if there are more than 10 of
  131   them. You can increase this threshold by setting, for example,
  132 
  133   --with-posix-malloc-threshold=20
  134 
  135   on the "configure" command.
  136 
  137 . PCRE has a counter that can be set to limit the amount of resources it uses.
  138   If the limit is exceeded during a match, the match fails. The default is ten
  139   million. You can change the default by setting, for example,
  140 
  141   --with-match-limit=500000
  142 
  143   on the "configure" command. This is just the default; individual calls to
  144   pcre_exec() can supply their own value. There is discussion on the pcreapi
  145   man page.
  146 
  147 . There is a separate counter that limits the depth of recursive function calls
  148   during a matching process. This also has a default of ten million, which is
  149   essentially "unlimited". You can change the default by setting, for example,
  150 
  151   --with-match-limit-recursion=500000
  152 
  153   Recursive function calls use up the runtime stack; running out of stack can
  154   cause programs to crash in strange ways. There is a discussion about stack
  155   sizes in the pcrestack man page.
  156 
  157 . The default maximum compiled pattern size is around 64K. You can increase
  158   this by adding --with-link-size=3 to the "configure" command. You can
  159   increase it even more by setting --with-link-size=4, but this is unlikely
  160   ever to be necessary. If you build PCRE with an increased link size, test 2
  161   (and 5 if you are using UTF-8) will fail. Part of the output of these tests
  162   is a representation of the compiled pattern, and this changes with the link
  163   size.
  164 
  165 . You can build PCRE so that its internal match() function that is called from
  166   pcre_exec() does not call itself recursively. Instead, it uses blocks of data
  167   from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
  168   to save data that would otherwise be saved on the stack. To build PCRE like
  169   this, use
  170 
  171   --disable-stack-for-recursion
  172 
  173   on the "configure" command. PCRE runs more slowly in this mode, but it may be
  174   necessary in environments with limited stack sizes. This applies only to the
  175   pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
  176   use deeply nested recursion.
  177 
  178 The "configure" script builds eight files for the basic C library:
  179 
  180 . Makefile is the makefile that builds the library
  181 . config.h contains build-time configuration options for the library
  182 . pcre-config is a script that shows the settings of "configure" options
  183 . libpcre.pc is data for the pkg-config command
  184 . libtool is a script that builds shared and/or static libraries
  185 . RunTest is a script for running tests on the library
  186 . RunGrepTest is a script for running tests on the pcregrep command
  187 
  188 In addition, if a C++ compiler is found, the following are also built:
  189 
  190 . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
  191 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
  192 
  193 The "configure" script also creates config.status, which is an executable
  194 script that can be run to recreate the configuration, and config.log, which
  195 contains compiler output from tests that "configure" runs.
  196 
  197 Once "configure" has run, you can run "make". It builds two libraries, called
  198 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
  199 command. If a C++ compiler was found on your system, it also builds the C++
  200 wrapper library, which is called libpcrecpp, and some test programs called
  201 pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
  202 
  203 The command "make test" runs all the appropriate tests. Details of the PCRE
  204 tests are given in a separate section of this document, below.
  205 
  206 You can use "make install" to copy the libraries, the public header files
  207 pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
  208 the C++ wrapper was built), and the man pages to appropriate live directories
  209 on your system, in the normal way.
  210 
  211 If you want to remove PCRE from your system, you can run "make uninstall".
  212 This removes all the files that "make install" installed. However, it does not
  213 remove any directories, because these are often shared with other programs.
  214 
  215 
  216 Retrieving configuration information on Unix-like systems
  217 ---------------------------------------------------------
  218 
  219 Running "make install" also installs the command pcre-config, which can be used
  220 to recall information about the PCRE configuration and installation. For
  221 example:
  222 
  223   pcre-config --version
  224 
  225 prints the version number, and
  226 
  227   pcre-config --libs
  228 
  229 outputs information about where the library is installed. This command can be
  230 included in makefiles for programs that use PCRE, saving the programmer from
  231 having to remember too many details.
  232 
  233 The pkg-config command is another system for saving and retrieving information
  234 about installed libraries. Instead of separate commands for each library, a
  235 single command is used. For example:
  236 
  237   pkg-config --cflags pcre
  238 
  239 The data is held in *.pc files that are installed in a directory called
  240 pkgconfig.
  241 
  242 
  243 Shared libraries on Unix-like systems
  244 -------------------------------------
  245 
  246 The default distribution builds PCRE as shared libraries and static libraries,
  247 as long as the operating system supports shared libraries. Shared library
  248 support relies on the "libtool" script which is built as part of the
  249 "configure" process.
  250 
  251 The libtool script is used to compile and link both shared and static
  252 libraries. They are placed in a subdirectory called .libs when they are newly
  253 built. The programs pcretest and pcregrep are built to use these uninstalled
  254 libraries (by means of wrapper scripts in the case of shared libraries). When
  255 you use "make install" to install shared libraries, pcregrep and pcretest are
  256 automatically re-built to use the newly installed shared libraries before being
  257 installed themselves. However, the versions left in the source directory still
  258 use the uninstalled libraries.
  259 
  260 To build PCRE using static libraries only you must use --disable-shared when
  261 configuring it. For example:
  262 
  263 ./configure --prefix=/usr/gnu --disable-shared
  264 
  265 Then run "make" in the usual way. Similarly, you can use --disable-static to
  266 build only shared libraries.
  267 
  268 
  269 Cross-compiling on a Unix-like system
  270 -------------------------------------
  271 
  272 You can specify CC and CFLAGS in the normal way to the "configure" command, in
  273 order to cross-compile PCRE for some other host. However, during the building
  274 process, the dftables.c source file is compiled *and run* on the local host, in
  275 order to generate the default character tables (the chartables.c file). It
  276 therefore needs to be compiled with the local compiler, not the cross compiler.
  277 You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
  278 there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
  279 when calling the "configure" command. If they are not specified, they default
  280 to the values of CC and CFLAGS.
  281 
  282 
  283 Using HP's ANSI C++ compiler (aCC)
  284 ----------------------------------
  285 
  286 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
  287 "configure" script, you *must* include the "-AA" option in the CXXFLAGS
  288 environment variable in order for the C++ components to compile correctly.
  289 
  290 Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
  291 needed libraries fail to get included when specifying the "-AA" compiler
  292 option. If you experience unresolved symbols when linking the C++ programs,
  293 use the workaround of specifying the following environment variable prior to
  294 running the "configure" script:
  295 
  296   CXXLDFLAGS="-lstd_v2 -lCsup_v2"
  297 
  298 
  299 Building on non-Unix systems
  300 ----------------------------
  301 
  302 For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
  303 the system supports the use of "configure" and "make" you may be able to build
  304 PCRE in the same way as for Unix systems.
  305 
  306 PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
  307 the details because I don't use those systems. It should be straightforward to
  308 build PCRE on any system that has a Standard C compiler and library, because it
  309 uses only Standard C functions.
  310 
  311 
  312 Testing PCRE
  313 ------------
  314 
  315 To test PCRE on a Unix system, run the RunTest script that is created by the
  316 configuring process. There is also a script called RunGrepTest that tests the
  317 options of the pcregrep command. If the C++ wrapper library is build, three
  318 test programs called pcrecpp_unittest, pcre_scanner_unittest, and
  319 pcre_stringpiece_unittest are provided.
  320 
  321 Both the scripts and all the program tests are run if you obey "make runtest",
  322 "make check", or "make test". For other systems, see the instructions in
  323 NON-UNIX-USE.
  324 
  325 The RunTest script runs the pcretest test program (which is documented in its
  326 own man page) on each of the testinput files (in the testdata directory) in
  327 turn, and compares the output with the contents of the corresponding testoutput
  328 files. A file called testtry is used to hold the main output from pcretest
  329 (testsavedregex is also used as a working file). To run pcretest on just one of
  330 the test files, give its number as an argument to RunTest, for example:
  331 
  332   RunTest 2
  333 
  334 The first test file can also be fed directly into the perltest script to check
  335 that Perl gives the same results. The only difference you should see is in the
  336 first few lines, where the Perl version is given instead of the PCRE version.
  337 
  338 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
  339 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
  340 detection, and run-time flags that are specific to PCRE, as well as the POSIX
  341 wrapper API. It also uses the debugging flag to check some of the internals of
  342 pcre_compile().
  343 
  344 If you build PCRE with a locale setting that is not the standard C locale, the
  345 character tables may be different (see next paragraph). In some cases, this may
  346 cause failures in the second set of tests. For example, in a locale where the
  347 isprint() function yields TRUE for characters in the range 128-255, the use of
  348 [:isascii:] inside a character class defines a different set of characters, and
  349 this shows up in this test as a difference in the compiled code, which is being
  350 listed for checking. Where the comparison test output contains [\x00-\x7f] the
  351 test will contain [\x00-\xff], and similarly in some other cases. This is not a
  352 bug in PCRE.
  353 
  354 The third set of tests checks pcre_maketables(), the facility for building a
  355 set of character tables for a specific locale and using them instead of the
  356 default tables. The tests make use of the "fr_FR" (French) locale. Before
  357 running the test, the script checks for the presence of this locale by running
  358 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
  359 in the list of available locales, the third test cannot be run, and a comment
  360 is output to say why. If running this test produces instances of the error
  361 
  362   ** Failed to set locale "fr_FR"
  363 
  364 in the comparison output, it means that locale is not available on your system,
  365 despite being listed by "locale". This does not mean that PCRE is broken.
  366 
  367 The fourth test checks the UTF-8 support. It is not run automatically unless
  368 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
  369 running "configure". This file can be also fed directly to the perltest script,
  370 provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
  371 commented in the script, can be be used.)
  372 
  373 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
  374 features of PCRE that are not relevant to Perl.
  375 
  376 The sixth and test checks the support for Unicode character properties. It it
  377 not run automatically unless PCRE is built with Unicode property support. To to
  378 this you must set --enable-unicode-properties when running "configure".
  379 
  380 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
  381 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
  382 property support, respectively. The eighth and ninth tests are not run
  383 automatically unless PCRE is build with the relevant support.
  384 
  385 
  386 Character tables
  387 ----------------
  388 
  389 PCRE uses four tables for manipulating and identifying characters whose values
  390 are less than 256. The final argument of the pcre_compile() function is a
  391 pointer to a block of memory containing the concatenated tables. A call to
  392 pcre_maketables() can be used to generate a set of tables in the current
  393 locale. If the final argument for pcre_compile() is passed as NULL, a set of
  394 default tables that is built into the binary is used.
  395 
  396 The source file called chartables.c contains the default set of tables. This is
  397 not supplied in the distribution, but is built by the program dftables
  398 (compiled from dftables.c), which uses the ANSI C character handling functions
  399 such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
  400 sources. This means that the default C locale which is set for your system will
  401 control the contents of these default tables. You can change the default tables
  402 by editing chartables.c and then re-building PCRE. If you do this, you should
  403 probably also edit Makefile to ensure that the file doesn't ever get
  404 re-generated.
  405 
  406 The first two 256-byte tables provide lower casing and case flipping functions,
  407 respectively. The next table consists of three 32-byte bit maps which identify
  408 digits, "word" characters, and white space, respectively. These are used when
  409 building 32-byte bit maps that represent character classes.
  410 
  411 The final 256-byte table has bits indicating various character types, as
  412 follows:
  413 
  414     1   white space character
  415     2   letter
  416     4   decimal digit
  417     8   hexadecimal digit
  418    16   alphanumeric or '_'
  419   128   regular expression metacharacter or binary zero
  420 
  421 You should not alter the set of characters that contain the 128 bit, as that
  422 will cause PCRE to malfunction.
  423 
  424 
  425 Manifest
  426 --------
  427 
  428 The distribution should contain the following files:
  429 
  430 (A) The actual source files of the PCRE library functions and their
  431     headers:
  432 
  433   dftables.c            auxiliary program for building chartables.c
  434 
  435   pcreposix.c           )
  436   pcre_compile.c        )
  437   pcre_config.c         )
  438   pcre_dfa_exec.c       )
  439   pcre_exec.c           )
  440   pcre_fullinfo.c       )
  441   pcre_get.c            ) sources for the functions in the library,
  442   pcre_globals.c        )   and some internal functions that they use
  443   pcre_info.c           )
  444   pcre_maketables.c     )
  445   pcre_newline.c        )
  446   pcre_ord2utf8.c       )
  447   pcre_refcount.c       )
  448   pcre_study.c          )
  449   pcre_tables.c         )
  450   pcre_try_flipped.c    )
  451   pcre_ucp_searchfuncs.c)
  452   pcre_valid_utf8.c     )
  453   pcre_version.c        )
  454   pcre_xclass.c         )
  455   ucptable.c            )
  456 
  457   pcre_printint.src     ) debugging function that is #included in pcretest, and
  458                         )   can also be #included in pcre_compile()
  459 
  460   pcre.h                the public PCRE header file
  461   pcreposix.h           header for the external POSIX wrapper API
  462   pcre_internal.h       header for internal use
  463   ucp.h                 ) headers concerned with
  464   ucpinternal.h         )   Unicode property handling
  465   config.in             template for config.h, which is built by configure
  466 
  467   pcrecpp.h             the header file for the C++ wrapper
  468   pcrecpparg.h.in       "source" for another C++ header file
  469   pcrecpp.cc            )
  470   pcre_scanner.cc       ) source for the C++ wrapper library
  471 
  472   pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
  473                           C++ stringpiece functions
  474   pcre_stringpiece.cc   source for the C++ stringpiece functions
  475 
  476 (B) Auxiliary files:
  477 
  478   AUTHORS               information about the author of PCRE
  479   ChangeLog             log of changes to the code
  480   INSTALL               generic installation instructions
  481   LICENCE               conditions for the use of PCRE
  482   COPYING               the same, using GNU's standard name
  483   Makefile.in           template for Unix Makefile, which is built by configure
  484   NEWS                  important changes in this release
  485   NON-UNIX-USE          notes on building PCRE on non-Unix systems
  486   README                this file
  487   RunTest.in            template for a Unix shell script for running tests
  488   RunGrepTest.in        template for a Unix shell script for pcregrep tests
  489   config.guess          ) files used by libtool,
  490   config.sub            )   used only when building a shared library
  491   config.h.in           "source" for the config.h header file
  492   configure             a configuring shell script (built by autoconf)
  493   configure.ac          the autoconf input used to build configure
  494   doc/Tech.Notes        notes on the encoding
  495   doc/*.3               man page sources for the PCRE functions
  496   doc/*.1               man page sources for pcregrep and pcretest
  497   doc/html/*            HTML documentation
  498   doc/pcre.txt          plain text version of the man pages
  499   doc/pcretest.txt      plain text documentation of test program
  500   doc/perltest.txt      plain text documentation of Perl test program
  501   install-sh            a shell script for installing files
  502   libpcre.pc.in         "source" for libpcre.pc for pkg-config
  503   ltmain.sh             file used to build a libtool script
  504   mkinstalldirs         script for making install directories
  505   pcretest.c            comprehensive test program
  506   pcredemo.c            simple demonstration of coding calls to PCRE
  507   perltest              Perl test program
  508   pcregrep.c            source of a grep utility that uses PCRE
  509   pcre-config.in        source of script which retains PCRE information
  510   pcrecpp_unittest.c           )
  511   pcre_scanner_unittest.c      ) test programs for the C++ wrapper
  512   pcre_stringpiece_unittest.c  )
  513   testdata/testinput*   test data for main library tests
  514   testdata/testoutput*  expected test results
  515   testdata/grep*        input and output for pcregrep tests
  516 
  517 (C) Auxiliary files for Win32 DLL
  518 
  519   libpcre.def
  520   libpcreposix.def
  521 
  522 (D) Auxiliary file for VPASCAL
  523 
  524   makevp.bat
  525 
  526 Philip Hazel
  527 Email local part: ph10
  528 Email domain: cam.ac.uk
  529 November 2006