"Fossies" - the Fresh Open Source Software Archive

Member "pcre2-10.36/doc/html/README.txt" (4 Dec 2020, 43112 Bytes) of package /linux/misc/pcre2-10.36.tar.bz2:

As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. For more information about "README.txt" see the Fossies "Dox" file reference documentation and the latest Fossies "Diffs" side-by-side code changes report: 10.35_vs_10.36.

    1 README file for PCRE2 (Perl-compatible regular expression library)
    2 ------------------------------------------------------------------
    4 PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
    5 API. Since its initial release in 2015, there has been further development of
    6 the code and it now differs from PCRE1 in more than just the API. There are new
    7 features and the internals have been improved. The latest release of PCRE2 is
    8 available in three alternative formats from:
   10 https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz
   11 https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2
   12 https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip
   14 There is a mailing list for discussion about the development of PCRE (both the
   15 original and new APIs) at pcre-dev@exim.org. You can access the archives and
   16 subscribe or manage your subscription here:
   18    https://lists.exim.org/mailman/listinfo/pcre-dev
   20 Please read the NEWS file if you are upgrading from a previous release. The
   21 contents of this README file are:
   23   The PCRE2 APIs
   24   Documentation for PCRE2
   25   Contributions by users of PCRE2
   26   Building PCRE2 on non-Unix-like systems
   27   Building PCRE2 without using autotools
   28   Building PCRE2 using autotools
   29   Retrieving configuration information
   30   Shared libraries
   31   Cross-compiling using autotools
   32   Making new tarballs
   33   Testing PCRE2
   34   Character tables
   35   File manifest
   38 The PCRE2 APIs
   39 --------------
   41 PCRE2 is written in C, and it has its own API. There are three sets of
   42 functions, one for the 8-bit library, which processes strings of bytes, one for
   43 the 16-bit library, which processes strings of 16-bit values, and one for the
   44 32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
   45 are no C++ wrappers.
   47 The distribution does contain a set of C wrapper functions for the 8-bit
   48 library that are based on the POSIX regular expression API (see the pcre2posix
   49 man page). These are built into a library called libpcre2-posix. Note that this
   50 just provides a POSIX calling interface to PCRE2; the regular expressions
   51 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
   52 and does not give full access to all of PCRE2's facilities.
   54 The header file for the POSIX-style functions is called pcre2posix.h. The
   55 official POSIX name is regex.h, but I did not want to risk possible problems
   56 with existing files of that name by distributing it that way. To use PCRE2 with
   57 an existing program that uses the POSIX API, pcre2posix.h will have to be
   58 renamed or pointed at by a link (or the program modified, of course). See the
   59 pcre2posix documentation for more details.
   62 Documentation for PCRE2
   63 -----------------------
   65 If you install PCRE2 in the normal way on a Unix-like system, you will end up
   66 with a set of man pages whose names all start with "pcre2". The one that is
   67 just called "pcre2" lists all the others. In addition to these man pages, the
   68 PCRE2 documentation is supplied in two other forms:
   70   1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
   71      doc/pcre2test.txt in the source distribution. The first of these is a
   72      concatenation of the text forms of all the section 3 man pages except the
   73      listing of pcre2demo.c and those that summarize individual functions. The
   74      other two are the text forms of the section 1 man pages for the pcre2grep
   75      and pcre2test commands. These text forms are provided for ease of scanning
   76      with text editors or similar tools. They are installed in
   77      <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
   78      (defaulting to /usr/local).
   80   2. A set of files containing all the documentation in HTML form, hyperlinked
   81      in various ways, and rooted in a file called index.html, is distributed in
   82      doc/html and installed in <prefix>/share/doc/pcre2/html.
   85 Building PCRE2 on non-Unix-like systems
   86 ---------------------------------------
   88 For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
   89 your system supports the use of "configure" and "make" you may be able to build
   90 PCRE2 using autotools in the same way as for many Unix-like systems.
   92 PCRE2 can also be configured using CMake, which can be run in various ways
   93 (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
   94 NON-AUTOTOOLS-BUILD has information about CMake.
   96 PCRE2 has been compiled on many different operating systems. It should be
   97 straightforward to build PCRE2 on any system that has a Standard C compiler and
   98 library, because it uses only Standard C functions.
  101 Building PCRE2 without using autotools
  102 --------------------------------------
  104 The use of autotools (in particular, libtool) is problematic in some
  105 environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
  106 file for ways of building PCRE2 without using autotools.
  109 Building PCRE2 using autotools
  110 ------------------------------
  112 The following instructions assume the use of the widely used "configure; make;
  113 make install" (autotools) process.
  115 To build PCRE2 on system that supports autotools, first run the "configure"
  116 command from the PCRE2 distribution directory, with your current directory set
  117 to the directory where you want the files to be created. This command is a
  118 standard GNU "autoconf" configuration script, for which generic instructions
  119 are supplied in the file INSTALL.
  121 Most commonly, people build PCRE2 within its own distribution directory, and in
  122 this case, on many systems, just running "./configure" is sufficient. However,
  123 the usual methods of changing standard defaults are available. For example:
  125 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
  127 This command specifies that the C compiler should be run with the flags '-O2
  128 -Wall' instead of the default, and that "make install" should install PCRE2
  129 under /opt/local instead of the default /usr/local.
  131 If you want to build in a different directory, just run "configure" with that
  132 directory as current. For example, suppose you have unpacked the PCRE2 source
  133 into /source/pcre2/pcre2-xxx, but you want to build it in
  134 /build/pcre2/pcre2-xxx:
  136 cd /build/pcre2/pcre2-xxx
  137 /source/pcre2/pcre2-xxx/configure
  139 PCRE2 is written in C and is normally compiled as a C library. However, it is
  140 possible to build it as a C++ library, though the provided building apparatus
  141 does not have any features to support this.
  143 There are some optional features that can be included or omitted from the PCRE2
  144 library. They are also documented in the pcre2build man page.
  146 . By default, both shared and static libraries are built. You can change this
  147   by adding one of these options to the "configure" command:
  149   --disable-shared
  150   --disable-static
  152   (See also "Shared libraries on Unix-like systems" below.)
  154 . By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
  155   the "configure" command, the 16-bit library is also built. If you add
  156   --enable-pcre2-32 to the "configure" command, the 32-bit library is also
  157   built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
  158   to disable building the 8-bit library.
  160 . If you want to include support for just-in-time (JIT) compiling, which can
  161   give large performance improvements on certain platforms, add --enable-jit to
  162   the "configure" command. This support is available only for certain hardware
  163   architectures. If you try to enable it on an unsupported architecture, there
  164   will be a compile time error. If in doubt, use --enable-jit=auto, which
  165   enables JIT only if the current hardware is supported.
  167 . If you are enabling JIT under SELinux environment you may also want to add
  168   --enable-jit-sealloc, which enables the use of an executable memory allocator
  169   that is compatible with SELinux. Warning: this allocator is experimental!
  170   It does not support fork() operation and may crash when no disk space is
  171   available. This option has no effect if JIT is disabled.
  173 . If you do not want to make use of the default support for UTF-8 Unicode
  174   character strings in the 8-bit library, UTF-16 Unicode character strings in
  175   the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
  176   library, you can add --disable-unicode to the "configure" command. This
  177   reduces the size of the libraries. It is not possible to configure one
  178   library with Unicode support, and another without, in the same configuration.
  179   It is also not possible to use --enable-ebcdic (see below) with Unicode
  180   support, so if this option is set, you must also use --disable-unicode.
  182   When Unicode support is available, the use of a UTF encoding still has to be
  183   enabled by setting the PCRE2_UTF option at run time or starting a pattern
  184   with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
  185   either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
  187   As well as supporting UTF strings, Unicode support includes support for the
  188   \P, \p, and \X sequences that recognize Unicode character properties.
  189   However, only the basic two-letter properties such as Lu are supported.
  190   Escape sequences such as \d and \w in patterns do not by default make use of
  191   Unicode properties, but can be made to do so by setting the PCRE2_UCP option
  192   or starting a pattern with (*UCP).
  194 . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
  195   of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
  196   character as indicating the end of a line. Whatever you specify at build time
  197   is the default; the caller of PCRE2 can change the selection at run time. The
  198   default newline indicator is a single LF character (the Unix standard). You
  199   can specify the default newline indicator by adding --enable-newline-is-cr,
  200   --enable-newline-is-lf, --enable-newline-is-crlf,
  201   --enable-newline-is-anycrlf, --enable-newline-is-any, or
  202   --enable-newline-is-nul to the "configure" command, respectively.
  204 . By default, the sequence \R in a pattern matches any Unicode line ending
  205   sequence. This is independent of the option specifying what PCRE2 considers
  206   to be the end of a line (see above). However, the caller of PCRE2 can
  207   restrict \R to match only CR, LF, or CRLF. You can make this the default by
  208   adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
  210 . In a pattern, the escape sequence \C matches a single code unit, even in a
  211   UTF mode. This can be dangerous because it breaks up multi-code-unit
  212   characters. You can build PCRE2 with the use of \C permanently locked out by
  213   adding --enable-never-backslash-C (note the upper case C) to the "configure"
  214   command. When \C is allowed by the library, individual applications can lock
  215   it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
  217 . PCRE2 has a counter that limits the depth of nesting of parentheses in a
  218   pattern. This limits the amount of system stack that a pattern uses when it
  219   is compiled. The default is 250, but you can change it by setting, for
  220   example,
  222   --with-parens-nest-limit=500
  224 . PCRE2 has a counter that can be set to limit the amount of computing resource
  225   it uses when matching a pattern. If the limit is exceeded during a match, the
  226   match fails. The default is ten million. You can change the default by
  227   setting, for example,
  229   --with-match-limit=500000
  231   on the "configure" command. This is just the default; individual calls to
  232   pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
  233   discussion in the pcre2api man page (search for pcre2_set_match_limit).
  235 . There is a separate counter that limits the depth of nested backtracking
  236   (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
  237   matching process, which indirectly limits the amount of heap memory that is
  238   used, and in the case of pcre2_dfa_match() the amount of stack as well. This
  239   counter also has a default of ten million, which is essentially "unlimited".
  240   You can change the default by setting, for example,
  242   --with-match-limit-depth=5000
  244   There is more discussion in the pcre2api man page (search for
  245   pcre2_set_depth_limit).
  247 . You can also set an explicit limit on the amount of heap memory used by
  248   the pcre2_match() and pcre2_dfa_match() interpreters:
  250   --with-heap-limit=500
  252   The units are kibibytes (units of 1024 bytes). This limit does not apply when
  253   the JIT optimization (which has its own memory control features) is used.
  254   There is more discussion on the pcre2api man page (search for
  255   pcre2_set_heap_limit).
  257 . In the 8-bit library, the default maximum compiled pattern size is around
  258   64 kibibytes. You can increase this by adding --with-link-size=3 to the
  259   "configure" command. PCRE2 then uses three bytes instead of two for offsets
  260   to different parts of the compiled pattern. In the 16-bit library,
  261   --with-link-size=3 is the same as --with-link-size=4, which (in both
  262   libraries) uses four-byte offsets. Increasing the internal link size reduces
  263   performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
  264   link size setting is ignored, as 4-byte offsets are always used.
  266 . For speed, PCRE2 uses four tables for manipulating and identifying characters
  267   whose code point values are less than 256. By default, it uses a set of
  268   tables for ASCII encoding that is part of the distribution. If you specify
  270   --enable-rebuild-chartables
  272   a program called pcre2_dftables is compiled and run in the default C locale
  273   when you obey "make". It builds a source file called pcre2_chartables.c. If
  274   you do not specify this option, pcre2_chartables.c is created as a copy of
  275   pcre2_chartables.c.dist. See "Character tables" below for further
  276   information.
  278 . It is possible to compile PCRE2 for use on systems that use EBCDIC as their
  279   character code (as opposed to ASCII/Unicode) by specifying
  281   --enable-ebcdic --disable-unicode
  283   This automatically implies --enable-rebuild-chartables (see above). However,
  284   when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
  285   both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
  286   which specifies that the code value for the EBCDIC NL character is 0x25
  287   instead of the default 0x15.
  289 . If you specify --enable-debug, additional debugging code is included in the
  290   build. This option is intended for use by the PCRE2 maintainers.
  292 . In environments where valgrind is installed, if you specify
  294   --enable-valgrind
  296   PCRE2 will use valgrind annotations to mark certain memory regions as
  297   unaddressable. This allows it to detect invalid memory accesses, and is
  298   mostly useful for debugging PCRE2 itself.
  300 . In environments where the gcc compiler is used and lcov is installed, if you
  301   specify
  303   --enable-coverage
  305   the build process implements a code coverage report for the test suite. The
  306   report is generated by running "make coverage". If ccache is installed on
  307   your system, it must be disabled when building PCRE2 for coverage reporting.
  308   You can do this by setting the environment variable CCACHE_DISABLE=1 before
  309   running "make" to build PCRE2. There is more information about coverage
  310   reporting in the "pcre2build" documentation.
  312 . When JIT support is enabled, pcre2grep automatically makes use of it, unless
  313   you add --disable-pcre2grep-jit to the "configure" command.
  315 . There is support for calling external programs during matching in the
  316   pcre2grep command, using PCRE2's callout facility with string arguments. This
  317   support can be disabled by adding --disable-pcre2grep-callout to the
  318   "configure" command. There are two kinds of callout: one that generates
  319   output from inbuilt code, and another that calls an external program. The
  320   latter has special support for Windows and VMS; otherwise it assumes the
  321   existence of the fork() function. This facility can be disabled by adding
  322   --disable-pcre2grep-callout-fork to the "configure" command.
  324 . The pcre2grep program currently supports only 8-bit data files, and so
  325   requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
  326   libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
  327   specifying one or both of
  329   --enable-pcre2grep-libz
  330   --enable-pcre2grep-libbz2
  332   Of course, the relevant libraries must be installed on your system.
  334 . The default starting size (in bytes) of the internal buffer used by pcre2grep
  335   can be set by, for example:
  337   --with-pcre2grep-bufsize=51200
  339   The value must be a plain integer. The default is 20480. The amount of memory
  340   used by pcre2grep is actually three times this number, to allow for "before"
  341   and "after" lines. If very long lines are encountered, the buffer is
  342   automatically enlarged, up to a fixed maximum size.
  344 . The default maximum size of pcre2grep's internal buffer can be set by, for
  345   example:
  347   --with-pcre2grep-max-bufsize=2097152
  349   The default is either 1048576 or the value of --with-pcre2grep-bufsize,
  350   whichever is the larger.
  352 . It is possible to compile pcre2test so that it links with the libreadline
  353   or libedit libraries, by specifying, respectively,
  355   --enable-pcre2test-libreadline or --enable-pcre2test-libedit
  357   If this is done, when pcre2test's input is from a terminal, it reads it using
  358   the readline() function. This provides line-editing and history facilities.
  359   Note that libreadline is GPL-licenced, so if you distribute a binary of
  360   pcre2test linked in this way, there may be licensing issues. These can be
  361   avoided by linking with libedit (which has a BSD licence) instead.
  363   Enabling libreadline causes the -lreadline option to be added to the
  364   pcre2test build. In many operating environments with a sytem-installed
  365   readline library this is sufficient. However, in some environments (e.g. if
  366   an unmodified distribution version of readline is in use), it may be
  367   necessary to specify something like LIBS="-lncurses" as well. This is
  368   because, to quote the readline INSTALL, "Readline uses the termcap functions,
  369   but does not link with the termcap or curses library itself, allowing
  370   applications which link with readline the to choose an appropriate library."
  371   If you get error messages about missing functions tgetstr, tgetent, tputs,
  372   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
  373   should fix it.
  375 . The C99 standard defines formatting modifiers z and t for size_t and
  376   ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
  377   environments other than Microsoft Visual Studio when __STDC_VERSION__ is
  378   defined and has a value greater than or equal to 199901L (indicating C99).
  379   However, there is at least one environment that claims to be C99 but does not
  380   support these modifiers. If --disable-percent-zt is specified, no use is made
  381   of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for
  382   size_t values.
  384 . There is a special option called --enable-fuzz-support for use by people who
  385   want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
  386   library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
  387   be built, but not installed. This contains a single function called
  388   LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
  389   length of the string. When called, this function tries to compile the string
  390   as a pattern, and if that succeeds, to match it. This is done both with no
  391   options and with some random options bits that are generated from the string.
  392   Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
  393   be created. This is normally run under valgrind or used when PCRE2 is
  394   compiled with address sanitizing enabled. It calls the fuzzing function and
  395   outputs information about it is doing. The input strings are specified by
  396   arguments: if an argument starts with "=" the rest of it is a literal input
  397   string. Otherwise, it is assumed to be a file name, and the contents of the
  398   file are the test string.
  400 . Releases before 10.30 could be compiled with --disable-stack-for-recursion,
  401   which caused pcre2_match() to use individual blocks on the heap for
  402   backtracking instead of recursive function calls (which use the stack). This
  403   is now obsolete since pcre2_match() was refactored always to use the heap (in
  404   a much more efficient way than before). This option is retained for backwards
  405   compatibility, but has no effect other than to output a warning.
  407 The "configure" script builds the following files for the basic C library:
  409 . Makefile             the makefile that builds the library
  410 . src/config.h         build-time configuration options for the library
  411 . src/pcre2.h          the public PCRE2 header file
  412 . pcre2-config          script that shows the building settings such as CFLAGS
  413                          that were set for "configure"
  414 . libpcre2-8.pc        )
  415 . libpcre2-16.pc       ) data for the pkg-config command
  416 . libpcre2-32.pc       )
  417 . libpcre2-posix.pc    )
  418 . libtool              script that builds shared and/or static libraries
  420 Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
  421 tarballs under the names config.h.generic and pcre2.h.generic. These are
  422 provided for those who have to build PCRE2 without using "configure" or CMake.
  423 If you use "configure" or CMake, the .generic versions are not used.
  425 The "configure" script also creates config.status, which is an executable
  426 script that can be run to recreate the configuration, and config.log, which
  427 contains compiler output from tests that "configure" runs.
  429 Once "configure" has run, you can run "make". This builds whichever of the
  430 libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
  431 program called pcre2test. If you enabled JIT support with --enable-jit, another
  432 test program called pcre2_jit_test is built as well. If the 8-bit library is
  433 built, libpcre2-posix and the pcre2grep command are also built. Running
  434 "make" with the -j option may speed up compilation on multiprocessor systems.
  436 The command "make check" runs all the appropriate tests. Details of the PCRE2
  437 tests are given below in a separate section of this document. The -j option of
  438 "make" can also be used when running the tests.
  440 You can use "make install" to install PCRE2 into live directories on your
  441 system. The following are installed (file names are all relative to the
  442 <prefix> that is set when "configure" is run):
  444   Commands (bin):
  445     pcre2test
  446     pcre2grep (if 8-bit support is enabled)
  447     pcre2-config
  449   Libraries (lib):
  450     libpcre2-8      (if 8-bit support is enabled)
  451     libpcre2-16     (if 16-bit support is enabled)
  452     libpcre2-32     (if 32-bit support is enabled)
  453     libpcre2-posix  (if 8-bit support is enabled)
  455   Configuration information (lib/pkgconfig):
  456     libpcre2-8.pc
  457     libpcre2-16.pc
  458     libpcre2-32.pc
  459     libpcre2-posix.pc
  461   Header files (include):
  462     pcre2.h
  463     pcre2posix.h
  465   Man pages (share/man/man{1,3}):
  466     pcre2grep.1
  467     pcre2test.1
  468     pcre2-config.1
  469     pcre2.3
  470     pcre2*.3 (lots more pages, all starting "pcre2")
  472   HTML documentation (share/doc/pcre2/html):
  473     index.html
  474     *.html (lots more pages, hyperlinked from index.html)
  476   Text file documentation (share/doc/pcre2):
  477     AUTHORS
  478     COPYING
  479     ChangeLog
  480     LICENCE
  481     NEWS
  482     README
  483     pcre2.txt         (a concatenation of the man(3) pages)
  484     pcre2test.txt     the pcre2test man page
  485     pcre2grep.txt     the pcre2grep man page
  486     pcre2-config.txt  the pcre2-config man page
  488 If you want to remove PCRE2 from your system, you can run "make uninstall".
  489 This removes all the files that "make install" installed. However, it does not
  490 remove any directories, because these are often shared with other programs.
  493 Retrieving configuration information
  494 ------------------------------------
  496 Running "make install" installs the command pcre2-config, which can be used to
  497 recall information about the PCRE2 configuration and installation. For example:
  499   pcre2-config --version
  501 prints the version number, and
  503   pcre2-config --libs8
  505 outputs information about where the 8-bit library is installed. This command
  506 can be included in makefiles for programs that use PCRE2, saving the programmer
  507 from having to remember too many details. Run pcre2-config with no arguments to
  508 obtain a list of possible arguments.
  510 The pkg-config command is another system for saving and retrieving information
  511 about installed libraries. Instead of separate commands for each library, a
  512 single command is used. For example:
  514   pkg-config --libs libpcre2-16
  516 The data is held in *.pc files that are installed in a directory called
  517 <prefix>/lib/pkgconfig.
  520 Shared libraries
  521 ----------------
  523 The default distribution builds PCRE2 as shared libraries and static libraries,
  524 as long as the operating system supports shared libraries. Shared library
  525 support relies on the "libtool" script which is built as part of the
  526 "configure" process.
  528 The libtool script is used to compile and link both shared and static
  529 libraries. They are placed in a subdirectory called .libs when they are newly
  530 built. The programs pcre2test and pcre2grep are built to use these uninstalled
  531 libraries (by means of wrapper scripts in the case of shared libraries). When
  532 you use "make install" to install shared libraries, pcre2grep and pcre2test are
  533 automatically re-built to use the newly installed shared libraries before being
  534 installed themselves. However, the versions left in the build directory still
  535 use the uninstalled libraries.
  537 To build PCRE2 using static libraries only you must use --disable-shared when
  538 configuring it. For example:
  540 ./configure --prefix=/usr/gnu --disable-shared
  542 Then run "make" in the usual way. Similarly, you can use --disable-static to
  543 build only shared libraries.
  546 Cross-compiling using autotools
  547 -------------------------------
  549 You can specify CC and CFLAGS in the normal way to the "configure" command, in
  550 order to cross-compile PCRE2 for some other host. However, you should NOT
  551 specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
  552 source file is compiled and run on the local host, in order to generate the
  553 inbuilt character tables (the pcre2_chartables.c file). This will probably not
  554 work, because pcre2_dftables.c needs to be compiled with the local compiler,
  555 not the cross compiler.
  557 When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
  558 created by making a copy of pcre2_chartables.c.dist, which is a default set of
  559 tables that assumes ASCII code. Cross-compiling with the default tables should
  560 not be a problem.
  562 If you need to modify the character tables when cross-compiling, you should
  563 move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
  564 hand and run it on the local host to make a new version of
  565 pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
  566 at build time" for more details.
  569 Making new tarballs
  570 -------------------
  572 The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
  573 zip formats. The command "make distcheck" does the same, but then does a trial
  574 build of the new distribution to ensure that it works.
  576 If you have modified any of the man page sources in the doc directory, you
  577 should first run the PrepareRelease script before making a distribution. This
  578 script creates the .txt and HTML forms of the documentation from the man pages.
  581 Testing PCRE2
  582 -------------
  584 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
  585 There is another script called RunGrepTest that tests the pcre2grep command.
  586 When JIT support is enabled, a third test program called pcre2_jit_test is
  587 built. Both the scripts and all the program tests are run if you obey "make
  588 check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
  590 The RunTest script runs the pcre2test test program (which is documented in its
  591 own man page) on each of the relevant testinput files in the testdata
  592 directory, and compares the output with the contents of the corresponding
  593 testoutput files. RunTest uses a file called testtry to hold the main output
  594 from pcre2test. Other files whose names begin with "test" are used as working
  595 files in some tests.
  597 Some tests are relevant only when certain build-time options were selected. For
  598 example, the tests for UTF-8/16/32 features are run only when Unicode support
  599 is available. RunTest outputs a comment when it skips a test.
  601 Many (but not all) of the tests that are not skipped are run twice if JIT
  602 support is available. On the second run, JIT compilation is forced. This
  603 testing can be suppressed by putting "nojit" on the RunTest command line.
  605 The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
  606 libraries that are enabled. If you want to run just one set of tests, call
  607 RunTest with either the -8, -16 or -32 option.
  609 If valgrind is installed, you can run the tests under it by putting "valgrind"
  610 on the RunTest command line. To run pcre2test on just one or more specific test
  611 files, give their numbers as arguments to RunTest, for example:
  613   RunTest 2 7 11
  615 You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
  616 end), or a number preceded by ~ to exclude a test. For example:
  618   Runtest 3-15 ~10
  620 This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
  621 except test 13. Whatever order the arguments are in, the tests are always run
  622 in numerical order.
  624 You can also call RunTest with the single argument "list" to cause it to output
  625 a list of tests.
  627 The test sequence starts with "test 0", which is a special test that has no
  628 input file, and whose output is not checked. This is because it will be
  629 different on different hardware and with different configurations. The test
  630 exists in order to exercise some of pcre2test's code that would not otherwise
  631 be run.
  633 Tests 1 and 2 can always be run, as they expect only plain text strings (not
  634 UTF) and make no use of Unicode properties. The first test file can be fed
  635 directly into the perltest.sh script to check that Perl gives the same results.
  636 The only difference you should see is in the first few lines, where the Perl
  637 version is given instead of the PCRE2 version. The second set of tests check
  638 auxiliary functions, error detection, and run-time flags that are specific to
  639 PCRE2. It also uses the debugging flags to check some of the internals of
  640 pcre2_compile().
  642 If you build PCRE2 with a locale setting that is not the standard C locale, the
  643 character tables may be different (see next paragraph). In some cases, this may
  644 cause failures in the second set of tests. For example, in a locale where the
  645 isprint() function yields TRUE for characters in the range 128-255, the use of
  646 [:isascii:] inside a character class defines a different set of characters, and
  647 this shows up in this test as a difference in the compiled code, which is being
  648 listed for checking. For example, where the comparison test output contains
  649 [\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
  650 cases. This is not a bug in PCRE2.
  652 Test 3 checks pcre2_maketables(), the facility for building a set of character
  653 tables for a specific locale and using them instead of the default tables. The
  654 script uses the "locale" command to check for the availability of the "fr_FR",
  655 "french", or "fr" locale, and uses the first one that it finds. If the "locale"
  656 command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
  657 the list of available locales, the third test cannot be run, and a comment is
  658 output to say why. If running this test produces an error like this:
  660   ** Failed to set locale "fr_FR"
  662 it means that the given locale is not available on your system, despite being
  663 listed by "locale". This does not mean that PCRE2 is broken. There are three
  664 alternative output files for the third test, because three different versions
  665 of the French locale have been encountered. The test passes if its output
  666 matches any one of them.
  668 Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
  669 with the perltest.sh script, and test 5 checking PCRE2-specific things.
  671 Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
  672 non-UTF mode and UTF-mode with Unicode property support, respectively.
  674 Test 8 checks some internal offsets and code size features, but it is run only
  675 when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
  676 32-bit modes and for different link sizes, so there are different output files
  677 for each mode and link size.
  679 Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
  680 16-bit and 32-bit modes. These are tests that generate different output in
  681 8-bit mode. Each pair are for general cases and Unicode support, respectively.
  683 Test 13 checks the handling of non-UTF characters greater than 255 by
  684 pcre2_dfa_match() in 16-bit and 32-bit modes.
  686 Test 14 contains some special UTF and UCP tests that give different output for
  687 different code unit widths.
  689 Test 15 contains a number of tests that must not be run with JIT. They check,
  690 among other non-JIT things, the match-limiting features of the intepretive
  691 matcher.
  693 Test 16 is run only when JIT support is not available. It checks that an
  694 attempt to use JIT has the expected behaviour.
  696 Test 17 is run only when JIT support is available. It checks JIT complete and
  697 partial modes, match-limiting under JIT, and other JIT-specific features.
  699 Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
  700 the 8-bit library, without and with Unicode support, respectively.
  702 Test 20 checks the serialization functions by writing a set of compiled
  703 patterns to a file, and then reloading and checking them.
  705 Tests 21 and 22 test \C support when the use of \C is not locked out, without
  706 and with UTF support, respectively. Test 23 tests \C when it is locked out.
  708 Tests 24 and 25 test the experimental pattern conversion functions, without and
  709 with UTF support, respectively.
  712 Character tables
  713 ----------------
  715 For speed, PCRE2 uses four tables for manipulating and identifying characters
  716 whose code point values are less than 256. By default, a set of tables that is
  717 built into the library is used. The pcre2_maketables() function can be called
  718 by an application to create a new set of tables in the current locale. This are
  719 passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
  720 compile context.
  722 The source file called pcre2_chartables.c contains the default set of tables.
  723 By default, this is created as a copy of pcre2_chartables.c.dist, which
  724 contains tables for ASCII coding. However, if --enable-rebuild-chartables is
  725 specified for ./configure, a new version of pcre2_chartables.c is built by the
  726 program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
  727 character handling functions such as isalnum(), isalpha(), isupper(),
  728 islower(), etc. to build the table sources. This means that the default C
  729 locale that is set for your system will control the contents of these default
  730 tables. You can change the default tables by editing pcre2_chartables.c and
  731 then re-building PCRE2. If you do this, you should take care to ensure that the
  732 file does not get automatically re-generated. The best way to do this is to
  733 move pcre2_chartables.c.dist out of the way and replace it with your customized
  734 tables.
  736 When the pcre2_dftables program is run as a result of specifying
  737 --enable-rebuild-chartables, it uses the default C locale that is set on your
  738 system. It does not pay attention to the LC_xxx environment variables. In other
  739 words, it uses the system's default locale rather than whatever the compiling
  740 user happens to have set. If you really do want to build a source set of
  741 character tables in a locale that is specified by the LC_xxx variables, you can
  742 run the pcre2_dftables program by hand with the -L option. For example:
  744   ./pcre2_dftables -L pcre2_chartables.c.special
  746 The second argument names the file where the source code for the tables is
  747 written. The first two 256-byte tables provide lower casing and case flipping
  748 functions, respectively. The next table consists of a number of 32-byte bit
  749 maps which identify certain character classes such as digits, "word"
  750 characters, white space, etc. These are used when building 32-byte bit maps
  751 that represent character classes for code points less than 256. The final
  752 256-byte table has bits indicating various character types, as follows:
  754     1   white space character
  755     2   letter
  756     4   lower case letter
  757     8   decimal digit
  758    16   alphanumeric or '_'
  760 You can also specify -b (with or without -L) when running pcre2_dftables. This
  761 causes the tables to be written in binary instead of as source code. A set of
  762 binary tables can be loaded into memory by an application and passed to
  763 pcre2_compile() in the same way as tables created dynamically by calling
  764 pcre2_maketables(). The tables are just a string of bytes, independent of
  765 hardware characteristics such as endianness. This means they can be bundled
  766 with an application that runs in different environments, to ensure consistent
  767 behaviour.
  769 See also the pcre2build section "Creating character tables at build time".
  772 File manifest
  773 -------------
  775 The distribution should contain the files listed below.
  777 (A) Source files for the PCRE2 library functions and their headers are found in
  778     the src directory:
  780   src/pcre2_dftables.c     auxiliary program for building pcre2_chartables.c
  781                            when --enable-rebuild-chartables is specified
  783   src/pcre2_chartables.c.dist  a default set of character tables that assume
  784                            ASCII coding; unless --enable-rebuild-chartables is
  785                            specified, used by copying to pcre2_chartables.c
  787   src/pcre2posix.c         )
  788   src/pcre2_auto_possess.c )
  789   src/pcre2_compile.c      )
  790   src/pcre2_config.c       )
  791   src/pcre2_context.c      )
  792   src/pcre2_convert.c      )
  793   src/pcre2_dfa_match.c    )
  794   src/pcre2_error.c        )
  795   src/pcre2_extuni.c       )
  796   src/pcre2_find_bracket.c )
  797   src/pcre2_jit_compile.c  )
  798   src/pcre2_jit_match.c    ) sources for the functions in the library,
  799   src/pcre2_jit_misc.c     )   and some internal functions that they use
  800   src/pcre2_maketables.c   )
  801   src/pcre2_match.c        )
  802   src/pcre2_match_data.c   )
  803   src/pcre2_newline.c      )
  804   src/pcre2_ord2utf.c      )
  805   src/pcre2_pattern_info.c )
  806   src/pcre2_script_run.c   )
  807   src/pcre2_serialize.c    )
  808   src/pcre2_string_utils.c )
  809   src/pcre2_study.c        )
  810   src/pcre2_substitute.c   )
  811   src/pcre2_substring.c    )
  812   src/pcre2_tables.c       )
  813   src/pcre2_ucd.c          )
  814   src/pcre2_valid_utf.c    )
  815   src/pcre2_xclass.c       )
  817   src/pcre2_printint.c     debugging function that is used by pcre2test,
  818   src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
  820   src/config.h.in          template for config.h, when built by "configure"
  821   src/pcre2.h.in           template for pcre2.h when built by "configure"
  822   src/pcre2posix.h         header for the external POSIX wrapper API
  823   src/pcre2_internal.h     header for internal use
  824   src/pcre2_intmodedep.h   a mode-specific internal header
  825   src/pcre2_ucp.h          header for Unicode property handling
  827   sljit/*                  source files for the JIT compiler
  829 (B) Source files for programs that use PCRE2:
  831   src/pcre2demo.c          simple demonstration of coding calls to PCRE2
  832   src/pcre2grep.c          source of a grep utility that uses PCRE2
  833   src/pcre2test.c          comprehensive test program
  834   src/pcre2_jit_test.c     JIT test program
  836 (C) Auxiliary files:
  838   132html                  script to turn "man" pages into HTML
  839   AUTHORS                  information about the author of PCRE2
  840   ChangeLog                log of changes to the code
  841   CleanTxt                 script to clean nroff output for txt man pages
  842   Detrail                  script to remove trailing spaces
  843   HACKING                  some notes about the internals of PCRE2
  844   INSTALL                  generic installation instructions
  845   LICENCE                  conditions for the use of PCRE2
  846   COPYING                  the same, using GNU's standard name
  847   Makefile.in              ) template for Unix Makefile, which is built by
  848                            )   "configure"
  849   Makefile.am              ) the automake input that was used to create
  850                            )   Makefile.in
  851   NEWS                     important changes in this release
  852   NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
  853   PrepareRelease           script to make preparations for "make dist"
  854   README                   this file
  855   RunTest                  a Unix shell script for running tests
  856   RunGrepTest              a Unix shell script for pcre2grep tests
  857   aclocal.m4               m4 macros (generated by "aclocal")
  858   config.guess             ) files used by libtool,
  859   config.sub               )   used only when building a shared library
  860   configure                a configuring shell script (built by autoconf)
  861   configure.ac             ) the autoconf input that was used to build
  862                            )   "configure" and config.h
  863   depcomp                  ) script to find program dependencies, generated by
  864                            )   automake
  865   doc/*.3                  man page sources for PCRE2
  866   doc/*.1                  man page sources for pcre2grep and pcre2test
  867   doc/index.html.src       the base HTML page
  868   doc/html/*               HTML documentation
  869   doc/pcre2.txt            plain text version of the man pages
  870   doc/pcre2test.txt        plain text documentation of test program
  871   install-sh               a shell script for installing files
  872   libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
  873   libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
  874   libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
  875   libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
  876   ltmain.sh                file used to build a libtool script
  877   missing                  ) common stub for a few missing GNU programs while
  878                            )   installing, generated by automake
  879   mkinstalldirs            script for making install directories
  880   perltest.sh              Script for running a Perl test program
  881   pcre2-config.in          source of script which retains PCRE2 information
  882   testdata/testinput*      test data for main library tests
  883   testdata/testoutput*     expected test results
  884   testdata/grep*           input and output for pcre2grep tests
  885   testdata/*               other supporting test files
  887 (D) Auxiliary files for cmake support
  890   cmake/FindPackageHandleStandardArgs.cmake
  891   cmake/FindEditline.cmake
  892   cmake/FindReadline.cmake
  893   CMakeLists.txt
  894   config-cmake.h.in
  896 (E) Auxiliary files for building PCRE2 "by hand"
  898   src/pcre2.h.generic     ) a version of the public PCRE2 header file
  899                           )   for use in non-"configure" environments
  900   src/config.h.generic    ) a version of config.h for use in non-"configure"
  901                           )   environments
  903 Philip Hazel
  904 Email local part: Philip.Hazel
  905 Email domain: gmail.com
  906 Last updated: 04 December 2020