"Fossies" - the Fresh Open Source Software Archive

Member "xapian-core-1.4.14/HACKING" (23 Nov 2019, 69786 Bytes) of package /linux/www/xapian-core-1.4.14.tar.xz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the latest Fossies "Diffs" side-by-side code changes report for "HACKING": 1.4.13_vs_1.4.14.

    1 Instructions for hacking on Xapian
    2 ==================================
    3 
    4 .. contents:: Table of contents
    5 
    6 This file is aimed to help developers get started with working on
    7 Xapian.  The documentation contains a section covering various internal
    8 aspects of the library - this can also be found on the Xapian website
    9 <https://xapian.org/>.
   10 
   11 Extra options to give to configure
   12 ==================================
   13 
   14 Note: Non-developer configure options are described in INSTALL
   15 
   16 You will probably want to use some of these if you're going to be developing
   17 Xapian.
   18 
   19 --enable-assertions
   20 	This enables compiling of assertion code which will throw
   21 	Xapian::AssertionError if the code detects violating of
   22 	preconditions, postconditions, or fails other consistency checks.
   23 
   24 --enable-assertions=partial
   25 	This option enables a subset of the assertions enabled by
   26 	"--enable-assertions", but not the most expensive.  The intention is
   27 	that it should be suitable for use in a real-world system for tracking
   28 	down problems without imposing too much of an overhead (but note that
   29 	we haven't yet performed timings to measure the overhead...)
   30 
   31 --enable-log
   32 	This enables compiling code into the library which generates verbose
   33 	debugging messages.  See "Debugging Messages", below.
   34 
   35 --enable-log=profile
   36 	In 1.2.0 and earlier, this used to use the debug logging macros to
   37 	report to stderr how long each method takes to execute.  This feature
   38 	was removed in 1.2.1 - you are likely to get better results using
   39 	dedicated profiling tools - for more information see:
   40 	https://trac.xapian.org/wiki/ProfilingXapian
   41 
   42 --enable-maintainer-mode
   43 	This tells configure to enable make dependencies for regenerating build
   44 	system files (such as configure, Makefile.in, and Makefile) and other
   45 	generated files (such as the stemmers and query parser) when required.
   46 	These are disabled by default as some make programs try to rebuild them
   47 	when it's not appropriate (e.g. BSD make doesn't handle VPATH except
   48 	for implicit rules).  For this reason, we recommend GNU make if you
   49 	enable maintainer mode.  You'll also need a non-cross-compiling C
   50 	compiler for compiling the Lemon parser generator and the Snowball
   51 	stemming algorithm compiler.  The configure script will attempt to
   52 	locate one, but you can override this autodetection by passing
   53 	CC_FOR_BUILD on the command line like so::
   54 
   55 	./configure CC_FOR_BUILD=/opt/bin/gcc
   56 
   57 --enable-documentation
   58 	This tells configure to enable make dependencies for regenerating
   59 	documentation files.  By default it uses the same setting as
   60 	--enable-maintainer-mode.
   61 
   62 Debugging Messages
   63 ==================
   64 
   65 If you configure with --enable-log, lots of places in the code generate
   66 debugging messages to tell us what they're up to - this information can be
   67 very useful for debugging both the Xapian library and code which uses it.  But
   68 the quantity of information generated is potentially vast so there's a
   69 mechanism to allow you to select where to store the log and which types of
   70 message you're interested by setting environment variables.  You can:
   71 
   72  * set XAPIAN_DEBUG_LOG to be the path to a file that you would like debugging
   73    output to be appended to, or to the special value ``-`` to indicate that you
   74    would like debugging output to be sent to stderr.  Unless XAPIAN_DEBUG_LOG
   75    is set, no debug logging will be performed.  Occurrences of ``%p`` in
   76    XAPIAN_DEBUG_LOG will be replaced with the current process-id.
   77 
   78    If you're debugging a crash and want to avoid losing the most recent log
   79    messages then include ``%!`` in XAPIAN_DEBUG_LOG (which is replaced with
   80    the empty string).  This will cause the log file to be opened with
   81    ``O_DSYNC`` or ``O_SYNC`` or similar if running on a platform that supports
   82    a suitable mechanism.  In 1.4.10 and earlier this was on by default (and
   83    ``%!`` has no special meaning) but it can incur a significant performance
   84    overhead and in most cases isn't necessary.
   85 
   86  * set XAPIAN_DEBUG_FLAGS to a string of capital letters indicating the types
   87    of debugging message you would like to display (the default is to log calls
   88    to API functions and methods).  These letters are shown in the first column
   89    of the log output, and are also listed in ``common/debuglog.h``.  If the
   90    first character is ``-``, then the letters indicate those categories of
   91    message *not* be shown instead.  As a consequence of this, setting
   92    ``XAPIAN_DEBUG_FLAGS=-`` will give you all debugging messages.
   93 
   94 These environment variables only have any effect if you ran configure with the
   95 --enable-log option.
   96 
   97 The format is::
   98 
   99     <message type> <pid> [<this>] <message>
  100 
  101 For example::
  102 
  103     A 16747 [0x57ad1e0] void Xapian::Query::Internal::validate_query()
  104 
  105 Each nested call adds another space before the ``[`` so you can easily see
  106 which function call and return messages correspond.
  107 
  108 Debugging memory allocations
  109 ============================
  110 
  111 The testsuite can make use of valgrind 3.3.0 or newer to check for memory
  112 leaks, reads from uninitialised memory, and some other bugs during tests.
  113 
  114 Valgrind doesn't support every platform, but Xapian contains very little
  115 platform specific code (and most of what there is is Microsoft Windows
  116 specific) so even just testing with valgrind on one platform gives good
  117 coverage.
  118 
  119 If you have a new enough version of valgrind installed, it's automatically
  120 detected by configure and used when running the testsuite.  The testsuite runs
  121 more slowly under valgrind, so if you wish to disable this auto-detection you
  122 can run configure with::
  123 
  124 ./configure VALGRIND=
  125 
  126 Or you can disable use of valgrind during a particular run of "make check"
  127 like so::
  128 
  129 make check VALGRIND=
  130 
  131 Or disable it while running a test directly (under sh or bash)::
  132 
  133 VALGRIND= ./runtest ./apitest
  134 
  135 Current versions of valgrind result in false positives on current versions
  136 of macOS, so on this platform configure only enables use of valgrind if
  137 it's specified explicitly, for example if valgrind is on your ``PATH``
  138 you can just use::
  139 
  140 ./configure VALGRIND=valgrind
  141 
  142 Running test programs
  143 =====================
  144 
  145 To run all tests, use ``make check``.  You can also run just the subset of
  146 tests which exercise the inmemory, remote progserver, remote TCP,
  147 multi-database, glass, or chert backends using ``make check-inmemory``,
  148 ``make check-remoteprog``, ``make check-remotetcp``, ``make check-multi``,
  149 ``make check-glass``, or ``make check-chert``
  150 respectively.
  151 
  152 Also, ``make check-remote`` will run the tests on both variants of the remote
  153 backend, and ``make check-none`` will run those tests which don't use any
  154 backend.  These are handy shortcuts when doing development work on a particular
  155 backend.
  156 
  157 The runtest script (in the tests subdirectory) takes care of the details of
  158 running the test programs (including setting up the environment so they work
  159 when srcdir != builddir and handling libtool dynamically linked binaries).  To
  160 run a test program by hand (rather than via make) just use:
  161 
  162 ./runtest ./apitest
  163 
  164 You can specify options and arguments.  Individual test programs optionally
  165 take one or more test names as arguments, and you can also pass ``-v`` to get
  166 more verbose output from failing tests, e.g.:
  167 
  168 ./runtest ./apitest -v deldoc1
  169 
  170 If the number of the test is omitted, all tests with that basename are run,
  171 so to run deldoc1, deldoc2, etc:
  172 
  173 ./runtest ./apitest deldoc
  174 
  175 You can also use runtest to run a test program under gdb (or most other tools):
  176 
  177 ./runtest gdb ./apitest -v deldoc1
  178 ./runtest valgrind ./apitest -v deldoc1
  179 
  180 Some test programs take special arguments - for example, you can restrict
  181 apitest to the glass backend using ``-bglass``.
  182 
  183 There are a few environment variables which the testsuite harness checks for
  184 which you might find useful:
  185 
  186   XAPIAN_TESTSUITE_SIG_DFL:
  187     By default, the testsuite harness catches signals and handles them
  188     gracefully - the current test is failed, and the testsuite moves onto the
  189     next test.  If you want to suppress this (some debugging tools may work
  190     better if the signal is not caught) set the environment variable
  191     XAPIAN_TESTSUITE_SIG_DFL to any value to prevent the testsuite harness
  192     from installing its own signal handling.
  193 
  194   XAPIAN_TESTSUITE_OUTPUT:
  195     By default, the testsuite harness uses ANSI escape sequences to give
  196     colour output if stdout is a tty.  You can disable this feature by setting
  197     XAPIAN_TESTSUITE_OUTPUT=plain (alternatively, piping the output (e.g.
  198     through ``cat`` or ``more``) will have the same effect).  Auto-detection
  199     can be explicitly specified with XAPIAN_TESTSUITE_OUTPUT=auto (or empty).
  200     Any other value forces the use of colour.  Colour output is always disabled
  201     on Microsoft Windows, so XAPIAN_TESTSUITE_OUTPUT has no effect there.
  202 
  203   XAPIAN_TESTSUITE_LD_PRELOAD:
  204     The runtest script will add this to LD_PRELOAD if it is set, allowing you
  205     to easily load LD_PRELOAD libraries when running the testsuite.  The
  206     original intended use was to allow use of libeatmydata
  207     (https://www.flamingspork.com/projects/libeatmydata/) which makes fsync
  208     and related calls no-ops, but configure now checks for the eatmydata
  209     wrapper script and this is used automatically.  However, there may be
  210     other LD_PRELOAD libraries which are useful, so we've left the machinery
  211     in place.
  212 
  213 Speeding up the testsuite with eatmydata
  214 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  215 
  216 The testsuite does a lot of small database operations, and the calls to fsync,
  217 fdatasync, etc which Xapian makes by default can slow down testsuite runs
  218 substantially.  There's a handy LD_PRELOAD library called eatmydata
  219 (https://www.flamingspork.com/projects/libeatmydata/), which can help here, by
  220 turning fsync and related calls into no-ops.
  221 
  222 You need a version of eatmydata with the eatmydata wrapper script (version 37
  223 or newer), and then configure should auto-detect it and it'll get used when
  224 running the testsuite (via runtest).  If you wish to disable this
  225 auto-detection for some reason, you can run configure with:
  226 
  227 ./configure EATMYDATA=
  228 
  229 Or you can disable use of eatmydata during a particular run of "make check"
  230 like so:
  231 
  232 make check EATMYDATA=
  233 
  234 Or disable it while running a test directly (under sh or bash):
  235 
  236 EATMYDATA= ./runtest ./apitest
  237 
  238 Using various debugging, profiling, and leak-finding tools
  239 ==========================================================
  240 
  241 GCC's libstdc++ supports a debug mode, which checks for various misuses of
  242 the STL - to enable this, define _GLIBCXX_DEBUG when building Xapian:
  243 
  244   ./configure CPPFLAGS=-D_GLIBCXX_DEBUG
  245 
  246 For documentation of this option, see:
  247 https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode.html
  248 
  249 Note: all C++ code must be compiled with this defined or you'll get problems.
  250 Xapian's API headers include a check that the same setting is used when
  251 building code using Xapian as was used to build Xapian.
  252 
  253 To use valgrind (http://www.valgrind.org/), no special build options are
  254 required, but make sure you compile with debugging information (on by default
  255 for GCC) and the valgrind documentation recommends disabling optimisation (with
  256 optimisation, line numbers in error messages can be confusing due to code
  257 inlining, etc):
  258 
  259   ./configure CXXFLAGS='-O0 -g'
  260 
  261 To use gdb (https://www.gnu.org/software/gdb/), no special build options are
  262 required, but make sure you compile with debugging information (on by default
  263 for GCC).  You'll probably find debugging easier if you compile without
  264 optimisation (with optimisation, line numbers in error messages can be
  265 confusing due to code inlining, etc, and the values of some variables can't be
  266 printed because they've been eliminated from the code completely):
  267 
  268   ./configure CXXFLAGS='-O0 -g'
  269 
  270 To enable profiling for gprof:
  271 
  272   ./configure CXXFLAGS=-pg LDFLAGS=-pg
  273 
  274 To use Purify (a proprietary tool):
  275 
  276   ./configure CXXLD='purify c++' --disable-shared
  277 
  278 To use Insure (another proprietary tool):
  279 
  280   ./configure CXX=insure
  281 
  282 To use lcov (at least version 1.10) to generate a test coverage report (see
  283 `lcov.xapian.org <http://lcov.xapian.org/>`_ for reports) there are three make
  284 targets (all in the `xapian-core` directory):
  285 
  286   * `make coverage-reconfigure`: reruns configure in the source tree.  See
  287     Makefile.am for details of the configure options used and why they
  288     are needed.  If you're using ccache, make sure it's at least version
  289     3.0, and ideally at least 3.2.2.
  290 
  291   * `make coverage-reconfigure-maintainer-mode`: does the same thing, except
  292     the tree is configured in "maintainer mode", which is what you want if
  293     generating coverage reports while working on the code.
  294 
  295   * `make coverage-check`: runs `make check` and generates an HTML report in a
  296     directory called `lcov`.
  297 
  298     + You can specify extra arguments to pass to the ``genhtml`` tool using
  299       `GENHTML_ARGS`, so for example if you plan to serve the generated HTML
  300       coverage report from a webserver, you might use:
  301       `make coverage-check GENHTML_ARGS=--html-gzip`
  302 
  303 You ideally want lcov 1.11 or later, since 1.11 includes patches to reduce
  304 memory usage significantly - lcov 1.10 would run out of memory in a 1GB VM.
  305 
  306 If you have runes for using other tools, please add them above, or send them
  307 to us so we can.
  308 
  309 Snapshots
  310 =========
  311 
  312 If you want to try unreleased Xapian code, you can fetch it from our git
  313 repository.  For convenience, we also provide bootstrapped tarballs (much like
  314 the sourcecode download for any release version) which get built every 20
  315 minutes if there have been any changes checked in.  These tarballs need to
  316 pass "make distcheck" to be automatically uploaded, so using them will help
  317 to assure that you don't pick a "bad" version.  The snapshots are available
  318 from the "Bleeding Edge" page of the Xapian website.
  319 
  320 Building from git
  321 =================
  322 
  323 When building from a git checkout, we *strongly* recommend that you use
  324 the ``bootstrap`` script in the top level directory to set up the tree ready
  325 for building.  This script will check which directories you have checked out,
  326 so you can bootstrap a partial tree.  You can also ``touch .nobootstrap`` in
  327 a subdirectory to tell bootstrap to ignore it.
  328 
  329 You will need the following tools installed to build from git:
  330 
  331 * GNU m4 >= 4.6 (for autoconf)
  332 * perl >= 5.6 (for automake; also for various maintainer scripts)
  333 * python >= 2.3 (for generating the Python bindings)
  334 * GNU make (or another make which support VPATH for explicit rules)
  335 * GNU bison (for building SWIG, used for generating the bindings)
  336 * Tcl (to generate unicode/unicode-data.cc)
  337 
  338 For a recent version of Debian or Ubuntu, this command should ensure you have
  339 all the necessary tools and libraries::
  340 
  341     apt-get install build-essential m4 perl python zlib1g-dev uuid-dev wget bison tcl
  342 
  343 If you want to build Omega, you'll also need::
  344 
  345     apt-get install libpcre3-dev libmagic-dev
  346 
  347 On Fedora, the uuid library can be installed by doing::
  348 
  349     yum install libuuid-devel
  350 
  351 On Mac OS X, if you're using macports you'll want the following:
  352 
  353   * file (magic.h in configure)
  354 
  355 If you're using homebrew you'll want the following::
  356 
  357     brew install libmagic pcre
  358 
  359 If you're doing much development work, you'll probably also want the following
  360 tools installed:
  361 
  362 * valgrind for better testsuite error finding
  363 * ccache for faster rebuilds
  364 * eatmydata for faster testsuite runs
  365 
  366 The repository does not contain any automatically generated files
  367 (such as configure, Makefile.in, Snowball-generated stemmers, Lemon-generated
  368 parsers, SWIG-generated code, etc) because experience shows it's best to keep
  369 these out of version control.  To avoid requiring you to install the correct
  370 versions of the tools required, we either include the source to these tools in
  371 the repo directly (in the case of Snowball and Lemon), or the bootstrap script
  372 will download them as tarballs (autoconf, automake, libtool) or
  373 from git (SWIG), build them, and install them within the source tree.
  374 
  375 To download source tarballs, bootstrap will use wget, curl or lwp-request if
  376 installed.  If not, it will give an error telling you the URL to download from
  377 by hand and where to copy the file to.
  378 
  379 Bootstrap will then run autoreconf on each of the checked-out subdirectories,
  380 and generate a top-level configure script.  This configure script allows you to
  381 configure xapian-core and any other modules you've checked out with single
  382 simple command, such that the other modules link against the uninstalled
  383 xapian-core (which is very handy for development work and a bit fiddly to set
  384 up by hand).  It automatically passes --enable-maintainer-mode to the
  385 subprojects so that the autotools will be rerun if configure.ac, Makefile.am,
  386 etc are modified.
  387 
  388 The bootstrap script doesn't care what the current directory is.  The top-level
  389 configure script generated by it supports building in a separate directory to
  390 the sources: simply create the directory you want to build in, and then run the
  391 configure script from inside that directory.  For example, to build in a
  392 directory called "build" (starting in the top level source directory)::
  393 
  394   ./bootstrap
  395   mkdir build
  396   cd build
  397   ../configure
  398 
  399 When running bootstrap, if you need to add any extra macro directories to the
  400 path searched by aclocal (which is part of automake), you can do this by
  401 specifying these in the ACLOCAL_FLAGS environment variable, e.g.::
  402 
  403   ACLOCAL_FLAGS=-I/extra/macro/directory ./bootstrap
  404 
  405 If you wish to prevent bootstrap from downloading and building the autotools
  406 pass the --without-autotools option.  You can force it to delete the downloaded
  407 and installed versions by passing --clean.
  408 
  409 If you are tracking development in git, there will sometimes be changes
  410 to the build system sources which require regeneration of the generated
  411 makefiles and associated machinery.  We aim to make the build system
  412 automatically regenerate the necessary files, but in the event that a build
  413 fails after an update, it may be worth re-running the bootstrap script to
  414 regenerate the build system from scratch, before looking for the cause of the
  415 error elsewhere.
  416 
  417 Tools required to build documentation
  418 -------------------------------------
  419 
  420 If you want to be able to build distribution tarballs (with "make dist") then
  421 you'll also need some further tools.  If you don't want to have to install all
  422 these tools, then pass --disable-documentation to configure to disable these
  423 rules (the default state of this follows the setting of
  424 --enable-maintainer-mode, so in a non-maintainer-mode tree, you can pass
  425 --enable-documentation to enable these rules).  Without the documentation,
  426 "make dist" will fail (to prevent accidentally distributing tarballs without
  427 documentation), but you can configure and build.
  428 
  429 The documentation tools are:
  430 
  431 * doxygen (v1.8.8 is used for 1.3.x snapshots and releases; 1.7.6.1 fails to
  432   process trunk after PL2Weight was added).
  433 * dot (part of Graphviz.  Doxygen's DOT_MULTI_TARGETS option apparently needs
  434   ">1.8.10")
  435 * help2man
  436 * rst2html or rst2html.py (in python-docutils on Debian/Ubuntu)
  437 * pngcrush (optional - used to reduce the size of PNG files in the HTML
  438   apidocs)
  439 * sphinx-doc (in python-sphinx and python3-sphinx on Debian/Ubuntu, or as
  440   sphinx via pip install)
  441 
  442 For a recent version of Debian or Ubuntu, this command should install all the
  443 required documentation tools::
  444 
  445     apt-get install doxygen graphviz help2man python-docutils pngcrush python-sphinx python3-sphinx
  446 
  447 Documentation builds on OS X
  448 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  449 
  450 On Mac OS X, if you're using homebrew, you'll want the following::
  451 
  452     brew install doxygen help2man graphviz pngcrush
  453 
  454 (Ensure you're up to date with brew, as earlier packaging of graphviz
  455 didn't properly install dot.)
  456 
  457 You also need sphinx and docutils, which are python packages; you can
  458 install them via pip::
  459 
  460     pip install sphinx docutils
  461 
  462 You may find it easier to use homebrew to install python first, so
  463 these packages are separate from the system python::
  464 
  465     brew install python
  466 
  467 If you install both python (v2) and python3 (v3) via homebrew, you
  468 will be able to build bindings for both; you'll then need to install
  469 sphinx for python3::
  470 
  471     pip3 install sphinx
  472 
  473 PDF versions of docs
  474 ~~~~~~~~~~~~~~~~~~~~
  475 
  476 As of 1.3.2, we no longer build PDF versions of the API docs by default, but
  477 you can build them yourself with::
  478 
  479     make -C docs apidoc.pdf
  480 
  481 Additional tools are needed for these:
  482 
  483 * gs (part of Ghostscript)
  484 * pdflatex (in texlive-latex-base on Debian/Ubuntu)
  485 * epstopdf (in texlive-extra-utils on Debian/Ubuntu)
  486 * makeindex (in texlive-binaries on Debian/Ubuntu, or texlive-base-bin for older releases)
  487 
  488 Note that pdflatex, epstopdf, gs, and makeindex must all currently be on your
  489 path (as specified by the environment variable PATH), since doxygen will look
  490 for them there.
  491 
  492 For a recent version of Debian or Ubuntu, this command should install these
  493 extra tools::
  494 
  495     apt-get install ghostscript texlive-latex-base texlive-extra-utils texlive-binaries texlive-fonts-extra texlive-fonts-recommended texlive-latex-extra texlive-latex-recommended
  496 
  497 On Mac OS X, if you're using macports you'll want the following:
  498 
  499   * texlive (pdflatex during build)
  500   * texlive-basic (for makeindex in configure)
  501   * texlive-latex-extra (latex style)
  502 
  503 Alternatively, you can install MacTeX from https://www.tug.org/mactex/ instead
  504 of texlive, texlive-basic and texlive-latex-extra.
  505 
  506 The homebrew texlive package only supports 32 bit systems, so even if you're
  507 using homebrew, you'll probably want to install MacTeX from
  508 https://www.tug.org/mactex/ instead.
  509 
  510 Autotools versions
  511 ------------------
  512 
  513 * autoconf 2.69 is used to generate snapshots and releases.
  514 
  515   autoconf 2.64 is a hard minimum requirement.
  516 
  517   autoconf 2.60 is required for docdir support and AC_TYPE_SSIZE_T.
  518 
  519   autoconf 2.62 generates faster configure scripts and warns about unrecognised
  520   options passed to configure.
  521 
  522   autoconf 2.63 fixes a regression in AC_C_BIGENDIAN introduced in 2.62
  523   (Omega uses this macro).
  524 
  525   autoconf 2.64 generates smaller configure scripts by using shell functions.
  526 
  527 * automake 1.15.1 is used to generate snapshots and releases.
  528 
  529   automake 1.12.2 is a hard minimum requirement.  This version fixes a
  530   security issue (CVE-2012-3386) in the generated `make distcheck` rules.
  531 
  532   automake 1.12 is needed to support using LOG_COMPILER to specify a testsuite
  533   driver (used by xapian-bindings).
  534 
  535 * libtool 2.4.6 is used to generate snapshots and releases.
  536 
  537   libtool 2.2.8 is the current hard minimum requirement.
  538 
  539   libtool 2.2 is required for us to be able to override link_all_deplibs_CXX
  540   and sys_lib_dlsearch_path_spec in configure.  It also fixes some
  541   long-standing issues and is significantly faster.
  542 
  543 Please tell us if you find that newer versions of any of these tools work or
  544 fail to work.
  545 
  546 There is a good GNU autotools tutorial at
  547 <https://www.lrde.epita.fr/~adl/autotools.html>.
  548 
  549 Building from git on Windows with MSVC
  550 --------------------------------------
  551 
  552 Building using MSVC is now supported by the autotools build system.  You need
  553 to install a set of Unix-like tools first - we recommended MSYS2:
  554 https://www.msys2.org/
  555 
  556 For details of how to specify MSVC to ``configure`` see the "INSTALL" document.
  557 
  558 When building from git, by default you'll need some extra tools to generate
  559 Unicode tables (Tcl) and build documentation (doxygen, help2man, sphinx-doc).
  560 We don't currently have detailed advice on how to do this (if you can provide
  561 some then please send a patch).
  562 
  563 You can avoid needing Tcl by copying ``xapian-core/unicode/unicode-data.cc``
  564 from another platform or a release which uses the same Unicode version.  You
  565 can avoid needing most of the documentation tools by running configure with
  566 the ``--disable-documentation`` option.
  567 
  568 Using a Vagrant-driven Ubuntu virtual machine
  569 ---------------------------------------------
  570 
  571 Note: Vagrant support is experimental. Please report bugs in the
  572 normal fashion, to https://trac.xapian.org/newticket, or ask for help
  573 on the #xapian IRC channel on Freenode.
  574 
  575 If you have Vagrant (https://www.vagrantup.com/, tested on version
  576 1.5.2) and VirtualBox (https://www.virtualbox.org/, tested on version
  577 4.3.10) installed, `vagrant up` will make a virtual machine suitable
  578 for developing Xapian:
  579 
  580  * Ubuntu 13.04 with all packages needed to build Xapian and its
  581    documentation
  582 
  583  * eatmydata (to speed up test runs) and valgrind (for debugging
  584    memory allocations) both also installed
  585 
  586  * source code from this checkout in /vagrant; edit it on your host
  587    operating system and changes are reflected in the VM. The source
  588    tree is bootstrapped automatically (ensuring that the right
  589    versions of the build tools are available on the VM)
  590 
  591  * build tree in /home/vagrant/build, configured to install into
  592    /home/vagrant/install, with maintainer mode and documentation
  593    both enabled
  594 
  595 Setting up can take a long time, as it downloads a minimal base box
  596 and then installs all the required packages; once this is done you
  597 don't have to wait so long if you need to reprovision the VM. (Once
  598 Ubuntu 14.04 is released the plan is to build our own base box with
  599 these packages already installed, which should make the process much
  600 faster.)
  601 
  602 `vagrant ssh` will log you into the VM, and you can type `cd build &&
  603 make` to build Xapian. `make check` will run the tests.
  604 
  605 (As noted above, in maintainer mode most changes that require
  606 reconfiguration will happen automatically. If you need to do it by
  607 hand you can either run the configure command yourself, or you can run
  608 `vagrant provision`, which also checks for any system package
  609 updates.)
  610 
  611 The VM has a single 64 bit virtual processor, with 384M of memory; it
  612 takes about 8G of disk space once up and running.
  613 
  614 Use of C++ Features
  615 ===================
  616 
  617 * As of Xapian 1.3.3, a compiler with decent support for C++11 is required to
  618   build Xapian.  We currently aim to allow users to use a non-C++11 compiler
  619   to build code which uses Xapian.
  620 
  621   There are now several compilers with good C++11 support, but there are a
  622   few shortfalls in commonly deployed versions of most of them.  Often we can
  623   work around this, and we should do where the effort is low compared to the
  624   gain (so a compiler version which is widely used is more worth supporting
  625   than one which is hardly used by anyone).
  626 
  627   However, we shouldn't have to jump through hoops to cater for compilers where
  628   their authors aren't putting in the effort to keep up with the language
  629   standards.
  630 
  631   Please avoid the following C++11 features for the time being:
  632 
  633   * ``std::to_string()`` - this is completely missing on current versions of
  634     mingw and cygwin - in the library, you can ``#include "str.h"`` and then
  635     use the ``str()`` function instead for most cases.  This is also usually
  636     faster than ``std::to_string()``.
  637 
  638 * C++ features we currently assume:
  639 
  640   * We assume <sstream> is available.  GCC < 2.95.3 didn't have it but GCC
  641     2.95.3 includes a backported version.  We aren't aware of any other
  642     compilers still in use which lack it.
  643 
  644   * Non-".h" versions of standard ISO C++ headers (e.g. ``#include <list>``
  645     rather than ``#include <list.h>``).  We aren't aware of any compiler still
  646     in use which lacks these, and GCC 4.3 no longer has the old versions.  If
  647     there are any, we could add a directory full of forwarding headers to work
  648     around this issue.
  649 
  650   * Standard header ``<limits>`` (for ``numeric_limits<>``) - for GCC, this was
  651     added in GCC 3.0.
  652 
  653   * Standard header ``<streambuf>`` (GCC < 3.0 only has ``<streambuf.h>``).
  654 
  655 * RTTI (dynamic_cast<>, typeid, etc):  Needing to use RTTI features in the
  656   library most likely indicates a design flaw, and you should avoid use
  657   of these features.  Where necessary, you can use a technique similar to
  658   Database::as_networkdatabase() to replace dynamic_cast<>.
  659 
  660 * Exceptions: In hindsight, throwing exceptions in the library seems to have
  661   been a poor design decision.  GCC on Solaris can't cope with exceptions in
  662   shared libraries (though it appears this may have been fixed in more recent
  663   versions), and we've also had test failures on other platforms which only
  664   occur with shared libraries - possibly with a similar cause.  Exceptions can
  665   also be a pain to handle elegantly in the bindings.  We intend to investigate
  666   modifying the library to return error codes internally, and then offering the
  667   user the choice of exception throwing or error code returning API methods
  668   (with the exception being thrown by an inlined wrapper in the externally
  669   visible header files).  With this in mind, please don't complicate the
  670   internal handling of exceptions...
  671 
  672 * "using namespace std;" and "using std::XXX;" - it's OK to use these in
  673   applications, library code, and internal library headers.  But in externally
  674   visible headers (such as anything included by "#include <xapian.h>") you MUST
  675   use explicit "std::" qualifiers - it's not acceptable to pull anything from
  676   namespace std into the namespace of an application which uses Xapian.
  677 
  678 * Use C++ style casts (static_cast<>, reinterpret_cast<>, and const_cast<>)
  679   or constructor-syntax (e.g. ``double(value)``) in preference to C style
  680   casts.  The syntax of the C++ casts is ugly, but they do make the
  681   intent much clearer which is definitely a good thing, and they avoid issues
  682   such as casting away const when you only meant to cast the type of a pointer.
  683 
  684 * std::pair<> with an STL class as one (or both) of the members can produce
  685   very long symbols (over 4KB!) after name mangling - long enough to overflow
  686   the size limits of some vendor compilers or toolchains (so this can affect
  687   GCC if it is using the system ld or as).  Even where the compiler works, the
  688   symbol bloat in an unstripped build is probably best avoided, so it's
  689   preferable to use a simple two member struct instead.  The code is probably
  690   more readable anyway, and easier to extend if more members are needed later.
  691 
  692 * We try to avoid putting the full definition of virtual methods in header
  693   files.  This is because current compilers can't (as far as we know) inline
  694   virtual methods, so putting the definition in the header file simply slows
  695   down compilation (and, because method definitions often require further
  696   header files to be included, this can result in many more files needing
  697   recompilation after a change to a header file than is really necessary).
  698   Just put the declaration in the header file, and put the definition in a .cc
  699   file with the same basename.
  700 
  701 Include ordering for source files
  702 ---------------------------------
  703 
  704 To help us move towards a consistent ordering of #include lines in source
  705 files, please follow the following policy when ordering them:
  706 
  707 * #include <config.h> should be first, and use <> not "" (as recommended by the
  708   autoconf manual).  Always include config.h from C/C++ source files, but don't
  709   include it from header files - the autoconf manual recommends that it should
  710   be included first, so including it from headers is either redundant, or may
  711   hide a missing config.h include in the source file the header was included
  712   from (better to get an error in this case).
  713 
  714 * The header corresponding to the source file should be next. This means that
  715   compilation of the library ensures that each header with a corresponding
  716   source file is "self supporting" (i.e. it implicitly or explicitly includes
  717   all of the headers it requires).
  718 
  719 * External xapian-core headers, alphabetically. When included from other
  720   external headers, use <> to reduce problems with finding headers in the
  721   user's source tree by mistake. In sources and internal headers, use "" (?) -
  722   practically this makes no difference as we have -I for srcdir and builddir,
  723   but <> suggests installed header files so "" seems more natural).
  724 
  725 * Internal headers, alphabetically (using "").
  726 
  727 * "Safe" versions of library headers (include these first to avoid issues if
  728   other library headers include the ones we want to wrap). Use "" and order
  729   alphabetically.
  730 
  731 * Library headers, alphabetically.
  732 
  733 * Standard C++ headers, alphabetically. Use the modern (no .h suffix) names.
  734 
  735 C++ Portability Issues
  736 ======================
  737 
  738 Web Resources
  739 -------------
  740 
  741 The "C++ Super-FAQ" covers many frequently asked C++ questions:
  742 https://isocpp.org/faq
  743 
  744 Header Portability Issues
  745 -------------------------
  746 
  747 <fcntl.h>:
  748 ----------
  749 
  750 Don't directly '#include <fcntl.h>' - instead '#include "safefcntl.h"'.
  751 
  752 The main reason for this is that when using certain compilers on certain
  753 versions of Solaris, fcntl.h does '#define open open64'.  Sadly this breaks C++
  754 code which has methods called open (as we do).  There's a cunning workaround
  755 for this problem in common/safefcntl.h.
  756 
  757 Also, safefcntl.h ensures the O_BINARY is defined (to 0 if not required) so
  758 calls to open() and creat() can specify O_BINARY unconditionally for the
  759 benefit of platforms which discriminate between text and binary files.
  760 
  761 <windows.h>:
  762 ------------
  763 
  764 Don't directly '#include <windows.h>' - instead '#include "safewindows.h"'
  765 which reduces the bloat of header files included and prevents some of the
  766 more egregious namespace pollution.  It also defines any constants we need
  767 which might be missing in older versions of the mingw headers.
  768 
  769 <winsock2.h>:
  770 -------------
  771 
  772 Don't directly '#include <winsock2.h>' - instead '#include "safewinsock2.h"'.
  773 This ensure that safewindows.h is included before <winsock2.h> to avoid
  774 winsock2.h including windows.h without our namespace pollution reducing
  775 workarounds.
  776 
  777 <sys/select.h>:
  778 ---------------
  779 
  780 Don't directly '#include <sys/select.h>' - instead '#include "safesysselect.h"'
  781 which supports older UNIX platforms which predate POSIX 1003.1-2001 and works
  782 around a problem on Solaris.
  783 
  784 <sys/socket.h>:
  785 ---------------
  786 
  787 Don't directly '#include <sys/socket.h>' - instead '#include "safesyssocket.h"'
  788 which supports older UNIX platforms which predate POSIX 1003.1-2001 and works
  789 on Windows too.
  790 
  791 <sys/stat.h>:
  792 -------------
  793 
  794 Don't directly '#include <sys/stat.h>' - instead '#include "safesysstat.h"'
  795 which under MSVC enables stat to work on files > 2GB, defines the missing
  796 POSIX macros S_ISDIR and S_ISREG, pulls in <direct.h> for mkdir() (which is
  797 provided by sys/stat.h under UNIX) and provides a compatibility wrapper for
  798 mkdir() which takes 2 arguments (so code using mkdir can always just pass
  799 two arguments).
  800 
  801 <sys/wait.h>:
  802 -------------
  803 
  804 To get `WEXITSTATUS` or `WIFEXITED` defined, '#include "safesyswait.h"'.
  805 Note that this won't provide `waitpid()`, etc on Microsoft Windows, since
  806 these functions are only really useful to use when `fork()` is available.
  807 
  808 <unistd.h>:
  809 -----------
  810 
  811 Don't directly '#include <unistd.h>' - instead '#include "safeunistd.h"'
  812 - MSVC doesn't even HAVE unistd.h!
  813 
  814 The various "safe" headers are maintained in xapian-core/common, but also used
  815 by Omega.  Currently bootstrap sorts out setting up a copy of this subdirectory
  816 via a secondary git checkout.
  817 
  818 Warning-Free Compilation
  819 ------------------------
  820 
  821 Compiling without warnings on every platform is our goal, though it's not
  822 always possible to achieve.  For example, some GCC 3.x compilers produce the
  823 occasional bogus warning (e.g.  warning that a variable may be used
  824 uninitialised, despite it being initialised at the point of declaration!)
  825 
  826 You should consider configure-ing with:
  827 
  828 ./configure CXXFLAGS=-Werror
  829 
  830 when doing development work on Xapian.  This promotes warnings to errors,
  831 which should ensure you at least don't introduce new warnings for the compiler
  832 you're using.
  833 
  834 If you configure with --enable-maintainer-mode, and are using GCC 4.1 or newer,
  835 this is done for you automatically.  This is intended to be an aid rather than
  836 a form of automated punishment - it's all too easy to miss a new warning as
  837 once a file is compiled, you don't see it unless you modify that file or one of
  838 its dependencies.
  839 
  840 With Intel's C++ compiler, --enable-maintainer-mode also enables -Werror.
  841 If you know the equivalent of -Werror for other compilers, please add a note
  842 here, or tell us so that we can add a note.
  843 
  844 Miscellaneous Portability Issues
  845 --------------------------------
  846 
  847 Make sure that the last line of any source file ends with a linefeed character
  848 since it's undefined behaviour if it doesn't (most compilers accept it, though
  849 at least GCC gives a warning).
  850 
  851 Branch Prediction Hints
  852 =======================
  853 
  854 For compilers which support ``__builtin_expect()`` (GCC >= 3.0 and some others)
  855 you can provide manual hints to assist branch prediction.  We've wrapped these
  856 in macros which evaluate to just their argument for compilers which don't
  857 support ``__builtin_expect()__``.
  858 
  859 Within the xapian-core library code, you can mark the expressions in ``if`` and
  860 ``while`` statements as ``rare`` (if the condition is rarely true) or ``usual``
  861 (if the condition is usually true).
  862 
  863 For example::
  864 
  865     if (rare(something_unusual())) deal_with_it();
  866 
  867     while (usual(!end_condition()) keep_going();
  868 
  869 It's easy to make incorrect assumptions about where hotspots are and which
  870 branches are usually taken or not, so except for really obvious cases (such
  871 as ``if (!consistency_check()) throw_exception();``) you should benchmark
  872 that new ``rare`` and ``usual`` hints help rather than hinder before committing
  873 them to the repository.  It's also likely to be a waste of effort to add them
  874 outside of areas of code which are executed very frequently.
  875 
  876 Don't expect miracles - the first 15 uses added saved approximately 1%.
  877 
  878 If you know how to implement the ``rare`` and ``usual`` macros for other
  879 compilers, please let us know.
  880 
  881 Configure Options
  882 =================
  883 
  884 Especially for a library, compile-time options aren't a good solution for
  885 how to integrate a new feature.  An increasingly large number of users install
  886 pre-built binary packages rather than building from source, and unless the
  887 package is capable of being split into modules, the packager has to choose a
  888 set of compile-time options to use.  And they'll tend to choose either the
  889 standard ones, or perhaps a broader set to try to keep everyone happy.  For a
  890 library, similar issues occur when installing from source as well - the
  891 sysadmin must choose the options which will keep all users happy.
  892 
  893 Another problem with compile-time options is that it's hard to ensure that
  894 a change doesn't break compilation under some combination of options without
  895 actually building and running the test-suite on all combinations.  The fewer
  896 compile-time options, the more likely the code will compile with every
  897 combination of them.
  898 
  899 So please think carefully before adding more compile-time options.  They're
  900 probably OK for experimental features (but should go away once a feature is no
  901 longer experimental).  Options to instrument a build for special purposes
  902 (debug, profiling, etc) are also acceptable.  Disabling whole features probably
  903 isn't (e.g. the --disable-backend-XXX options we already have are dubious,
  904 though being able to disable the remote backend can be useful when trying to
  905 get Xapian going on a platform).
  906 
  907 Makefile Portability
  908 ====================
  909 
  910 We don't want to force those building Xapian from the source distribution to
  911 have to use GNU make.  Requiring GNU make for "make dist" isn't such a problem
  912 but it's probably better to use portable constructs everywhere to avoid
  913 problems when people move or copy code between targets.  If you do make use
  914 of non-portable constructs where it's OK, add a comment noting the special
  915 circumstances which justify doing so.
  916 
  917 Here's an incomplete list of things to avoid:
  918 
  919 * Don't use "$(RM)" - it's defined by GNU make, but using it actually harms
  920   portability as other makes don't define it.  Use plain "rm" instead.
  921 
  922 * Don't use "%" pattern rules - these are GNU make specific.  Use an
  923   implicit rule (e.g. ".c.o:") if you can.  Otherwise, write out each version
  924   explicitly.
  925 
  926 * Don't use "$<" except in implicit rules.  This is an annoying restriction,
  927   as using "$<" makes it much easier to make VPATH builds work.  But it's only
  928   portable in implicit rules.  Tips for rewriting - if it's a source file,
  929   write it as::
  930 
  931     $(srcdir)/foo.ext
  932 
  933   If it's a generated object file or similar, just write the name as is.  The
  934   tricky case is a generated file which isn't in git but is shipped in the
  935   distribution tarball, as such a file could be in either the source or build
  936   tree.  Use this trick to make sure it's found whichever directory it's in::
  937 
  938     `test -f foo.ext || echo '$(srcdir)/'`foo.ext
  939 
  940 * Don't use "exit 0" to make a rule fail.  Use "false" instead.  BSD make
  941   doesn't like "exit 0" in a rule.
  942 
  943 * Don't use make conditionals.  Automake offers conditionals which may be
  944   of use, and these are implemented to work with any make.  See the automake
  945   manual for details, and a few caveats.
  946 
  947 * The list of portable utilities is:
  948 
  949     cat cmp cp diff echo egrep expr false grep install-info
  950     ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true
  951 
  952   Note that versions of these (GNU versions in particular) support switches
  953   which aren't portable - notably, "test -r" isn't portable; neither is
  954   "cp -a".  And note that "mkdir -p" isn't portable - the semantics vary.
  955   The autoconf manual has some useful information about writing portable
  956   shell code (most of it not specific to autoconf)::
  957 
  958     https://www.gnu.org/software/autoconf/manual/autoconf.html#Portable-Shell
  959 
  960 * Don't use "include" - it's not present in BSD make (at least some versions
  961   have ".include" instead, but that doesn't really seem to help...)  Automake
  962   provides a configure-time include, which may provide a replacement for some
  963   uses of "include".
  964 
  965 * It appears that BSD make only supports VPATH for implicit rules (e.g.
  966   ".c.o:") - there's certainly a restriction there which is not present in GNU
  967   make.  We used to try to work around this, but now we use AM_MAINTAINER_MODE
  968   to disable rules which are only needed by those developing Xapian (these were
  969   the rules which caused problems).  And we recommend those developing Xapian
  970   use GNU make to avoid problems.
  971 
  972 * Rules with multiple targets can cause problems for parallel builds.  These
  973   rules are really just a shorthand for multiple rules with the same
  974   prerequisites and commands, and it is fine to use them in this way.  However,
  975   a common temptation is to use them when a single invocation of a command
  976   generates multiple output files, by adding each of the output files as a
  977   target.  Eg, if a swig language module generates xapian_wrap.cc and
  978   xapian_wrap.h, it is tempting to add a single rule something like::
  979 
  980     # This rule has a problem
  981     xapian_wrap.cc xapian_wrap.h: xapian.i
  982             SWIG_commands
  983 
  984   This can result in SWIG_commands being run twice, in parallel.  If
  985   SWIG_commands generates any temporary files, the two invocations can
  986   interfere causing one of them to fail.
  987 
  988   Instead of this rule, one solution is to pick one of the output files as a
  989   primary target, and add a dependency for the second output file on the first
  990   output file::
  991 
  992     # This rule also has a problem
  993     xapian_wrap.h: xapian_wrap.cc
  994     xapian_wrap.cc: xapian.i
  995             SWIG_commands
  996 
  997   This ensures that make knows that only one invocation of SWIG_commands is
  998   necessary, but could result in problems if the invocation of SWIG_commands
  999   failed after creating xapian_wrap.cc, but before creating xapian_wrap.h.
 1000   Instead, we recommend creating an intermediate target::
 1001 
 1002     # This rule works in most cases
 1003     xapian_wrap.cc xapian_wrap.h: xapian_wrap.stamp
 1004     xapian_wrap.stamp: xapian.i
 1005             SWIG_commands
 1006             touch $@
 1007 
 1008   Because the intermediate target is only touched after the commands have
 1009   executed successfully, subsequent builds will always retry the commands if an
 1010   error occurs.  Note that the intermediate target cannot be a "phony" target
 1011   because this would result in the commands being re-run for every build.
 1012 
 1013   However, this rule still has a problem - if the xapian_wrap.cc and
 1014   xapian_wrap.h files are removed, but the xapian_wrap.stamp file is not, the
 1015   .cc and .h files will not be regenerated.  There is no simple solution to
 1016   this, but the following is a recipe taken from the automake manual which
 1017   works.  For details of *why* it works, see the section in the automake manual
 1018   titled "Multiple Outputs"::
 1019 
 1020     # This rule works even if some of the output files were removed
 1021     xapian_wrap.cc xapian_wrap.h: xapian_wrap.stamp
 1022     ## Recover from the removal of $@.  A full explanation of these rules is in
 1023     ## the automake manual under the heading "Multiple Outputs".
 1024             @if test -f $@; then :; else \
 1025               trap 'rm -rf xapian_wrap.lock xapian_wrap.stamp' 1 2 13 15; \
 1026               if mkdir xapian_wrap.lock 2>/dev/null; then \
 1027                 rm -f xapian_wrap.stamp; \
 1028                 $(MAKE) $(AM_MAKEFLAGS) xapian_wrap.stamp; \
 1029                 rmdir xapian_wrap.lock; \
 1030               else \
 1031                 while test -d xapian_wrap.lock; do sleep 1; done; \
 1032                 test -f xapian_wrap.stamp; exit $$?; \
 1033               fi; \
 1034             fi
 1035     xapian_wrap.stamp: xapian.i
 1036             SWIG_commands
 1037             touch $@
 1038 
 1039 * This is actually a robustness point, not portability per se.  Rules which
 1040   generate files should be careful not to leave a partial file in place if
 1041   there's an error as it will have a timestamp which leads make to believe it's
 1042   up-to-date.  So this is bad:
 1043 
 1044   foo.cc: script.pl
 1045 	$PERL script.pl > foo.cc
 1046 
 1047   This is better:
 1048 
 1049   foo.cc: script.pl
 1050 	$PERL script.pl > foo.tmp
 1051 	mv foo.tmp foo.cc
 1052 
 1053   Alternatively, pass the output filename to the script and make sure you
 1054   delete the output on error or a signal (although this approach can leave
 1055   a partial file in place if the power fails).  All used Makefile.am-s and
 1056   scripts have been checked (and fixed if required) as of 2003-07-10 (didn't
 1057   check xapian-bindings).
 1058 
 1059 * Another robustness point - if you add a non-file target to a makefile, you
 1060   should also list it in ".PHONY".  Otherwise your target won't get remade
 1061   reliably if someone creates a file with the same name in their tree.  For
 1062   example:
 1063 
 1064   .PHONY: hello goodbye
 1065 
 1066   hello:
 1067         echo hello
 1068 
 1069   goodbye:
 1070         echo goodbye
 1071 
 1072 And lastly a style point - using "@" to suppress echoing of commands being
 1073 executed removes choice from the user - they may want to see what commands
 1074 are being executed.  And if they don't want to, many versions of make support
 1075 the use "make -s" to suppress the echoing of commands.
 1076 
 1077 Using @echo on a message sent to stdout or stderr is acceptable (since it
 1078 avoids showing the message twice).  Otherwise don't use "@" - it makes it
 1079 harder to track down problems in the makefiles.
 1080 
 1081 Naming of Scripts
 1082 =================
 1083 
 1084 Scripts generally should *not* have an extension indicating the language they
 1085 are currently implemented in (e.g. ``runtest`` rather than ``runtest.sh`` or
 1086 ``runtest.pl``).  The problem with such an extension is that if we decide
 1087 to reimplement the script in a different language, we either have to rename
 1088 the script (which is annoying as people will be used to the name, and may
 1089 have embedded it in their own scripts), or we have a script with a confusing
 1090 name (e.g. a Python script with extension ``.pl``).
 1091 
 1092 The above reasoning doesn't apply to scripts which have to be in a particular
 1093 language for some reason, though for consistency they probably shouldn't get
 1094 an extension either, unless there's a good reason to have one.
 1095 
 1096 Use of Assert
 1097 =============
 1098 
 1099 Use Assert to perform internal consistency checks, and to check for invalid
 1100 arguments to functions and methods (e.g. passing a NULL pointer when this isn't
 1101 permitted).  It should *NOT* be used to check for error conditions such as
 1102 file read errors, memory allocation failing, etc (since we want to perform such
 1103 checks in non-debug builds too).
 1104 
 1105 File format errors should also not be tested with Assert - we want to catch
 1106 a corrupted database or a malformed input file in a non-debug build too.
 1107 
 1108 There are several variants of Assert:
 1109 
 1110 - Assert(P) -- asserts that expression P is true.
 1111 
 1112 - AssertRel(a,rel,b) -- asserts that (a rel b) is true - rel can be a boolean
 1113   relational operator, i.e. one of ``==``, ``!=``, ``>``, ``>=``, ``<``,
 1114   ``<=``.  The message given if the assertion fails reports the values of
 1115   a and b, so ``AssertRel(a,<,b);`` is more helpful than ``Assert(a < b);``
 1116 
 1117 - AssertEq(a,b) -- shorthand for AssertRel(a,==,b).
 1118 
 1119 - AssertEqDouble(a,b) -- asserts a and b differ by less than DBL_EPSILON
 1120 
 1121 - AssertParanoid(P) -- a particularly expensive assertion.  If you want a build
 1122   with Asserts enabled, but without a great performance overhead, then
 1123   passing --enable-assertions=partial to configure and AssertParanoids
 1124   won't be checked, but Asserts will.  You can also use AssertRelParanoid
 1125   and AssertEqParanoid.
 1126 
 1127 - CompileTimeAssert(P) -- this has now been removed, since we require C++11
 1128   support from the compiler, and C++11 added ``static_assert``.
 1129 
 1130 Marking Features as Deprecated
 1131 ==============================
 1132 
 1133 In the API headers, a feature (a class, method, function, enum, typedef, etc)
 1134 can be marked as deprecated by using the XAPIAN_DEPRECATED() or
 1135 XAPIAN_DEPRECATED_CLASS macros.  Note that you can't deprecate a preprocessor
 1136 macro.
 1137 
 1138 For compilers with a suitable mechanism (such as GCC, clang and MSVC) this
 1139 causes compile-time warning messages to be emitted for any use of the
 1140 deprecated feature.  For compilers without support, the macro just expands to
 1141 its argument.
 1142 
 1143 Sometimes a deprecated feature will also be removed from the library itself
 1144 (particularly something like a typedef), but if the feature is still used
 1145 inside the library (for example, so we can define class methods), then use
 1146 XAPIAN_DEPRECATED_EX() or XAPIAN_DEPRECATED_CLASS_EX instead, which will only
 1147 issue a warning in user code (this relies on user code including xapian.h
 1148 and library code including individual headers)
 1149 
 1150 You must add this line to any API header which uses XAPIAN_DEPRECATED() or
 1151 XAPIAN_DEPRECATED_CLASS::
 1152 
 1153     #include <xapian/deprecated.h>
 1154 
 1155 When marking a feature as deprecated, document the deprecation in
 1156 docs/deprecation.rst.  When actually removing deprecated features, please tidy
 1157 up by removing the inclusion of <xapian/deprecated.h> from any file which no
 1158 longer marks any features as deprecated.
 1159 
 1160 The XAPIAN_DEPRECATED() macro should wrap the whole declaration except for the
 1161 semicolon and any "definition" part, for example::
 1162 
 1163     XAPIAN_DEPRECATED(int old_function(double arg));
 1164 
 1165     class Foo {
 1166       public:
 1167         XAPIAN_DEPRECATED(int old_method());
 1168 
 1169         XAPIAN_DEPRECATED(int old_const_method() const);
 1170 
 1171         XAPIAN_DEPRECATED(virtual int old_virt_method()) = 0;
 1172 
 1173         XAPIAN_DEPRECATED(static int old_static_method());
 1174 
 1175         XAPIAN_DEPRECATED(static const int OLD_CONSTANT) = 42;
 1176     };
 1177 
 1178 Mark a class as deprecated by inserting ``XAPIAN_DEPRECATED_CLASS`` after the
 1179 class keyword like so::
 1180 
 1181     class XAPIAN_DEPRECATED_CLASS Foo {
 1182       public:
 1183         Foo() { }
 1184 
 1185         // ...
 1186     };
 1187 
 1188 With recent versions of GCC (4.4.7 allows this, 3.3.5 doesn't), you can
 1189 simply mark a method defined inline in a class with ``XAPIAN_DEPRECATED()``
 1190 like so:
 1191 
 1192     class Foo {
 1193       public:
 1194         // This failed to compile with GCC 3.3.5.
 1195         XAPIAN_DEPRECATED(int old_inline_method()) { return 42; }
 1196     };
 1197 
 1198 Xapian 1.3.x and later require at least GCC 4.7, so you can now just use the
 1199 approach above.
 1200 
 1201 Submitting Patches:
 1202 ===================
 1203 
 1204 If you have a patch to fix a problem in Xapian, or to add a new feature,
 1205 please send it to us for inclusion.  Any major changes should be discussed
 1206 on the xapian-devel mailing list first:
 1207 <https://xapian.org/lists>
 1208 
 1209 Also, please read the following section on licensing of patches before
 1210 submitting a patch.
 1211 
 1212 We find patches in unified diff format easiest to read.  If you're using
 1213 git, then "git diff" is good (or "git format-patch" for a patch series).  If
 1214 you're working from a tarball, you can unpack a second clean copy of the files
 1215 and compare the two versions with "diff -pruN" (-p reports the function name
 1216 for each chunk, -r acts recursively, -u does a unified diff, and -N shows
 1217 new files in the diff).  Alternatively "ptardiff" (which comes with perl, at
 1218 least on Debian and Ubuntu) can diff against the original tarball, unpacking
 1219 it on the fly.
 1220 
 1221 Please set the width of a tab character in your editor to 8 spaces, and use
 1222 Unix line endings (i.e. LF, not CR+LF).  Failing to do so will make it much
 1223 harder for us to merge in your changes.
 1224 
 1225 We don't currently have a formal coding standards document, but please try
 1226 to follow the style of the existing code.  In particular:
 1227 
 1228 * Indent C++ code by 4 spaces for a new indentation level, and set your editor
 1229   to tab-fill indentation (with a tab being 8 spaces wide).
 1230 
 1231   As an exception, "public", "protected" and "private" declarations in classes
 1232   and structs should be indented by 2 spaces, and the following code should be
 1233   indented by 2 more spaces::
 1234 
 1235     class Foo {
 1236       public:
 1237         method();
 1238     };
 1239 
 1240   The rationale for this exception is that class definitions in header files
 1241   often have fairly long lines, so losing an indent level to the access
 1242   specifier tends to make class definitions less readable.
 1243 
 1244   The default access for a class is always "private", so there's no need
 1245   to specify that explicitly - in other words, write this::
 1246 
 1247     class Foo {
 1248         int internal_method();
 1249 
 1250       public:
 1251         int external_method();
 1252     };
 1253 
 1254   Don't write this::
 1255 
 1256     class Foo {
 1257       private:
 1258         int internal_method();
 1259 
 1260       public:
 1261         int external_method();
 1262     };
 1263 
 1264   If a class only contains public methods and data, consider declaring it as a
 1265   "struct" (the only difference in C++ is that the default access for a
 1266   struct is "public").
 1267 
 1268 * Put a space before the "(" after control flow constructs like "for", "if",
 1269   "while", etc.  Don't put a space before the "(" in function calls.  So
 1270   write "if (strlen(p) > 10)" not "if(strlen (p) > 10)".
 1271 
 1272 * When "if", "else", "for", "while", "do," "switch", "case", "default", "try",
 1273   or "catch" is followed by a block enclosed in braces, the opening brace
 1274   should be on the same line, like so::
 1275 
 1276     if (x > 12) {
 1277         foo(x);
 1278         x = 12;
 1279     } else {
 1280         bar(x);
 1281     }
 1282 
 1283   The rationale for this is that it conserves vertical space (allowing more
 1284   code to fit on screen) without reducing readability.
 1285 
 1286 * If you have an empty loop body, use `{ }` rather than `;` as the former
 1287   stands out more clearly to the reader (but also consider if the code might be
 1288   clearer written a different way).
 1289 
 1290 * Prefer "++i;" to "i++;", "i += 1;", or "i = i + 1".  For simple integer
 1291   variables these should generate equivalent (if not identical) code, but if i
 1292   is an iterator object then the pre-increment form can be more efficient in
 1293   some cases with some compilers.  It's simpler and more consistent to always
 1294   use the pre-increment form (unless you make use of the old value which the
 1295   post-increment form returns).  For the same reasons, prefer "--i;" to "i--;",
 1296   "i -= 1;", or "i = i - 1;".
 1297 
 1298 * Prefer "container.empty()" to "container.size() == 0" (and
 1299   "!container.empty()" to "container.size() != 0" or "container.size() > 0").
 1300   Some containers (e.g. std::forward_list) support "empty()" but not "size()".
 1301   Pre-C++11 finding the size of a container wasn't necessarily a constant time
 1302   operation for some containers (e.g. std::list with GCC) - that's no longer
 1303   the case for any STL containers since C++11, but it could still be true for
 1304   non-STL containers.  Also the "empty()" form is a little more concise and
 1305   makes the intent of the test more explicit.
 1306 
 1307 * Prefer not to use "else" when the control flow is diverted elsewhere at the
 1308   end of the "if" block (e.g. by "return", "continue", "break", "throw").  This
 1309   eliminates a level of indentation from the code in the "else" block, and
 1310   typically makes the control flow logic clearer.  For example::
 1311 
 1312     if (x == 0) {
 1313         foo();
 1314         return;
 1315     }
 1316 
 1317     while (x--) {
 1318         bar();
 1319     }
 1320 
 1321   rather than::
 1322 
 1323     if (x == 0) {
 1324         foo();
 1325         return;
 1326     } else {
 1327         while (x--) {
 1328             bar();
 1329         }
 1330     }
 1331 
 1332 * For standard ISO C headers, prefer the C++ form for ISO C headers (e.g.
 1333   "#include <cstdlib>" rather than "#include <stdlib.h>") unless there's a good
 1334   reason (e.g. portability) to do otherwise.  Be sure to document such
 1335   exceptions to avoid another developer changing them to the standard form.
 1336   Global exceptions: <signal.h> (lots of POSIX stuff which e.g. Sun's compiler
 1337   doesn't provide in <csignal>).
 1338 
 1339 * For standard ISO C++ headers, *always* use the ISO C++ form '#include <list>'
 1340   (pre-ISO compilers used '#include <list.h>', but GCC has generated a warning
 1341   for this form for years, and GCC 4.3 dropped support entirely).
 1342 
 1343 * Some guidelines for efficient use of std::string:
 1344 
 1345   + When passing an empty string to a method expecting ``const std::string &``
 1346     prefer ``std::string()`` to ``""`` or ``std::string("")`` as the first form
 1347     is more likely to directly use a special "empty string representation" (it
 1348     does with GCC at least).
 1349 
 1350   + To make a string object empty, ``s.resize(0)`` (if you want to keep the
 1351     current reserved space) or ``s = string()`` (if you don't) seem the best
 1352     options.
 1353 
 1354   + Use ``std::string::assign()`` rather than building a temporary string
 1355     object and assigning that.  For example, ``foo = std::string(ptr, len);``
 1356     is better written as ``foo.assign(ptr, len);``.
 1357 
 1358   + It's generally better to build up strings using ``+=`` rather than
 1359     combining series of components with ``+``.  So ``foo = a + " and " + c`` is
 1360     better written as ``foo = a; foo += " and "; foo += c;``.  It's possible
 1361     for compilers to handle the former without a lot of temporary string
 1362     objects by returning a proxy object to allow the concatenation to happen
 1363     lazily, but not all compilers do this, and it's likely to still have some
 1364     overhead.  Note that GCC 4.1 seems to produce larger code in some cases for
 1365     the latter approach, but it's a definite win with GCC 4.4.
 1366 
 1367   * ``std::string(1, '\0')`` seems to be slightly more efficient than
 1368     ``std::string("", 1)`` for constructing a std::string containing a single
 1369     ASCII nul character.
 1370 
 1371 * Prefer ``new SomeClass`` to ``new SomeClass()``, since the latter tends to
 1372   lead one to write ``SomeClass foo();` which is a function prototype, and not
 1373   equivalent to the variable definition ``SomeClass foo``.  However, note that
 1374   ``new SomePODType()`` is *not* the same as ``new SomePODType`` (if
 1375   SomePODType is a POD (Plain Old Data) type) - the former will zero-initialise
 1376   scalar members of SomePODType.
 1377 
 1378 * When catching an exception which is an object, do it by const reference, so
 1379   like this::
 1380 
 1381       try {
 1382 	  foo();
 1383       } catch (const ErrorClass &e) {
 1384 	  bar(e);
 1385       }
 1386 
 1387   Catching by value is bad because it "slices" the object if an object of a
 1388   derived type is thrown.  Even if derived types aren't a worry, it also causes
 1389   the copy constructor to be called needlessly.
 1390 
 1391   See also: https://isocpp.org/wiki/faq/exceptions#what-to-catch
 1392 
 1393   A const reference is preferable to a non-const reference as it stops the
 1394   object being inadvertently modified.  In the rare cases when you want to
 1395   modify the caught object, a non-const reference is OK.
 1396 
 1397 We will do our best to give credit where credit is due - if we have used
 1398 patches from you, or received helpful reports or advice, we will add your name
 1399 to the AUTHORS file (unless you specifically request us not to).  If you see we
 1400 have forgotten to do this, please draw it to our attention so that we can
 1401 address the omission.
 1402 
 1403 Licensing of patches
 1404 ====================
 1405 
 1406 If you want a patch to be considered for inclusion in the Xapian sources, you
 1407 must own the copyright on this patch.  Employers often claim copyright on code
 1408 written by their employees (even if the code is written in their spare time),
 1409 so please check with your employer if this applies.  Be aware that even if you
 1410 are a student your university may try and claim some rights on code which you
 1411 write.
 1412 
 1413 Patches which are submitted to Xapian will only be included if the copyright
 1414 holder(s) dual-license them under each of the following licences:
 1415 
 1416  - GPL version 2 and all later versions (see the file "COPYING" for details).
 1417  - MIT/X license::
 1418 
 1419  Copyright (c) <year> <copyright holders>
 1420 
 1421  Permission is hereby granted, free of charge, to any person obtaining a copy
 1422  of this software and associated documentation files (the "Software"), to
 1423  deal in the Software without restriction, including without limitation the
 1424  rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 1425  sell copies of the Software, and to permit persons to whom the Software is
 1426  furnished to do so, subject to the following conditions:
 1427 
 1428  The above copyright notice and this permission notice shall be included in
 1429  all copies or substantial portions of the Software.
 1430 
 1431  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 1432  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 1433  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 1434  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 1435  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 1436  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 1437  IN THE SOFTWARE.
 1438 
 1439 The current distribution of Xapian contains many files which are only licensed
 1440 under the GPL, but we are working towards being able to distribute Xapian under
 1441 a more permissive license, and are not willing to accept patches which we will
 1442 have to rewrite before this can happen.
 1443 
 1444 Tips for Submitting a Good Patch
 1445 ================================
 1446 
 1447 1) Make sure that the documentation is updated
 1448 ----------------------------------------------
 1449 
 1450  * API classes, methods, functions, and types must be documented by
 1451    documentation comments alongside the declaration in ``include/xapian/*.h``.
 1452    These are collated by doxygen - see doxygen's documentation for details
 1453    of the supported syntax.  We've decided to prefer to use @ rather than \
 1454    to introduce doxygen commands (the choice is essentially arbitrary, but
 1455    \ introduces C/C++ escape sequences so @ is likely to make for easier to
 1456    read mark up for C/C++ coders).
 1457 
 1458  * The documentation comments don't give users a good overview, so we also
 1459    need documentation which gives a good overview of how to achieve particular
 1460    tasks.  In particularly, major new functionality should have its own "topic"
 1461    document, or extend an existing topic document if more appropriate.
 1462 
 1463  * Internal classes, etc should also be documented by documentation comments
 1464    where they are declared.
 1465 
 1466 2) Make sure the tests are right
 1467 --------------------------------
 1468 
 1469  * If you're adding a feature, also add feature tests for it.  These both
 1470    ensure that the feature isn't broken to start with and detect if later
 1471    changes stop it working as intended.
 1472 
 1473  * If you've fixed a bug, make sure there's a regression test which
 1474    fails on the existing code and succeeds after your changes.
 1475 
 1476  * If you're adding a new testcase to demonstrate an existing bug, and not
 1477    checking a fix in at the same time, mark the testcase as a known failure (by
 1478    calling ``XFAIL("explanatory message")`` at the start of your testcase (if
 1479    necessary this can be conditional on backend or other factors - the backend
 1480    case has explicit support via ``XFAIL_FOR_BACKEND("backend", "message")``).
 1481 
 1482    This will mean that this testcase failing will be reported as "XFAIL" which
 1483    won't cause the test run to fail.  If such a testcase in fact passes, that
 1484    gets reported as "XPASS" and *will* cause the test run to fail.  A testcase
 1485    should not be flagged as "XFAIL" for a long time, but it can be useful to be
 1486    able to add such testcases during development.  It also allows a patch
 1487    series which fixes a bug to first demonstrate the bug via a new testcase
 1488    marked as "XFAIL", then fix the bug and remove the "XFAIL" - this makes it
 1489    clear that the regression test actually failed before the fix.
 1490 
 1491    Note that failures which are due to valgrind errors or leaked fds are not
 1492    affected by this macro - such errors are inherently not suitable for "XFAIL"
 1493    as they go away when the testsuite is run without valgrind or on a platform
 1494    where our fd leak detector code isn't supported.
 1495 
 1496  * Make sure all existing tests continue to pass.
 1497 
 1498 If you don't know how to write tests using the Xapian test rig, then
 1499 ask.  It's reasonably simple once you've done it once.  There is a brief
 1500 introduction to the Xapian test system in ``docs/tests.html``.
 1501 
 1502 3) Make sure the attributions are right
 1503 ---------------------------------------
 1504 
 1505  * If necessary, modify the copyright statement at the top of any
 1506    files you've altered. If there is no copyright statement, you may
 1507    add one (there are a couple of Makefile.am's and similar that don't
 1508    have copyright statements; anything that small doesn't really need
 1509    one anyway, so it's a judgement call).  If you've added files which
 1510    you've written from scratch, they should include the GPL boilerplate
 1511    with your name only.
 1512 
 1513  * If you're not in there, add yourself to the AUTHORS file.
 1514 
 1515 4) Commit
 1516 ---------
 1517 
 1518  * Commit:
 1519 
 1520    + If there's a trac ticket or other reference for the bug, mention it in the
 1521      commit message - it's a great help to future developers trying to work out
 1522      why a change was made.
 1523 
 1524 5) Consider backporting
 1525 -----------------------
 1526 
 1527  * If there's an active release branch, check if the bug is present in that
 1528    branch, and if the fix is appropriate to backport - if the fix breaks ABI
 1529    compatibility or is very invasive, you need to fix it in a different way
 1530    for the release branch, or decide not to backport the fix.
 1531 
 1532 6) Update trac
 1533 --------------
 1534 
 1535  * If there's a related trac ticket, update it (if the issue is completely
 1536    addressed by the changes you've made, then close it).
 1537 
 1538  * Update the release notes for the most recent release with a copy of the
 1539    patch.  If the commit from git applies cleanly, you can just link to
 1540    it.  If it fails to apply, please attach an adjusted patch which does.
 1541    If there are conflicts in test cases which aren't easy to resolve, it is
 1542    acceptable to just drop those changes from the patch if we can still be
 1543    confident that the issue is actually fixed by the patch.
 1544 
 1545 API Structure Notes
 1546 ===================
 1547 
 1548 We use reference counted pointers for most API classes.  These are implemented
 1549 using Xapian::Internal::intrusive_ptr, the implementation of which is exposed
 1550 for efficiency, and because it's unlikely we'll need to change it frequently,
 1551 if at all.
 1552 
 1553 For the reference counted classes, the API class (e.g. Xapian::Enquire) is
 1554 really just a wrapper around a reference counted pointer.  This points to an
 1555 internal class (e.g. Xapian::Enquire::Internal).  The reference counted
 1556 pointer is a member variable of the API class called internal.  Conceptually
 1557 this member is private, though it typically isn't declared as private (this
 1558 is to avoid littering the external headers with friend declarations for
 1559 non-API classes).
 1560 
 1561 There are a few exceptions to the reference counted structure, such as
 1562 MSetIterator and ESetIterator which have an exposed implementation.  Tests show
 1563 this makes a substantial difference to speed (it's ~20% faster) in typical
 1564 cases of iterator use.
 1565 
 1566 The postfix operator++ for iterators should be implemented inline in terms
 1567 of the prefix form as described by Joe Buck on the gcc mailing list
 1568 - excerpt from https://article.gmane.org/gmane.comp.gcc.devel/50201 ::
 1569 
 1570 	class some_iterator {
 1571 	public:
 1572 	    // ...
 1573 	    some_iterator& operator++();
 1574 
 1575 	    some_iterator operator++(int) {
 1576 		some_iterator tmp = *this;
 1577 		operator++();
 1578 		return tmp;
 1579 	    }
 1580 	};
 1581 
 1582     The compiler is allowed to assume that the copy constructor only does
 1583     a copy, and to optimize away unneeded copy operations.  The result
 1584     in this case should be that, for some_iterator above, using the
 1585     postfix operator without using the result should give code equivalent
 1586     to using the prefix operator.
 1587 
 1588     Now, for [GCC 3.4], you'll find that the dead uses of tmp are only
 1589     completely optimized away if tmp has only one data member that can fit in a
 1590     register.  [GCC 4.0 will do] better, and you should find that this style
 1591     comes very close to eliminating any penalty from "incorrect" use of the
 1592     postfix form.
 1593 
 1594 Xapian's PostingIterator, TermIterator, PositionIterator, and ValueIterator all
 1595 have only one data member which fits in a register.
 1596 
 1597 Handy tips for aiding development
 1598 =================================
 1599 
 1600 If you are find you are repeatedly changing the API headers (in include/)
 1601 during development, then you may become annoyed that the docs/ subdirectory
 1602 will rebuild the doxygen documentation every time you run "make" since this
 1603 takes a while.  You can disable this temporarily (if you're using GNU make),
 1604 by creating a file "docs/GNUmakefile" containing these two lines::
 1605 
 1606 %:
 1607 	@echo "Skipping 'make $@' in docs"
 1608 
 1609 Note that the whitespace at the start of the second line needs to be a
 1610 single "tab" character!
 1611 
 1612 Don't forget to remove (or rename) this and check the documentation builds
 1613 before committing or generating a patch though!
 1614 
 1615 If you are using an editor or other tool capable of running syntax checks as you
 1616 work there you can use the `make` target 'check-syntax'. For 'emacs' users this
 1617 works well with 'flymake'. Usage from a shell::
 1618 
 1619     make check-syntax check_sources=api/omdatabase.cc
 1620 
 1621 
 1622 How to make a release
 1623 =====================
 1624 
 1625 See https://github.com/xapian/xapian-developer-guide/blob/master/releases/index.rst
 1626 where the documentation for this now lives.
 1627 
 1628 .. vim: syntax=rst