"Fossies" - the Fresh Open Source Software Archive

Member "xdelta3-3.0.11/xdelta3.c" (27 Dec 2015, 137058 Bytes) of package /linux/misc/xdelta3-3.0.11.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file. For more information about "xdelta3.c" see the Fossies "Dox" file reference documentation.

    1 /* xdelta 3 - delta compression tools and library
    2  * Copyright (C) 2001, 2003, 2004, 2005, 2006, 2007,
    3  * 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015. Joshua P. MacDonald
    4  *
    5  *  This program is free software; you can redistribute it and/or modify
    6  *  it under the terms of the GNU General Public License as published by
    7  *  the Free Software Foundation; either version 2 of the License, or
    8  *  (at your option) any later version.
    9  *
   10  *  This program is distributed in the hope that it will be useful,
   11  *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   12  *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   13  *  GNU General Public License for more details.
   14  *
   15  *  You should have received a copy of the GNU General Public License
   16  *  along with this program; if not, write to the Free Software
   17  *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
   18 
   19    -------------------------------------------------------------------
   20 
   21                    Xdelta 3
   22 
   23    The goal of this library is to to implement both the (stand-alone)
   24    data-compression and delta-compression aspects of VCDIFF encoding, and
   25    to support a programming interface that works like Zlib
   26    (http://www.gzip.org/zlib.html). See RFC3284: The VCDIFF Generic
   27    Differencing and Compression Data Format.
   28 
   29    VCDIFF is a unified encoding that combines data-compression and
   30    delta-encoding ("differencing").
   31 
   32    VCDIFF has a detailed byte-code instruction set with many features.
   33    The instruction format supports an immediate size operand for small
   34    COPYs and ADDs (e.g., under 18 bytes).  There are also instruction
   35    "modes", which are used to compress COPY addresses by using two
   36    address caches.  An instruction mode refers to slots in the NEAR
   37    and SAME caches for recent addresses.  NEAR remembers the
   38    previous 4 (by default) COPY addresses, and SAME catches
   39    frequent re-uses of the same address using a 3-way (by default)
   40    256-entry associative cache of [ADDR mod 256], the encoded byte.
   41    A hit in the NEAR/SAME cache requires 0/1 ADDR bytes.
   42 
   43    VCDIFF has a default instruction table, but an alternate
   44    instruction tables may themselves be be delta-compressed and
   45    included in the encoding header.  This allows even more freedom.
   46    There are 9 instruction modes in the default code table, 4 near, 3
   47    same, VCD_SELF (absolute encoding) and VCD_HERE (relative to the
   48    current position).
   49 
   50    ----------------------------------------------------------------------
   51 
   52                   Algorithms
   53 
   54    Aside from the details of encoding and decoding, there are a bunch
   55    of algorithms needed.
   56 
   57    1. STRING-MATCH.  A two-level fingerprinting approach is used.  A
   58    single loop computes the two checksums -- small and large -- at
   59    successive offsets in the TARGET file.  The large checksum is more
   60    accurate and is used to discover SOURCE matches, which are
   61    potentially very long.  The small checksum is used to discover
   62    copies within the TARGET.  Small matching, which is more expensive,
   63    usually dominates the large STRING-MATCH costs in this code - the
   64    more exhaustive the search, the better the results.  Either of the
   65    two string-matching mechanisms may be disabled.
   66 
   67    2. INSTRUCTION SELECTION.  The IOPT buffer here represents a queue
   68    used to store overlapping copy instructions.  There are two possible
   69    optimizations that go beyond a greedy search.  Both of these fall
   70    into the category of "non-greedy matching" optimizations.
   71 
   72    The first optimization stems from backward SOURCE-COPY matching.
   73    When a new SOURCE-COPY instruction covers a previous instruction in
   74    the target completely, it is erased from the queue.  Randal Burns
   75    originally analyzed these algorithms and did a lot of related work
   76    (\cite the 1.5-pass algorithm).
   77 
   78    The second optimization comes by the encoding of common very-small
   79    COPY and ADD instructions, for which there are special DOUBLE-code
   80    instructions, which code two instructions in a single byte.
   81 
   82    The cost of bad instruction-selection overhead is relatively high
   83    for data-compression, relative to delta-compression, so this second
   84    optimization is fairly important.  With "lazy" matching (the name
   85    used in Zlib for a similar optimization), the string-match
   86    algorithm searches after a match for potential overlapping copy
   87    instructions.  In Xdelta and by default, VCDIFF, the minimum match
   88    size is 4 bytes, whereas Zlib searches with a 3-byte minimum.  This
   89    feature, combined with double instructions, provides a nice
   90    challenge.  Search in this file for "black magic", a heuristic.
   91 
   92    3. STREAM ALIGNMENT.  Stream alignment is needed to compress large
   93    inputs in constant space.  See xd3_srcwin_move_point().
   94 
   95    4. WINDOW SELECTION.  When the IOPT buffer flushes, in the first call
   96    to xd3_iopt_finish_encoding containing any kind of copy instruction,
   97    the parameters of the source window must be decided: the offset into
   98    the source and the length of the window.  Since the IOPT buffer is
   99    finite, the program may be forced to fix these values before knowing
  100    the best offset/length.
  101 
  102    5. SECONDARY COMPRESSION.  VCDIFF supports a secondary encoding to
  103    be applied to the individual sections of the data format, which are
  104    ADDRess, INSTruction, and DATA.  Several secondary compressor
  105    variations are implemented here, although none is standardized yet.
  106 
  107    One is an adaptive huffman algorithm -- the FGK algorithm (Faller,
  108    Gallager, and Knuth, 1985).  This compressor is extremely slow.
  109 
  110    The other is a simple static Huffman routine, which is the base
  111    case of a semi-adaptive scheme published by D.J. Wheeler and first
  112    widely used in bzip2 (by Julian Seward).  This is a very
  113    interesting algorithm, originally published in nearly cryptic form
  114    by D.J. Wheeler. !!!NOTE!!! Because these are not standardized,
  115    secondary compression remains off by default.
  116    ftp://ftp.cl.cam.ac.uk/users/djw3/bred3.{c,ps}
  117    --------------------------------------------------------------------
  118 
  119                 Other Features
  120 
  121    1. USER CONVENIENCE
  122 
  123    For user convenience, it is essential to recognize Gzip-compressed
  124    files and automatically Gzip-decompress them prior to
  125    delta-compression (or else no delta-compression will be achieved
  126    unless the user manually decompresses the inputs).  The compressed
  127    represention competes with Xdelta, and this must be hidden from the
  128    command-line user interface.  The Xdelta-1.x encoding was simple, not
  129    compressed itself, so Xdelta-1.x uses Zlib internally to compress the
  130    representation.
  131 
  132    This implementation supports external compression, which implements
  133    the necessary fork() and pipe() mechanics.  There is a tricky step
  134    involved to support automatic detection of a compressed input in a
  135    non-seekable input.  First you read a bit of the input to detect
  136    magic headers.  When a compressed format is recognized, exec() the
  137    external compression program and create a second child process to
  138    copy the original input stream. [Footnote: There is a difficulty
  139    related to using Gzip externally. It is not possible to decompress
  140    and recompress a Gzip file transparently.  If FILE.GZ had a
  141    cryptographic signature, then, after: (1) Gzip-decompression, (2)
  142    Xdelta-encoding, (3) Gzip-compression the signature could be
  143    broken.  The only way to solve this problem is to guess at Gzip's
  144    compression level or control it by other means.  I recommend that
  145    specific implementations of any compression scheme store
  146    information needed to exactly re-compress the input, that way
  147    external compression is transparent - however, this won't happen
  148    here until it has stabilized.]
  149 
  150    2. APPLICATION-HEADER
  151 
  152    This feature was introduced in RFC3284.  It allows any application
  153    to include a header within the VCDIFF file format.  This allows
  154    general inter-application data exchange with support for
  155    application-specific extensions to communicate metadata.
  156 
  157    3. VCDIFF CHECKSUM
  158 
  159    An optional checksum value is included with each window, which can
  160    be used to validate the final result.  This verifies the correct source
  161    file was used for decompression as well as the obvious advantage:
  162    checking the implementation (and underlying) correctness.
  163 
  164    4. LIGHT WEIGHT
  165 
  166    The code makes efforts to avoid copying data more than necessary.
  167    The code delays many initialization tasks until the first use, it
  168    optimizes for identical (perfectly matching) inputs.  It does not
  169    compute any checksums until the first lookup misses.  Memory usage
  170    is reduced.  String-matching is templatized (by slightly gross use
  171    of CPP) to hard-code alternative compile-time defaults.  The code
  172    has few outside dependencies.
  173    ----------------------------------------------------------------------
  174 
  175         The default rfc3284 instruction table:
  176             (see RFC for the explanation)
  177 
  178            TYPE      SIZE     MODE    TYPE     SIZE     MODE     INDEX
  179    --------------------------------------------------------------------
  180        1.  Run         0        0     Noop       0        0        0
  181        2.  Add    0, [1,17]     0     Noop       0        0      [1,18]
  182        3.  Copy   0, [4,18]     0     Noop       0        0     [19,34]
  183        4.  Copy   0, [4,18]     1     Noop       0        0     [35,50]
  184        5.  Copy   0, [4,18]     2     Noop       0        0     [51,66]
  185        6.  Copy   0, [4,18]     3     Noop       0        0     [67,82]
  186        7.  Copy   0, [4,18]     4     Noop       0        0     [83,98]
  187        8.  Copy   0, [4,18]     5     Noop       0        0     [99,114]
  188        9.  Copy   0, [4,18]     6     Noop       0        0    [115,130]
  189       10.  Copy   0, [4,18]     7     Noop       0        0    [131,146]
  190       11.  Copy   0, [4,18]     8     Noop       0        0    [147,162]
  191       12.  Add       [1,4]      0     Copy     [4,6]      0    [163,174]
  192       13.  Add       [1,4]      0     Copy     [4,6]      1    [175,186]
  193       14.  Add       [1,4]      0     Copy     [4,6]      2    [187,198]
  194       15.  Add       [1,4]      0     Copy     [4,6]      3    [199,210]
  195       16.  Add       [1,4]      0     Copy     [4,6]      4    [211,222]
  196       17.  Add       [1,4]      0     Copy     [4,6]      5    [223,234]
  197       18.  Add       [1,4]      0     Copy       4        6    [235,238]
  198       19.  Add       [1,4]      0     Copy       4        7    [239,242]
  199       20.  Add       [1,4]      0     Copy       4        8    [243,246]
  200       21.  Copy        4      [0,8]   Add        1        0    [247,255]
  201    --------------------------------------------------------------------
  202 
  203              Reading the source: Overview
  204 
  205    This file includes itself in several passes to macro-expand certain
  206    sections with variable forms.  Just read ahead, there's only a
  207    little confusion.  I know this sounds ugly, but hard-coding some of
  208    the string-matching parameters results in a 10-15% increase in
  209    string-match performance.  The only time this hurts is when you have
  210    unbalanced #if/endifs.
  211 
  212    A single compilation unit tames the Makefile.  In short, this is to
  213    allow the above-described hack without an explodingMakefile.  The
  214    single compilation unit includes the core library features,
  215    configurable string-match templates, optional main() command-line
  216    tool, misc optional features, and a regression test.  Features are
  217    controled with CPP #defines, see Makefile.am.
  218 
  219    The initial __XDELTA3_C_HEADER_PASS__ starts first, the _INLINE_ and
  220    _TEMPLATE_ sections follow.  Easy stuff first, hard stuff last.
  221 
  222    Optional features include:
  223 
  224      xdelta3-main.h     The command-line interface, external compression
  225                         support, POSIX-specific, info & VCDIFF-debug tools.
  226      xdelta3-second.h   The common secondary compression routines.
  227      xdelta3-decoder.h  All decoding routines.
  228      xdelta3-djw.h      The semi-adaptive huffman secondary encoder.
  229      xdelta3-fgk.h      The adaptive huffman secondary encoder.
  230      xdelta3-test.h     The unit test covers major algorithms,
  231                         encoding and decoding.  There are single-bit
  232                         error decoding tests.  There are 32/64-bit file size
  233                         boundary tests.  There are command-line tests.
  234                         There are compression tests.  There are external
  235                         compression tests.  There are string-matching tests.
  236             There should be more tests...
  237 
  238    Additional headers include:
  239 
  240      xdelta3.h          The public header file.
  241      xdelta3-cfgs.h     The default settings for default, built-in
  242                         encoders.  These are hard-coded at
  243                         compile-time.  There is also a single
  244                         soft-coded string matcher for experimenting
  245                         with arbitrary values.
  246      xdelta3-list.h     A cyclic list template
  247 
  248    Misc little debug utilities:
  249 
  250      badcopy.c          Randomly modifies an input file based on two
  251                         parameters: (1) the probability that a byte in
  252                         the file is replaced with a pseudo-random value,
  253                         and (2) the mean change size.  Changes are
  254                         generated using an expoential distribution
  255                         which approximates the expected error_prob
  256             distribution.
  257    --------------------------------------------------------------------
  258 
  259    This file itself is unusually large.  I hope to defend this layout
  260    with lots of comments.  Everything in this file is related to
  261    encoding and decoding.  I like it all together - the template stuff
  262    is just a hack. */
  263 
  264 #ifndef __XDELTA3_C_HEADER_PASS__
  265 #define __XDELTA3_C_HEADER_PASS__
  266 
  267 #include "xdelta3.h"
  268 #include "xdelta3-internal.h"
  269 
  270 /***********************************************************************
  271  STATIC CONFIGURATION
  272  ***********************************************************************/
  273 
  274 #ifndef XD3_MAIN                  /* the main application */
  275 #define XD3_MAIN 0
  276 #endif
  277 
  278 #ifndef VCDIFF_TOOLS
  279 #define VCDIFF_TOOLS XD3_MAIN
  280 #endif
  281 
  282 #ifndef SECONDARY_FGK    /* one from the algorithm preservation department: */
  283 #define SECONDARY_FGK 0  /* adaptive Huffman routines */
  284 #endif
  285 
  286 #ifndef SECONDARY_DJW    /* semi-adaptive/static Huffman for the eventual */
  287 #define SECONDARY_DJW 0  /* standardization, off by default until such time. */
  288 #endif
  289 
  290 #ifndef SECONDARY_LZMA
  291 #ifdef HAVE_LZMA_H
  292 #define SECONDARY_LZMA 1
  293 #else
  294 #define SECONDARY_LZMA 0
  295 #endif
  296 #endif
  297 
  298 #if XD3_ENCODER
  299 #define IF_ENCODER(x) x
  300 #else
  301 #define IF_ENCODER(x)
  302 #endif
  303 
  304 /***********************************************************************/
  305 
  306   /* header indicator bits */
  307 #define VCD_SECONDARY (1U << 0)  /* uses secondary compressor */
  308 #define VCD_CODETABLE (1U << 1)  /* supplies code table data */
  309 #define VCD_APPHEADER (1U << 2)  /* supplies application data */
  310 #define VCD_INVHDR    (~0x7U)
  311 
  312   /* window indicator bits */
  313 #define VCD_SOURCE   (1U << 0)  /* copy window in source file */
  314 #define VCD_TARGET   (1U << 1)  /* copy window in target file */
  315 #define VCD_ADLER32  (1U << 2)  /* has adler32 checksum */
  316 #define VCD_INVWIN   (~0x7U)
  317 
  318 #define VCD_SRCORTGT (VCD_SOURCE | VCD_TARGET)
  319 
  320   /* delta indicator bits */
  321 #define VCD_DATACOMP (1U << 0)
  322 #define VCD_INSTCOMP (1U << 1)
  323 #define VCD_ADDRCOMP (1U << 2)
  324 #define VCD_INVDEL   (~0x7U)
  325 
  326 typedef enum {
  327   VCD_DJW_ID    = 1,
  328   VCD_LZMA_ID   = 2,
  329   VCD_FGK_ID    = 16  /* Note: these are not standard IANA-allocated IDs! */
  330 } xd3_secondary_ids;
  331 
  332 typedef enum {
  333   SEC_NOFLAGS     = 0,
  334 
  335   /* Note: SEC_COUNT_FREQS Not implemented (to eliminate 1st Huffman pass) */
  336   SEC_COUNT_FREQS = (1 << 0)
  337 } xd3_secondary_flags;
  338 
  339 typedef enum {
  340   DATA_SECTION, /* These indicate which section to the secondary
  341                  * compressor. */
  342   INST_SECTION, /* The header section is not compressed, therefore not
  343                  * listed here. */
  344   ADDR_SECTION
  345 } xd3_section_type;
  346 
  347 typedef unsigned int xd3_rtype;
  348 
  349 /***********************************************************************/
  350 
  351 #include "xdelta3-list.h"
  352 
  353 XD3_MAKELIST(xd3_rlist, xd3_rinst, link);
  354 
  355 /***********************************************************************/
  356 
  357 #define SECONDARY_MIN_SAVINGS 2  /* Secondary compression has to save
  358                     at least this many bytes. */
  359 #define SECONDARY_MIN_INPUT   10 /* Secondary compression needs at
  360                     least this many bytes. */
  361 
  362 #define VCDIFF_MAGIC1  0xd6  /* 1st file byte */
  363 #define VCDIFF_MAGIC2  0xc3  /* 2nd file byte */
  364 #define VCDIFF_MAGIC3  0xc4  /* 3rd file byte */
  365 #define VCDIFF_VERSION 0x00  /* 4th file byte */
  366 
  367 #define VCD_SELF       0     /* 1st address mode */
  368 #define VCD_HERE       1     /* 2nd address mode */
  369 
  370 #define CODE_TABLE_STRING_SIZE (6 * 256) /* Should fit a code table string. */
  371 #define CODE_TABLE_VCDIFF_SIZE (6 * 256) /* Should fit a compressed code
  372                       * table string */
  373 
  374 #define SECONDARY_ANY (SECONDARY_DJW || SECONDARY_FGK || SECONDARY_LZMA)
  375 
  376 #define ALPHABET_SIZE      256  /* Used in test code--size of the secondary
  377                  * compressor alphabet. */
  378 
  379 #define HASH_PERMUTE       1    /* The input is permuted by random nums */
  380 #define ADLER_LARGE_CKSUM  1    /* Adler checksum vs. RK checksum */
  381 
  382 #define HASH_CKOFFSET      1U   /* Table entries distinguish "no-entry" from
  383                  * offset 0 using this offset. */
  384 
  385 #define MIN_SMALL_LOOK    2U    /* Match-optimization stuff. */
  386 #define MIN_LARGE_LOOK    2U
  387 #define MIN_MATCH_OFFSET  1U
  388 #define MAX_MATCH_SPLIT   18U   /* VCDIFF code table: 18 is the default limit
  389                  * for direct-coded ADD sizes */
  390 
  391 #define LEAST_MATCH_INCR  0   /* The least number of bytes an overlapping
  392                    * match must beat the preceding match by.  This
  393                    * is a bias for the lazy match optimization.  A
  394                    * non-zero value means that an adjacent match
  395                    * has to be better by more than the step
  396                    * between them.  0. */
  397 
  398 #define MIN_MATCH         4U  /* VCDIFF code table: MIN_MATCH=4 */
  399 #define MIN_ADD           1U  /* 1 */
  400 #define MIN_RUN           8U  /* The shortest run, if it is shorter than this
  401                    * an immediate add/copy will be just as good.
  402                    * ADD1/COPY6 = 1I+1D+1A bytes, RUN18 =
  403                    * 1I+1D+1A. */
  404 
  405 #define MAX_MODES         9  /* Maximum number of nodes used for
  406                   * compression--does not limit decompression. */
  407 
  408 #define ENC_SECTS         4  /* Number of separate output sections. */
  409 
  410 #define HDR_TAIL(s)  ((s)->enc_tails[0])
  411 #define DATA_TAIL(s) ((s)->enc_tails[1])
  412 #define INST_TAIL(s) ((s)->enc_tails[2])
  413 #define ADDR_TAIL(s) ((s)->enc_tails[3])
  414 
  415 #define HDR_HEAD(s)  ((s)->enc_heads[0])
  416 #define DATA_HEAD(s) ((s)->enc_heads[1])
  417 #define INST_HEAD(s) ((s)->enc_heads[2])
  418 #define ADDR_HEAD(s) ((s)->enc_heads[3])
  419 
  420 #define TOTAL_MODES(x) (2+(x)->acache.s_same+(x)->acache.s_near)
  421 
  422 /* Template instances. */
  423 #if XD3_BUILD_SLOW
  424 #define IF_BUILD_SLOW(x) x
  425 #else
  426 #define IF_BUILD_SLOW(x)
  427 #endif
  428 #if XD3_BUILD_FAST
  429 #define IF_BUILD_FAST(x) x
  430 #else
  431 #define IF_BUILD_FAST(x)
  432 #endif
  433 #if XD3_BUILD_FASTER
  434 #define IF_BUILD_FASTER(x) x
  435 #else
  436 #define IF_BUILD_FASTER(x)
  437 #endif
  438 #if XD3_BUILD_FASTEST
  439 #define IF_BUILD_FASTEST(x) x
  440 #else
  441 #define IF_BUILD_FASTEST(x)
  442 #endif
  443 #if XD3_BUILD_SOFT
  444 #define IF_BUILD_SOFT(x) x
  445 #else
  446 #define IF_BUILD_SOFT(x)
  447 #endif
  448 #if XD3_BUILD_DEFAULT
  449 #define IF_BUILD_DEFAULT(x) x
  450 #else
  451 #define IF_BUILD_DEFAULT(x)
  452 #endif
  453 
  454 /* Update the run-length state */
  455 #define NEXTRUN(c) do { if ((c) == run_c) { run_l += 1; } \
  456   else { run_c = (c); run_l = 1; } } while (0)
  457 
  458 /* This CPP-conditional stuff can be cleaned up... */
  459 #if REGRESSION_TEST
  460 #define IF_REGRESSION(x) x
  461 #else
  462 #define IF_REGRESSION(x)
  463 #endif
  464 
  465 /***********************************************************************/
  466 
  467 #if XD3_ENCODER
  468 static void*       xd3_alloc0 (xd3_stream *stream,
  469                    usize_t      elts,
  470                    usize_t      size);
  471 
  472 
  473 static int         xd3_alloc_iopt (xd3_stream *stream, usize_t elts);
  474 
  475 static void        xd3_free_output (xd3_stream *stream,
  476                     xd3_output *output);
  477 
  478 static int         xd3_emit_double (xd3_stream *stream, xd3_rinst *first,
  479                     xd3_rinst *second, usize_t code);
  480 static int         xd3_emit_single (xd3_stream *stream, xd3_rinst *single,
  481                     usize_t code);
  482 
  483 static usize_t      xd3_sizeof_output (xd3_output *output);
  484 static void        xd3_encode_reset (xd3_stream *stream);
  485 
  486 static int         xd3_source_match_setup (xd3_stream *stream, xoff_t srcpos);
  487 static int         xd3_source_extend_match (xd3_stream *stream);
  488 static int         xd3_srcwin_setup (xd3_stream *stream);
  489 static usize_t     xd3_iopt_last_matched (xd3_stream *stream);
  490 static int         xd3_emit_uint32_t (xd3_stream *stream, xd3_output **output,
  491                       uint32_t num);
  492 
  493 static usize_t xd3_smatch (xd3_stream *stream,
  494                usize_t base,
  495                usize_t scksum,
  496                usize_t *match_offset);
  497 static int xd3_string_match_init (xd3_stream *stream);
  498 static uint32_t xd3_scksum (uint32_t *state, const uint8_t *seg,
  499                 const usize_t ln);
  500 static usize_t xd3_comprun (const uint8_t *seg, usize_t slook, uint8_t *run_cp);
  501 static int xd3_srcwin_move_point (xd3_stream *stream,
  502                   usize_t *next_move_point);
  503 
  504 static int xd3_emit_run (xd3_stream *stream, usize_t pos,
  505              usize_t size, uint8_t *run_c);
  506 static usize_t xd3_checksum_hash (const xd3_hash_cfg *cfg,
  507                   const usize_t cksum);
  508 static xoff_t xd3_source_cksum_offset(xd3_stream *stream, usize_t low);
  509 static void xd3_scksum_insert (xd3_stream *stream,
  510                    usize_t inx,
  511                    usize_t scksum,
  512                    usize_t pos);
  513 
  514 
  515 #if XD3_DEBUG
  516 static void xd3_verify_run_state (xd3_stream    *stream,
  517                   const uint8_t *inp,
  518                   usize_t        x_run_l,
  519                   uint8_t       *x_run_c);
  520 static void xd3_verify_large_state (xd3_stream *stream,
  521                     const uint8_t *inp,
  522                     uint32_t x_cksum);
  523 static void xd3_verify_small_state (xd3_stream    *stream,
  524                     const uint8_t *inp,
  525                     uint32_t       x_cksum);
  526 
  527 #endif /* XD3_DEBUG */
  528 #endif /* XD3_ENCODER */
  529 
  530 static int         xd3_decode_allocate (xd3_stream *stream, usize_t size,
  531                     uint8_t **copied1, usize_t *alloc1);
  532 
  533 static void*       xd3_alloc (xd3_stream *stream, usize_t elts, usize_t size);
  534 static void        xd3_free  (xd3_stream *stream, void *ptr);
  535 
  536 static int         xd3_read_uint32_t (xd3_stream *stream, const uint8_t **inpp,
  537                       const uint8_t *max, uint32_t *valp);
  538 
  539 #if REGRESSION_TEST
  540 static int         xd3_selftest      (void);
  541 #endif
  542 
  543 /***********************************************************************/
  544 
  545 const char* xd3_strerror (int ret)
  546 {
  547   switch (ret)
  548     {
  549     case XD3_INPUT: return "XD3_INPUT";
  550     case XD3_OUTPUT: return "XD3_OUTPUT";
  551     case XD3_GETSRCBLK: return "XD3_GETSRCBLK";
  552     case XD3_GOTHEADER: return "XD3_GOTHEADER";
  553     case XD3_WINSTART: return "XD3_WINSTART";
  554     case XD3_WINFINISH: return "XD3_WINFINISH";
  555     case XD3_TOOFARBACK: return "XD3_TOOFARBACK";
  556     case XD3_INTERNAL: return "XD3_INTERNAL";
  557     case XD3_INVALID: return "XD3_INVALID";
  558     case XD3_INVALID_INPUT: return "XD3_INVALID_INPUT";
  559     case XD3_NOSECOND: return "XD3_NOSECOND";
  560     case XD3_UNIMPLEMENTED: return "XD3_UNIMPLEMENTED";
  561     }
  562   return NULL;
  563 }
  564 
  565 /***********************************************************************/
  566 
  567 #define xd3_sec_data(s) ((s)->sec_stream_d)
  568 #define xd3_sec_inst(s) ((s)->sec_stream_i)
  569 #define xd3_sec_addr(s) ((s)->sec_stream_a)
  570 
  571 struct _xd3_sec_type
  572 {
  573   int         id;
  574   const char *name;
  575   xd3_secondary_flags flags;
  576 
  577   /* xd3_sec_stream is opaque to the generic code */
  578   xd3_sec_stream* (*alloc)   (xd3_stream     *stream);
  579   void            (*destroy) (xd3_stream     *stream,
  580                   xd3_sec_stream *sec);
  581   int             (*init)    (xd3_stream     *stream,
  582                   xd3_sec_stream *sec_stream,
  583                   int             is_encode);
  584   int             (*decode)  (xd3_stream     *stream,
  585                   xd3_sec_stream *sec_stream,
  586                   const uint8_t **input,
  587                   const uint8_t  *input_end,
  588                   uint8_t       **output,
  589                   const uint8_t  *output_end);
  590 #if XD3_ENCODER
  591   int             (*encode)  (xd3_stream     *stream,
  592                   xd3_sec_stream *sec_stream,
  593                   xd3_output     *input,
  594                   xd3_output     *output,
  595                   xd3_sec_cfg    *cfg);
  596 #endif
  597 };
  598 
  599 #define BIT_STATE_ENCODE_INIT { 0, 1 }
  600 #define BIT_STATE_DECODE_INIT { 0, 0x100 }
  601 
  602 typedef struct _bit_state bit_state;
  603 struct _bit_state
  604 {
  605   usize_t cur_byte;
  606   usize_t cur_mask;
  607 };
  608 
  609 #if SECONDARY_ANY == 0
  610 #define IF_SEC(x)
  611 #define IF_NSEC(x) x
  612 #else /* yuck */
  613 #define IF_SEC(x) x
  614 #define IF_NSEC(x)
  615 static int
  616 xd3_decode_secondary (xd3_stream      *stream,
  617               xd3_desect      *sect,
  618               xd3_sec_stream **sec_streamp);
  619 #if XD3_ENCODER
  620 static int
  621 xd3_encode_secondary (xd3_stream      *stream,
  622               xd3_output     **head,
  623               xd3_output     **tail,
  624               xd3_sec_stream **sec_streamp,
  625               xd3_sec_cfg     *cfg,
  626               int             *did_it);
  627 #endif
  628 #endif /* SECONDARY_ANY */
  629 
  630 #if SECONDARY_FGK
  631 extern const xd3_sec_type fgk_sec_type;
  632 #define IF_FGK(x) x
  633 #define FGK_CASE(s) \
  634   s->sec_type = & fgk_sec_type; \
  635   break;
  636 #else
  637 #define IF_FGK(x)
  638 #define FGK_CASE(s) \
  639   s->msg = "unavailable secondary compressor: FGK Adaptive Huffman"; \
  640   return XD3_INTERNAL;
  641 #endif
  642 
  643 #if SECONDARY_DJW
  644 extern const xd3_sec_type djw_sec_type;
  645 #define IF_DJW(x) x
  646 #define DJW_CASE(s) \
  647   s->sec_type = & djw_sec_type; \
  648   break;
  649 #else
  650 #define IF_DJW(x)
  651 #define DJW_CASE(s) \
  652   s->msg = "unavailable secondary compressor: DJW Static Huffman"; \
  653   return XD3_INTERNAL;
  654 #endif
  655 
  656 #if SECONDARY_LZMA
  657 extern const xd3_sec_type lzma_sec_type;
  658 #define IF_LZMA(x) x
  659 #define LZMA_CASE(s) \
  660   s->sec_type = & lzma_sec_type; \
  661   break;
  662 #else
  663 #define IF_LZMA(x)
  664 #define LZMA_CASE(s) \
  665   s->msg = "unavailable secondary compressor: LZMA"; \
  666   return XD3_INTERNAL;
  667 #endif
  668 
  669 /***********************************************************************/
  670 
  671 #include "xdelta3-hash.h"
  672 
  673 /* Process template passes - this includes xdelta3.c several times. */
  674 #define __XDELTA3_C_TEMPLATE_PASS__
  675 #include "xdelta3-cfgs.h"
  676 #undef __XDELTA3_C_TEMPLATE_PASS__
  677 
  678 /* Process the inline pass. */
  679 #define __XDELTA3_C_INLINE_PASS__
  680 #include "xdelta3.c"
  681 #undef __XDELTA3_C_INLINE_PASS__
  682 
  683 /* Secondary compression */
  684 #if SECONDARY_ANY
  685 #include "xdelta3-second.h"
  686 #endif
  687 
  688 #if SECONDARY_FGK
  689 #include "xdelta3-fgk.h"
  690 const xd3_sec_type fgk_sec_type =
  691 {
  692   VCD_FGK_ID,
  693   "FGK Adaptive Huffman",
  694   SEC_NOFLAGS,
  695   (xd3_sec_stream* (*)(xd3_stream*)) fgk_alloc,
  696   (void (*)(xd3_stream*, xd3_sec_stream*)) fgk_destroy,
  697   (int (*)(xd3_stream*, xd3_sec_stream*, int)) fgk_init,
  698   (int (*)(xd3_stream*, xd3_sec_stream*, const uint8_t**, const uint8_t*,
  699        uint8_t**, const uint8_t*)) xd3_decode_fgk,
  700   IF_ENCODER((int (*)(xd3_stream*, xd3_sec_stream*, xd3_output*,
  701               xd3_output*, xd3_sec_cfg*))   xd3_encode_fgk)
  702 };
  703 #endif
  704 
  705 #if SECONDARY_DJW
  706 #include "xdelta3-djw.h"
  707 const xd3_sec_type djw_sec_type =
  708 {
  709   VCD_DJW_ID,
  710   "Static Huffman",
  711   SEC_COUNT_FREQS,
  712   (xd3_sec_stream* (*)(xd3_stream*)) djw_alloc,
  713   (void (*)(xd3_stream*, xd3_sec_stream*)) djw_destroy,
  714   (int (*)(xd3_stream*, xd3_sec_stream*, int)) djw_init,
  715   (int (*)(xd3_stream*, xd3_sec_stream*, const uint8_t**, const uint8_t*,
  716        uint8_t**, const uint8_t*)) xd3_decode_huff,
  717   IF_ENCODER((int (*)(xd3_stream*, xd3_sec_stream*, xd3_output*,
  718               xd3_output*, xd3_sec_cfg*))   xd3_encode_huff)
  719 };
  720 #endif
  721 
  722 #if SECONDARY_LZMA
  723 #include "xdelta3-lzma.h"
  724 const xd3_sec_type lzma_sec_type =
  725 {
  726   VCD_LZMA_ID,
  727   "lzma",
  728   SEC_NOFLAGS,
  729   (xd3_sec_stream* (*)(xd3_stream*)) xd3_lzma_alloc,
  730   (void (*)(xd3_stream*, xd3_sec_stream*)) xd3_lzma_destroy,
  731   (int (*)(xd3_stream*, xd3_sec_stream*, int)) xd3_lzma_init,
  732   (int (*)(xd3_stream*, xd3_sec_stream*, const uint8_t**, const uint8_t*,
  733        uint8_t**, const uint8_t*)) xd3_decode_lzma,
  734   IF_ENCODER((int (*)(xd3_stream*, xd3_sec_stream*, xd3_output*,
  735               xd3_output*, xd3_sec_cfg*))   xd3_encode_lzma)
  736 };
  737 #endif
  738 
  739 #if XD3_MAIN || PYTHON_MODULE || SWIG_MODULE || NOT_MAIN
  740 #include "xdelta3-main.h"
  741 #endif
  742 
  743 #if REGRESSION_TEST
  744 #include "xdelta3-test.h"
  745 #endif
  746 
  747 #endif /* __XDELTA3_C_HEADER_PASS__ */
  748 #ifdef __XDELTA3_C_INLINE_PASS__
  749 
  750 const uint16_t __single_hash[256] =
  751 {
  752   /* Random numbers generated using SLIB's pseudo-random number generator.
  753    * This hashes the input alphabet. */
  754   0xbcd1, 0xbb65, 0x42c2, 0xdffe, 0x9666, 0x431b, 0x8504, 0xeb46,
  755   0x6379, 0xd460, 0xcf14, 0x53cf, 0xdb51, 0xdb08, 0x12c8, 0xf602,
  756   0xe766, 0x2394, 0x250d, 0xdcbb, 0xa678, 0x02af, 0xa5c6, 0x7ea6,
  757   0xb645, 0xcb4d, 0xc44b, 0xe5dc, 0x9fe6, 0x5b5c, 0x35f5, 0x701a,
  758   0x220f, 0x6c38, 0x1a56, 0x4ca3, 0xffc6, 0xb152, 0x8d61, 0x7a58,
  759   0x9025, 0x8b3d, 0xbf0f, 0x95a3, 0xe5f4, 0xc127, 0x3bed, 0x320b,
  760   0xb7f3, 0x6054, 0x333c, 0xd383, 0x8154, 0x5242, 0x4e0d, 0x0a94,
  761   0x7028, 0x8689, 0x3a22, 0x0980, 0x1847, 0xb0f1, 0x9b5c, 0x4176,
  762   0xb858, 0xd542, 0x1f6c, 0x2497, 0x6a5a, 0x9fa9, 0x8c5a, 0x7743,
  763   0xa8a9, 0x9a02, 0x4918, 0x438c, 0xc388, 0x9e2b, 0x4cad, 0x01b6,
  764   0xab19, 0xf777, 0x365f, 0x1eb2, 0x091e, 0x7bf8, 0x7a8e, 0x5227,
  765   0xeab1, 0x2074, 0x4523, 0xe781, 0x01a3, 0x163d, 0x3b2e, 0x287d,
  766   0x5e7f, 0xa063, 0xb134, 0x8fae, 0x5e8e, 0xb7b7, 0x4548, 0x1f5a,
  767   0xfa56, 0x7a24, 0x900f, 0x42dc, 0xcc69, 0x02a0, 0x0b22, 0xdb31,
  768   0x71fe, 0x0c7d, 0x1732, 0x1159, 0xcb09, 0xe1d2, 0x1351, 0x52e9,
  769   0xf536, 0x5a4f, 0xc316, 0x6bf9, 0x8994, 0xb774, 0x5f3e, 0xf6d6,
  770   0x3a61, 0xf82c, 0xcc22, 0x9d06, 0x299c, 0x09e5, 0x1eec, 0x514f,
  771   0x8d53, 0xa650, 0x5c6e, 0xc577, 0x7958, 0x71ac, 0x8916, 0x9b4f,
  772   0x2c09, 0x5211, 0xf6d8, 0xcaaa, 0xf7ef, 0x287f, 0x7a94, 0xab49,
  773   0xfa2c, 0x7222, 0xe457, 0xd71a, 0x00c3, 0x1a76, 0xe98c, 0xc037,
  774   0x8208, 0x5c2d, 0xdfda, 0xe5f5, 0x0b45, 0x15ce, 0x8a7e, 0xfcad,
  775   0xaa2d, 0x4b5c, 0xd42e, 0xb251, 0x907e, 0x9a47, 0xc9a6, 0xd93f,
  776   0x085e, 0x35ce, 0xa153, 0x7e7b, 0x9f0b, 0x25aa, 0x5d9f, 0xc04d,
  777   0x8a0e, 0x2875, 0x4a1c, 0x295f, 0x1393, 0xf760, 0x9178, 0x0f5b,
  778   0xfa7d, 0x83b4, 0x2082, 0x721d, 0x6462, 0x0368, 0x67e2, 0x8624,
  779   0x194d, 0x22f6, 0x78fb, 0x6791, 0xb238, 0xb332, 0x7276, 0xf272,
  780   0x47ec, 0x4504, 0xa961, 0x9fc8, 0x3fdc, 0xb413, 0x007a, 0x0806,
  781   0x7458, 0x95c6, 0xccaa, 0x18d6, 0xe2ae, 0x1b06, 0xf3f6, 0x5050,
  782   0xc8e8, 0xf4ac, 0xc04c, 0xf41c, 0x992f, 0xae44, 0x5f1b, 0x1113,
  783   0x1738, 0xd9a8, 0x19ea, 0x2d33, 0x9698, 0x2fe9, 0x323f, 0xcde2,
  784   0x6d71, 0xe37d, 0xb697, 0x2c4f, 0x4373, 0x9102, 0x075d, 0x8e25,
  785   0x1672, 0xec28, 0x6acb, 0x86cc, 0x186e, 0x9414, 0xd674, 0xd1a5
  786 };
  787 
  788 /****************************************************************
  789  Instruction tables
  790  *****************************************************************/
  791 
  792 /* The following code implements a parametrized description of the
  793  * code table given above for a few reasons.  It is not necessary for
  794  * implementing the standard, to support compression with variable
  795  * tables, so an implementation is only required to know the default
  796  * code table to begin decompression.  (If the encoder uses an
  797  * alternate table, the table is included in compressed form inside
  798  * the VCDIFF file.)
  799  *
  800  * Before adding variable-table support there were two functions which
  801  * were hard-coded to the default table above.
  802  * xd3_compute_default_table() would create the default table by
  803  * filling a 256-elt array of xd3_dinst values.  The corresponding
  804  * function, xd3_choose_instruction(), would choose an instruction
  805  * based on the hard-coded parameters of the default code table.
  806  *
  807  * Notes: The parametrized code table description here only generates
  808  * tables of a certain regularity similar to the default table by
  809  * allowing to vary the distribution of single- and
  810  * double-instructions and change the number of near and same copy
  811  * modes.  More exotic tables are only possible by extending this
  812  * code.
  813  *
  814  * For performance reasons, both the parametrized and non-parametrized
  815  * versions of xd3_choose_instruction remain.  The parametrized
  816  * version is only needed for testing multi-table decoding support.
  817  * If ever multi-table encoding is required, this can be optimized by
  818  * compiling static functions for each table.
  819  */
  820 
  821 /* The XD3_CHOOSE_INSTRUCTION calls xd3_choose_instruction with the
  822  * table description when GENERIC_ENCODE_TABLES are in use.  The
  823  * IF_GENCODETBL macro enables generic-code-table specific code
  824  * (removed 10/2014). */
  825 #define XD3_CHOOSE_INSTRUCTION(stream,prev,inst) \
  826   xd3_choose_instruction (prev, inst)
  827 
  828 /* This structure maintains information needed by
  829  * xd3_choose_instruction to compute the code for a double instruction
  830  * by first indexing an array of code_table_sizes by copy mode, then
  831  * using (offset + (muliplier * X)) */
  832 struct _xd3_code_table_sizes {
  833   uint8_t cpy_max;
  834   uint8_t offset;
  835   uint8_t mult;
  836 };
  837 
  838 /* This contains a complete description of a code table. */
  839 struct _xd3_code_table_desc
  840 {
  841   /* Assumes a single RUN instruction */
  842   /* Assumes that MIN_MATCH is 4 */
  843 
  844   uint8_t add_sizes;            /* Number of immediate-size single adds (default 17) */
  845   uint8_t near_modes;           /* Number of near copy modes (default 4) */
  846   uint8_t same_modes;           /* Number of same copy modes (default 3) */
  847   uint8_t cpy_sizes;            /* Number of immediate-size single copies (default 15) */
  848 
  849   uint8_t addcopy_add_max;      /* Maximum add size for an add-copy double instruction,
  850                    all modes (default 4) */
  851   uint8_t addcopy_near_cpy_max; /* Maximum cpy size for an add-copy double instruction,
  852                    up through VCD_NEAR modes (default 6) */
  853   uint8_t addcopy_same_cpy_max; /* Maximum cpy size for an add-copy double instruction,
  854                    VCD_SAME modes (default 4) */
  855 
  856   uint8_t copyadd_add_max;      /* Maximum add size for a copy-add double instruction,
  857                    all modes (default 1) */
  858   uint8_t copyadd_near_cpy_max; /* Maximum cpy size for a copy-add double instruction,
  859                    up through VCD_NEAR modes (default 4) */
  860   uint8_t copyadd_same_cpy_max; /* Maximum cpy size for a copy-add double instruction,
  861                    VCD_SAME modes (default 4) */
  862 
  863   xd3_code_table_sizes addcopy_max_sizes[MAX_MODES];
  864   xd3_code_table_sizes copyadd_max_sizes[MAX_MODES];
  865 };
  866 
  867 /* The rfc3284 code table is represented: */
  868 static const xd3_code_table_desc __rfc3284_code_table_desc = {
  869   17, /* add sizes */
  870   4,  /* near modes */
  871   3,  /* same modes */
  872   15, /* copy sizes */
  873 
  874   4,  /* add-copy max add */
  875   6,  /* add-copy max cpy, near */
  876   4,  /* add-copy max cpy, same */
  877 
  878   1,  /* copy-add max add */
  879   4,  /* copy-add max cpy, near */
  880   4,  /* copy-add max cpy, same */
  881 
  882   /* addcopy */
  883   { {6,163,3},{6,175,3},{6,187,3},{6,199,3},{6,211,3},{6,223,3},{4,235,1},{4,239,1},{4,243,1} },
  884   /* copyadd */
  885   { {4,247,1},{4,248,1},{4,249,1},{4,250,1},{4,251,1},{4,252,1},{4,253,1},{4,254,1},{4,255,1} },
  886 };
  887 
  888 /* Computes code table entries of TBL using the specified description. */
  889 static void
  890 xd3_build_code_table (const xd3_code_table_desc *desc, xd3_dinst *tbl)
  891 {
  892   usize_t size1, size2, mode;
  893   usize_t cpy_modes = 2 + desc->near_modes + desc->same_modes;
  894   xd3_dinst *d = tbl;
  895 
  896   (d++)->type1 = XD3_RUN;
  897   (d++)->type1 = XD3_ADD;
  898 
  899   for (size1 = 1; size1 <= desc->add_sizes; size1 += 1, d += 1)
  900     {
  901       d->type1 = XD3_ADD;
  902       d->size1 = size1;
  903     }
  904 
  905   for (mode = 0; mode < cpy_modes; mode += 1)
  906     {
  907       (d++)->type1 = XD3_CPY + mode;
  908 
  909       for (size1 = MIN_MATCH; size1 < MIN_MATCH + desc->cpy_sizes; size1 += 1, d += 1)
  910     {
  911       d->type1 = XD3_CPY + mode;
  912       d->size1 = size1;
  913     }
  914     }
  915 
  916   for (mode = 0; mode < cpy_modes; mode += 1)
  917     {
  918       for (size1 = 1; size1 <= desc->addcopy_add_max; size1 += 1)
  919     {
  920       usize_t max = (mode < 2U + desc->near_modes) ?
  921         desc->addcopy_near_cpy_max :
  922         desc->addcopy_same_cpy_max;
  923 
  924       for (size2 = MIN_MATCH; size2 <= max; size2 += 1, d += 1)
  925         {
  926           d->type1 = XD3_ADD;
  927           d->size1 = size1;
  928           d->type2 = XD3_CPY + mode;
  929           d->size2 = size2;
  930         }
  931     }
  932     }
  933 
  934   for (mode = 0; mode < cpy_modes; mode += 1)
  935     {
  936       usize_t max = (mode < 2U + desc->near_modes) ?
  937     desc->copyadd_near_cpy_max :
  938     desc->copyadd_same_cpy_max;
  939 
  940       for (size1 = MIN_MATCH; size1 <= max; size1 += 1)
  941     {
  942       for (size2 = 1; size2 <= desc->copyadd_add_max; size2 += 1, d += 1)
  943         {
  944           d->type1 = XD3_CPY + mode;
  945           d->size1 = size1;
  946           d->type2 = XD3_ADD;
  947           d->size2 = size2;
  948         }
  949     }
  950     }
  951 
  952   XD3_ASSERT (d - tbl == 256);
  953 }
  954 
  955 /* This function generates the static default code table. */
  956 static const xd3_dinst*
  957 xd3_rfc3284_code_table (void)
  958 {
  959   static xd3_dinst __rfc3284_code_table[256];
  960 
  961   if (__rfc3284_code_table[0].type1 != XD3_RUN)
  962     {
  963       xd3_build_code_table (& __rfc3284_code_table_desc, __rfc3284_code_table);
  964     }
  965 
  966   return __rfc3284_code_table;
  967 }
  968 
  969 #if XD3_ENCODER
  970 /* This version of xd3_choose_instruction is hard-coded for the default
  971    table. */
  972 static void
  973 xd3_choose_instruction (xd3_rinst *prev, xd3_rinst *inst)
  974 {
  975   switch (inst->type)
  976     {
  977     case XD3_RUN:
  978       inst->code1 = 0;
  979       break;
  980 
  981     case XD3_ADD:
  982       inst->code1 = 1;
  983 
  984       if (inst->size <= 17)
  985     {
  986       inst->code1 += inst->size;
  987 
  988       if ( (inst->size == 1) &&
  989            (prev != NULL) &&
  990            (prev->size == 4) &&
  991            (prev->type >= XD3_CPY) )
  992         {
  993           prev->code2 = 247 + (prev->type - XD3_CPY);
  994         }
  995     }
  996 
  997       break;
  998 
  999     default:
 1000       {
 1001     int mode = inst->type - XD3_CPY;
 1002 
 1003     XD3_ASSERT (inst->type >= XD3_CPY && inst->type < 12);
 1004 
 1005     inst->code1 = 19 + 16 * mode;
 1006 
 1007     if (inst->size <= 18 && inst->size >= 4)
 1008       {
 1009         inst->code1 += inst->size - 3;
 1010 
 1011         if ( (prev != NULL) &&
 1012          (prev->type == XD3_ADD) &&
 1013          (prev->size <= 4) )
 1014           {
 1015         if ( (inst->size <= 6) &&
 1016              (mode       <= 5) )
 1017           {
 1018             prev->code2 = 163 + (mode * 12) + (3 * (prev->size - 1)) + (inst->size - 4);
 1019 
 1020             XD3_ASSERT (prev->code2 <= 234);
 1021           }
 1022         else if ( (inst->size == 4) &&
 1023               (mode       >= 6) )
 1024           {
 1025             prev->code2 = 235 + ((mode - 6) * 4) + (prev->size - 1);
 1026 
 1027             XD3_ASSERT (prev->code2 <= 246);
 1028           }
 1029           }
 1030       }
 1031 
 1032     XD3_ASSERT (inst->code1 <= 162);
 1033       }
 1034       break;
 1035     }
 1036 }
 1037 #endif /* XD3_ENCODER */
 1038 
 1039 /***********************************************************************/
 1040 
 1041 static inline void
 1042 xd3_swap_uint8p (uint8_t** p1, uint8_t** p2)
 1043 {
 1044   uint8_t *t = (*p1);
 1045   (*p1) = (*p2);
 1046   (*p2) = t;
 1047 }
 1048 
 1049 static inline void
 1050 xd3_swap_usize_t (usize_t* p1, usize_t* p2)
 1051 {
 1052   usize_t t = (*p1);
 1053   (*p1) = (*p2);
 1054   (*p2) = t;
 1055 }
 1056 
 1057 /* It's not constant time, but it computes the log. */
 1058 static int
 1059 xd3_check_pow2 (xoff_t value, usize_t *logof)
 1060 {
 1061   xoff_t x = 1;
 1062   usize_t nolog;
 1063   if (logof == NULL) {
 1064     logof = &nolog;
 1065   }
 1066 
 1067   *logof = 0;
 1068 
 1069   for (; x != 0; x <<= 1, *logof += 1)
 1070     {
 1071       if (x == value)
 1072     {
 1073       return 0;
 1074     }
 1075     }
 1076 
 1077   return XD3_INTERNAL;
 1078 }
 1079 
 1080 size_t
 1081 xd3_pow2_roundup (size_t x)
 1082 {
 1083   size_t i = 1;
 1084   while (x > i) {
 1085     i <<= 1U;
 1086   }
 1087   return i;
 1088 }
 1089 
 1090 static xoff_t
 1091 xd3_xoff_roundup (xoff_t x)
 1092 {
 1093   xoff_t i = 1;
 1094   while (x > i) {
 1095     i <<= 1U;
 1096   }
 1097   return i;
 1098 }
 1099 
 1100 static usize_t
 1101 xd3_round_blksize (usize_t sz, usize_t blksz)
 1102 {
 1103   usize_t mod = sz & (blksz-1);
 1104 
 1105   XD3_ASSERT (xd3_check_pow2 (blksz, NULL) == 0);
 1106 
 1107   if (mod == 0)
 1108     {
 1109       return sz;
 1110     }
 1111 
 1112   if (sz > USIZE_T_MAXBLKSZ)
 1113     {
 1114       return USIZE_T_MAXBLKSZ;
 1115     }
 1116 
 1117   return sz + (blksz - mod);
 1118 }
 1119 
 1120 /***********************************************************************
 1121  Adler32 stream function: code copied from Zlib, defined in RFC1950
 1122  ***********************************************************************/
 1123 
 1124 #define A32_BASE 65521L /* Largest prime smaller than 2^16 */
 1125 #define A32_NMAX 5552   /* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
 1126 
 1127 #define A32_DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
 1128 #define A32_DO2(buf,i)  A32_DO1(buf,i); A32_DO1(buf,i+1);
 1129 #define A32_DO4(buf,i)  A32_DO2(buf,i); A32_DO2(buf,i+2);
 1130 #define A32_DO8(buf,i)  A32_DO4(buf,i); A32_DO4(buf,i+4);
 1131 #define A32_DO16(buf)   A32_DO8(buf,0); A32_DO8(buf,8);
 1132 
 1133 static unsigned long adler32 (unsigned long adler, const uint8_t *buf, 
 1134                   usize_t len)
 1135 {
 1136     unsigned long s1 = adler & 0xffff;
 1137     unsigned long s2 = (adler >> 16) & 0xffff;
 1138     int k;
 1139 
 1140     while (len > 0)
 1141       {
 1142         k    = (len < A32_NMAX) ? len : A32_NMAX;
 1143         len -= k;
 1144 
 1145     while (k >= 16)
 1146       {
 1147         A32_DO16(buf);
 1148         buf += 16;
 1149             k -= 16;
 1150       }
 1151 
 1152     if (k != 0)
 1153       {
 1154         do
 1155           {
 1156         s1 += *buf++;
 1157         s2 += s1;
 1158           }
 1159         while (--k);
 1160       }
 1161 
 1162         s1 %= A32_BASE;
 1163         s2 %= A32_BASE;
 1164     }
 1165 
 1166     return (s2 << 16) | s1;
 1167 }
 1168 
 1169 /***********************************************************************
 1170  Run-length function
 1171  ***********************************************************************/
 1172 
 1173 #if XD3_ENCODER
 1174 static usize_t
 1175 xd3_comprun (const uint8_t *seg, usize_t slook, uint8_t *run_cp)
 1176 {
 1177   usize_t i;
 1178   usize_t run_l = 0;
 1179   uint8_t run_c = 0;
 1180 
 1181   for (i = 0; i < slook; i += 1)
 1182     {
 1183       NEXTRUN(seg[i]);
 1184     }
 1185 
 1186   (*run_cp) = run_c;
 1187 
 1188   return run_l;
 1189 }
 1190 #endif
 1191 
 1192 /***********************************************************************
 1193  Basic encoder/decoder functions
 1194  ***********************************************************************/
 1195 
 1196 #if XD3_ENCODER
 1197 inline int
 1198 xd3_emit_byte (xd3_stream  *stream,
 1199            xd3_output **outputp,
 1200            uint8_t      code)
 1201 {
 1202   xd3_output *output = (*outputp);
 1203 
 1204   if (output->next == output->avail)
 1205     {
 1206       xd3_output *aoutput;
 1207 
 1208       if ((aoutput = xd3_alloc_output (stream, output)) == NULL)
 1209     {
 1210       return ENOMEM;
 1211     }
 1212 
 1213       output = (*outputp) = aoutput;
 1214     }
 1215 
 1216   output->base[output->next++] = code;
 1217 
 1218   return 0;
 1219 }
 1220 
 1221 inline int
 1222 xd3_emit_bytes (xd3_stream     *stream,
 1223         xd3_output    **outputp,
 1224         const uint8_t  *base,
 1225         usize_t         size)
 1226 {
 1227   xd3_output *output = (*outputp);
 1228 
 1229   do
 1230     {
 1231       usize_t take;
 1232 
 1233       if (output->next == output->avail)
 1234     {
 1235       xd3_output *aoutput;
 1236 
 1237       if ((aoutput = xd3_alloc_output (stream, output)) == NULL)
 1238         {
 1239           return ENOMEM;
 1240         }
 1241 
 1242       output = (*outputp) = aoutput;
 1243     }
 1244 
 1245       take = xd3_min (output->avail - output->next, size);
 1246 
 1247       memcpy (output->base + output->next, base, (size_t) take);
 1248 
 1249       output->next += take;
 1250       size -= take;
 1251       base += take;
 1252     }
 1253   while (size > 0);
 1254 
 1255   return 0;
 1256 }
 1257 #endif /* XD3_ENCODER */
 1258 
 1259 /***********************************************************************
 1260  Address cache stuff
 1261  ***********************************************************************/
 1262 
 1263 static int
 1264 xd3_alloc_cache (xd3_stream *stream)
 1265 {
 1266   if (stream->acache.near_array != NULL)
 1267     {
 1268       xd3_free (stream, stream->acache.near_array);
 1269     }
 1270 
 1271   if (stream->acache.same_array != NULL)
 1272     {
 1273       xd3_free (stream, stream->acache.same_array);
 1274     }
 1275 
 1276   if (((stream->acache.s_near > 0) &&
 1277        (stream->acache.near_array = (usize_t*)
 1278     xd3_alloc (stream, stream->acache.s_near,
 1279            (usize_t) sizeof (usize_t)))
 1280        == NULL) ||
 1281       ((stream->acache.s_same > 0) &&
 1282        (stream->acache.same_array = (usize_t*)
 1283     xd3_alloc (stream, stream->acache.s_same * 256,
 1284            (usize_t) sizeof (usize_t)))
 1285        == NULL))
 1286     {
 1287       return ENOMEM;
 1288     }
 1289 
 1290   return 0;
 1291 }
 1292 
 1293 void
 1294 xd3_init_cache (xd3_addr_cache* acache)
 1295 {
 1296   if (acache->s_near > 0)
 1297     {
 1298       memset (acache->near_array, 0, acache->s_near * sizeof (usize_t));
 1299       acache->next_slot = 0;
 1300     }
 1301 
 1302   if (acache->s_same > 0)
 1303     {
 1304       memset (acache->same_array, 0, acache->s_same * 256 * sizeof (usize_t));
 1305     }
 1306 }
 1307 
 1308 static void
 1309 xd3_update_cache (xd3_addr_cache* acache, usize_t addr)
 1310 {
 1311   if (acache->s_near > 0)
 1312     {
 1313       acache->near_array[acache->next_slot] = addr;
 1314       acache->next_slot = (acache->next_slot + 1) % acache->s_near;
 1315     }
 1316 
 1317   if (acache->s_same > 0)
 1318     {
 1319       acache->same_array[addr % (acache->s_same*256)] = addr;
 1320     }
 1321 }
 1322 
 1323 #if XD3_ENCODER
 1324 /* OPT: this gets called a lot, can it be optimized? */
 1325 static int
 1326 xd3_encode_address (xd3_stream *stream,
 1327             usize_t addr,
 1328             usize_t here,
 1329             uint8_t* mode)
 1330 {
 1331   usize_t d, bestd;
 1332   usize_t i, bestm, ret;
 1333   xd3_addr_cache* acache = & stream->acache;
 1334 
 1335 #define SMALLEST_INT(x) do { if (((x) & ~127U) == 0) { goto good; } } while (0)
 1336 
 1337   /* Attempt to find the address mode that yields the smallest integer value
 1338    * for "d", the encoded address value, thereby minimizing the encoded size
 1339    * of the address. */
 1340   bestd = addr;
 1341   bestm = VCD_SELF;
 1342 
 1343   XD3_ASSERT (addr < here);
 1344 
 1345   SMALLEST_INT (bestd);
 1346 
 1347   if ((d = here-addr) < bestd)
 1348     {
 1349       bestd = d;
 1350       bestm = VCD_HERE;
 1351 
 1352       SMALLEST_INT (bestd);
 1353     }
 1354 
 1355   for (i = 0; i < acache->s_near; i += 1)
 1356     {
 1357       /* Note: If we used signed computation here, we'd could compte d
 1358        * and then check (d >= 0 && d < bestd). */
 1359       if (addr >= acache->near_array[i])
 1360     {
 1361       d = addr - acache->near_array[i];
 1362 
 1363       if (d < bestd)
 1364         {
 1365           bestd = d;
 1366           bestm = i+2; /* 2 counts the VCD_SELF, VCD_HERE modes */
 1367 
 1368           SMALLEST_INT (bestd);
 1369         }
 1370     }
 1371     }
 1372 
 1373   if (acache->s_same > 0 &&
 1374       acache->same_array[d = addr%(acache->s_same*256)] == addr)
 1375     {
 1376       bestd = d%256;
 1377       /* 2 + s_near offsets past the VCD_NEAR modes */
 1378       bestm = acache->s_near + 2 + d/256;
 1379 
 1380       if ((ret = xd3_emit_byte (stream, & ADDR_TAIL (stream), bestd)))
 1381     {
 1382       return ret;
 1383     }
 1384     }
 1385   else
 1386     {
 1387     good:
 1388 
 1389       if ((ret = xd3_emit_size (stream, & ADDR_TAIL (stream), bestd)))
 1390     {
 1391       return ret;
 1392     }
 1393     }
 1394 
 1395   xd3_update_cache (acache, addr);
 1396 
 1397   (*mode) += bestm;
 1398 
 1399   return 0;
 1400 }
 1401 #endif
 1402 
 1403 static int
 1404 xd3_decode_address (xd3_stream *stream, usize_t here,
 1405             usize_t mode, const uint8_t **inpp,
 1406             const uint8_t *max, uint32_t *valp)
 1407 {
 1408   int ret;
 1409   usize_t same_start = 2 + stream->acache.s_near;
 1410 
 1411   if (mode < same_start)
 1412     {
 1413       if ((ret = xd3_read_size (stream, inpp, max, valp))) { return ret; }
 1414 
 1415       switch (mode)
 1416     {
 1417     case VCD_SELF:
 1418       break;
 1419     case VCD_HERE:
 1420       (*valp) = here - (*valp);
 1421       break;
 1422     default:
 1423       (*valp) += stream->acache.near_array[mode - 2];
 1424       break;
 1425     }
 1426     }
 1427   else
 1428     {
 1429       if (*inpp == max)
 1430     {
 1431       stream->msg = "address underflow";
 1432       return XD3_INVALID_INPUT;
 1433     }
 1434 
 1435       mode -= same_start;
 1436 
 1437       (*valp) = stream->acache.same_array[mode*256 + (**inpp)];
 1438 
 1439       (*inpp) += 1;
 1440     }
 1441 
 1442   xd3_update_cache (& stream->acache, *valp);
 1443 
 1444   return 0;
 1445 }
 1446 
 1447 /***********************************************************************
 1448  Alloc/free
 1449 ***********************************************************************/
 1450 
 1451 static void*
 1452 __xd3_alloc_func (void* opaque, size_t items, usize_t size)
 1453 {
 1454   return malloc (items * (size_t) size);
 1455 }
 1456 
 1457 static void
 1458 __xd3_free_func (void* opaque, void* address)
 1459 {
 1460   free (address);
 1461 }
 1462 
 1463 static void*
 1464 xd3_alloc (xd3_stream *stream,
 1465        usize_t      elts,
 1466        usize_t      size)
 1467 {
 1468   void *a = stream->alloc (stream->opaque, elts, size);
 1469 
 1470   if (a != NULL)
 1471     {
 1472       IF_DEBUG (stream->alloc_cnt += 1);
 1473       IF_DEBUG2 (DP(RINT "[stream %p malloc] size %u ptr %p\n",
 1474             stream, elts * size, a));
 1475     }
 1476   else
 1477     {
 1478       stream->msg = "out of memory";
 1479     }
 1480 
 1481   return a;
 1482 }
 1483 
 1484 static void
 1485 xd3_free (xd3_stream *stream,
 1486       void       *ptr)
 1487 {
 1488   if (ptr != NULL)
 1489     {
 1490       IF_DEBUG (stream->free_cnt += 1);
 1491       XD3_ASSERT (stream->free_cnt <= stream->alloc_cnt);
 1492       IF_DEBUG2 (DP(RINT "[stream %p free] %p\n",
 1493             stream, ptr));
 1494       stream->free (stream->opaque, ptr);
 1495     }
 1496 }
 1497 
 1498 #if XD3_ENCODER
 1499 static void*
 1500 xd3_alloc0 (xd3_stream *stream,
 1501         usize_t      elts,
 1502         usize_t      size)
 1503 {
 1504   void *a = xd3_alloc (stream, elts, size);
 1505 
 1506   if (a != NULL)
 1507     {
 1508       memset (a, 0, (size_t) (elts * size));
 1509     }
 1510 
 1511   return a;
 1512 }
 1513 
 1514 xd3_output*
 1515 xd3_alloc_output (xd3_stream *stream,
 1516           xd3_output *old_output)
 1517 {
 1518   xd3_output *output;
 1519   uint8_t    *base;
 1520 
 1521   if (stream->enc_free != NULL)
 1522     {
 1523       output = stream->enc_free;
 1524       stream->enc_free = output->next_page;
 1525     }
 1526   else
 1527     {
 1528       if ((output = (xd3_output*) xd3_alloc (stream, 1,
 1529                          (usize_t) sizeof (xd3_output)))
 1530       == NULL)
 1531     {
 1532       return NULL;
 1533     }
 1534 
 1535       if ((base = (uint8_t*) xd3_alloc (stream, XD3_ALLOCSIZE,
 1536                     sizeof (uint8_t))) == NULL)
 1537     {
 1538       xd3_free (stream, output);
 1539       return NULL;
 1540     }
 1541 
 1542       output->base  = base;
 1543       output->avail = XD3_ALLOCSIZE;
 1544     }
 1545 
 1546   output->next = 0;
 1547 
 1548   if (old_output)
 1549     {
 1550       old_output->next_page = output;
 1551     }
 1552 
 1553   output->next_page = NULL;
 1554 
 1555   return output;
 1556 }
 1557 
 1558 static usize_t
 1559 xd3_sizeof_output (xd3_output *output)
 1560 {
 1561   usize_t s = 0;
 1562 
 1563   for (; output; output = output->next_page)
 1564     {
 1565       s += output->next;
 1566     }
 1567 
 1568   return s;
 1569 }
 1570 
 1571 static void
 1572 xd3_freelist_output (xd3_stream *stream,
 1573              xd3_output *output)
 1574 {
 1575   xd3_output *tmp;
 1576 
 1577   while (output)
 1578     {
 1579       tmp    = output;
 1580       output = output->next_page;
 1581 
 1582       tmp->next = 0;
 1583       tmp->next_page = stream->enc_free;
 1584       stream->enc_free = tmp;
 1585     }
 1586 }
 1587 
 1588 static void
 1589 xd3_free_output (xd3_stream *stream,
 1590          xd3_output *output)
 1591 {
 1592   xd3_output *next;
 1593 
 1594  again:
 1595   if (output == NULL)
 1596     {
 1597       return;
 1598     }
 1599 
 1600   next = output->next_page;
 1601 
 1602   xd3_free (stream, output->base);
 1603   xd3_free (stream, output);
 1604 
 1605   output = next;
 1606   goto again;
 1607 }
 1608 #endif /* XD3_ENCODER */
 1609 
 1610 void
 1611 xd3_free_stream (xd3_stream *stream)
 1612 {
 1613   xd3_iopt_buflist *blist = stream->iopt_alloc;
 1614 
 1615   while (blist != NULL)
 1616     {
 1617       xd3_iopt_buflist *tmp = blist;
 1618       blist = blist->next;
 1619       xd3_free (stream, tmp->buffer);
 1620       xd3_free (stream, tmp);
 1621     }
 1622 
 1623   xd3_free (stream, stream->large_table);
 1624   xd3_free (stream, stream->small_table);
 1625   xd3_free (stream, stream->small_prev);
 1626 
 1627 #if XD3_ENCODER
 1628   {
 1629     int i;
 1630     for (i = 0; i < ENC_SECTS; i += 1)
 1631       {
 1632     xd3_free_output (stream, stream->enc_heads[i]);
 1633       }
 1634     xd3_free_output (stream, stream->enc_free);
 1635   }
 1636 #endif
 1637 
 1638   xd3_free (stream, stream->acache.near_array);
 1639   xd3_free (stream, stream->acache.same_array);
 1640 
 1641   xd3_free (stream, stream->inst_sect.copied1);
 1642   xd3_free (stream, stream->addr_sect.copied1);
 1643   xd3_free (stream, stream->data_sect.copied1);
 1644 
 1645   if (stream->dec_lastwin != stream->dec_buffer)
 1646     {
 1647       xd3_free (stream, (uint8_t*) stream->dec_lastwin);
 1648     }
 1649   xd3_free (stream, stream->dec_buffer);
 1650 
 1651   xd3_free (stream, stream->buf_in);
 1652   xd3_free (stream, stream->dec_appheader);
 1653   xd3_free (stream, stream->dec_codetbl);
 1654   xd3_free (stream, stream->code_table_alloc);
 1655 
 1656 #if SECONDARY_ANY
 1657   xd3_free (stream, stream->inst_sect.copied2);
 1658   xd3_free (stream, stream->addr_sect.copied2);
 1659   xd3_free (stream, stream->data_sect.copied2);
 1660 
 1661   if (stream->sec_type != NULL)
 1662     {
 1663       stream->sec_type->destroy (stream, stream->sec_stream_d);
 1664       stream->sec_type->destroy (stream, stream->sec_stream_i);
 1665       stream->sec_type->destroy (stream, stream->sec_stream_a);
 1666     }
 1667 #endif
 1668 
 1669   xd3_free (stream, stream->whole_target.adds);
 1670   xd3_free (stream, stream->whole_target.inst);
 1671   xd3_free (stream, stream->whole_target.wininfo);
 1672 
 1673   XD3_ASSERT (stream->alloc_cnt == stream->free_cnt);
 1674 
 1675   memset (stream, 0, sizeof (xd3_stream));
 1676 }
 1677 
 1678 #if (XD3_DEBUG > 1 || VCDIFF_TOOLS)
 1679 static const char*
 1680 xd3_rtype_to_string (xd3_rtype type, int print_mode)
 1681 {
 1682   switch (type)
 1683     {
 1684     case XD3_NOOP:
 1685       return "NOOP ";
 1686     case XD3_RUN:
 1687       return "RUN  ";
 1688     case XD3_ADD:
 1689       return "ADD  ";
 1690     default: break;
 1691     }
 1692   if (! print_mode)
 1693     {
 1694       return "CPY  ";
 1695     }
 1696   switch (type)
 1697     {
 1698     case XD3_CPY + 0: return "CPY_0";
 1699     case XD3_CPY + 1: return "CPY_1";
 1700     case XD3_CPY + 2: return "CPY_2";
 1701     case XD3_CPY + 3: return "CPY_3";
 1702     case XD3_CPY + 4: return "CPY_4";
 1703     case XD3_CPY + 5: return "CPY_5";
 1704     case XD3_CPY + 6: return "CPY_6";
 1705     case XD3_CPY + 7: return "CPY_7";
 1706     case XD3_CPY + 8: return "CPY_8";
 1707     case XD3_CPY + 9: return "CPY_9";
 1708     default:          return "CPY>9";
 1709     }
 1710 }
 1711 #endif
 1712 
 1713 /****************************************************************
 1714  Stream configuration
 1715  ******************************************************************/
 1716 
 1717 int
 1718 xd3_config_stream(xd3_stream *stream,
 1719           xd3_config *config)
 1720 {
 1721   int ret;
 1722   xd3_config defcfg;
 1723   xd3_smatcher *smatcher = &stream->smatcher;
 1724 
 1725   if (config == NULL)
 1726     {
 1727       config = & defcfg;
 1728       memset (config, 0, sizeof (*config));
 1729     }
 1730 
 1731   /* Initial setup: no error checks yet */
 1732   memset (stream, 0, sizeof (*stream));
 1733 
 1734   stream->winsize = config->winsize ? config->winsize : XD3_DEFAULT_WINSIZE;
 1735   stream->sprevsz = config->sprevsz ? config->sprevsz : XD3_DEFAULT_SPREVSZ;
 1736 
 1737   if (config->iopt_size == 0)
 1738     {
 1739       stream->iopt_size = XD3_ALLOCSIZE / sizeof(xd3_rinst);
 1740       stream->iopt_unlimited = 1;
 1741     }
 1742   else
 1743     {
 1744       stream->iopt_size = config->iopt_size;
 1745     }
 1746 
 1747   stream->getblk    = config->getblk;
 1748   stream->alloc     = config->alloc ? config->alloc : __xd3_alloc_func;
 1749   stream->free      = config->freef ? config->freef : __xd3_free_func;
 1750   stream->opaque    = config->opaque;
 1751   stream->flags     = config->flags;
 1752 
 1753   /* Secondary setup. */
 1754   stream->sec_data  = config->sec_data;
 1755   stream->sec_inst  = config->sec_inst;
 1756   stream->sec_addr  = config->sec_addr;
 1757 
 1758   stream->sec_data.data_type = DATA_SECTION;
 1759   stream->sec_inst.data_type = INST_SECTION;
 1760   stream->sec_addr.data_type = ADDR_SECTION;
 1761 
 1762   /* Check static sizes. */
 1763   if (sizeof (usize_t) != SIZEOF_USIZE_T ||
 1764       sizeof (xoff_t) != SIZEOF_XOFF_T ||
 1765       (ret = xd3_check_pow2(XD3_ALLOCSIZE, NULL)))
 1766     {
 1767       stream->msg = "incorrect compilation: wrong integer sizes";
 1768       return XD3_INTERNAL;
 1769     }
 1770 
 1771   /* Check/set secondary compressor. */
 1772   switch (stream->flags & XD3_SEC_TYPE)
 1773     {
 1774     case 0:
 1775       if (stream->flags & XD3_SEC_NOALL)
 1776     {
 1777       stream->msg = "XD3_SEC flags require a secondary compressor type";
 1778       return XD3_INTERNAL;
 1779     }
 1780       break;
 1781     case XD3_SEC_FGK:
 1782       FGK_CASE (stream);
 1783     case XD3_SEC_DJW:
 1784       DJW_CASE (stream);
 1785     case XD3_SEC_LZMA:
 1786       LZMA_CASE (stream);
 1787     default:
 1788       stream->msg = "too many secondary compressor types set";
 1789       return XD3_INTERNAL;
 1790     }
 1791 
 1792   stream->code_table_desc = & __rfc3284_code_table_desc;
 1793   stream->code_table_func = xd3_rfc3284_code_table;
 1794 
 1795   /* Check sprevsz */
 1796   if (smatcher->small_chain == 1 &&
 1797       smatcher->small_lchain == 1)
 1798     {
 1799       stream->sprevsz = 0;
 1800     }
 1801   else
 1802     {
 1803       if ((ret = xd3_check_pow2 (stream->sprevsz, NULL)))
 1804     {
 1805       stream->msg = "sprevsz is required to be a power of two";
 1806       return XD3_INTERNAL;
 1807     }
 1808 
 1809       stream->sprevmask = stream->sprevsz - 1;
 1810     }
 1811 
 1812   /* Default scanner settings. */
 1813 #if XD3_ENCODER
 1814   switch (config->smatch_cfg)
 1815     {
 1816       IF_BUILD_SOFT(case XD3_SMATCH_SOFT:
 1817       {
 1818     *smatcher = config->smatcher_soft;
 1819     smatcher->string_match = __smatcher_soft.string_match;
 1820     smatcher->name = __smatcher_soft.name;
 1821     if (smatcher->large_look  < MIN_MATCH ||
 1822         smatcher->large_step  < 1         ||
 1823         smatcher->small_look  < MIN_MATCH)
 1824       {
 1825         stream->msg = "invalid soft string-match config";
 1826         return XD3_INVALID;
 1827       }
 1828     break;
 1829       })
 1830 
 1831       IF_BUILD_DEFAULT(case XD3_SMATCH_DEFAULT:
 1832             *smatcher = __smatcher_default;
 1833             break;)
 1834       IF_BUILD_SLOW(case XD3_SMATCH_SLOW:
 1835             *smatcher = __smatcher_slow;
 1836             break;)
 1837       IF_BUILD_FASTEST(case XD3_SMATCH_FASTEST:
 1838             *smatcher = __smatcher_fastest;
 1839             break;)
 1840       IF_BUILD_FASTER(case XD3_SMATCH_FASTER:
 1841             *smatcher = __smatcher_faster;
 1842             break;)
 1843       IF_BUILD_FAST(case XD3_SMATCH_FAST:
 1844             *smatcher = __smatcher_fast;
 1845             break;)
 1846     default:
 1847       stream->msg = "invalid string match config type";
 1848       return XD3_INTERNAL;
 1849     }
 1850 
 1851   if (config->smatch_cfg == XD3_SMATCH_DEFAULT &&
 1852       (stream->flags & XD3_COMPLEVEL_MASK) != 0)
 1853     {
 1854       int level = (stream->flags & XD3_COMPLEVEL_MASK) >> XD3_COMPLEVEL_SHIFT;
 1855 
 1856       switch (level)
 1857     {
 1858     case 1:
 1859       IF_BUILD_FASTEST(*smatcher = __smatcher_fastest;
 1860                break;)
 1861     case 2:
 1862       IF_BUILD_FASTER(*smatcher = __smatcher_faster;
 1863                break;)
 1864     case 3: case 4: case 5:
 1865       IF_BUILD_FAST(*smatcher = __smatcher_fast;
 1866             break;)
 1867     case 6:
 1868       IF_BUILD_DEFAULT(*smatcher = __smatcher_default;
 1869                break;)
 1870     default:
 1871       IF_BUILD_SLOW(*smatcher = __smatcher_slow;
 1872             break;)
 1873       IF_BUILD_DEFAULT(*smatcher = __smatcher_default;
 1874                break;)
 1875       IF_BUILD_FAST(*smatcher = __smatcher_fast;
 1876             break;)
 1877       IF_BUILD_FASTER(*smatcher = __smatcher_faster;
 1878             break;)
 1879       IF_BUILD_FASTEST(*smatcher = __smatcher_fastest;
 1880                break;)
 1881     }
 1882     }
 1883 #endif
 1884 
 1885   return 0;
 1886 }
 1887 
 1888 /***********************************************************
 1889  Getblk interface
 1890  ***********************************************************/
 1891 
 1892 inline
 1893 xoff_t xd3_source_eof(const xd3_source *src)
 1894 {
 1895   xoff_t r = (src->max_blkno << src->shiftby) + (xoff_t)src->onlastblk;
 1896   return r;
 1897 }
 1898 
 1899 inline
 1900 usize_t xd3_bytes_on_srcblk (xd3_source *src, xoff_t blkno)
 1901 {
 1902   usize_t r = (blkno == src->max_blkno ?
 1903            src->onlastblk :
 1904            src->blksize);
 1905   return r;
 1906 }
 1907 
 1908 /* This function interfaces with the client getblk function, checks
 1909  * its results, updates max_blkno, onlastblk, eof_known. */
 1910 static int
 1911 xd3_getblk (xd3_stream *stream, xoff_t blkno)
 1912 {
 1913   int ret;
 1914   xd3_source *source = stream->src;
 1915 
 1916   if (source->curblk == NULL || blkno != source->curblkno)
 1917     {
 1918       source->getblkno = blkno;
 1919 
 1920       if (stream->getblk == NULL)
 1921     {
 1922       IF_DEBUG2 (DP(RINT "[getblk] XD3_GETSRCBLK %"Q"u\n", blkno));
 1923       stream->msg = "getblk source input";
 1924       return XD3_GETSRCBLK;
 1925     }
 1926 
 1927       ret = stream->getblk (stream, source, blkno);
 1928       if (ret != 0)
 1929     {
 1930       IF_DEBUG2 (DP(RINT "[getblk] app error blkno %"Q"u: %s\n",
 1931             blkno, xd3_strerror (ret)));
 1932       return ret;
 1933     }
 1934 
 1935       IF_DEBUG2 (DP(RINT "[getblk] read source block %"Q"u onblk "
 1936             "%u blksize %u max_blkno %"Q"u\n", blkno, source->onblk,
 1937             source->blksize, source->max_blkno));
 1938     }
 1939 
 1940   if (blkno > source->max_blkno)
 1941     {
 1942       source->max_blkno = blkno;
 1943 
 1944       if (source->onblk == source->blksize)
 1945     {
 1946       IF_DEBUG1 (DP(RINT "[getblk] full source blkno %"Q"u: "
 1947             "source length unknown %"Q"u\n",
 1948             blkno,
 1949             xd3_source_eof (source)));
 1950     }
 1951       else if (!source->eof_known)
 1952     {
 1953       IF_DEBUG1 (DP(RINT "[getblk] eof block has %d bytes; "
 1954             "source length known %"Q"u\n",
 1955             xd3_bytes_on_srcblk (source, blkno),
 1956             xd3_source_eof (source)));
 1957       source->eof_known = 1;
 1958     }
 1959     }
 1960 
 1961   XD3_ASSERT (source->curblk != NULL);
 1962 
 1963   if (blkno == source->max_blkno)
 1964     {
 1965       /* In case the application sets the source as 1 block w/ a
 1966        * preset buffer. */
 1967       source->onlastblk = source->onblk;
 1968     }
 1969   return 0;
 1970 }
 1971 
 1972 /***********************************************************
 1973  Stream open/close
 1974  ***************************************************************/
 1975 
 1976 int
 1977 xd3_set_source (xd3_stream *stream,
 1978         xd3_source *src)
 1979 {
 1980   usize_t shiftby;
 1981 
 1982   stream->src = src;
 1983   src->srclen  = 0;
 1984   src->srcbase = 0;
 1985 
 1986   /* Enforce power-of-two blocksize so that source-block number
 1987    * calculations are cheap. */
 1988   if (xd3_check_pow2 (src->blksize, &shiftby) != 0)
 1989     {
 1990       src->blksize = xd3_pow2_roundup(src->blksize);
 1991       xd3_check_pow2 (src->blksize, &shiftby);
 1992       IF_DEBUG1 (DP(RINT "raising src_blksz to %u\n", src->blksize));
 1993     }
 1994 
 1995   src->shiftby = shiftby;
 1996   src->maskby = (1 << shiftby) - 1;
 1997 
 1998   if (xd3_check_pow2 (src->max_winsize, NULL) != 0)
 1999     {
 2000       src->max_winsize = xd3_xoff_roundup(src->max_winsize);
 2001       IF_DEBUG1 (DP(RINT "raising src_maxsize to %u\n", src->blksize));
 2002     }
 2003   src->max_winsize = xd3_max (src->max_winsize, XD3_ALLOCSIZE);
 2004   return 0;
 2005 }
 2006 
 2007 int
 2008 xd3_set_source_and_size (xd3_stream *stream,
 2009              xd3_source *user_source,
 2010              xoff_t source_size) {
 2011   int ret = xd3_set_source (stream, user_source);
 2012   if (ret == 0)
 2013     {
 2014       stream->src->eof_known = 1;
 2015 
 2016       xd3_blksize_div(source_size,
 2017               stream->src,
 2018               &stream->src->max_blkno,
 2019               &stream->src->onlastblk);
 2020 
 2021       IF_DEBUG1 (DP(RINT "[set source] size known %"Q"u max_blkno %"Q"u\n",
 2022             source_size, stream->src->max_blkno));
 2023     }
 2024   return ret;
 2025 }
 2026 
 2027 void
 2028 xd3_abort_stream (xd3_stream *stream)
 2029 {
 2030   stream->dec_state = DEC_ABORTED;
 2031   stream->enc_state = ENC_ABORTED;
 2032 }
 2033 
 2034 int
 2035 xd3_close_stream (xd3_stream *stream)
 2036 {
 2037   if (stream->enc_state != 0 && stream->enc_state != ENC_ABORTED)
 2038     {
 2039       if (stream->buf_leftover != NULL)
 2040     {
 2041       stream->msg = "encoding is incomplete";
 2042       return XD3_INTERNAL;
 2043     }
 2044 
 2045       if (stream->enc_state == ENC_POSTWIN)
 2046     {
 2047 #if XD3_ENCODER
 2048       xd3_encode_reset (stream);
 2049 #endif
 2050       stream->current_window += 1;
 2051       stream->enc_state = ENC_INPUT;
 2052     }
 2053 
 2054       /* If encoding, should be ready for more input but not actually
 2055      have any. */
 2056       if (stream->enc_state != ENC_INPUT || stream->avail_in != 0)
 2057     {
 2058       stream->msg = "encoding is incomplete";
 2059       return XD3_INTERNAL;
 2060     }
 2061     }
 2062   else
 2063     {
 2064       switch (stream->dec_state)
 2065     {
 2066     case DEC_VCHEAD:
 2067     case DEC_WININD:
 2068       /* TODO: Address the zero-byte ambiguity.  Does the encoder
 2069        * emit a window or not?  If so, then catch an error here.
 2070        * If not, need another routine to say
 2071        * decode_at_least_one_if_empty. */
 2072     case DEC_ABORTED:
 2073       break;
 2074     default:
 2075       /* If decoding, should be ready for the next window. */
 2076       stream->msg = "eof in decode";
 2077       return XD3_INVALID_INPUT;
 2078     }
 2079     }
 2080 
 2081   return 0;
 2082 }
 2083 
 2084 /**************************************************************
 2085  Application header
 2086  ****************************************************************/
 2087 
 2088 int
 2089 xd3_get_appheader (xd3_stream  *stream,
 2090            uint8_t    **data,
 2091            usize_t      *size)
 2092 {
 2093   if (stream->dec_state < DEC_WININD)
 2094     {
 2095       stream->msg = "application header not available";
 2096       return XD3_INTERNAL;
 2097     }
 2098 
 2099   (*data) = stream->dec_appheader;
 2100   (*size) = stream->dec_appheadsz;
 2101   return 0;
 2102 }
 2103 
 2104 /**********************************************************
 2105  Decoder stuff
 2106  *************************************************/
 2107 
 2108 #include "xdelta3-decode.h"
 2109 
 2110 /****************************************************************
 2111  Encoder stuff
 2112  *****************************************************************/
 2113 
 2114 #if XD3_ENCODER
 2115 void
 2116 xd3_set_appheader (xd3_stream    *stream,
 2117            const uint8_t *data,
 2118            usize_t         size)
 2119 {
 2120   stream->enc_appheader = data;
 2121   stream->enc_appheadsz = size;
 2122 }
 2123 
 2124 #if XD3_DEBUG
 2125 static int
 2126 xd3_iopt_check (xd3_stream *stream)
 2127 {
 2128   usize_t ul = xd3_rlist_length (& stream->iopt_used);
 2129   usize_t fl = xd3_rlist_length (& stream->iopt_free);
 2130 
 2131   return (ul + fl + (stream->iout ? 1 : 0)) == stream->iopt_size;
 2132 }
 2133 #endif
 2134 
 2135 static xd3_rinst*
 2136 xd3_iopt_free (xd3_stream *stream, xd3_rinst *i)
 2137 {
 2138   xd3_rinst *n = xd3_rlist_remove (i);
 2139   xd3_rlist_push_back (& stream->iopt_free, i);
 2140   return n;
 2141 }
 2142 
 2143 static void
 2144 xd3_iopt_free_nonadd (xd3_stream *stream, xd3_rinst *i)
 2145 {
 2146   if (i->type != XD3_ADD)
 2147     {
 2148       xd3_rlist_push_back (& stream->iopt_free, i);
 2149     }
 2150 }
 2151 
 2152 /* When an instruction is ready to flush from the iopt buffer, this
 2153  * function is called to produce an encoding.  It writes the
 2154  * instruction plus size, address, and data to the various encoding
 2155  * sections. */
 2156 static int
 2157 xd3_iopt_finish_encoding (xd3_stream *stream, xd3_rinst *inst)
 2158 {
 2159   int ret;
 2160 
 2161   /* Check for input overflow. */
 2162   XD3_ASSERT (inst->pos + inst->size <= stream->avail_in);
 2163 
 2164   switch (inst->type)
 2165     {
 2166     case XD3_CPY:
 2167       {
 2168     /* the address may have an offset if there is a source window. */
 2169     usize_t addr;
 2170     xd3_source *src = stream->src;
 2171 
 2172     if (src != NULL)
 2173       {
 2174         /* If there is a source copy, the source must have its
 2175          * source window decided before we can encode.  This can
 2176          * be bad -- we have to make this decision even if no
 2177          * source matches have been found. */
 2178         if (stream->srcwin_decided == 0)
 2179           {
 2180         if ((ret = xd3_srcwin_setup (stream))) { return ret; }
 2181           }
 2182         else
 2183           {
 2184         stream->srcwin_decided_early = (!stream->src->eof_known ||
 2185                         (stream->srcwin_cksum_pos <
 2186                          xd3_source_eof (stream->src)));
 2187           }
 2188 
 2189         /* xtra field indicates the copy is from the source */
 2190         if (inst->xtra)
 2191           {
 2192         XD3_ASSERT (inst->addr >= src->srcbase);
 2193         XD3_ASSERT (inst->addr + inst->size <=
 2194                 src->srcbase + src->srclen);
 2195         addr = (usize_t)(inst->addr - src->srcbase);
 2196         stream->n_scpy += 1;
 2197         stream->l_scpy += (xoff_t) inst->size;
 2198           }
 2199         else
 2200           {
 2201         /* with source window: target copy address is offset
 2202          * by taroff. */
 2203         addr = stream->taroff + (usize_t) inst->addr;
 2204         stream->n_tcpy += 1;
 2205         stream->l_tcpy += (xoff_t) inst->size;
 2206           }
 2207       }
 2208     else
 2209       {
 2210         addr = (usize_t) inst->addr;
 2211         stream->n_tcpy += 1;
 2212         stream->l_tcpy += inst->size;
 2213       }
 2214 
 2215     /* Note: used to assert inst->size >= MIN_MATCH, but not true
 2216      * for merge operations & identical match heuristics. */
 2217     /* the "here" position is always offset by taroff */
 2218     if ((ret = xd3_encode_address (stream, addr, inst->pos + stream->taroff,
 2219                        & inst->type)))
 2220       {
 2221         return ret;
 2222       }
 2223 
 2224     IF_DEBUG2 ({
 2225       static int cnt;
 2226       DP(RINT "[iopt copy:%d] pos %"Q"u-%"Q"u addr %"Q"u-%"Q"u size %u\n",
 2227            cnt++,
 2228            stream->total_in + inst->pos,
 2229            stream->total_in + inst->pos + inst->size,
 2230            inst->addr, inst->addr + inst->size, inst->size);
 2231     });
 2232     break;
 2233       }
 2234     case XD3_RUN:
 2235       {
 2236     if ((ret = xd3_emit_byte (stream, & DATA_TAIL (stream), inst->xtra))) { return ret; }
 2237 
 2238     stream->n_run += 1;
 2239     stream->l_run += inst->size;
 2240 
 2241     IF_DEBUG2 ({
 2242       static int cnt;
 2243       DP(RINT "[iopt run:%d] pos %"Q"u size %u\n", cnt++, stream->total_in + inst->pos, inst->size);
 2244     });
 2245     break;
 2246       }
 2247     case XD3_ADD:
 2248       {
 2249     if ((ret = xd3_emit_bytes (stream, & DATA_TAIL (stream),
 2250                    stream->next_in + inst->pos, inst->size))) { return ret; }
 2251 
 2252     stream->n_add += 1;
 2253     stream->l_add += inst->size;
 2254 
 2255     IF_DEBUG2 ({
 2256       static int cnt;
 2257       DP(RINT "[iopt add:%d] pos %"Q"u size %u\n", cnt++, stream->total_in + inst->pos, inst->size);
 2258     });
 2259 
 2260     break;
 2261       }
 2262     }
 2263 
 2264   /* This is the only place stream->unencoded_offset is incremented. */
 2265   XD3_ASSERT (stream->unencoded_offset == inst->pos);
 2266   stream->unencoded_offset += inst->size;
 2267 
 2268   inst->code2 = 0;
 2269 
 2270   XD3_CHOOSE_INSTRUCTION (stream, stream->iout, inst);
 2271 
 2272   if (stream->iout != NULL)
 2273     {
 2274       if (stream->iout->code2 != 0)
 2275     {
 2276       if ((ret = xd3_emit_double (stream, stream->iout, inst, stream->iout->code2))) { return ret; }
 2277 
 2278       xd3_iopt_free_nonadd (stream, stream->iout);
 2279       xd3_iopt_free_nonadd (stream, inst);
 2280       stream->iout = NULL;
 2281       return 0;
 2282     }
 2283       else
 2284     {
 2285       if ((ret = xd3_emit_single (stream, stream->iout, stream->iout->code1))) { return ret; }
 2286 
 2287       xd3_iopt_free_nonadd (stream, stream->iout);
 2288     }
 2289     }
 2290 
 2291   stream->iout = inst;
 2292 
 2293   return 0;
 2294 }
 2295 
 2296 /* This possibly encodes an add instruction, iadd, which must remain
 2297  * on the stack until the following call to
 2298  * xd3_iopt_finish_encoding. */
 2299 static int
 2300 xd3_iopt_add (xd3_stream *stream, usize_t pos, xd3_rinst *iadd)
 2301 {
 2302   int ret;
 2303   usize_t off = stream->unencoded_offset;
 2304 
 2305   if (pos > off)
 2306     {
 2307       iadd->type = XD3_ADD;
 2308       iadd->pos  = off;
 2309       iadd->size = pos - off;
 2310 
 2311       if ((ret = xd3_iopt_finish_encoding (stream, iadd))) { return ret; }
 2312     }
 2313 
 2314   return 0;
 2315 }
 2316 
 2317 /* This function calls xd3_iopt_finish_encoding to finish encoding an
 2318  * instruction, and it may also produce an add instruction for an
 2319  * unmatched region. */
 2320 static int
 2321 xd3_iopt_add_encoding (xd3_stream *stream, xd3_rinst *inst)
 2322 {
 2323   int ret;
 2324   xd3_rinst iadd;
 2325 
 2326   if ((ret = xd3_iopt_add (stream, inst->pos, & iadd))) { return ret; }
 2327 
 2328   if ((ret = xd3_iopt_finish_encoding (stream, inst))) { return ret; }
 2329 
 2330   return 0;
 2331 }
 2332 
 2333 /* Generates a final add instruction to encode the remaining input. */
 2334 static int
 2335 xd3_iopt_add_finalize (xd3_stream *stream)
 2336 {
 2337   int ret;
 2338   xd3_rinst iadd;
 2339 
 2340   if ((ret = xd3_iopt_add (stream, stream->avail_in, & iadd))) { return ret; }
 2341 
 2342   if (stream->iout)
 2343     {
 2344       if ((ret = xd3_emit_single (stream, stream->iout, stream->iout->code1))) { return ret; }
 2345 
 2346       xd3_iopt_free_nonadd (stream, stream->iout);
 2347       stream->iout = NULL;
 2348     }
 2349 
 2350   return 0;
 2351 }
 2352 
 2353 /* Compact the instruction buffer by choosing the best non-overlapping
 2354  * instructions when lazy string-matching.  There are no ADDs in the
 2355  * iopt buffer because those are synthesized in xd3_iopt_add_encoding
 2356  * and during xd3_iopt_add_finalize. */
 2357 static int
 2358 xd3_iopt_flush_instructions (xd3_stream *stream, int force)
 2359 {
 2360   xd3_rinst *r1 = xd3_rlist_front (& stream->iopt_used);
 2361   xd3_rinst *r2;
 2362   xd3_rinst *r3;
 2363   usize_t r1end;
 2364   usize_t r2end;
 2365   usize_t r2off;
 2366   usize_t r2moff;
 2367   usize_t gap;
 2368   usize_t flushed;
 2369   int ret;
 2370 
 2371   XD3_ASSERT (xd3_iopt_check (stream));
 2372 
 2373   /* Note: once tried to skip this step if it's possible to assert
 2374    * there are no overlapping instructions.  Doesn't work because
 2375    * xd3_opt_erase leaves overlapping instructions. */
 2376   while (! xd3_rlist_end (& stream->iopt_used, r1) &&
 2377      ! xd3_rlist_end (& stream->iopt_used, r2 = xd3_rlist_next (r1)))
 2378     {
 2379       r1end = r1->pos + r1->size;
 2380 
 2381       /* If the instructions do not overlap, continue. */
 2382       if (r1end <= r2->pos)
 2383     {
 2384       r1 = r2;
 2385       continue;
 2386     }
 2387 
 2388       r2end = r2->pos + r2->size;
 2389 
 2390       /* The min_match adjustments prevent this. */
 2391       XD3_ASSERT (r2end > (r1end + LEAST_MATCH_INCR));
 2392 
 2393       /* If r3 is available... */
 2394       if (! xd3_rlist_end (& stream->iopt_used, r3 = xd3_rlist_next (r2)))
 2395     {
 2396       /* If r3 starts before r1 finishes or just about, r2 is irrelevant */
 2397       if (r3->pos <= r1end + 1)
 2398         {
 2399           xd3_iopt_free (stream, r2);
 2400           continue;
 2401         }
 2402     }
 2403       else if (! force)
 2404     {
 2405       /* Unless force, end the loop when r3 is not available. */
 2406       break;
 2407     }
 2408 
 2409       r2off  = r2->pos - r1->pos;
 2410       r2moff = r2end - r1end;
 2411       gap    = r2end - r1->pos;
 2412 
 2413       /* If the two matches overlap almost entirely, choose the better match
 2414        * and discard the other.  The else branch can still create inefficient
 2415        * copies, e.g., a 4-byte copy that takes 4 bytes to encode, which
 2416        * xd3_smatch() wouldn't allow by its crude efficiency check.  However,
 2417        * in this case there are adjacent copies which mean the add would cost
 2418        * one extra byte.  Allow the inefficiency here. */
 2419       if (gap < 2*MIN_MATCH || r2moff <= 2 || r2off <= 2)
 2420     {
 2421       /* Only one match should be used, choose the longer one. */
 2422       if (r1->size < r2->size)
 2423         {
 2424           xd3_iopt_free (stream, r1);
 2425           r1 = r2;
 2426         }
 2427       else
 2428         {
 2429           /* We are guaranteed that r1 does not overlap now, so advance past r2 */
 2430           r1 = xd3_iopt_free (stream, r2);
 2431         }
 2432       continue;
 2433     }
 2434       else
 2435     {
 2436       /* Shorten one of the instructions -- could be optimized
 2437        * based on the address cache. */
 2438       usize_t average;
 2439       usize_t newsize;
 2440       usize_t adjust1;
 2441 
 2442       XD3_ASSERT (r1end > r2->pos && r2end > r1->pos);
 2443 
 2444       /* Try to balance the length of both instructions, but avoid
 2445        * making both longer than MAX_MATCH_SPLIT . */
 2446       average = gap / 2;
 2447       newsize = xd3_min (MAX_MATCH_SPLIT, gap - average);
 2448 
 2449       /* Should be possible to simplify this code. */
 2450       if (newsize > r1->size)
 2451         {
 2452           /* shorten r2 */
 2453           adjust1 = r1end - r2->pos;
 2454         }
 2455       else if (newsize > r2->size)
 2456         {
 2457           /* shorten r1 */
 2458           adjust1 = r1end - r2->pos;
 2459 
 2460           XD3_ASSERT (r1->size > adjust1);
 2461 
 2462           r1->size -= adjust1;
 2463 
 2464           /* don't shorten r2 */
 2465           adjust1 = 0;
 2466         }
 2467       else
 2468         {
 2469           /* shorten r1 */
 2470           adjust1 = r1->size - newsize;
 2471 
 2472           if (r2->pos > r1end - adjust1)
 2473         {
 2474           adjust1 -= r2->pos - (r1end - adjust1);
 2475         }
 2476 
 2477           XD3_ASSERT (r1->size > adjust1);
 2478 
 2479           r1->size -= adjust1;
 2480 
 2481           /* shorten r2 */
 2482           XD3_ASSERT (r1->pos + r1->size >= r2->pos);
 2483 
 2484           adjust1 = r1->pos + r1->size - r2->pos;
 2485         }
 2486 
 2487       /* Fallthrough above if-else, shorten r2 */
 2488       XD3_ASSERT (r2->size > adjust1);
 2489 
 2490       r2->size -= adjust1;
 2491       r2->pos  += adjust1;
 2492       r2->addr += adjust1;
 2493 
 2494       XD3_ASSERT (r1->size >= MIN_MATCH);
 2495       XD3_ASSERT (r2->size >= MIN_MATCH);
 2496 
 2497       r1 = r2;
 2498     }
 2499     }
 2500 
 2501   XD3_ASSERT (xd3_iopt_check (stream));
 2502 
 2503   /* If forcing, pick instructions until the list is empty, otherwise
 2504    * this empties 50% of the queue. */
 2505   for (flushed = 0; ! xd3_rlist_empty (& stream->iopt_used); )
 2506     {
 2507       xd3_rinst *renc = xd3_rlist_pop_front (& stream->iopt_used);
 2508       if ((ret = xd3_iopt_add_encoding (stream, renc)))
 2509     {
 2510       return ret;
 2511     }
 2512 
 2513       if (! force)
 2514     {
 2515       if (++flushed > stream->iopt_size / 2)
 2516         {
 2517           break;
 2518         }
 2519 
 2520       /* If there are only two instructions remaining, break,
 2521        * because they were not optimized.  This means there were
 2522        * more than 50% eliminated by the loop above. */
 2523       r1 = xd3_rlist_front (& stream->iopt_used);
 2524       if (xd3_rlist_end(& stream->iopt_used, r1) ||
 2525           xd3_rlist_end(& stream->iopt_used, r2 = xd3_rlist_next (r1)) ||
 2526           xd3_rlist_end(& stream->iopt_used, r3 = xd3_rlist_next (r2)))
 2527         {
 2528           break;
 2529         }
 2530     }
 2531     }
 2532 
 2533   XD3_ASSERT (xd3_iopt_check (stream));
 2534 
 2535   XD3_ASSERT (!force || xd3_rlist_length (& stream->iopt_used) == 0);
 2536 
 2537   return 0;
 2538 }
 2539 
 2540 static int
 2541 xd3_iopt_get_slot (xd3_stream *stream, xd3_rinst** iptr)
 2542 {
 2543   xd3_rinst *i;
 2544   int ret;
 2545 
 2546   if (xd3_rlist_empty (& stream->iopt_free))
 2547     {
 2548       if (stream->iopt_unlimited)
 2549     {
 2550       usize_t elts = XD3_ALLOCSIZE / sizeof(xd3_rinst);
 2551 
 2552       if ((ret = xd3_alloc_iopt (stream, elts)))
 2553         {
 2554           return ret;
 2555         }
 2556 
 2557       stream->iopt_size += elts;
 2558     }
 2559       else
 2560     {
 2561       if ((ret = xd3_iopt_flush_instructions (stream, 0))) { return ret; }
 2562 
 2563       XD3_ASSERT (! xd3_rlist_empty (& stream->iopt_free));
 2564     }
 2565     }
 2566 
 2567   i = xd3_rlist_pop_back (& stream->iopt_free);
 2568 
 2569   xd3_rlist_push_back (& stream->iopt_used, i);
 2570 
 2571   (*iptr) = i;
 2572 
 2573   ++stream->i_slots_used;
 2574 
 2575   return 0;
 2576 }
 2577 
 2578 /* A copy is about to be emitted that extends backwards to POS,
 2579  * therefore it may completely cover some existing instructions in the
 2580  * buffer.  If an instruction is completely covered by this new match,
 2581  * erase it.  If the new instruction is covered by the previous one,
 2582  * return 1 to skip it. */
 2583 static void
 2584 xd3_iopt_erase (xd3_stream *stream, usize_t pos, usize_t size)
 2585 {
 2586   while (! xd3_rlist_empty (& stream->iopt_used))
 2587     {
 2588       xd3_rinst *r = xd3_rlist_back (& stream->iopt_used);
 2589 
 2590       /* Verify that greedy is working.  The previous instruction
 2591        * should end before the new one begins. */
 2592       XD3_ASSERT ((stream->flags & XD3_BEGREEDY) == 0 || (r->pos + r->size <= pos));
 2593       /* Verify that min_match is working.  The previous instruction
 2594        * should end before the new one ends. */
 2595       XD3_ASSERT ((stream->flags & XD3_BEGREEDY) != 0 || (r->pos + r->size < pos + size));
 2596 
 2597       /* See if the last instruction starts before the new
 2598        * instruction.  If so, there is nothing to erase. */
 2599       if (r->pos < pos)
 2600     {
 2601       return;
 2602     }
 2603 
 2604       /* Otherwise, the new instruction covers the old one, delete it
 2605      and repeat. */
 2606       xd3_rlist_remove (r);
 2607       xd3_rlist_push_back (& stream->iopt_free, r);
 2608       --stream->i_slots_used;
 2609     }
 2610 }
 2611 
 2612 /* This function tells the last matched input position. */
 2613 static usize_t
 2614 xd3_iopt_last_matched (xd3_stream *stream)
 2615 {
 2616   xd3_rinst *r;
 2617 
 2618   if (xd3_rlist_empty (& stream->iopt_used))
 2619     {
 2620       return 0;
 2621     }
 2622 
 2623   r = xd3_rlist_back (& stream->iopt_used);
 2624 
 2625   return r->pos + r->size;
 2626 }
 2627 
 2628 /*********************************************************
 2629  Emit routines
 2630  ***********************************************************/
 2631 
 2632 static int
 2633 xd3_emit_single (xd3_stream *stream, xd3_rinst *single, usize_t code)
 2634 {
 2635   int has_size = stream->code_table[code].size1 == 0;
 2636   int ret;
 2637 
 2638   IF_DEBUG2 (DP(RINT "[emit1] %u %s (%u) code %u\n",
 2639            single->pos,
 2640         xd3_rtype_to_string ((xd3_rtype) single->type, 0),
 2641            single->size,
 2642            code));
 2643 
 2644   if ((ret = xd3_emit_byte (stream, & INST_TAIL (stream), code)))
 2645     {
 2646       return ret;
 2647     }
 2648 
 2649   if (has_size)
 2650     {
 2651       if ((ret = xd3_emit_size (stream, & INST_TAIL (stream), single->size)))
 2652         {
 2653           return ret;
 2654         }
 2655     }
 2656 
 2657   return 0;
 2658 }
 2659 
 2660 static int
 2661 xd3_emit_double (xd3_stream *stream, xd3_rinst *first,
 2662                  xd3_rinst *second, usize_t code)
 2663 {
 2664   int ret;
 2665 
 2666   /* All double instructions use fixed sizes, so all we need to do is
 2667    * output the instruction code, no sizes. */
 2668   XD3_ASSERT (stream->code_table[code].size1 != 0 &&
 2669           stream->code_table[code].size2 != 0);
 2670 
 2671   if ((ret = xd3_emit_byte (stream, & INST_TAIL (stream), code)))
 2672     {
 2673       return ret;
 2674     }
 2675 
 2676   IF_DEBUG2 (DP(RINT "[emit2]: %u %s (%u) %s (%u) code %u\n",
 2677            first->pos,
 2678         xd3_rtype_to_string ((xd3_rtype) first->type, 0),
 2679            first->size,
 2680         xd3_rtype_to_string ((xd3_rtype) second->type, 0),
 2681            second->size,
 2682            code));
 2683 
 2684   return 0;
 2685 }
 2686 
 2687 /* This enters a potential run instruction into the iopt buffer.  The
 2688  * position argument is relative to the target window. */
 2689 static int
 2690 xd3_emit_run (xd3_stream *stream, usize_t pos, usize_t size, uint8_t *run_c)
 2691 {
 2692   xd3_rinst* ri;
 2693   int ret;
 2694 
 2695   if ((ret = xd3_iopt_get_slot (stream, & ri))) { return ret; }
 2696 
 2697   ri->type = XD3_RUN;
 2698   ri->xtra = *run_c;
 2699   ri->pos  = pos;
 2700   ri->size = size;
 2701 
 2702   return 0;
 2703 }
 2704 
 2705 /* This enters a potential copy instruction into the iopt buffer.  The
 2706  * position argument is relative to the target window.. */
 2707 int
 2708 xd3_found_match (xd3_stream *stream, usize_t pos,
 2709          usize_t size, xoff_t addr, int is_source)
 2710 {
 2711   xd3_rinst* ri;
 2712   int ret;
 2713 
 2714   if ((ret = xd3_iopt_get_slot (stream, & ri))) { return ret; }
 2715 
 2716   ri->type = XD3_CPY;
 2717   ri->xtra = is_source;
 2718   ri->pos  = pos;
 2719   ri->size = size;
 2720   ri->addr = addr;
 2721 
 2722   return 0;
 2723 }
 2724 
 2725 static int
 2726 xd3_emit_hdr (xd3_stream *stream)
 2727 {
 2728   int  ret;
 2729   int  use_secondary = stream->sec_type != NULL;
 2730   int  use_adler32   = stream->flags & (XD3_ADLER32 | XD3_ADLER32_RECODE);
 2731   int  vcd_source    = xd3_encoder_used_source (stream);
 2732   usize_t win_ind = 0;
 2733   usize_t del_ind = 0;
 2734   usize_t enc_len;
 2735   usize_t tgt_len;
 2736   usize_t data_len;
 2737   usize_t inst_len;
 2738   usize_t addr_len;
 2739 
 2740   if (stream->current_window == 0)
 2741     {
 2742       usize_t hdr_ind = 0;
 2743       int use_appheader  = stream->enc_appheader != NULL;
 2744 
 2745       if (use_secondary)  { hdr_ind |= VCD_SECONDARY; }
 2746       if (use_appheader)  { hdr_ind |= VCD_APPHEADER; }
 2747 
 2748       if ((ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 2749                 VCDIFF_MAGIC1)) != 0 ||
 2750       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 2751                 VCDIFF_MAGIC2)) != 0 ||
 2752       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 2753                 VCDIFF_MAGIC3)) != 0 ||
 2754       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 2755                 VCDIFF_VERSION)) != 0 ||
 2756       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream), hdr_ind)) != 0)
 2757     {
 2758       return ret;
 2759     }
 2760 
 2761       /* Secondary compressor ID */
 2762 #if SECONDARY_ANY
 2763       if (use_secondary &&
 2764       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 2765                 stream->sec_type->id)))
 2766     {
 2767       return ret;
 2768     }
 2769 #endif
 2770 
 2771       /* Application header */
 2772       if (use_appheader)
 2773     {
 2774       if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream),
 2775                     stream->enc_appheadsz)) ||
 2776           (ret = xd3_emit_bytes (stream, & HDR_TAIL (stream),
 2777                      stream->enc_appheader,
 2778                      stream->enc_appheadsz)))
 2779         {
 2780           return ret;
 2781         }
 2782     }
 2783     }
 2784 
 2785   /* try to compress this window */
 2786 #if SECONDARY_ANY
 2787   if (use_secondary)
 2788     {
 2789       int data_sec = 0;
 2790       int inst_sec = 0;
 2791       int addr_sec = 0;
 2792 
 2793 #     define ENCODE_SECONDARY_SECTION(UPPER,LOWER) \
 2794              ((stream->flags & XD3_SEC_NO ## UPPER) == 0 && \
 2795               (ret = xd3_encode_secondary (stream, \
 2796                        & UPPER ## _HEAD (stream), \
 2797                        & UPPER ## _TAIL (stream), \
 2798                     & xd3_sec_ ## LOWER (stream), \
 2799                         & stream->sec_ ## LOWER, \
 2800                        & LOWER ## _sec)))
 2801 
 2802       if (ENCODE_SECONDARY_SECTION (DATA, data) ||
 2803       ENCODE_SECONDARY_SECTION (INST, inst) ||
 2804       ENCODE_SECONDARY_SECTION (ADDR, addr))
 2805     {
 2806       return ret;
 2807     }
 2808 
 2809       del_ind |= (data_sec ? VCD_DATACOMP : 0);
 2810       del_ind |= (inst_sec ? VCD_INSTCOMP : 0);
 2811       del_ind |= (addr_sec ? VCD_ADDRCOMP : 0);
 2812     }
 2813 #endif
 2814 
 2815   /* if (vcd_target) { win_ind |= VCD_TARGET; } */
 2816   if (vcd_source)  { win_ind |= VCD_SOURCE; }
 2817   if (use_adler32) { win_ind |= VCD_ADLER32; }
 2818 
 2819   /* window indicator */
 2820   if ((ret = xd3_emit_byte (stream, & HDR_TAIL (stream), win_ind)))
 2821     {
 2822       return ret;
 2823     }
 2824 
 2825   /* source window */
 2826   if (vcd_source)
 2827     {
 2828       /* or (vcd_target) { ... } */
 2829       if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream),
 2830                 stream->src->srclen)) ||
 2831       (ret = xd3_emit_offset (stream, & HDR_TAIL (stream),
 2832                   stream->src->srcbase))) { return ret; }
 2833     }
 2834 
 2835   tgt_len  = stream->avail_in;
 2836   data_len = xd3_sizeof_output (DATA_HEAD (stream));
 2837   inst_len = xd3_sizeof_output (INST_HEAD (stream));
 2838   addr_len = xd3_sizeof_output (ADDR_HEAD (stream));
 2839 
 2840   /* The enc_len field is a redundency for future extensions.*/
 2841   enc_len = (1 + (xd3_sizeof_size (tgt_len) +
 2842           xd3_sizeof_size (data_len) +
 2843           xd3_sizeof_size (inst_len) +
 2844           xd3_sizeof_size (addr_len)) +
 2845          data_len +
 2846          inst_len +
 2847          addr_len +
 2848          (use_adler32 ? 4 : 0));
 2849 
 2850   if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream), enc_len)) ||
 2851       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), tgt_len)) ||
 2852       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream), del_ind)) ||
 2853       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), data_len)) ||
 2854       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), inst_len)) ||
 2855       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), addr_len)))
 2856     {
 2857       return ret;
 2858     }
 2859 
 2860   if (use_adler32)
 2861     {
 2862       uint8_t  send[4];
 2863       uint32_t a32;
 2864 
 2865       if (stream->flags & XD3_ADLER32)
 2866     {
 2867       a32 = adler32 (1L, stream->next_in, stream->avail_in);
 2868     }
 2869       else
 2870     {
 2871       a32 = stream->recode_adler32;
 2872     }
 2873 
 2874       /* Four bytes. */
 2875       send[0] = (uint8_t) (a32 >> 24);
 2876       send[1] = (uint8_t) (a32 >> 16);
 2877       send[2] = (uint8_t) (a32 >> 8);
 2878       send[3] = (uint8_t) (a32 & 0x000000FFU);
 2879 
 2880       if ((ret = xd3_emit_bytes (stream, & HDR_TAIL (stream), send, 4)))
 2881     {
 2882       return ret;
 2883     }
 2884     }
 2885 
 2886   return 0;
 2887 }
 2888 
 2889 /****************************************************************
 2890  Encode routines
 2891  ****************************************************************/
 2892 
 2893 static int
 2894 xd3_encode_buffer_leftover (xd3_stream *stream)
 2895 {
 2896   usize_t take;
 2897   usize_t room;
 2898 
 2899   /* Allocate the buffer. */
 2900   if (stream->buf_in == NULL &&
 2901       (stream->buf_in = (uint8_t*) xd3_alloc (stream, stream->winsize, 1)) == NULL)
 2902     {
 2903       return ENOMEM;
 2904     }
 2905 
 2906   IF_DEBUG2 (DP(RINT "[leftover] flush?=%s\n", (stream->flags & XD3_FLUSH) ? "yes" : "no"));
 2907 
 2908   /* Take leftover input first. */
 2909   if (stream->buf_leftover != NULL)
 2910     {
 2911       XD3_ASSERT (stream->buf_avail == 0);
 2912       XD3_ASSERT (stream->buf_leftavail < stream->winsize);
 2913 
 2914       IF_DEBUG2 (DP(RINT "[leftover] previous %u avail %u\n", stream->buf_leftavail, stream->avail_in));
 2915 
 2916       memcpy (stream->buf_in, stream->buf_leftover, stream->buf_leftavail);
 2917 
 2918       stream->buf_leftover = NULL;
 2919       stream->buf_avail    = stream->buf_leftavail;
 2920     }
 2921 
 2922   /* Copy into the buffer. */
 2923   room = stream->winsize - stream->buf_avail;
 2924   take = xd3_min (room, stream->avail_in);
 2925 
 2926   memcpy (stream->buf_in + stream->buf_avail, stream->next_in, take);
 2927 
 2928   stream->buf_avail += take;
 2929 
 2930   if (take < stream->avail_in)
 2931     {
 2932       /* Buffer is full */
 2933       stream->buf_leftover  = stream->next_in  + take;
 2934       stream->buf_leftavail = stream->avail_in - take;
 2935     }
 2936   else if ((stream->buf_avail < stream->winsize) && !(stream->flags & XD3_FLUSH))
 2937     {
 2938       /* Buffer has space */
 2939       IF_DEBUG2 (DP(RINT "[leftover] emptied %u\n", take));
 2940       return XD3_INPUT;
 2941     }
 2942 
 2943   /* Use the buffer: */
 2944   IF_DEBUG2 (DP(RINT "[leftover] take %u remaining %u\n", take, stream->buf_leftavail));
 2945   stream->next_in   = stream->buf_in;
 2946   stream->avail_in  = stream->buf_avail;
 2947   stream->buf_avail = 0;
 2948 
 2949   return 0;
 2950 }
 2951 
 2952 /* Allocates one block of xd3_rlist elements */
 2953 static int
 2954 xd3_alloc_iopt (xd3_stream *stream, usize_t elts)
 2955 {
 2956   usize_t i;
 2957   xd3_iopt_buflist* last =
 2958     (xd3_iopt_buflist*) xd3_alloc (stream, sizeof (xd3_iopt_buflist), 1);
 2959 
 2960   if (last == NULL ||
 2961       (last->buffer = (xd3_rinst*) xd3_alloc (stream, sizeof (xd3_rinst), elts)) == NULL)
 2962     {
 2963       return ENOMEM;
 2964     }
 2965 
 2966   last->next = stream->iopt_alloc;
 2967   stream->iopt_alloc = last;
 2968 
 2969   for (i = 0; i < elts; i += 1)
 2970     {
 2971       xd3_rlist_push_back (& stream->iopt_free, & last->buffer[i]);
 2972     }
 2973 
 2974   return 0;
 2975 }
 2976 
 2977 /* This function allocates all memory initially used by the encoder. */
 2978 static int
 2979 xd3_encode_init (xd3_stream *stream, int full_init)
 2980 {
 2981   int i;
 2982 
 2983   if (full_init)
 2984     {
 2985       int large_comp = (stream->src != NULL);
 2986       int small_comp = ! (stream->flags & XD3_NOCOMPRESS);
 2987 
 2988       /* Memory allocations for checksum tables are delayed until
 2989        * xd3_string_match_init in the first call to string_match--that way
 2990        * identical or short inputs require no table allocation. */
 2991       if (large_comp)
 2992     {
 2993       usize_t hash_values = stream->src->max_winsize /
 2994                             stream->smatcher.large_step;
 2995 
 2996       xd3_size_hashtable (stream,
 2997                   hash_values,
 2998                   & stream->large_hash);
 2999     }
 3000 
 3001       if (small_comp)
 3002     {
 3003       /* TODO: This is under devel: used to have min (sprevsz) here, which sort
 3004        * of makes sense, but observed fast performance w/ larger tables, which
 3005        * also sort of makes sense. @@@ */
 3006       usize_t hash_values = stream->winsize;
 3007 
 3008       xd3_size_hashtable (stream,
 3009                   hash_values,
 3010                   & stream->small_hash);
 3011     }
 3012     }
 3013 
 3014   /* data buffers */
 3015   for (i = 0; i < ENC_SECTS; i += 1)
 3016     {
 3017       if ((stream->enc_heads[i] =
 3018        stream->enc_tails[i] =
 3019        xd3_alloc_output (stream, NULL)) == NULL)
 3020     {
 3021       return ENOMEM;
 3022     }
 3023     }
 3024 
 3025   /* iopt buffer */
 3026   xd3_rlist_init (& stream->iopt_used);
 3027   xd3_rlist_init (& stream->iopt_free);
 3028 
 3029   if (xd3_alloc_iopt (stream, stream->iopt_size) != 0) { goto fail; }
 3030 
 3031   XD3_ASSERT (xd3_rlist_length (& stream->iopt_free) == stream->iopt_size);
 3032   XD3_ASSERT (xd3_rlist_length (& stream->iopt_used) == 0);
 3033 
 3034   /* address cache, code table */
 3035   stream->acache.s_near = stream->code_table_desc->near_modes;
 3036   stream->acache.s_same = stream->code_table_desc->same_modes;
 3037   stream->code_table    = stream->code_table_func ();
 3038 
 3039   return xd3_alloc_cache (stream);
 3040 
 3041  fail:
 3042 
 3043   return ENOMEM;
 3044 }
 3045 
 3046 int
 3047 xd3_encode_init_full (xd3_stream *stream)
 3048 {
 3049   return xd3_encode_init (stream, 1);
 3050 }
 3051 
 3052 int
 3053 xd3_encode_init_partial (xd3_stream *stream)
 3054 {
 3055   return xd3_encode_init (stream, 0);
 3056 }
 3057 
 3058 /* Called after the ENC_POSTOUT state, this puts the output buffers
 3059  * back into separate lists and re-initializes some variables.  (The
 3060  * output lists were spliced together during the ENC_FLUSH state.) */
 3061 static void
 3062 xd3_encode_reset (xd3_stream *stream)
 3063 {
 3064   int i;
 3065   xd3_output *olist;
 3066 
 3067   stream->avail_in     = 0;
 3068   stream->small_reset  = 1;
 3069   stream->i_slots_used = 0;
 3070 
 3071   if (stream->src != NULL)
 3072     {
 3073       stream->src->srcbase   = 0;
 3074       stream->src->srclen    = 0;
 3075       stream->srcwin_decided = 0;
 3076       stream->srcwin_decided_early = 0;
 3077       stream->match_minaddr  = 0;
 3078       stream->match_maxaddr  = 0;
 3079       stream->taroff         = 0;
 3080     }
 3081 
 3082   /* Reset output chains. */
 3083   olist = stream->enc_heads[0];
 3084 
 3085   for (i = 0; i < ENC_SECTS; i += 1)
 3086     {
 3087       XD3_ASSERT (olist != NULL);
 3088 
 3089       stream->enc_heads[i] = olist;
 3090       stream->enc_tails[i] = olist;
 3091       olist = olist->next_page;
 3092 
 3093       stream->enc_heads[i]->next = 0;
 3094       stream->enc_heads[i]->next_page = NULL;
 3095 
 3096       stream->enc_tails[i]->next_page = NULL;
 3097       stream->enc_tails[i] = stream->enc_heads[i];
 3098     }
 3099 
 3100   xd3_freelist_output (stream, olist);
 3101 }
 3102 
 3103 /* The main encoding routine. */
 3104 int
 3105 xd3_encode_input (xd3_stream *stream)
 3106 {
 3107   int ret, i;
 3108 
 3109   if (stream->dec_state != 0)
 3110     {
 3111       stream->msg = "encoder/decoder transition";
 3112       return XD3_INTERNAL;
 3113     }
 3114 
 3115   switch (stream->enc_state)
 3116     {
 3117     case ENC_INIT:
 3118       /* Only reached on first time through: memory setup. */
 3119       if ((ret = xd3_encode_init_full (stream))) { return ret; }
 3120 
 3121       stream->enc_state = ENC_INPUT;
 3122 
 3123     case ENC_INPUT:
 3124 
 3125       /* If there is no input yet, just return.  This checks for
 3126        * next_in == NULL, not avail_in == 0 since zero bytes is a
 3127        * valid input.  There is an assertion in xd3_avail_input() that
 3128        * next_in != NULL for this reason.  By returning right away we
 3129        * avoid creating an input buffer before the caller has supplied
 3130        * its first data.  It is possible for xd3_avail_input to be
 3131        * called both before and after the first call to
 3132        * xd3_encode_input(). */
 3133       if (stream->next_in == NULL)
 3134     {
 3135       return XD3_INPUT;
 3136     }
 3137 
 3138     enc_flush:
 3139       /* See if we should buffer the input: either if there is already
 3140        * a leftover buffer, or if the input is short of winsize
 3141        * without flush.  The label at this point is reached by a goto
 3142        * below, when there is leftover input after postout. */
 3143       if ((stream->buf_leftover != NULL) ||
 3144       (stream->buf_avail != 0) ||
 3145       (stream->avail_in < stream->winsize && ! (stream->flags & XD3_FLUSH)))
 3146     {
 3147       if ((ret = xd3_encode_buffer_leftover (stream))) { return ret; }
 3148     }
 3149 
 3150       /* Initalize the address cache before each window. */
 3151       xd3_init_cache (& stream->acache);
 3152 
 3153       stream->input_position    = 0;
 3154       stream->min_match = MIN_MATCH;
 3155       stream->unencoded_offset = 0;
 3156 
 3157       stream->enc_state = ENC_SEARCH;
 3158 
 3159       IF_DEBUG2 (DP(RINT "[WINSTART:%"Q"u] input bytes %u offset %"Q"u\n",
 3160             stream->current_window, stream->avail_in,
 3161             stream->total_in));
 3162       return XD3_WINSTART;
 3163 
 3164     case ENC_SEARCH:
 3165       IF_DEBUG2 (DP(RINT "[SEARCH] match_state %d avail_in %u %s\n",
 3166             stream->match_state, stream->avail_in,
 3167             stream->src ? "source" : "no source"));
 3168 
 3169       /* Reentrant matching. */
 3170       if (stream->src != NULL)
 3171     {
 3172       switch (stream->match_state)
 3173         {
 3174         case MATCH_TARGET:
 3175           /* Try matching forward at the start of the target.
 3176            * This is entered the first time through, to check for
 3177            * a perfect match, and whenever there is a source match
 3178            * that extends to the end of the previous window.  The
 3179            * match_srcpos field is initially zero and later set
 3180            * during xd3_source_extend_match. */
 3181 
 3182           if (stream->avail_in > 0)
 3183         {
 3184           /* This call can't fail because the source window is
 3185            * unrestricted. */
 3186           ret = xd3_source_match_setup (stream, stream->match_srcpos);
 3187           XD3_ASSERT (ret == 0);
 3188           stream->match_state = MATCH_FORWARD;
 3189         }
 3190           else
 3191         {
 3192           stream->match_state = MATCH_SEARCHING;
 3193           stream->match_fwd = 0;
 3194         }
 3195           XD3_ASSERT (stream->match_fwd == 0);
 3196 
 3197         case MATCH_FORWARD:
 3198         case MATCH_BACKWARD:
 3199           if (stream->avail_in != 0)
 3200         {
 3201           if ((ret = xd3_source_extend_match (stream)) != 0)
 3202             {
 3203               return ret;
 3204             }
 3205 
 3206           /* The search has to make forward progress here
 3207            * or else it can get stuck in a match-backward
 3208            * (getsrcblk) then match-forward (getsrcblk),
 3209            * find insufficient match length, then repeat
 3210 
 3211            * exactly the same search.
 3212            */
 3213           stream->input_position += stream->match_fwd;
 3214         }
 3215 
 3216         case MATCH_SEARCHING:
 3217           /* Continue string matching.  (It's possible that the
 3218            * initial match continued through the entire input, in
 3219            * which case we're still in MATCH_FORWARD and should
 3220            * remain so for the next input window.) */
 3221           break;
 3222         }
 3223     }
 3224 
 3225       /* String matching... */
 3226       if (stream->avail_in != 0 &&
 3227       (ret = stream->smatcher.string_match (stream)))
 3228     {
 3229       return ret;
 3230     }
 3231 
 3232       stream->enc_state = ENC_INSTR;
 3233 
 3234     case ENC_INSTR:
 3235       /* Note: Jump here to encode VCDIFF deltas w/o using this
 3236        * string-matching code.  Merging code enters here. */
 3237 
 3238       /* Flush the instrution buffer, then possibly add one more
 3239        * instruction, then emit the header. */
 3240       if ((ret = xd3_iopt_flush_instructions (stream, 1)) ||
 3241           (ret = xd3_iopt_add_finalize (stream)))
 3242     {
 3243       return ret;
 3244     }
 3245 
 3246       stream->enc_state = ENC_FLUSH;
 3247 
 3248     case ENC_FLUSH:
 3249       /* Note: main_recode_func() bypasses string-matching by setting
 3250        * ENC_FLUSH. */
 3251       if ((ret = xd3_emit_hdr (stream)))
 3252     {
 3253       return ret;
 3254     }
 3255 
 3256       /* Begin output. */
 3257       stream->enc_current = HDR_HEAD (stream);
 3258 
 3259       /* Chain all the outputs together.  After doing this, it looks
 3260        * as if there is only one section.  The other enc_heads are set
 3261        * to NULL to avoid freeing them more than once. */
 3262        for (i = 1; i < ENC_SECTS; i += 1)
 3263     {
 3264       stream->enc_tails[i-1]->next_page = stream->enc_heads[i];
 3265       stream->enc_heads[i] = NULL;
 3266     }
 3267 
 3268     enc_output:
 3269 
 3270       stream->enc_state  = ENC_POSTOUT;
 3271       stream->next_out   = stream->enc_current->base;
 3272       stream->avail_out  = stream->enc_current->next;
 3273       stream->total_out += (xoff_t) stream->avail_out;
 3274 
 3275       /* If there is any output in this buffer, return it, otherwise
 3276        * fall through to handle the next buffer or finish the window
 3277        * after all buffers have been output. */
 3278       if (stream->avail_out > 0)
 3279     {
 3280       /* This is the only place xd3_encode returns XD3_OUTPUT */
 3281       return XD3_OUTPUT;
 3282     }
 3283 
 3284     case ENC_POSTOUT:
 3285 
 3286       if (stream->avail_out != 0)
 3287     {
 3288       stream->msg = "missed call to consume output";
 3289       return XD3_INTERNAL;
 3290     }
 3291 
 3292       /* Continue outputting one buffer at a time, until the next is NULL. */
 3293       if ((stream->enc_current = stream->enc_current->next_page) != NULL)
 3294     {
 3295       goto enc_output;
 3296     }
 3297 
 3298       stream->total_in += (xoff_t) stream->avail_in;
 3299       stream->enc_state = ENC_POSTWIN;
 3300 
 3301       IF_DEBUG2 (DP(RINT "[WINFINISH:%"Q"u] in=%"Q"u\n",
 3302             stream->current_window,
 3303             stream->total_in));
 3304       return XD3_WINFINISH;
 3305 
 3306     case ENC_POSTWIN:
 3307 
 3308       xd3_encode_reset (stream);
 3309 
 3310       stream->current_window += 1;
 3311       stream->enc_state = ENC_INPUT;
 3312 
 3313       /* If there is leftover input to flush, repeat. */
 3314       if (stream->buf_leftover != NULL)
 3315     {
 3316       goto enc_flush;
 3317     }
 3318 
 3319       /* Ready for more input. */
 3320       return XD3_INPUT;
 3321 
 3322     default:
 3323       stream->msg = "invalid state";
 3324       return XD3_INTERNAL;
 3325     }
 3326 }
 3327 #endif /* XD3_ENCODER */
 3328 
 3329 /*****************************************************************
 3330  Client convenience functions
 3331  ******************************************************************/
 3332 
 3333 int
 3334 xd3_process_stream (int            is_encode,
 3335             xd3_stream    *stream,
 3336             int          (*func) (xd3_stream *),
 3337             int            close_stream,
 3338             const uint8_t *input,
 3339             usize_t        input_size,
 3340             uint8_t       *output,
 3341             usize_t       *output_size,
 3342             usize_t        output_size_max)
 3343 {
 3344   usize_t ipos = 0;
 3345   usize_t n = xd3_min (stream->winsize, input_size);
 3346 
 3347   (*output_size) = 0;
 3348 
 3349   stream->flags |= XD3_FLUSH;
 3350 
 3351   xd3_avail_input (stream, input + ipos, n);
 3352   ipos += n;
 3353 
 3354   for (;;)
 3355     {
 3356       int ret;
 3357       switch ((ret = func (stream)))
 3358     {
 3359     case XD3_OUTPUT: { /* memcpy below */ break; }
 3360     case XD3_INPUT: {
 3361       n = xd3_min(stream->winsize, input_size - ipos);
 3362       if (n == 0) 
 3363         {
 3364           goto done;
 3365         }
 3366       xd3_avail_input (stream, input + ipos, n);
 3367       ipos += n;
 3368       continue;
 3369     }
 3370     case XD3_GOTHEADER: { /* ignore */ continue; }
 3371     case XD3_WINSTART: { /* ignore */ continue; }
 3372     case XD3_WINFINISH: { /* ignore */ continue; }
 3373     case XD3_GETSRCBLK:
 3374       {
 3375         /* When the getblk function is NULL, it is necessary to
 3376          * provide the complete source as a single block using
 3377          * xd3_set_source_and_size, otherwise this error.  The
 3378          * library should never ask for another source block. */
 3379         stream->msg = "library requested source block";
 3380         return XD3_INTERNAL;
 3381       }
 3382     case 0:
 3383       {
 3384         /* xd3_encode_input/xd3_decode_input never return 0 */
 3385         stream->msg = "invalid return: 0";
 3386         return XD3_INTERNAL;
 3387       }
 3388     default:
 3389       return ret;
 3390     }
 3391 
 3392       if (*output_size + stream->avail_out > output_size_max)
 3393     {
 3394       stream->msg = "insufficient output space";
 3395       return ENOSPC;
 3396     }
 3397 
 3398       memcpy (output + *output_size, stream->next_out, stream->avail_out);
 3399 
 3400       *output_size += stream->avail_out;
 3401 
 3402       xd3_consume_output (stream);
 3403     }
 3404  done:
 3405   return (close_stream == 0) ? 0 : xd3_close_stream (stream);
 3406 }
 3407 
 3408 static int
 3409 xd3_process_memory (int            is_encode,
 3410             int          (*func) (xd3_stream *),
 3411             const uint8_t *input,
 3412             usize_t        input_size,
 3413             const uint8_t *source,
 3414             usize_t        source_size,
 3415             uint8_t       *output,
 3416             usize_t       *output_size,
 3417             usize_t        output_size_max,
 3418             int            flags) {
 3419   xd3_stream stream;
 3420   xd3_config config;
 3421   xd3_source src;
 3422   int ret;
 3423 
 3424   memset (& stream, 0, sizeof (stream));
 3425   memset (& config, 0, sizeof (config));
 3426 
 3427   if (input == NULL || output == NULL) {
 3428     stream.msg = "invalid input/output buffer";
 3429     ret = XD3_INTERNAL;
 3430     goto exit;
 3431   }
 3432 
 3433   config.flags = flags;
 3434 
 3435   if (is_encode)
 3436     {
 3437       config.winsize = xd3_min(input_size, (usize_t) XD3_DEFAULT_WINSIZE);
 3438       config.sprevsz = xd3_pow2_roundup (config.winsize);
 3439     }
 3440 
 3441   if ((ret = xd3_config_stream (&stream, &config)) != 0)
 3442     {
 3443       goto exit;
 3444     }
 3445 
 3446   if (source != NULL)
 3447     {
 3448       memset (& src, 0, sizeof (src));
 3449 
 3450       src.blksize = source_size;
 3451       src.onblk = source_size;
 3452       src.curblk = source;
 3453       src.curblkno = 0;
 3454       src.max_winsize = source_size;
 3455 
 3456       if ((ret = xd3_set_source_and_size (&stream, &src, source_size)) != 0)
 3457     {
 3458       goto exit;
 3459     }
 3460     }
 3461 
 3462   if ((ret = xd3_process_stream (is_encode,
 3463                  & stream,
 3464                  func, 1,
 3465                  input, input_size,
 3466                  output,
 3467                  output_size,
 3468                  output_size_max)) != 0)
 3469     {
 3470       goto exit;
 3471     }
 3472 
 3473  exit:
 3474   if (ret != 0)
 3475     {
 3476       IF_DEBUG2 (DP(RINT "process_memory: %d: %s\n", ret, stream.msg));
 3477     }
 3478   xd3_free_stream(&stream);
 3479   return ret;
 3480 }
 3481 
 3482 int
 3483 xd3_decode_stream (xd3_stream    *stream,
 3484            const uint8_t *input,
 3485            usize_t        input_size,
 3486            uint8_t       *output,
 3487            usize_t       *output_size,
 3488            usize_t        output_size_max)
 3489 {
 3490   return xd3_process_stream (0, stream, & xd3_decode_input, 1,
 3491                  input, input_size,
 3492                  output, output_size, output_size_max);
 3493 }
 3494 
 3495 int
 3496 xd3_decode_memory (const uint8_t *input,
 3497            usize_t        input_size,
 3498            const uint8_t *source,
 3499            usize_t        source_size,
 3500            uint8_t       *output,
 3501            usize_t       *output_size,
 3502            usize_t        output_size_max,
 3503            int            flags) {
 3504   return xd3_process_memory (0, & xd3_decode_input,
 3505                  input, input_size,
 3506                  source, source_size,
 3507                  output, output_size, output_size_max,
 3508                  flags);
 3509 }
 3510 
 3511 
 3512 #if XD3_ENCODER
 3513 int
 3514 xd3_encode_stream (xd3_stream    *stream,
 3515            const uint8_t *input,
 3516            usize_t         input_size,
 3517            uint8_t       *output,
 3518            usize_t        *output_size,
 3519            usize_t         output_size_max)
 3520 {
 3521   return xd3_process_stream (1, stream, & xd3_encode_input, 1,
 3522                  input, input_size,
 3523                  output, output_size, output_size_max);
 3524 }
 3525 
 3526 int
 3527 xd3_encode_memory (const uint8_t *input,
 3528            usize_t        input_size,
 3529            const uint8_t *source,
 3530            usize_t        source_size,
 3531            uint8_t       *output,
 3532            usize_t        *output_size,
 3533            usize_t        output_size_max,
 3534            int            flags) {
 3535   return xd3_process_memory (1, & xd3_encode_input,
 3536                  input, input_size,
 3537                  source, source_size,
 3538                  output, output_size, output_size_max,
 3539                  flags);
 3540 }
 3541 #endif
 3542 
 3543 
 3544 /*************************************************************
 3545  String matching helpers
 3546  *************************************************************/
 3547 
 3548 #if XD3_ENCODER
 3549 /* Do the initial xd3_string_match() checksum table setup.
 3550  * Allocations are delayed until first use to avoid allocation
 3551  * sometimes (e.g., perfect matches, zero-length inputs). */
 3552 static int
 3553 xd3_string_match_init (xd3_stream *stream)
 3554 {
 3555   const int DO_SMALL = ! (stream->flags & XD3_NOCOMPRESS);
 3556   const int DO_LARGE = (stream->src != NULL);
 3557 
 3558   if (DO_LARGE && stream->large_table == NULL)
 3559     {
 3560       if ((stream->large_table =
 3561        (usize_t*) xd3_alloc0 (stream, stream->large_hash.size, sizeof (usize_t))) == NULL)
 3562     {
 3563       return ENOMEM;
 3564     }
 3565     }
 3566 
 3567   if (DO_SMALL)
 3568     {
 3569       /* Subsequent calls can return immediately after checking reset. */
 3570       if (stream->small_table != NULL)
 3571     {
 3572       /* The target hash table is reinitialized once per window. */
 3573       /* TODO: This would not have to be reinitialized if absolute
 3574        * offsets were being stored. */
 3575       if (stream->small_reset)
 3576         {
 3577           stream->small_reset = 0;
 3578           memset (stream->small_table, 0,
 3579               sizeof (usize_t) * stream->small_hash.size);
 3580         }
 3581 
 3582       return 0;
 3583     }
 3584 
 3585       if ((stream->small_table =
 3586        (usize_t*) xd3_alloc0 (stream,
 3587                   stream->small_hash.size,
 3588                   sizeof (usize_t))) == NULL)
 3589     {
 3590       return ENOMEM;
 3591     }
 3592 
 3593       /* If there is a previous table needed. */
 3594       if (stream->smatcher.small_lchain > 1 ||
 3595       stream->smatcher.small_chain > 1)
 3596     {
 3597       if ((stream->small_prev =
 3598            (xd3_slist*) xd3_alloc (stream,
 3599                        stream->sprevsz,
 3600                        sizeof (xd3_slist))) == NULL)
 3601         {
 3602           return ENOMEM;
 3603         }
 3604     }
 3605     }
 3606 
 3607   return 0;
 3608 }
 3609 
 3610 #if XD3_USE_LARGEFILE64
 3611 /* This function handles the 32/64bit ambiguity -- file positions are 64bit
 3612  * but the hash table for source-offsets is 32bit. */
 3613 static xoff_t
 3614 xd3_source_cksum_offset(xd3_stream *stream, usize_t low)
 3615 {
 3616   xoff_t scp = stream->srcwin_cksum_pos;
 3617   xoff_t s0 = scp >> 32;
 3618 
 3619   usize_t sr = (usize_t) scp;
 3620 
 3621   if (s0 == 0) {
 3622     return low;
 3623   }
 3624 
 3625   /* This should not be >= because srcwin_cksum_pos is the next
 3626    * position to index. */
 3627   if (low > sr) {
 3628     return (--s0 << 32) | low;
 3629   }
 3630 
 3631   return (s0 << 32) | low;
 3632 }
 3633 #else
 3634 static xoff_t
 3635 xd3_source_cksum_offset(xd3_stream *stream, usize_t low)
 3636 {
 3637   return (xoff_t) low;
 3638 }
 3639 #endif
 3640 
 3641 /* This function sets up the stream->src fields srcbase, srclen.  The
 3642  * call is delayed until these values are needed to encode a copy
 3643  * address.  At this point the decision has to be made. */
 3644 static int
 3645 xd3_srcwin_setup (xd3_stream *stream)
 3646 {
 3647   xd3_source *src = stream->src;
 3648   xoff_t length, x;
 3649 
 3650   /* Check the undecided state. */
 3651   XD3_ASSERT (src->srclen == 0 && src->srcbase == 0);
 3652 
 3653   /* Avoid repeating this call. */
 3654   stream->srcwin_decided = 1;
 3655 
 3656   /* If the stream is flushing, then the iopt buffer was able to
 3657    * contain the complete encoding.  If no copies were issued no
 3658    * source window is actually needed.  This prevents the VCDIFF
 3659    * header from including source base/len.  xd3_emit_hdr checks for
 3660    * srclen == 0. */
 3661   if (stream->enc_state == ENC_INSTR && stream->match_maxaddr == 0)
 3662     {
 3663       goto done;
 3664     }
 3665 
 3666   /* Check for overflow, srclen is usize_t - this can't happen unless
 3667    * XD3_DEFAULT_SRCBACK and related parameters are extreme - should
 3668    * use smaller windows. */
 3669   length = stream->match_maxaddr - stream->match_minaddr;
 3670 
 3671   x = (xoff_t) USIZE_T_MAX;
 3672   if (length > x)
 3673     {
 3674       stream->msg = "source window length overflow (not 64bit)";
 3675       return XD3_INTERNAL;
 3676     }
 3677 
 3678   /* If ENC_INSTR, then we know the exact source window to use because
 3679    * no more copies can be issued. */
 3680   if (stream->enc_state == ENC_INSTR)
 3681     {
 3682       src->srcbase = stream->match_minaddr;
 3683       src->srclen  = (usize_t) length;
 3684       XD3_ASSERT (src->srclen);
 3685       goto done;
 3686     }
 3687 
 3688   /* Otherwise, we have to make a guess.  More copies may still be
 3689    * issued, but we have to decide the source window base and length
 3690    * now.  */
 3691   /* TODO: This may not working well in practice, more testing needed. */
 3692   src->srcbase = stream->match_minaddr;
 3693   src->srclen  = xd3_max ((usize_t) length,
 3694               stream->avail_in + (stream->avail_in >> 2));
 3695 
 3696   if (src->eof_known)
 3697     {
 3698       /* Note: if the source size is known, we must reduce srclen or 
 3699        * code that expects to pass a single block w/ getblk == NULL
 3700        * will not function, as the code will return GETSRCBLK asking
 3701        * for the second block. */
 3702       src->srclen = xd3_min (src->srclen, xd3_source_eof(src) - src->srcbase);
 3703     }
 3704   
 3705   IF_DEBUG1 (DP(RINT "[srcwin_setup_constrained] base %"Q"u len %u\n",
 3706         src->srcbase, src->srclen));
 3707 
 3708   XD3_ASSERT (src->srclen);
 3709  done:
 3710   /* Set the taroff.  This convenience variable is used even when
 3711      stream->src == NULL. */
 3712   stream->taroff = src->srclen;
 3713   return 0;
 3714 }
 3715 
 3716 /* Sets the bounding region for a newly discovered source match, prior
 3717  * to calling xd3_source_extend_match().  This sets the match_maxfwd,
 3718  * match_maxback variables.  Note: srcpos is an absolute position
 3719  * (xoff_t) but the match_maxfwd, match_maxback variables are usize_t.
 3720  * Returns 0 if the setup succeeds, or 1 if the source position lies
 3721  * outside an already-decided srcbase/srclen window. */
 3722 static int
 3723 xd3_source_match_setup (xd3_stream *stream, xoff_t srcpos)
 3724 {
 3725   xd3_source *const src = stream->src;
 3726   usize_t greedy_or_not;
 3727 
 3728   stream->match_maxback = 0;
 3729   stream->match_maxfwd  = 0;
 3730   stream->match_back    = 0;
 3731   stream->match_fwd     = 0;
 3732 
 3733   /* This avoids a non-blocking endless loop caused by scanning
 3734    * backwards across a block boundary, only to find not enough
 3735    * matching bytes to beat the current min_match due to a better lazy
 3736    * target match: the re-entry to xd3_string_match() repeats the same
 3737    * long match because the input position hasn't changed.  TODO: if
 3738    * ever duplicates are added to the source hash table, this logic
 3739    * won't suffice to avoid loops.  See testing/regtest.cc's
 3740    * TestNonBlockingProgress test! */
 3741   if (srcpos != 0 && srcpos == stream->match_last_srcpos)
 3742     {
 3743       IF_DEBUG2(DP(RINT "[match_setup] looping failure\n"));
 3744       goto bad;
 3745     }
 3746 
 3747   /* Implement src->max_winsize, which prevents the encoder from seeking
 3748    * back further than the LRU cache maintaining FIFO discipline, (to
 3749    * avoid seeking). */
 3750   if (srcpos < stream->srcwin_cksum_pos &&
 3751       stream->srcwin_cksum_pos - srcpos > src->max_winsize)
 3752     {
 3753       IF_DEBUG2(DP(RINT "[match_setup] rejected due to src->max_winsize "
 3754            "distance eof=%"Q"u srcpos=%"Q"u max_winsz=%"Q"u\n",
 3755            xd3_source_eof (src),
 3756            srcpos, src->max_winsize));
 3757       goto bad;
 3758     }
 3759 
 3760   /* There are cases where the above test does not reject a match that
 3761    * will experience XD3_TOOFARBACK at the first xd3_getblk call
 3762    * because the input may have advanced up to one block beyond the
 3763    * actual EOF. */
 3764   IF_DEBUG2(DP(RINT "[match_setup] %"Q"u srcpos %"Q"u, "
 3765            "src->max_winsize %"Q"u\n",
 3766            stream->total_in + stream->input_position,
 3767            srcpos, src->max_winsize));
 3768 
 3769   /* Going backwards, the 1.5-pass algorithm allows some
 3770    * already-matched input may be covered by a longer source match.
 3771    * The greedy algorithm does not allow this. */
 3772   if (stream->flags & XD3_BEGREEDY)
 3773     {
 3774       /* The greedy algorithm allows backward matching to the last
 3775      matched position. */
 3776       greedy_or_not = xd3_iopt_last_matched (stream);
 3777     }
 3778   else
 3779     {
 3780       /* The 1.5-pass algorithm allows backward matching to go back as
 3781        * far as the unencoded offset, which is updated as instructions
 3782        * pass out of the iopt buffer.  If this (default) is chosen, it
 3783        * means xd3_iopt_erase may be called to eliminate instructions
 3784        * when a covering source match is found. */
 3785       greedy_or_not = stream->unencoded_offset;
 3786     }
 3787 
 3788   /* Backward target match limit. */
 3789   XD3_ASSERT (stream->input_position >= greedy_or_not);
 3790   stream->match_maxback = stream->input_position - greedy_or_not;
 3791 
 3792   /* Forward target match limit. */
 3793   XD3_ASSERT (stream->avail_in > stream->input_position);
 3794   stream->match_maxfwd = stream->avail_in - stream->input_position;
 3795 
 3796   /* Now we take the source position into account.  It depends whether
 3797    * the srclen/srcbase have been decided yet. */
 3798   if (stream->srcwin_decided == 0)
 3799     {
 3800       /* Unrestricted case: the match can cover the entire source,
 3801        * 0--src->size.  We compare the usize_t
 3802        * match_maxfwd/match_maxback against the xoff_t
 3803        * src->size/srcpos values and take the min. */
 3804       if (srcpos < (xoff_t) stream->match_maxback)
 3805     {
 3806       stream->match_maxback = (usize_t) srcpos;
 3807     }
 3808 
 3809       if (src->eof_known)
 3810     {
 3811       xoff_t srcavail = xd3_source_eof (src) - srcpos;
 3812 
 3813       if (srcavail < (xoff_t) stream->match_maxfwd)
 3814         {
 3815           stream->match_maxfwd = (usize_t) srcavail;
 3816         }
 3817     }
 3818 
 3819       IF_DEBUG2(DP(RINT
 3820            "[match_setup] srcpos %"Q"u (tgtpos %"Q"u) "
 3821            "unrestricted maxback %u maxfwd %u\n",
 3822            srcpos,
 3823            stream->total_in + stream->input_position,
 3824            stream->match_maxback,
 3825            stream->match_maxfwd));
 3826       goto good;
 3827     }
 3828 
 3829   /* Decided some source window. */
 3830   XD3_ASSERT (src->srclen > 0);
 3831 
 3832   /* Restricted case: fail if the srcpos lies outside the source window */
 3833   if ((srcpos < src->srcbase) ||
 3834       (srcpos > (src->srcbase + (xoff_t) src->srclen)))
 3835     {
 3836       IF_DEBUG1(DP(RINT "[match_setup] restricted source window failure\n"));
 3837       goto bad;
 3838     }
 3839   else
 3840     {
 3841       usize_t srcavail;
 3842 
 3843       srcavail = (usize_t) (srcpos - src->srcbase);
 3844       if (srcavail < stream->match_maxback)
 3845     {
 3846       stream->match_maxback = srcavail;
 3847     }
 3848 
 3849       srcavail = (usize_t) (src->srcbase + (xoff_t) src->srclen - srcpos);
 3850       if (srcavail < stream->match_maxfwd)
 3851     {
 3852       stream->match_maxfwd = srcavail;
 3853     }
 3854 
 3855       IF_DEBUG2(DP(RINT
 3856            "[match_setup] srcpos %"Q"u (tgtpos %"Q"u) "
 3857            "restricted maxback %u maxfwd %u\n",
 3858            srcpos,
 3859            stream->total_in + stream->input_position,
 3860            stream->match_maxback,
 3861            stream->match_maxfwd));
 3862       goto good;
 3863     }
 3864 
 3865  good:
 3866   stream->match_state  = MATCH_BACKWARD;
 3867   stream->match_srcpos = srcpos;
 3868   stream->match_last_srcpos = srcpos;
 3869   return 0;
 3870 
 3871  bad:
 3872   stream->match_state  = MATCH_SEARCHING;
 3873   stream->match_last_srcpos = srcpos;
 3874   return 1;
 3875 }
 3876 
 3877 static inline int
 3878 xd3_forward_match(const uint8_t *s1c, const uint8_t *s2c, int n)
 3879 {
 3880   int i = 0;
 3881 #if UNALIGNED_OK
 3882   int nint = n / sizeof(int);
 3883 
 3884   if (nint >> 3)
 3885     {
 3886       int j = 0;
 3887       const int *s1 = (const int*)s1c;
 3888       const int *s2 = (const int*)s2c;
 3889       int nint_8 = nint - 8;
 3890 
 3891       while (i <= nint_8 &&
 3892          s1[i++] == s2[j++] &&
 3893          s1[i++] == s2[j++] &&
 3894          s1[i++] == s2[j++] &&
 3895          s1[i++] == s2[j++] &&
 3896          s1[i++] == s2[j++] &&
 3897          s1[i++] == s2[j++] &&
 3898          s1[i++] == s2[j++] &&
 3899          s1[i++] == s2[j++]) { }
 3900 
 3901       i = (i - 1) * sizeof(int);
 3902     }
 3903 #endif
 3904 
 3905   while (i < n && s1c[i] == s2c[i])
 3906     {
 3907       i++;
 3908     }
 3909   return i;
 3910 }
 3911 
 3912 /* This function expands the source match backward and forward.  It is
 3913  * reentrant, since xd3_getblk may return XD3_GETSRCBLK, so most
 3914  * variables are kept in xd3_stream.  There are two callers of this
 3915  * function, the string_matching routine when a checksum match is
 3916  * discovered, and xd3_encode_input whenever a continuing (or initial)
 3917  * match is suspected.  The two callers do different things with the
 3918  * input_position, thus this function leaves that variable untouched.
 3919  * If a match is taken the resulting stream->match_fwd is left
 3920  * non-zero. */
 3921 static int
 3922 xd3_source_extend_match (xd3_stream *stream)
 3923 {
 3924   int ret;
 3925   xd3_source *const src = stream->src;
 3926   xoff_t matchoff;  /* matchoff is the current right/left-boundary of
 3927                the source match being tested. */
 3928   usize_t streamoff; /* streamoff is the current right/left-boundary
 3929             of the input match being tested. */
 3930   xoff_t tryblk;    /* tryblk, tryoff are the block, offset position
 3931                of matchoff */
 3932   usize_t tryoff;
 3933   usize_t tryrem;    /* tryrem is the number of matchable bytes */
 3934   usize_t matched;
 3935 
 3936   XD3_ASSERT (src != NULL);
 3937 
 3938   /* Does it make sense to compute backward match AFTER forward match? */
 3939   if (stream->match_state == MATCH_BACKWARD)
 3940     {
 3941       /* Note: this code is practically duplicated below, substituting
 3942        * match_fwd/match_back and direction. */
 3943       matchoff  = stream->match_srcpos - stream->match_back;
 3944       streamoff = stream->input_position - stream->match_back;
 3945       xd3_blksize_div (matchoff, src, &tryblk, &tryoff);
 3946 
 3947       /* this loops backward over source blocks */
 3948       while (stream->match_back < stream->match_maxback)
 3949     {
 3950       /* see if we're backing across a source block boundary */
 3951       if (tryoff == 0)
 3952         {
 3953           tryoff  = src->blksize;
 3954           tryblk -= 1;
 3955         }
 3956 
 3957       if ((ret = xd3_getblk (stream, tryblk)))
 3958         {
 3959           if (ret == XD3_TOOFARBACK)
 3960         {
 3961           IF_DEBUG2(DP(RINT "[maxback] %"Q"u TOOFARBACK: %u INP %"Q"u CKSUM %"Q"u\n",
 3962                    tryblk, stream->match_back,
 3963                    stream->total_in + stream->input_position,
 3964                    stream->srcwin_cksum_pos));
 3965 
 3966           /* the starting position is too far back. */
 3967           if (stream->match_back == 0)
 3968             {
 3969               XD3_ASSERT(stream->match_fwd == 0);
 3970               goto donefwd;
 3971             }
 3972 
 3973           /* search went too far back, continue forward. */
 3974           goto doneback;
 3975         }
 3976 
 3977           /* could be a XD3_GETSRCBLK failure. */
 3978           return ret;
 3979         }
 3980 
 3981       tryrem = xd3_min (tryoff, stream->match_maxback - stream->match_back);
 3982 
 3983       IF_DEBUG2(DP(RINT "[maxback] maxback %u trysrc %"Q"u/%u tgt %u tryrem %u\n",
 3984                stream->match_maxback, tryblk, tryoff, streamoff, tryrem));
 3985 
 3986       /* TODO: This code can be optimized similar to xd3_match_forward() */
 3987       for (; tryrem != 0; tryrem -= 1, stream->match_back += 1)
 3988         {
 3989           if (src->curblk[tryoff-1] != stream->next_in[streamoff-1])
 3990         {
 3991           goto doneback;
 3992         }
 3993 
 3994           tryoff    -= 1;
 3995           streamoff -= 1;
 3996         }
 3997     }
 3998 
 3999     doneback:
 4000       stream->match_state = MATCH_FORWARD;
 4001     }
 4002 
 4003   XD3_ASSERT (stream->match_state == MATCH_FORWARD);
 4004 
 4005   matchoff  = stream->match_srcpos + stream->match_fwd;
 4006   streamoff = stream->input_position + stream->match_fwd;
 4007   xd3_blksize_div (matchoff, src, & tryblk, & tryoff);
 4008 
 4009   /* Note: practically the same code as backwards case above: same comments */
 4010   while (stream->match_fwd < stream->match_maxfwd)
 4011     {
 4012       if (tryoff == src->blksize)
 4013     {
 4014       tryoff  = 0;
 4015       tryblk += 1;
 4016     }
 4017 
 4018       if ((ret = xd3_getblk (stream, tryblk)))
 4019     {
 4020       if (ret == XD3_TOOFARBACK)
 4021         {
 4022           IF_DEBUG2(DP(RINT "[maxfwd] %"Q"u TOOFARBACK: %u INP %"Q"u CKSUM %"Q"u\n",
 4023                tryblk, stream->match_fwd,
 4024                stream->total_in + stream->input_position,
 4025                stream->srcwin_cksum_pos));
 4026           goto donefwd;
 4027         }
 4028 
 4029       /* could be a XD3_GETSRCBLK failure. */
 4030       return ret;
 4031     }
 4032 
 4033       tryrem = xd3_min(stream->match_maxfwd - stream->match_fwd,
 4034            src->onblk - tryoff);
 4035 
 4036       if (tryrem == 0)
 4037     {
 4038       /* Generally, this means we have a power-of-two size source
 4039        * and we just found the end-of-file, in this case it's an
 4040        * empty block. */
 4041       XD3_ASSERT (src->onblk < src->blksize);
 4042       break;
 4043     }
 4044 
 4045       matched = xd3_forward_match(src->curblk + tryoff,
 4046                   stream->next_in + streamoff,
 4047                   tryrem);
 4048       tryoff += matched;
 4049       streamoff += matched;
 4050       stream->match_fwd += matched;
 4051 
 4052       if (tryrem != matched)
 4053     {
 4054       break;
 4055     }
 4056     }
 4057 
 4058  donefwd:
 4059   stream->match_state = MATCH_SEARCHING;
 4060 
 4061   IF_DEBUG2(DP(RINT "[extend match] input %"Q"u srcpos %"Q"u len %u\n",
 4062            stream->input_position + stream->total_in,
 4063            stream->match_srcpos,
 4064            stream->match_fwd));
 4065 
 4066   /* If the match ends short of the last instruction end, we probably
 4067    * don't want it.  There is the possibility that a copy ends short
 4068    * of the last copy but also goes further back, in which case we
 4069    * might want it.  This code does not implement such: if so we would
 4070    * need more complicated xd3_iopt_erase logic. */
 4071   if (stream->match_fwd < stream->min_match)
 4072     {
 4073       stream->match_fwd = 0;
 4074     }
 4075   else
 4076     {
 4077       usize_t total  = stream->match_fwd + stream->match_back;
 4078 
 4079       /* Correct the variables to remove match_back from the equation. */
 4080       usize_t target_position = stream->input_position - stream->match_back;
 4081       usize_t match_length   = stream->match_back      + stream->match_fwd;
 4082       xoff_t match_position  = stream->match_srcpos    - stream->match_back;
 4083       xoff_t match_end       = stream->match_srcpos    + stream->match_fwd;
 4084 
 4085       /* At this point we may have to erase any iopt-buffer
 4086        * instructions that are fully covered by a backward-extending
 4087        * copy. */
 4088       if (stream->match_back > 0)
 4089     {
 4090       xd3_iopt_erase (stream, target_position, total);
 4091     }
 4092 
 4093       stream->match_back = 0;
 4094 
 4095       /* Update ranges.  The first source match occurs with both
 4096      values set to 0. */
 4097       if (stream->match_maxaddr == 0 ||
 4098       match_position < stream->match_minaddr)
 4099     {
 4100       stream->match_minaddr = match_position;
 4101     }
 4102 
 4103       if (match_end > stream->match_maxaddr)
 4104     {
 4105       /* Note: per-window */
 4106       stream->match_maxaddr = match_end;
 4107     }
 4108 
 4109       if (match_end > stream->maxsrcaddr)
 4110     {
 4111       /* Note: across windows */
 4112       stream->maxsrcaddr = match_end;
 4113     }
 4114 
 4115       IF_DEBUG2 ({
 4116     static int x = 0;
 4117     DP(RINT "[source match:%d] length %u <inp %"Q"u %"Q"u>  <src %"Q"u %"Q"u> (%s) [ %u bytes ]\n",
 4118        x++,
 4119        match_length,
 4120        stream->total_in + target_position,
 4121        stream->total_in + target_position + match_length,
 4122        match_position,
 4123        match_position + match_length,
 4124        (stream->total_in + target_position == match_position) ? "same" : "diff",
 4125        match_length);
 4126       });
 4127 
 4128       if ((ret = xd3_found_match (stream,
 4129                   /* decoder position */ target_position,
 4130                   /* length */ match_length,
 4131                   /* address */ match_position,
 4132                   /* is_source */ 1)))
 4133     {
 4134       return ret;
 4135     }
 4136 
 4137       /* If the match ends with the available input: */
 4138       if (target_position + match_length == stream->avail_in)
 4139     {
 4140       /* Setup continuing match for the next window. */
 4141       stream->match_state  = MATCH_TARGET;
 4142       stream->match_srcpos = match_end;
 4143     }
 4144     }
 4145 
 4146   return 0;
 4147 }
 4148 
 4149 /* Update the small hash.  Values in the small_table are offset by
 4150  * HASH_CKOFFSET (1) to distinguish empty buckets from real offsets. */
 4151 static void
 4152 xd3_scksum_insert (xd3_stream *stream,
 4153            usize_t inx,
 4154            usize_t scksum,
 4155            usize_t pos)
 4156 {
 4157   /* If we are maintaining previous duplicates. */
 4158   if (stream->small_prev)
 4159     {
 4160       usize_t    last_pos = stream->small_table[inx];
 4161       xd3_slist *pos_list = & stream->small_prev[pos & stream->sprevmask];
 4162 
 4163       /* Note last_pos is offset by HASH_CKOFFSET. */
 4164       pos_list->last_pos = last_pos;
 4165     }
 4166 
 4167   /* Enter the new position into the hash bucket. */
 4168   stream->small_table[inx] = pos + HASH_CKOFFSET;
 4169 }
 4170 
 4171 #if XD3_DEBUG
 4172 static int
 4173 xd3_check_smatch (const uint8_t *ref0, const uint8_t *inp0,
 4174           const uint8_t *inp_max, usize_t cmp_len)
 4175 {
 4176   usize_t i;
 4177 
 4178   for (i = 0; i < cmp_len; i += 1)
 4179     {
 4180       XD3_ASSERT (ref0[i] == inp0[i]);
 4181     }
 4182 
 4183   if (inp0 + cmp_len < inp_max)
 4184     {
 4185       XD3_ASSERT (inp0[i] != ref0[i]);
 4186     }
 4187 
 4188   return 1;
 4189 }
 4190 #endif /* XD3_DEBUG */
 4191 
 4192 /* When the hash table indicates a possible small string match, it
 4193  * calls this routine to find the best match.  The first matching
 4194  * position is taken from the small_table, HASH_CKOFFSET is subtracted
 4195  * to get the actual position.  After checking that match, if previous
 4196  * linked lists are in use (because stream->smatcher.small_chain > 1),
 4197  * previous matches are tested searching for the longest match.  If
 4198  * (stream->min_match > MIN_MATCH) then a lazy match is in effect.
 4199  */
 4200 static usize_t
 4201 xd3_smatch (xd3_stream *stream,
 4202         usize_t base,
 4203         usize_t scksum,
 4204         usize_t *match_offset)
 4205 {
 4206   usize_t cmp_len;
 4207   usize_t match_length = 0;
 4208   usize_t chain = (stream->min_match == MIN_MATCH ?
 4209                    stream->smatcher.small_chain :
 4210                    stream->smatcher.small_lchain);
 4211   const uint8_t *inp_max = stream->next_in + stream->avail_in;
 4212   const uint8_t *inp;
 4213   const uint8_t *ref;
 4214 
 4215   SMALL_HASH_DEBUG1 (stream, stream->next_in + stream->input_position);
 4216 
 4217   XD3_ASSERT (stream->min_match + stream->input_position <= stream->avail_in);
 4218 
 4219   base -= HASH_CKOFFSET;
 4220 
 4221  again:
 4222 
 4223   IF_DEBUG2 (DP(RINT "smatch at base=%u inp=%u cksum=%u\n", base,
 4224                 stream->input_position, scksum));
 4225 
 4226   /* For small matches, we can always go to the end-of-input because
 4227    * the matching position must be less than the input position. */
 4228   XD3_ASSERT (base < stream->input_position);
 4229 
 4230   ref = stream->next_in + base;
 4231   inp = stream->next_in + stream->input_position;
 4232 
 4233   SMALL_HASH_DEBUG2 (stream, ref);
 4234 
 4235   /* Expand potential match forward. */
 4236   while (inp < inp_max && *inp == *ref)
 4237     {
 4238       ++inp;
 4239       ++ref;
 4240     }
 4241 
 4242   cmp_len = (usize_t)(inp - (stream->next_in + stream->input_position));
 4243 
 4244   /* Verify correctness */
 4245   XD3_ASSERT (xd3_check_smatch (stream->next_in + base,
 4246                 stream->next_in + stream->input_position,
 4247                 inp_max, cmp_len));
 4248 
 4249   /* Update longest match */
 4250   if (cmp_len > match_length)
 4251     {
 4252       ( match_length) = cmp_len;
 4253       (*match_offset) = base;
 4254 
 4255       /* Stop if we match the entire input or have a long_enough match. */
 4256       if (inp == inp_max || cmp_len >= stream->smatcher.long_enough)
 4257     {
 4258       goto done;
 4259     }
 4260     }
 4261 
 4262   /* If we have not reached the chain limit, see if there is another
 4263      previous position. */
 4264   while (--chain != 0)
 4265     {
 4266       /* Calculate the previous offset. */
 4267       usize_t prev_pos = stream->small_prev[base & stream->sprevmask].last_pos;
 4268       usize_t diff_pos;
 4269 
 4270        if (prev_pos == 0)
 4271     {
 4272       break;
 4273     }
 4274 
 4275       prev_pos -= HASH_CKOFFSET;
 4276 
 4277       if (prev_pos > base)
 4278         {
 4279           break;
 4280         }
 4281 
 4282       base = prev_pos;
 4283 
 4284       XD3_ASSERT (stream->input_position > base);
 4285       diff_pos = stream->input_position - base;
 4286 
 4287       /* Stop searching if we go beyond sprevsz, since those entries
 4288        * are for unrelated checksum entries. */
 4289       if (diff_pos & ~stream->sprevmask)
 4290         {
 4291           break;
 4292         }
 4293 
 4294       goto again;
 4295     }
 4296 
 4297  done:
 4298   /* Crude efficiency test: if the match is very short and very far back, it's
 4299    * unlikely to help, but the exact calculation requires knowing the state of
 4300    * the address cache and adjacent instructions, which we can't do here.
 4301    * Rather than encode a probably inefficient copy here and check it later
 4302    * (which complicates the code a lot), do this:
 4303    */
 4304   if (match_length == 4 && stream->input_position - (*match_offset) >= 1<<14)
 4305     {
 4306       /* It probably takes >2 bytes to encode an address >= 2^14 from here */
 4307       return 0;
 4308     }
 4309   if (match_length == 5 && stream->input_position - (*match_offset) >= 1<<21)
 4310     {
 4311       /* It probably takes >3 bytes to encode an address >= 2^21 from here */
 4312       return 0;
 4313     }
 4314 
 4315   /* It's unlikely that a window is large enough for the (match_length == 6 &&
 4316    * address >= 2^28) check */
 4317   return match_length;
 4318 }
 4319 
 4320 #if XD3_DEBUG
 4321 static void
 4322 xd3_verify_small_state (xd3_stream    *stream,
 4323             const uint8_t *inp,
 4324             uint32_t          x_cksum)
 4325 {
 4326   uint32_t state;
 4327   uint32_t cksum = xd3_scksum (&state, inp, stream->smatcher.small_look);
 4328 
 4329   XD3_ASSERT (cksum == x_cksum);
 4330 }
 4331 
 4332 static void
 4333 xd3_verify_large_state (xd3_stream    *stream,
 4334             const uint8_t *inp,
 4335             uint32_t          x_cksum)
 4336 {
 4337   uint32_t cksum = xd3_lcksum (inp, stream->smatcher.large_look);
 4338   XD3_ASSERT (cksum == x_cksum);
 4339 }
 4340 static void
 4341 xd3_verify_run_state (xd3_stream    *stream,
 4342               const uint8_t *inp,
 4343               usize_t        x_run_l,
 4344               uint8_t       *x_run_c)
 4345 {
 4346   usize_t slook = stream->smatcher.small_look;
 4347   uint8_t run_c;
 4348   usize_t run_l = xd3_comprun (inp, slook, &run_c);
 4349 
 4350   XD3_ASSERT (run_l == 0 || run_c == *x_run_c);
 4351   XD3_ASSERT (x_run_l > slook || run_l == x_run_l);
 4352 }
 4353 #endif /* XD3_DEBUG */
 4354 
 4355 /* This function computes more source checksums to advance the window.
 4356  * Called at every entrance to the string-match loop and each time
 4357  * stream->input_position reaches the value returned as
 4358  * *next_move_point.  NB: this is one of the most expensive functions
 4359  * in this code and also the most critical for good compression.
 4360  */
 4361 static int
 4362 xd3_srcwin_move_point (xd3_stream *stream, usize_t *next_move_point)
 4363 {
 4364   /* the source file is indexed until this point */
 4365   xoff_t target_cksum_pos;
 4366   /* the absolute target file input position */
 4367   xoff_t absolute_input_pos;
 4368 
 4369   if (stream->src->eof_known)
 4370     {
 4371       xoff_t source_size = xd3_source_eof (stream->src);
 4372       XD3_ASSERT(stream->srcwin_cksum_pos <= source_size);
 4373 
 4374       if (stream->srcwin_cksum_pos == source_size)
 4375     {
 4376       *next_move_point = USIZE_T_MAX;
 4377       return 0;
 4378     }
 4379     }
 4380 
 4381   absolute_input_pos = stream->total_in + stream->input_position;
 4382 
 4383   /* Immediately read the entire window. 
 4384    *
 4385    * Note: this reverses a long held policy, at this point in the
 4386    * code, of advancing relatively slowly as the input is read, which
 4387    * results in better compression for very-similar inputs, but worse
 4388    * compression where data is deleted near the beginning of the file.
 4389    * 
 4390    * The new policy is simpler, somewhat slower and can benefit, or
 4391    * slightly worsen, compression performance. */
 4392   if (absolute_input_pos < stream->src->max_winsize / 2)
 4393     {
 4394       target_cksum_pos = stream->src->max_winsize;
 4395     }
 4396   else
 4397     {
 4398       /* TODO: The addition of 2 blocks here is arbitrary.  Do a
 4399        * better job of stream alignment based on observed source copy
 4400        * addresses, and when both input sizes are known, the
 4401        * difference in size. */
 4402       target_cksum_pos = absolute_input_pos +
 4403     stream->src->max_winsize / 2 +
 4404     stream->src->blksize * 2;
 4405       target_cksum_pos &= ~stream->src->maskby;
 4406     }
 4407 
 4408   /* A long match may have extended past srcwin_cksum_pos.  Don't
 4409    * start checksumming already-matched source data. */
 4410   if (stream->maxsrcaddr > stream->srcwin_cksum_pos)
 4411     {
 4412       stream->srcwin_cksum_pos = stream->maxsrcaddr;
 4413     }
 4414 
 4415   if (target_cksum_pos < stream->srcwin_cksum_pos)
 4416     {
 4417       target_cksum_pos = stream->srcwin_cksum_pos;
 4418     }
 4419 
 4420   while (stream->srcwin_cksum_pos < target_cksum_pos &&
 4421      (!stream->src->eof_known ||
 4422       stream->srcwin_cksum_pos < xd3_source_eof (stream->src)))
 4423     {
 4424       xoff_t  blkno;
 4425       xoff_t  blkbaseoffset;
 4426       usize_t blkrem;
 4427       ssize_t oldpos;  /* Using ssize_t because of a  */
 4428       ssize_t blkpos;  /* do { blkpos-- }
 4429               while (blkpos >= oldpos); */
 4430       int ret;
 4431       xd3_blksize_div (stream->srcwin_cksum_pos,
 4432                stream->src, &blkno, &blkrem);
 4433       oldpos = blkrem;
 4434 
 4435       if ((ret = xd3_getblk (stream, blkno)))
 4436     {
 4437       /* TOOFARBACK should never occur here, since we read forward. */
 4438       if (ret == XD3_TOOFARBACK)
 4439         {
 4440           ret = XD3_INTERNAL;
 4441         }
 4442 
 4443       IF_DEBUG1 (DP(RINT
 4444             "[srcwin_move_point] async getblk return for %"Q"u: %s\n",
 4445             blkno, xd3_strerror (ret)));
 4446       return ret;
 4447     }
 4448 
 4449       IF_DEBUG1 (DP(RINT
 4450             "[srcwin_move_point] block %"Q"u T=%"Q"u S=%"Q"u L=%"Q"u EOF=%"Q"u %s\n",
 4451             blkno,
 4452             stream->total_in + stream->input_position,
 4453             stream->srcwin_cksum_pos,
 4454             target_cksum_pos,
 4455             xd3_source_eof (stream->src),
 4456             stream->src->eof_known ? "known" : "unknown"));
 4457 
 4458       blkpos = xd3_bytes_on_srcblk (stream->src, blkno);
 4459 
 4460       if (blkpos < (ssize_t) stream->smatcher.large_look)
 4461     {
 4462       stream->srcwin_cksum_pos = (blkno + 1) * stream->src->blksize;
 4463       IF_DEBUG2 (DP(RINT "[srcwin_move_point] continue (end-of-block): %"Q"u\n", blkpos));
 4464       continue;
 4465     }
 4466 
 4467       /* This inserts checksums for the entire block, in reverse,
 4468        * starting from the end of the block.  This logic does not test
 4469        * stream->srcwin_cksum_pos because it always advances it to the
 4470        * start of the next block.
 4471        *
 4472        * oldpos is the srcwin_cksum_pos within this block.  blkpos is
 4473        * the number of bytes available.  Each iteration inspects
 4474        * large_look bytes then steps back large_step bytes.  The
 4475        * if-stmt above ensures at least one large_look of data. */
 4476       blkpos -= stream->smatcher.large_look;
 4477       blkbaseoffset = stream->src->blksize * blkno;
 4478 
 4479       do
 4480     {
 4481       uint32_t cksum = xd3_lcksum (stream->src->curblk + blkpos,
 4482                        stream->smatcher.large_look);
 4483       usize_t hval = xd3_checksum_hash (& stream->large_hash, cksum);
 4484 
 4485       stream->large_table[hval] =
 4486         (usize_t) (blkbaseoffset +
 4487                (xoff_t)(blkpos + HASH_CKOFFSET));
 4488 
 4489       IF_DEBUG (stream->large_ckcnt += 1);
 4490 
 4491       blkpos -= stream->smatcher.large_step;
 4492     }
 4493       while (blkpos >= oldpos);
 4494 
 4495       stream->srcwin_cksum_pos = (blkno + 1) * stream->src->blksize;
 4496     }
 4497 
 4498   if (stream->src->eof_known)
 4499     {
 4500       xoff_t source_size = xd3_source_eof (stream->src);
 4501       if (stream->srcwin_cksum_pos >= source_size)
 4502     {
 4503       /* This invariant is needed for xd3_source_cksum_offset() */
 4504       stream->srcwin_cksum_pos = source_size;
 4505       *next_move_point = USIZE_T_MAX;
 4506       IF_DEBUG1 (DP(RINT
 4507             "[srcwin_move_point] finished with source input\n"));
 4508       return 0;
 4509     }
 4510     }
 4511 
 4512   /* How long until this function should be called again. */
 4513   XD3_ASSERT(stream->srcwin_cksum_pos >= target_cksum_pos);
 4514 
 4515   *next_move_point = stream->input_position +
 4516     stream->src->blksize -
 4517     ((stream->srcwin_cksum_pos - target_cksum_pos) & stream->src->maskby);
 4518   
 4519   IF_DEBUG2 (DP(RINT
 4520         "[srcwin_move_point] finished T=%"Q"u "
 4521         "S=%"Q"u L=%"Q"u EOF=%"Q"u %s again in %u\n",
 4522         stream->total_in + stream->input_position,
 4523         stream->srcwin_cksum_pos,
 4524         target_cksum_pos,
 4525         xd3_source_eof (stream->src),
 4526         stream->src->eof_known ? "known" : "unknown",
 4527         *next_move_point - stream->input_position));
 4528 
 4529   return 0;
 4530 }
 4531 
 4532 #endif /* XD3_ENCODER */
 4533 
 4534 /********************************************************************
 4535  TEMPLATE pass
 4536  *********************************************************************/
 4537 
 4538 #endif /* __XDELTA3_C_INLINE_PASS__ */
 4539 #ifdef __XDELTA3_C_TEMPLATE_PASS__
 4540 
 4541 #if XD3_ENCODER
 4542 
 4543 /********************************************************************
 4544  Templates
 4545  *******************************************************************/
 4546 
 4547 /* Template macros */
 4548 #define XD3_TEMPLATE(x)      XD3_TEMPLATE2(x,TEMPLATE)
 4549 #define XD3_TEMPLATE2(x,n)   XD3_TEMPLATE3(x,n)
 4550 #define XD3_TEMPLATE3(x,n)   x ## n
 4551 #define XD3_STRINGIFY(x)     XD3_STRINGIFY2(x)
 4552 #define XD3_STRINGIFY2(x)    #x
 4553 
 4554 static int XD3_TEMPLATE(xd3_string_match_) (xd3_stream *stream);
 4555 
 4556 static const xd3_smatcher XD3_TEMPLATE(__smatcher_) =
 4557 {
 4558   XD3_STRINGIFY(TEMPLATE),
 4559   XD3_TEMPLATE(xd3_string_match_),
 4560 #if SOFTCFG == 1
 4561   0, 0, 0, 0, 0, 0, 0
 4562 #else
 4563   LLOOK, LSTEP, SLOOK, SCHAIN, SLCHAIN, MAXLAZY, LONGENOUGH
 4564 #endif
 4565 };
 4566 
 4567 static int
 4568 XD3_TEMPLATE(xd3_string_match_) (xd3_stream *stream)
 4569 {
 4570   const int      DO_SMALL = ! (stream->flags & XD3_NOCOMPRESS);
 4571   const int      DO_LARGE = (stream->src != NULL);
 4572   const int      DO_RUN   = (1);
 4573 
 4574   const uint8_t *inp;
 4575   uint32_t       scksum = 0;
 4576   uint32_t       scksum_state = 0;
 4577   uint32_t       lcksum = 0;
 4578   usize_t        sinx;
 4579   usize_t        linx;
 4580   uint8_t        run_c;
 4581   usize_t        run_l;
 4582   int            ret;
 4583   usize_t        match_length;
 4584   usize_t        match_offset = 0;
 4585   usize_t        next_move_point;
 4586 
 4587   IF_DEBUG2(DP(RINT "[string_match] initial entry %u\n", stream->input_position));
 4588 
 4589   /* If there will be no compression due to settings or short input,
 4590    * skip it entirely. */
 4591   if (! (DO_SMALL || DO_LARGE || DO_RUN) ||
 4592       stream->input_position + SLOOK > stream->avail_in) { goto loopnomore; }
 4593 
 4594   if ((ret = xd3_string_match_init (stream))) { return ret; }
 4595 
 4596   /* The restartloop label is reached when the incremental loop state
 4597    * needs to be reset. */
 4598  restartloop:
 4599 
 4600   IF_DEBUG2(DP(RINT "[string_match] restartloop %u\n", stream->input_position));
 4601 
 4602   /* If there is not enough input remaining for any kind of match,
 4603      skip it. */
 4604   if (stream->input_position + SLOOK > stream->avail_in) { goto loopnomore; }
 4605 
 4606   /* Now reset the incremental loop state: */
 4607 
 4608   /* The min_match variable is updated to avoid matching the same lazy
 4609    * match over and over again.  For example, if you find a (small)
 4610    * match of length 9 at one position, you will likely find a match
 4611    * of length 8 at the next position. */
 4612   if (xd3_iopt_last_matched (stream) > stream->input_position)
 4613     {
 4614       stream->min_match = xd3_max (MIN_MATCH,
 4615                    1 + xd3_iopt_last_matched(stream) -
 4616                    stream->input_position);
 4617     }
 4618   else
 4619     {
 4620       stream->min_match = MIN_MATCH;
 4621     }
 4622 
 4623   /* The current input byte. */
 4624   inp = stream->next_in + stream->input_position;
 4625 
 4626   /* Small match state. */
 4627   if (DO_SMALL)
 4628     {
 4629       scksum = xd3_scksum (&scksum_state, inp, SLOOK);
 4630     }
 4631 
 4632   /* Run state. */
 4633   if (DO_RUN)
 4634     {
 4635       run_l = xd3_comprun (inp, SLOOK, & run_c);
 4636     }
 4637 
 4638   /* Large match state.  We continue the loop even after not enough
 4639    * bytes for LLOOK remain, so always check stream->input_position in
 4640    * DO_LARGE code. */
 4641   if (DO_LARGE && (stream->input_position + LLOOK <= stream->avail_in))
 4642     {
 4643       /* Source window: next_move_point is the point that
 4644        * stream->input_position must reach before computing more
 4645        * source checksum.  Note: this is called unconditionally
 4646        * the first time after reentry, subsequent calls will be
 4647        * avoided if next_move_point is > input_position */
 4648       if ((ret = xd3_srcwin_move_point (stream, & next_move_point)))
 4649     {
 4650       return ret;
 4651     }
 4652 
 4653       lcksum = xd3_lcksum (inp, LLOOK);
 4654     }
 4655 
 4656   /* TRYLAZYLEN: True if a certain length match should be followed by
 4657    * lazy search.  This checks that LEN is shorter than MAXLAZY and
 4658    * that there is enough leftover data to consider lazy matching.
 4659    * "Enough" is set to 2 since the next match will start at the next
 4660    * offset, it must match two extra characters. */
 4661 #define TRYLAZYLEN(LEN,POS,MAX) ((MAXLAZY) > 0 && (LEN) < (MAXLAZY) \
 4662                  && (POS) + (LEN) <= (MAX) - 2)
 4663 
 4664   /* HANDLELAZY: This statement is called each time an instruciton is
 4665    * emitted (three cases).  If the instruction is large enough, the
 4666    * loop is restarted, otherwise lazy matching may ensue. */
 4667 #define HANDLELAZY(mlen) \
 4668   if (TRYLAZYLEN ((mlen), (stream->input_position), (stream->avail_in))) \
 4669     { stream->min_match = (mlen) + LEAST_MATCH_INCR; goto updateone; } \
 4670   else \
 4671     { stream->input_position += (mlen); goto restartloop; }
 4672 
 4673   /* Now loop over one input byte at a time until a match is found... */
 4674   for (;; inp += 1, stream->input_position += 1)
 4675     {
 4676       /* Now we try three kinds of string match in order of expense:
 4677        * run, large match, small match. */
 4678 
 4679       /* Expand the start of a RUN.  The test for (run_l == SLOOK)
 4680        * avoids repeating this check when we pass through a run area
 4681        * performing lazy matching.  The run is only expanded once when
 4682        * the min_match is first reached.  If lazy matching is
 4683        * performed, the run_l variable will remain inconsistent until
 4684        * the first non-running input character is reached, at which
 4685        * time the run_l may then again grow to SLOOK. */
 4686       if (DO_RUN && run_l == SLOOK)
 4687     {
 4688       usize_t max_len = stream->avail_in - stream->input_position;
 4689 
 4690       IF_DEBUG (xd3_verify_run_state (stream, inp, run_l, &run_c));
 4691 
 4692       while (run_l < max_len && inp[run_l] == run_c) { run_l += 1; }
 4693 
 4694       /* Output a RUN instruction. */
 4695       if (run_l >= stream->min_match && run_l >= MIN_RUN)
 4696         {
 4697           if ((ret = xd3_emit_run (stream, stream->input_position,
 4698                        run_l, &run_c))) { return ret; }
 4699 
 4700           HANDLELAZY (run_l);
 4701         }
 4702     }
 4703 
 4704       /* If there is enough input remaining. */
 4705       if (DO_LARGE && (stream->input_position + LLOOK <= stream->avail_in))
 4706     {
 4707       if ((stream->input_position >= next_move_point) &&
 4708           (ret = xd3_srcwin_move_point (stream, & next_move_point)))
 4709         {
 4710           return ret;
 4711         }
 4712 
 4713       linx = xd3_checksum_hash (& stream->large_hash, lcksum);
 4714 
 4715       IF_DEBUG (xd3_verify_large_state (stream, inp, lcksum));
 4716 
 4717       if (stream->large_table[linx] != 0)
 4718         {
 4719           /* the match_setup will fail if the source window has
 4720            * been decided and the match lies outside it.
 4721            * OPT: Consider forcing a window at this point to
 4722            * permit a new source window. */
 4723           xoff_t adj_offset =
 4724         xd3_source_cksum_offset(stream,
 4725                     stream->large_table[linx] -
 4726                     HASH_CKOFFSET);
 4727           if (xd3_source_match_setup (stream, adj_offset) == 0)
 4728         {
 4729           if ((ret = xd3_source_extend_match (stream)))
 4730             {
 4731               return ret;
 4732             }
 4733 
 4734           /* Update stream position.  match_fwd is zero if no
 4735            * match. */
 4736           if (stream->match_fwd > 0)
 4737             {
 4738               HANDLELAZY (stream->match_fwd);
 4739             }
 4740         }
 4741         }
 4742     }
 4743 
 4744       /* Small matches. */
 4745       if (DO_SMALL)
 4746     {
 4747       sinx = xd3_checksum_hash (& stream->small_hash, scksum);
 4748 
 4749       /* Verify incremental state in debugging mode. */
 4750       IF_DEBUG (xd3_verify_small_state (stream, inp, scksum));
 4751 
 4752       /* Search for the longest match */
 4753       if (stream->small_table[sinx] != 0)
 4754         {
 4755           match_length = xd3_smatch (stream,
 4756                      stream->small_table[sinx],
 4757                      scksum,
 4758                      & match_offset);
 4759         }
 4760       else
 4761         {
 4762           match_length = 0;
 4763         }
 4764 
 4765       /* Insert a hash for this string. */
 4766       xd3_scksum_insert (stream, sinx, scksum, stream->input_position);
 4767 
 4768       /* Maybe output a COPY instruction */
 4769       if (match_length >= stream->min_match)
 4770         {
 4771           IF_DEBUG2 ({
 4772         static int x = 0;
 4773         DP(RINT "[target match:%d] <inp %u %u>  <cpy %u %u> "
 4774            "(-%d) [ %u bytes ]\n",
 4775            x++,
 4776            stream->input_position,
 4777            stream->input_position + match_length,
 4778            match_offset,
 4779            match_offset + match_length,
 4780            stream->input_position - match_offset,
 4781            match_length);
 4782           });
 4783 
 4784           if ((ret = xd3_found_match (stream,
 4785                       /* decoder position */
 4786                       stream->input_position,
 4787                       /* length */ match_length,
 4788                       /* address */ (xoff_t) match_offset,
 4789                       /* is_source */ 0)))
 4790         {
 4791           return ret;
 4792         }
 4793 
 4794           /* Copy instruction. */
 4795           HANDLELAZY (match_length);
 4796         }
 4797     }
 4798 
 4799       /* The logic above prevents excess work during lazy matching by
 4800        * increasing min_match to avoid smaller matches.  Each time we
 4801        * advance stream->input_position by one, the minimum match
 4802        * shortens as well.  */
 4803       if (stream->min_match > MIN_MATCH)
 4804     {
 4805       stream->min_match -= 1;
 4806     }
 4807 
 4808     updateone:
 4809 
 4810       /* See if there are no more incremental cksums to compute. */
 4811       if (stream->input_position + SLOOK == stream->avail_in)
 4812     {
 4813       goto loopnomore;
 4814     }
 4815 
 4816       /* Compute next RUN, CKSUM */
 4817       if (DO_RUN)
 4818     {
 4819       NEXTRUN (inp[SLOOK]);
 4820     }
 4821 
 4822       if (DO_SMALL)
 4823     {
 4824       scksum = xd3_small_cksum_update (&scksum_state, inp, SLOOK);
 4825     }
 4826 
 4827       if (DO_LARGE && (stream->input_position + LLOOK < stream->avail_in))
 4828     {
 4829       lcksum = xd3_large_cksum_update (lcksum, inp, LLOOK);
 4830     }
 4831     }
 4832 
 4833  loopnomore:
 4834   return 0;
 4835 }
 4836 
 4837 #endif /* XD3_ENCODER */
 4838 #endif /* __XDELTA3_C_TEMPLATE_PASS__ */