"Fossies" - the Fresh Open Source Software Archive 
Member "pigz-2.8/pigz.c" (20 Aug 2023, 180222 Bytes) of package /linux/privat/pigz-2.8.tar.gz:
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style:
standard) with prefixed line numbers and
code folding option.
Alternatively you can here
view or
download the uninterpreted source code file.
For more information about "pigz.c" see the
Fossies "Dox" file reference documentation and the latest
Fossies "Diffs" side-by-side code changes report:
2.7_vs_2.8.
1 /* pigz.c -- parallel implementation of gzip
2 * Copyright (C) 2007-2023 Mark Adler
3 * Version 2.8 19 Aug 2023 Mark Adler
4 */
5
6 /*
7 This software is provided 'as-is', without any express or implied
8 warranty. In no event will the author be held liable for any damages
9 arising from the use of this software.
10
11 Permission is granted to anyone to use this software for any purpose,
12 including commercial applications, and to alter it and redistribute it
13 freely, subject to the following restrictions:
14
15 1. The origin of this software must not be misrepresented; you must not
16 claim that you wrote the original software. If you use this software
17 in a product, an acknowledgment in the product documentation would be
18 appreciated but is not required.
19 2. Altered source versions must be plainly marked as such, and must not be
20 misrepresented as being the original software.
21 3. This notice may not be removed or altered from any source distribution.
22
23 Mark Adler
24 madler@alumni.caltech.edu
25
26 */
27
28 /* Version history:
29 1.0 17 Jan 2007 First version, pipe only
30 1.1 28 Jan 2007 Avoid void * arithmetic (some compilers don't get that)
31 Add note about requiring zlib 1.2.3
32 Allow compression level 0 (no compression)
33 Completely rewrite parallelism -- add a write thread
34 Use deflateSetDictionary() to make use of history
35 Tune argument defaults to best performance on four cores
36 1.2.1 1 Feb 2007 Add long command line options, add all gzip options
37 Add debugging options
38 1.2.2 19 Feb 2007 Add list (--list) function
39 Process file names on command line, write .gz output
40 Write name and time in gzip header, set output file time
41 Implement all command line options except --recursive
42 Add --keep option to prevent deleting input files
43 Add thread tracing information with -vv used
44 Copy crc32_combine() from zlib (shared libraries issue)
45 1.3 25 Feb 2007 Implement --recursive
46 Expand help to show all options
47 Show help if no arguments or output piping are provided
48 Process options in GZIP environment variable
49 Add progress indicator to write thread if --verbose
50 1.4 4 Mar 2007 Add --independent to facilitate damaged file recovery
51 Reallocate jobs for new --blocksize or --processes
52 Do not delete original if writing to stdout
53 Allow --processes 1, which does no threading
54 Add NOTHREAD define to compile without threads
55 Incorporate license text from zlib in source code
56 1.5 25 Mar 2007 Reinitialize jobs for new compression level
57 Copy attributes and owner from input file to output file
58 Add decompression and testing
59 Add -lt (or -ltv) to show all entries and proper lengths
60 Add decompression, testing, listing of LZW (.Z) files
61 Only generate and show trace log if DEBUG defined
62 Take "-" argument to mean read file from stdin
63 1.6 30 Mar 2007 Add zlib stream compression (--zlib), and decompression
64 1.7 29 Apr 2007 Decompress first entry of a zip file (if deflated)
65 Avoid empty deflate blocks at end of deflate stream
66 Show zlib check value (Adler-32) when listing
67 Don't complain when decompressing empty file
68 Warn about trailing junk for gzip and zlib streams
69 Make listings consistent, ignore gzip extra flags
70 Add zip stream compression (--zip)
71 1.8 13 May 2007 Document --zip option in help output
72 2.0 19 Oct 2008 Complete rewrite of thread usage and synchronization
73 Use polling threads and a pool of memory buffers
74 Remove direct pthread library use, hide in yarn.c
75 2.0.1 20 Oct 2008 Check version of zlib at compile time, need >= 1.2.3
76 2.1 24 Oct 2008 Decompress with read, write, inflate, and check threads
77 Remove spurious use of ctime_r(), ctime() more portable
78 Change application of job->calc lock to be a semaphore
79 Detect size of off_t at run time to select %lu vs. %llu
80 #define large file support macro even if not __linux__
81 Remove _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS is enough
82 Detect file-too-large error and report, blame build
83 Replace check combination routines with those from zlib
84 2.1.1 28 Oct 2008 Fix a leak for files with an integer number of blocks
85 Update for yarn 1.1 (yarn_prefix and yarn_abort)
86 2.1.2 30 Oct 2008 Work around use of beta zlib in production systems
87 2.1.3 8 Nov 2008 Don't use zlib combination routines, put back in pigz
88 2.1.4 9 Nov 2008 Fix bug when decompressing very short files
89 2.1.5 20 Jul 2009 Added 2008, 2009 to --license statement
90 Allow numeric parameter immediately after -p or -b
91 Enforce parameter after -p, -b, -s, before other options
92 Enforce numeric parameters to have only numeric digits
93 Try to determine the number of processors for -p default
94 Fix --suffix short option to be -S to match gzip [Bloch]
95 Decompress if executable named "unpigz" [Amundsen]
96 Add a little bit of testing to Makefile
97 2.1.6 17 Jan 2010 Added pigz.spec to distribution for RPM systems [Brown]
98 Avoid some compiler warnings
99 Process symbolic links if piping to stdout [Hoffstätte]
100 Decompress if executable named "gunzip" [Hoffstätte]
101 Allow ".tgz" suffix [Chernookiy]
102 Fix adler32 comparison on .zz files
103 2.1.7 17 Dec 2011 Avoid unused parameter warning in reenter()
104 Don't assume 2's complement ints in compress_thread()
105 Replicate gzip -cdf cat-like behavior
106 Replicate gzip -- option to suppress option decoding
107 Test output from make test instead of showing it
108 Updated pigz.spec to install unpigz, pigz.1 [Obermaier]
109 Add PIGZ environment variable [Mueller]
110 Replicate gzip suffix search when decoding or listing
111 Fix bug in load() to set in_left to zero on end of file
112 Do not check suffix when input file won't be modified
113 Decompress to stdout if name is "*cat" [Hayasaka]
114 Write data descriptor signature to be like Info-ZIP
115 Update and sort options list in help
116 Use CC variable for compiler in Makefile
117 Exit with code 2 if a warning has been issued
118 Fix thread synchronization problem when tracing
119 Change macro name MAX to MAX2 to avoid library conflicts
120 Determine number of processors on HP-UX [Lloyd]
121 2.2 31 Dec 2011 Check for expansion bound busting (e.g. modified zlib)
122 Make the "threads" list head global variable volatile
123 Fix construction and printing of 32-bit check values
124 Add --rsyncable functionality
125 2.2.1 1 Jan 2012 Fix bug in --rsyncable buffer management
126 2.2.2 1 Jan 2012 Fix another bug in --rsyncable buffer management
127 2.2.3 15 Jan 2012 Remove volatile in yarn.c
128 Reduce the number of input buffers
129 Change initial rsyncable hash to comparison value
130 Improve the efficiency of arriving at a byte boundary
131 Add thread portability #defines from yarn.c
132 Have rsyncable compression be independent of threading
133 Fix bug where constructed dictionaries not being used
134 2.2.4 11 Mar 2012 Avoid some return value warnings
135 Improve the portability of printing the off_t type
136 Check for existence of compress binary before using
137 Update zlib version checking to 1.2.6 for new functions
138 Fix bug in zip (-K) output
139 Fix license in pigz.spec
140 Remove thread portability #defines in pigz.c
141 2.2.5 28 Jul 2012 Avoid race condition in free_pool()
142 Change suffix to .tar when decompressing or listing .tgz
143 Print name of executable in error messages
144 Show help properly when the name is unpigz or gunzip
145 Fix permissions security problem before output is closed
146 2.3 3 Mar 2013 Don't complain about missing suffix on stdout
147 Put all global variables in a structure for readability
148 Do not decompress concatenated zlib streams (just gzip)
149 Add option for compression level 11 to use zopfli
150 Fix handling of junk after compressed data
151 2.3.1 9 Oct 2013 Fix builds of pigzt and pigzn to include zopfli
152 Add -lm, needed to link log function on some systems
153 Respect LDFLAGS in Makefile, use CFLAGS consistently
154 Add memory allocation tracking
155 Fix casting error in uncompressed length calculation
156 Update zopfli to Mar 10, 2013 Google state
157 Support zopfli in single thread case
158 Add -F, -I, -M, and -O options for zopfli tuning
159 2.3.2 24 Jan 2015 Change whereis to which in Makefile for portability
160 Return zero exit code when only warnings are issued
161 Increase speed of unlzw (Unix compress decompression)
162 Update zopfli to current google state
163 Allow larger maximum blocksize (-b), now 512 MiB
164 Do not require that -d precede -N, -n, -T options
165 Strip any path from header name for -dN or -dNT
166 Remove use of PATH_MAX (PATH_MAX is not reliable)
167 Do not abort on inflate data error, do remaining files
168 Check gzip header CRC if present
169 Improve decompression error detection and reporting
170 2.3.3 24 Jan 2015 Portability improvements
171 Update copyright years in documentation
172 2.3.4 1 Oct 2016 Fix an out of bounds access due to invalid LZW input
173 Add an extra sync marker between independent blocks
174 Add zlib version for verbose version option (-vV)
175 Permit named pipes as input (e.g. made by mkfifo())
176 Fix a bug in -r directory traversal
177 Add warning for a zip file entry 4 GiB or larger
178 2.4 26 Dec 2017 Portability improvements
179 Produce Zip64 format when needed for --zip (>= 4 GiB)
180 Make -no-name compatible with gzip, add --time option
181 Add -m as a short option for --no-time
182 Check run-time zlib version to handle weak linking
183 Fix a concurrent read bug in --list operation
184 Process options first, for gzip compatibility
185 Add --synchronous (-Y) option to force device write
186 Disallow an empty suffix (e.g. --suffix '')
187 Return an exit code of 1 if any issues are encountered
188 Fix sign error in compression reduction percentage
189 2.5 23 Jan 2021 Add --alias/-A option to set .zip name for stdin input
190 Add --comment/-C option to add comment in .gz or .zip
191 Fix a bug that misidentified a multi-entry .zip
192 Fix a bug that did not emit double syncs for -i -p 1
193 Fix a bug in yarn that could try to access freed data
194 Do not delete multi-entry .zip files when extracting
195 Do not reject .zip entries with bit 11 set
196 Avoid a possible threads lock-order inversion
197 Ignore trailing junk after a gzip stream by default
198 2.6 6 Feb 2021 Add --huffman/-H and --rle/U strategy options
199 Fix issue when compiling for no threads
200 Fail silently on a broken pipe
201 2.7 15 Jan 2022 Show time stamp only for the first gzip member
202 Show totals when listing more than one gzip member
203 Don't unlink input file if it has other links
204 Add documentation for environment variables
205 Fix bug when combining -l with -d
206 Exit with status of zero if skipping non .gz files
207 Permit Huffman only (-H) when not compiling with zopfli
208 2.8 19 Aug 2023 Fix version bug when compiling with zlib 1.3
209 Save a modification time only for regular files
210 Write all available uncompressed data on an error
211 */
212
213 #define VERSION "pigz 2.8"
214
215 /* To-do:
216 - make source portable for Windows, VMS, etc. (see gzip source code)
217 - make build portable (currently good for Unixish)
218 */
219
220 /*
221 pigz compresses using threads to make use of multiple processors and cores.
222 The input is broken up into 128 KB chunks with each compressed in parallel.
223 The individual check value for each chunk is also calculated in parallel.
224 The compressed data is written in order to the output, and a combined check
225 value is calculated from the individual check values.
226
227 The compressed data format generated is in the gzip, zlib, or single-entry
228 zip format using the deflate compression method. The compression produces
229 partial raw deflate streams which are concatenated by a single write thread
230 and wrapped with the appropriate header and trailer, where the trailer
231 contains the combined check value.
232
233 Each partial raw deflate stream is terminated by an empty stored block
234 (using the Z_SYNC_FLUSH option of zlib), in order to end that partial bit
235 stream at a byte boundary, unless that partial stream happens to already end
236 at a byte boundary (the latter requires zlib 1.2.6 or later). Ending on a
237 byte boundary allows the partial streams to be concatenated simply as
238 sequences of bytes. This adds a very small four to five byte overhead
239 (average 3.75 bytes) to the output for each input chunk.
240
241 The default input block size is 128K, but can be changed with the -b option.
242 The number of compress threads is set by default to 8, which can be changed
243 using the -p option. Specifying -p 1 avoids the use of threads entirely.
244 pigz will try to determine the number of processors in the machine, in which
245 case if that number is two or greater, pigz will use that as the default for
246 -p instead of 8.
247
248 The input blocks, while compressed independently, have the last 32K of the
249 previous block loaded as a preset dictionary to preserve the compression
250 effectiveness of deflating in a single thread. This can be turned off using
251 the --independent or -i option, so that the blocks can be decompressed
252 independently for partial error recovery or for random access.
253
254 Decompression can't be parallelized over an arbitrary number of processors
255 like compression can be, at least not without specially prepared deflate
256 streams for that purpose. As a result, pigz uses a single thread (the main
257 thread) for decompression, but will create three other threads for reading,
258 writing, and check calculation, which can speed up decompression under some
259 circumstances. Parallel decompression can be turned off by specifying one
260 process (-dp 1 or -tp 1).
261
262 pigz requires zlib 1.2.1 or later to allow setting the dictionary when doing
263 raw deflate. Since zlib 1.2.3 corrects security vulnerabilities in zlib
264 version 1.2.1 and 1.2.2, conditionals check for zlib 1.2.3 or later during
265 the compilation of pigz.c. zlib 1.2.4 includes some improvements to
266 Z_FULL_FLUSH and deflateSetDictionary() that permit identical output for
267 pigz with and without threads, which is not possible with zlib 1.2.3. This
268 may be important for uses of pigz -R where small changes in the contents
269 should result in small changes in the archive for rsync. Note that due to
270 the details of how the lower levels of compression result in greater speed,
271 compression level 3 and below does not permit identical pigz output with and
272 without threads.
273
274 pigz uses the POSIX pthread library for thread control and communication,
275 through the yarn.h interface to yarn.c. yarn.c can be replaced with
276 equivalent implementations using other thread libraries. pigz can be
277 compiled with NOTHREAD #defined to not use threads at all (in which case
278 pigz will not be able to live up to the "parallel" in its name).
279 */
280
281 /*
282 Details of parallel compression implementation:
283
284 When doing parallel compression, pigz uses the main thread to read the input
285 in 'size' sized chunks (see -b), and puts those in a compression job list,
286 each with a sequence number to keep track of the ordering. If it is not the
287 first chunk, then that job also points to the previous input buffer, from
288 which the last 32K will be used as a dictionary (unless -i is specified).
289 This sets a lower limit of 32K on 'size'.
290
291 pigz launches up to 'procs' compression threads (see -p). Each compression
292 thread continues to look for jobs in the compression list and perform those
293 jobs until instructed to return. When a job is pulled, the dictionary, if
294 provided, will be loaded into the deflate engine and then that input buffer
295 is dropped for reuse. Then the input data is compressed into an output
296 buffer that grows in size if necessary to hold the compressed data. The job
297 is then put into the write job list, sorted by the sequence number. The
298 compress thread however continues to calculate the check value on the input
299 data, either a CRC-32 or Adler-32, possibly in parallel with the write
300 thread writing the output data. Once that's done, the compress thread drops
301 the input buffer and also releases the lock on the check value so that the
302 write thread can combine it with the previous check values. The compress
303 thread has then completed that job, and goes to look for another.
304
305 All of the compress threads are left running and waiting even after the last
306 chunk is processed, so that they can support the next input to be compressed
307 (more than one input file on the command line). Once pigz is done, it will
308 call all the compress threads home (that'll do pig, that'll do).
309
310 Before starting to read the input, the main thread launches the write thread
311 so that it is ready pick up jobs immediately. The compress thread puts the
312 write jobs in the list in sequence sorted order, so that the first job in
313 the list is always has the lowest sequence number. The write thread waits
314 for the next write job in sequence, and then gets that job. The job still
315 holds its input buffer, from which the write thread gets the input buffer
316 length for use in check value combination. Then the write thread drops that
317 input buffer to allow its reuse. Holding on to the input buffer until the
318 write thread starts also has the benefit that the read and compress threads
319 can't get way ahead of the write thread and build up a large backlog of
320 unwritten compressed data. The write thread will write the compressed data,
321 drop the output buffer, and then wait for the check value to be unlocked by
322 the compress thread. Then the write thread combines the check value for this
323 chunk with the total check value for eventual use in the trailer. If this is
324 not the last chunk, the write thread then goes back to look for the next
325 output chunk in sequence. After the last chunk, the write thread returns and
326 joins the main thread. Unlike the compress threads, a new write thread is
327 launched for each input stream. The write thread writes the appropriate
328 header and trailer around the compressed data.
329
330 The input and output buffers are reused through their collection in pools.
331 Each buffer has a use count, which when decremented to zero returns the
332 buffer to the respective pool. Each input buffer has up to three parallel
333 uses: as the input for compression, as the data for the check value
334 calculation, and as a dictionary for compression. Each output buffer has
335 only one use, which is as the output of compression followed serially as
336 data to be written. The input pool is limited in the number of buffers, so
337 that reading does not get way ahead of compression and eat up memory with
338 more input than can be used. The limit is approximately two times the number
339 of compression threads. In the case that reading is fast as compared to
340 compression, that number allows a second set of buffers to be read while the
341 first set of compressions are being performed. The number of output buffers
342 is not directly limited, but is indirectly limited by the release of input
343 buffers to about the same number.
344 */
345
346 // Portability defines.
347 #define _FILE_OFFSET_BITS 64 // Use large file functions
348 #define _LARGE_FILES // Same thing for AIX
349 #define _XOPEN_SOURCE 700 // For POSIX 2008
350
351 // Included headers and what is expected from each.
352 #include <stdio.h> // fflush(), fprintf(), fputs(), getchar(), putc(),
353 // puts(), printf(), vasprintf(), stderr, EOF, NULL,
354 // SEEK_END, size_t, off_t
355 #include <stdlib.h> // exit(), malloc(), free(), realloc(), atol(), atoi(),
356 // getenv()
357 #include <stdarg.h> // va_start(), va_arg(), va_end(), va_list
358 #include <string.h> // memset(), memchr(), memcpy(), strcmp(), strcpy(),
359 // strncpy(), strlen(), strcat(), strrchr(),
360 // strerror()
361 #include <errno.h> // errno, EEXIST
362 #include <assert.h> // assert()
363 #include <time.h> // ctime(), time(), time_t, mktime()
364 #include <signal.h> // signal(), SIGINT
365 #include <sys/types.h> // ssize_t
366 #include <sys/stat.h> // chmod(), stat(), fstat(), lstat(), struct stat,
367 // S_IFDIR, S_IFLNK, S_IFMT, S_IFREG
368 #include <sys/time.h> // utimes(), gettimeofday(), struct timeval
369 #include <unistd.h> // unlink(), _exit(), read(), write(), close(),
370 // lseek(), isatty(), chown(), fsync()
371 #include <fcntl.h> // open(), O_CREAT, O_EXCL, O_RDONLY, O_TRUNC,
372 // O_WRONLY, fcntl(), F_FULLFSYNC
373 #include <dirent.h> // opendir(), readdir(), closedir(), DIR,
374 // struct dirent
375 #include <limits.h> // UINT_MAX, INT_MAX
376 #if __STDC_VERSION__-0 >= 199901L || __GNUC__-0 >= 3
377 # include <inttypes.h> // intmax_t, uintmax_t
378 typedef uintmax_t length_t;
379 typedef uint32_t crc_t;
380 typedef uint_least16_t prefix_t;
381 #else
382 typedef unsigned long length_t;
383 typedef unsigned long crc_t;
384 typedef unsigned prefix_t;
385 #endif
386
387 #ifdef PIGZ_DEBUG
388 # if defined(__APPLE__)
389 # include <malloc/malloc.h>
390 # define MALLOC_SIZE(p) malloc_size(p)
391 # elif defined (__linux)
392 # include <malloc.h>
393 # define MALLOC_SIZE(p) malloc_usable_size(p)
394 # elif defined (_WIN32) || defined(_WIN64)
395 # include <malloc.h>
396 # define MALLOC_SIZE(p) _msize(p)
397 # else
398 # define MALLOC_SIZE(p) (0)
399 # endif
400 #endif
401
402 #ifdef __hpux
403 # include <sys/param.h>
404 # include <sys/pstat.h>
405 #endif
406
407 #ifndef S_IFLNK
408 # define S_IFLNK 0
409 #endif
410
411 #ifdef __MINGW32__
412 # define chown(p,o,g) 0
413 # define utimes(p,t) 0
414 # define lstat(p,s) stat(p,s)
415 # define _exit(s) exit(s)
416 #endif
417
418 #include "zlib.h" // deflateInit2(), deflateReset(), deflate(),
419 // deflateEnd(), deflateSetDictionary(), crc32(),
420 // adler32(), inflateBackInit(), inflateBack(),
421 // inflateBackEnd(), Z_DEFAULT_COMPRESSION,
422 // Z_DEFAULT_STRATEGY, Z_DEFLATED, Z_NO_FLUSH, Z_NULL,
423 // Z_OK, Z_SYNC_FLUSH, z_stream
424 #if !defined(ZLIB_VERNUM) || ZLIB_VERNUM < 0x1230
425 # error "Need zlib version 1.2.3 or later"
426 #endif
427
428 #ifndef NOTHREAD
429 # include "yarn.h" // thread, launch(), join(), join_all(), lock,
430 // new_lock(), possess(), twist(), wait_for(),
431 // release(), peek_lock(), free_lock(), yarn_name
432 #endif
433
434 #ifndef NOZOPFLI
435 # include "zopfli/src/zopfli/deflate.h" // ZopfliDeflatePart(),
436 // ZopfliInitOptions(),
437 // ZopfliOptions
438 #endif
439
440 #include "try.h" // try, catch, always, throw, drop, punt, ball_t
441
442 // For local functions and globals.
443 #define local static
444
445 // Prevent end-of-line conversions on MSDOSish operating systems.
446 #if defined(MSDOS) || defined(OS2) || defined(_WIN32) || defined(__CYGWIN__)
447 # include <io.h> // setmode(), O_BINARY, _commit() for _WIN32
448 # define SET_BINARY_MODE(fd) setmode(fd, O_BINARY)
449 #else
450 # define SET_BINARY_MODE(fd)
451 #endif
452
453 // Release an allocated pointer, if allocated, and mark as unallocated.
454 #define RELEASE(ptr) \
455 do { \
456 if ((ptr) != NULL) { \
457 FREE(ptr); \
458 ptr = NULL; \
459 } \
460 } while (0)
461
462 // Sliding dictionary size for deflate.
463 #define DICT 32768U
464
465 // Largest power of 2 that fits in an unsigned int. Used to limit requests to
466 // zlib functions that use unsigned int lengths.
467 #define MAXP2 (UINT_MAX - (UINT_MAX >> 1))
468
469 /* rsyncable constants -- RSYNCBITS is the number of bits in the mask for
470 comparison. For random input data, there will be a hit on average every
471 1<<RSYNCBITS bytes. So for an RSYNCBITS of 12, there will be an average of
472 one hit every 4096 bytes, resulting in a mean block size of 4096. RSYNCMASK
473 is the resulting bit mask. RSYNCHIT is what the hash value is compared to
474 after applying the mask.
475
476 The choice of 12 for RSYNCBITS is consistent with the original rsyncable
477 patch for gzip which also uses a 12-bit mask. This results in a relatively
478 small hit to compression, on the order of 1.5% to 3%. A mask of 13 bits can
479 be used instead if a hit of less than 1% to the compression is desired, at
480 the expense of more blocks transmitted for rsync updates. (Your mileage may
481 vary.)
482
483 This implementation of rsyncable uses a different hash algorithm than what
484 the gzip rsyncable patch uses in order to provide better performance in
485 several regards. The algorithm is simply to shift the hash value left one
486 bit and exclusive-or that with the next byte. This is masked to the number
487 of hash bits (RSYNCMASK) and compared to all ones except for a zero in the
488 top bit (RSYNCHIT). This rolling hash has a very small window of 19 bytes
489 (RSYNCBITS+7). The small window provides the benefit of much more rapid
490 resynchronization after a change, than does the 4096-byte window of the gzip
491 rsyncable patch.
492
493 The comparison value is chosen to avoid matching any repeated bytes or short
494 sequences. The gzip rsyncable patch on the other hand uses a sum and zero
495 for comparison, which results in certain bad behaviors, such as always
496 matching everywhere in a long sequence of zeros. Such sequences occur
497 frequently in tar files.
498
499 This hash efficiently discards history older than 19 bytes simply by
500 shifting that data past the top of the mask -- no history needs to be
501 retained to undo its impact on the hash value, as is needed for a sum.
502
503 The choice of the comparison value (RSYNCHIT) has the virtue of avoiding
504 extremely short blocks. The shortest block is five bytes (RSYNCBITS-7) from
505 hit to hit, and is unlikely. Whereas with the gzip rsyncable algorithm,
506 blocks of one byte are not only possible, but in fact are the most likely
507 block size.
508
509 Thanks and acknowledgement to Kevin Day for his experimentation and insights
510 on rsyncable hash characteristics that led to some of the choices here.
511 */
512 #define RSYNCBITS 12
513 #define RSYNCMASK ((1U << RSYNCBITS) - 1)
514 #define RSYNCHIT (RSYNCMASK >> 1)
515
516 // Initial pool counts and sizes -- INBUFS is the limit on the number of input
517 // spaces as a function of the number of processors (used to throttle the
518 // creation of compression jobs), OUTPOOL is the initial size of the output
519 // data buffer, chosen to make resizing of the buffer very unlikely and to
520 // allow prepending with a dictionary for use as an input buffer for zopfli.
521 #define INBUFS(p) (((p)<<1)+3)
522 #define OUTPOOL(s) ((s)+((s)>>4)+DICT)
523
524 // Input buffer size, and augmentation for re-inserting a central header.
525 #define BUF 32768
526 #define CEN 42
527 #define EXT (BUF + CEN) // provide enough room to unget a header
528
529 // Globals (modified by main thread only when it's the only thread).
530 local struct {
531 int ret; // pigz return code
532 char *prog; // name by which pigz was invoked
533 int ind; // input file descriptor
534 int outd; // output file descriptor
535 char *inf; // input file name (allocated)
536 size_t inz; // input file name allocated size
537 char *outf; // output file name (allocated)
538 int verbosity; // 0 = quiet, 1 = normal, 2 = verbose, 3 = trace
539 int headis; // 1 to store name, 2 to store date, 3 both
540 int pipeout; // write output to stdout even if file
541 int keep; // true to prevent deletion of input file
542 int force; // true to overwrite, compress links, cat
543 int sync; // true to flush output file
544 int form; // gzip = 0, zlib = 1, zip = 2 or 3
545 int magic1; // first byte of possible header when decoding
546 int recurse; // true to dive down into directory structure
547 char *sufx; // suffix to use (".gz" or user supplied)
548 char *name; // name for gzip or zip header
549 char *alias; // name for zip header when input is stdin
550 char *comment; // comment for gzip or zip header.
551 time_t mtime; // time stamp from input file for gzip header
552 int list; // true to list files instead of compress
553 int first; // true if we need to print listing header
554 int decode; // 0 to compress, 1 to decompress, 2 to test
555 int level; // compression level
556 int strategy; // compression strategy
557 #ifndef NOZOPFLI
558 ZopfliOptions zopts; // zopfli compression options
559 #endif
560 int rsync; // true for rsync blocking
561 int procs; // maximum number of compression threads (>= 1)
562 int setdict; // true to initialize dictionary in each thread
563 size_t block; // uncompressed input size per thread (>= 32K)
564 crc_t shift; // pre-calculated CRC-32 shift for length block
565
566 // saved gzip/zip header data for decompression, testing, and listing
567 time_t stamp; // time stamp from gzip header
568 char *hname; // name from header (allocated)
569 char *hcomm; // comment from header (allocated)
570 unsigned long zip_crc; // local header crc
571 length_t zip_clen; // local header compressed length
572 length_t zip_ulen; // local header uncompressed length
573 int zip64; // true if has zip64 extended information
574
575 // globals for decompression and listing buffered reading
576 unsigned char in_buf[EXT]; // input buffer
577 unsigned char *in_next; // next unused byte in buffer
578 size_t in_left; // number of unused bytes in buffer
579 int in_eof; // true if reached end of file on input
580 int in_short; // true if last read didn't fill buffer
581 length_t in_tot; // total bytes read from input
582 length_t out_tot; // total bytes written to output
583 unsigned long out_check; // check value of output
584
585 #ifndef NOTHREAD
586 // globals for decompression parallel reading
587 unsigned char in_buf2[EXT]; // second buffer for parallel reads
588 size_t in_len; // data waiting in next buffer
589 int in_which; // -1: start, 0: in_buf2, 1: in_buf
590 lock *load_state; // value = 0 to wait, 1 to read a buffer
591 thread *load_thread; // load_read() thread for joining
592 #endif
593 } g;
594
595 local void message(char *fmt, va_list ap) {
596 if (g.verbosity > 0) {
597 fprintf(stderr, "%s: ", g.prog);
598 vfprintf(stderr, fmt, ap);
599 putc('\n', stderr);
600 fflush(stderr);
601 }
602 }
603
604 // Display a complaint with the program name on stderr.
605 local int complain(char *fmt, ...) {
606 g.ret = 1;
607 va_list ap;
608 va_start(ap, fmt);
609 message(fmt, ap);
610 va_end(ap);
611 return 0;
612 }
613
614 // Same as complain(), but don't force a bad return code.
615 local int grumble(char *fmt, ...) {
616 va_list ap;
617 va_start(ap, fmt);
618 message(fmt, ap);
619 va_end(ap);
620 return 0;
621 }
622
623 #ifdef PIGZ_DEBUG
624
625 // Memory tracking.
626
627 #define MAXMEM 131072 // maximum number of tracked pointers
628
629 local struct mem_track_s {
630 size_t num; // current number of allocations
631 size_t size; // total size of current allocations
632 size_t tot; // maximum number of allocations
633 size_t max; // maximum size of allocations
634 #ifndef NOTHREAD
635 lock *lock; // lock for access across threads
636 #endif
637 size_t have; // number in array (possibly != num)
638 void *mem[MAXMEM]; // sorted array of allocated pointers
639 } mem_track;
640
641 #ifndef NOTHREAD
642 # define mem_track_grab(m) possess((m)->lock)
643 # define mem_track_drop(m) release((m)->lock)
644 #else
645 # define mem_track_grab(m)
646 # define mem_track_drop(m)
647 #endif
648
649 // Return the leftmost insert location of ptr in the sorted list mem->mem[],
650 // which currently has mem->have elements. If ptr is already in the list, the
651 // returned value will point to its first occurrence. The return location will
652 // be one after the last element if ptr is greater than all of the elements.
653 local size_t search_track(struct mem_track_s *mem, void *ptr) {
654 ptrdiff_t left = 0;
655 ptrdiff_t right = mem->have - 1;
656 while (left <= right) {
657 ptrdiff_t mid = (left + right) >> 1;
658 if (mem->mem[mid] < ptr)
659 left = mid + 1;
660 else
661 right = mid - 1;
662 }
663 return left;
664 }
665
666 // Insert ptr in the sorted list mem->mem[] and update the memory allocation
667 // statistics.
668 local void insert_track(struct mem_track_s *mem, void *ptr) {
669 mem_track_grab(mem);
670 assert(mem->have < MAXMEM && "increase MAXMEM in source and try again");
671 size_t i = search_track(mem, ptr);
672 if (i < mem->have && mem->mem[i] == ptr)
673 complain("mem_track: duplicate pointer %p\n", ptr);
674 memmove(&mem->mem[i + 1], &mem->mem[i],
675 (mem->have - i) * sizeof(void *));
676 mem->mem[i] = ptr;
677 mem->have++;
678 mem->num++;
679 mem->size += MALLOC_SIZE(ptr);
680 if (mem->num > mem->tot)
681 mem->tot = mem->num;
682 if (mem->size > mem->max)
683 mem->max = mem->size;
684 mem_track_drop(mem);
685 }
686
687 // Find and delete ptr from the sorted list mem->mem[] and update the memory
688 // allocation statistics.
689 local void delete_track(struct mem_track_s *mem, void *ptr) {
690 mem_track_grab(mem);
691 size_t i = search_track(mem, ptr);
692 if (i < mem->num && mem->mem[i] == ptr) {
693 memmove(&mem->mem[i], &mem->mem[i + 1],
694 (mem->have - (i + 1)) * sizeof(void *));
695 mem->have--;
696 }
697 else
698 complain("mem_track: missing pointer %p\n", ptr);
699 mem->num--;
700 mem->size -= MALLOC_SIZE(ptr);
701 mem_track_drop(mem);
702 }
703
704 local void *malloc_track(struct mem_track_s *mem, size_t size) {
705 void *ptr = malloc(size);
706 if (ptr != NULL)
707 insert_track(mem, ptr);
708 return ptr;
709 }
710
711 local void *realloc_track(struct mem_track_s *mem, void *ptr, size_t size) {
712 if (ptr == NULL)
713 return malloc_track(mem, size);
714 delete_track(mem, ptr);
715 void *got = realloc(ptr, size);
716 insert_track(mem, got == NULL ? ptr : got);
717 return got;
718 }
719
720 local void free_track(struct mem_track_s *mem, void *ptr) {
721 if (ptr != NULL) {
722 delete_track(mem, ptr);
723 free(ptr);
724 }
725 }
726
727 #ifndef NOTHREAD
728 local void *yarn_malloc(size_t size) {
729 return malloc_track(&mem_track, size);
730 }
731
732 local void yarn_free(void *ptr) {
733 free_track(&mem_track, ptr);
734 }
735 #endif
736
737 local voidpf zlib_alloc(voidpf opaque, uInt items, uInt size) {
738 return malloc_track(opaque, items * (size_t)size);
739 }
740
741 local void zlib_free(voidpf opaque, voidpf address) {
742 free_track(opaque, address);
743 }
744
745 #define REALLOC(p, s) realloc_track(&mem_track, p, s)
746 #define FREE(p) free_track(&mem_track, p)
747 #define OPAQUE (&mem_track)
748 #define ZALLOC zlib_alloc
749 #define ZFREE zlib_free
750
751 #else // !PIGZ_DEBUG
752
753 #define REALLOC realloc
754 #define FREE free
755 #define OPAQUE Z_NULL
756 #define ZALLOC Z_NULL
757 #define ZFREE Z_NULL
758
759 #endif
760
761 // Assured memory allocation.
762 local void *alloc(void *ptr, size_t size) {
763 ptr = REALLOC(ptr, size);
764 if (ptr == NULL)
765 throw(ENOMEM, "not enough memory");
766 return ptr;
767 }
768
769 #ifdef PIGZ_DEBUG
770
771 // Logging.
772
773 // Starting time of day for tracing.
774 local struct timeval start;
775
776 // Trace log.
777 local struct log {
778 struct timeval when; // time of entry
779 char *msg; // message
780 struct log *next; // next entry
781 } *log_head, **log_tail = NULL;
782 #ifndef NOTHREAD
783 local lock *log_lock = NULL;
784 #endif
785
786 // Maximum log entry length.
787 #define MAXMSG 256
788
789 // Set up log (call from main thread before other threads launched).
790 local void log_init(void) {
791 if (log_tail == NULL) {
792 mem_track.num = 0;
793 mem_track.size = 0;
794 mem_track.num = 0;
795 mem_track.max = 0;
796 mem_track.have = 0;
797 #ifndef NOTHREAD
798 mem_track.lock = new_lock(0);
799 yarn_mem(yarn_malloc, yarn_free);
800 log_lock = new_lock(0);
801 #endif
802 log_head = NULL;
803 log_tail = &log_head;
804 }
805 }
806
807 // Add entry to trace log.
808 local void log_add(char *fmt, ...) {
809 struct timeval now;
810 struct log *me;
811 va_list ap;
812 char msg[MAXMSG];
813
814 gettimeofday(&now, NULL);
815 me = alloc(NULL, sizeof(struct log));
816 me->when = now;
817 va_start(ap, fmt);
818 vsnprintf(msg, MAXMSG, fmt, ap);
819 va_end(ap);
820 me->msg = alloc(NULL, strlen(msg) + 1);
821 strcpy(me->msg, msg);
822 me->next = NULL;
823 #ifndef NOTHREAD
824 assert(log_lock != NULL);
825 possess(log_lock);
826 #endif
827 *log_tail = me;
828 log_tail = &(me->next);
829 #ifndef NOTHREAD
830 twist(log_lock, BY, +1);
831 #endif
832 }
833
834 // Pull entry from trace log and print it, return false if empty.
835 local int log_show(void) {
836 struct log *me;
837 struct timeval diff;
838
839 if (log_tail == NULL)
840 return 0;
841 #ifndef NOTHREAD
842 possess(log_lock);
843 #endif
844 me = log_head;
845 if (me == NULL) {
846 #ifndef NOTHREAD
847 release(log_lock);
848 #endif
849 return 0;
850 }
851 log_head = me->next;
852 if (me->next == NULL)
853 log_tail = &log_head;
854 #ifndef NOTHREAD
855 twist(log_lock, BY, -1);
856 #endif
857 diff.tv_usec = me->when.tv_usec - start.tv_usec;
858 diff.tv_sec = me->when.tv_sec - start.tv_sec;
859 if (diff.tv_usec < 0) {
860 diff.tv_usec += 1000000L;
861 diff.tv_sec--;
862 }
863 fprintf(stderr, "trace %ld.%06ld %s\n",
864 (long)diff.tv_sec, (long)diff.tv_usec, me->msg);
865 fflush(stderr);
866 FREE(me->msg);
867 FREE(me);
868 return 1;
869 }
870
871 // Release log resources (need to do log_init() to use again).
872 local void log_free(void) {
873 struct log *me;
874
875 if (log_tail != NULL) {
876 #ifndef NOTHREAD
877 possess(log_lock);
878 #endif
879 while ((me = log_head) != NULL) {
880 log_head = me->next;
881 FREE(me->msg);
882 FREE(me);
883 }
884 #ifndef NOTHREAD
885 twist(log_lock, TO, 0);
886 free_lock(log_lock);
887 log_lock = NULL;
888 yarn_mem(malloc, free);
889 free_lock(mem_track.lock);
890 #endif
891 log_tail = NULL;
892 }
893 }
894
895 // Show entries until no more, free log.
896 local void log_dump(void) {
897 if (log_tail == NULL)
898 return;
899 while (log_show())
900 ;
901 log_free();
902 if (mem_track.num || mem_track.size)
903 complain("memory leak: %zu allocs of %zu bytes total",
904 mem_track.num, mem_track.size);
905 if (mem_track.max)
906 fprintf(stderr, "%zu bytes of memory used in %zu allocs\n",
907 mem_track.max, mem_track.tot);
908 }
909
910 // Debugging macro.
911 #define Trace(x) \
912 do { \
913 if (g.verbosity > 2) { \
914 log_add x; \
915 } \
916 } while (0)
917
918 #else // !PIGZ_DEBUG
919
920 #define log_dump()
921 #define Trace(x)
922
923 #endif
924
925 // Abort or catch termination signal.
926 local void cut_short(int sig) {
927 if (sig == SIGINT) {
928 Trace(("termination by user"));
929 }
930 if (g.outd != -1 && g.outd != 1) {
931 unlink(g.outf);
932 RELEASE(g.outf);
933 g.outd = -1;
934 }
935 log_dump();
936 _exit(sig < 0 ? -sig : EINTR);
937 }
938
939 // Common code for catch block of top routine in the thread.
940 #define THREADABORT(ball) \
941 do { \
942 if ((ball).code != EPIPE) \
943 complain("abort: %s", (ball).why); \
944 drop(ball); \
945 cut_short(-(ball).code); \
946 } while (0)
947
948 // Compute next size up by multiplying by about 2**(1/3) and rounding to the
949 // next power of 2 if close (three applications results in doubling). If small,
950 // go up to at least 16, if overflow, go to max size_t value.
951 local inline size_t grow(size_t size) {
952 size_t was, top;
953 int shift;
954
955 was = size;
956 size += size >> 2;
957 top = size;
958 for (shift = 0; top > 7; shift++)
959 top >>= 1;
960 if (top == 7)
961 size = (size_t)1 << (shift + 3);
962 if (size < 16)
963 size = 16;
964 if (size <= was)
965 size = (size_t)0 - 1;
966 return size;
967 }
968
969 // Copy cpy[0..len-1] to *mem + off, growing *mem if necessary, where *size is
970 // the allocated size of *mem. Return the number of bytes in the result.
971 local inline size_t vmemcpy(char **mem, size_t *size, size_t off,
972 void *cpy, size_t len) {
973 size_t need;
974
975 need = off + len;
976 if (need < off)
977 throw(ERANGE, "overflow");
978 if (need > *size) {
979 need = grow(need);
980 if (off == 0) {
981 RELEASE(*mem);
982 *size = 0;
983 }
984 *mem = alloc(*mem, need);
985 *size = need;
986 }
987 memcpy(*mem + off, cpy, len);
988 return off + len;
989 }
990
991 // Copy the zero-terminated string cpy to *str + off, growing *str if
992 // necessary, where *size is the allocated size of *str. Return the length of
993 // the string plus one.
994 local inline size_t vstrcpy(char **str, size_t *size, size_t off, void *cpy) {
995 return vmemcpy(str, size, off, cpy, strlen(cpy) + 1);
996 }
997
998 // Read up to len bytes into buf, repeating read() calls as needed.
999 local size_t readn(int desc, unsigned char *buf, size_t len) {
1000 ssize_t ret;
1001 size_t got;
1002
1003 got = 0;
1004 while (len) {
1005 ret = read(desc, buf, len);
1006 if (ret < 0)
1007 throw(errno, "read error on %s (%s)", g.inf, strerror(errno));
1008 if (ret == 0)
1009 break;
1010 buf += ret;
1011 len -= (size_t)ret;
1012 got += (size_t)ret;
1013 }
1014 return got;
1015 }
1016
1017 // Write len bytes, repeating write() calls as needed. Return len.
1018 local size_t writen(int desc, void const *buf, size_t len) {
1019 char const *next = buf;
1020 size_t left = len;
1021
1022 while (left) {
1023 size_t const max = SSIZE_MAX;
1024 ssize_t ret = write(desc, next, left > max ? max : left);
1025 if (ret < 1)
1026 throw(errno, "write error on %s (%s)", g.outf, strerror(errno));
1027 next += ret;
1028 left -= (size_t)ret;
1029 }
1030 return len;
1031 }
1032
1033 // Convert Unix time to MS-DOS date and time, assuming the current timezone.
1034 // (You got a better idea?)
1035 local unsigned long time2dos(time_t t) {
1036 struct tm *tm;
1037 unsigned long dos;
1038
1039 if (t == 0)
1040 t = time(NULL);
1041 tm = localtime(&t);
1042 if (tm->tm_year < 80 || tm->tm_year > 207)
1043 return 0;
1044 dos = (unsigned long)(tm->tm_year - 80) << 25;
1045 dos += (unsigned long)(tm->tm_mon + 1) << 21;
1046 dos += (unsigned long)tm->tm_mday << 16;
1047 dos += (unsigned long)tm->tm_hour << 11;
1048 dos += (unsigned long)tm->tm_min << 5;
1049 dos += (unsigned long)(tm->tm_sec + 1) >> 1; // round to even seconds
1050 return dos;
1051 }
1052
1053 // Value type for put() value arguments. All value arguments for put() must be
1054 // cast to this type in order for va_arg() to pull the correct type from the
1055 // argument list.
1056 typedef length_t val_t;
1057
1058 // Write a set of header or trailer values to out, which is a file descriptor.
1059 // The values are specified by a series of arguments in pairs, where the first
1060 // argument in each pair is the number of bytes, and the second argument in
1061 // each pair is the unsigned integer value to write. The first argument in each
1062 // pair must be an int, and the second argument in each pair must be a val_t.
1063 // The arguments are terminated by a single zero (an int). If the number of
1064 // bytes is positive, then the value is written in little-endian order. If the
1065 // number of bytes is negative, then the value is written in big-endian order.
1066 // The total number of bytes written is returned. This makes the long and
1067 // tiresome zip format headers and trailers more readable, maintainable, and
1068 // verifiable.
1069 local unsigned put(int out, ...) {
1070 // compute the total number of bytes
1071 unsigned count = 0;
1072 int n;
1073 va_list ap;
1074 va_start(ap, out);
1075 while ((n = va_arg(ap, int)) != 0) {
1076 va_arg(ap, val_t);
1077 count += (unsigned)abs(n);
1078 }
1079 va_end(ap);
1080
1081 // allocate memory for the data
1082 unsigned char *wrap = alloc(NULL, count);
1083 unsigned char *next = wrap;
1084
1085 // write the requested data to wrap[]
1086 va_start(ap, out);
1087 while ((n = va_arg(ap, int)) != 0) {
1088 val_t val = va_arg(ap, val_t);
1089 if (n < 0) { // big endian
1090 n = -n << 3;
1091 do {
1092 n -= 8;
1093 *next++ = (unsigned char)(val >> n);
1094 } while (n);
1095 }
1096 else // little endian
1097 do {
1098 *next++ = (unsigned char)val;
1099 val >>= 8;
1100 } while (--n);
1101 }
1102 va_end(ap);
1103
1104 // write wrap[] to out and return the number of bytes written
1105 writen(out, wrap, count);
1106 FREE(wrap);
1107 return count;
1108 }
1109
1110 // Low 32-bits set to all ones.
1111 #define LOW32 0xffffffff
1112
1113 // Write a gzip, zlib, or zip header using the information in the globals.
1114 local length_t put_header(void) {
1115 length_t len;
1116
1117 if (g.form > 1) { // zip
1118 // write local header -- we don't know yet whether the lengths will fit
1119 // in 32 bits or not, so we have to assume that they might not and put
1120 // in a Zip64 extra field so that the data descriptor that appears
1121 // after the compressed data is interpreted with 64-bit lengths
1122 len = put(g.outd,
1123 4, (val_t)0x04034b50, // local header signature
1124 2, (val_t)45, // version needed to extract (4.5)
1125 2, (val_t)8, // flags: data descriptor follows data
1126 2, (val_t)8, // deflate
1127 4, (val_t)time2dos(g.mtime),
1128 4, (val_t)0, // crc (not here)
1129 4, (val_t)LOW32, // compressed length (not here)
1130 4, (val_t)LOW32, // uncompressed length (not here)
1131 2, (val_t)(strlen(g.name == NULL ? g.alias : g.name)), // name len
1132 2, (val_t)29, // length of extra field (see below)
1133 0);
1134
1135 // write file name (use g.alias for stdin)
1136 len += writen(g.outd, g.name == NULL ? g.alias : g.name,
1137 strlen(g.name == NULL ? g.alias : g.name));
1138
1139 // write Zip64 and extended timestamp extra field blocks (29 bytes)
1140 len += put(g.outd,
1141 2, (val_t)0x0001, // Zip64 extended information ID
1142 2, (val_t)16, // number of data bytes in this block
1143 8, (val_t)0, // uncompressed length (not here)
1144 8, (val_t)0, // compressed length (not here)
1145 2, (val_t)0x5455, // extended timestamp ID
1146 2, (val_t)5, // number of data bytes in this block
1147 1, (val_t)1, // flag presence of mod time
1148 4, (val_t)g.mtime, // mod time
1149 0);
1150 }
1151 else if (g.form) { // zlib
1152 if (g.comment != NULL)
1153 complain("can't store comment in zlib format -- ignoring");
1154 unsigned head;
1155 head = (0x78 << 8) + // deflate, 32K window
1156 (g.level >= 9 ? 3 << 6 :
1157 g.level == 1 ? 0 << 6:
1158 g.level >= 6 || g.level == Z_DEFAULT_COMPRESSION ? 1 << 6 :
1159 2 << 6); // optional compression level clue
1160 head += 31 - (head % 31); // make it a multiple of 31
1161 len = put(g.outd,
1162 -2, (val_t)head, // zlib format uses big-endian order
1163 0);
1164 }
1165 else { // gzip
1166 len = put(g.outd,
1167 1, (val_t)31,
1168 1, (val_t)139,
1169 1, (val_t)8, // deflate
1170 1, (val_t)((g.name != NULL ? 8 : 0) +
1171 (g.comment != NULL ? 16 : 0)),
1172 4, (val_t)g.mtime,
1173 1, (val_t)(g.level >= 9 ? 2 : g.level == 1 ? 4 : 0),
1174 1, (val_t)3, // unix
1175 0);
1176 if (g.name != NULL)
1177 len += writen(g.outd, g.name, strlen(g.name) + 1);
1178 if (g.comment != NULL)
1179 len += writen(g.outd, g.comment, strlen(g.comment) + 1);
1180 }
1181 return len;
1182 }
1183
1184 // Write a gzip, zlib, or zip trailer.
1185 local void put_trailer(length_t ulen, length_t clen,
1186 unsigned long check, length_t head) {
1187 if (g.form > 1) { // zip
1188 // write Zip64 data descriptor, as promised in the local header
1189 length_t desc = put(g.outd,
1190 4, (val_t)0x08074b50,
1191 4, (val_t)check,
1192 8, (val_t)clen,
1193 8, (val_t)ulen,
1194 0);
1195
1196 // zip64 is true if either the compressed or the uncompressed length
1197 // does not fit in 32 bits, in which case there needs to be a Zip64
1198 // extra block in the central directory entry
1199 int zip64 = ulen >= LOW32 || clen >= LOW32;
1200
1201 // write central file header
1202 length_t cent = put(g.outd,
1203 4, (val_t)0x02014b50, // central header signature
1204 1, (val_t)45, // made by 4.5 for Zip64 V1 end record
1205 1, (val_t)255, // ignore external attributes
1206 2, (val_t)45, // version needed to extract (4.5)
1207 2, (val_t)8, // data descriptor is present
1208 2, (val_t)8, // deflate
1209 4, (val_t)time2dos(g.mtime),
1210 4, (val_t)check, // crc
1211 4, (val_t)(zip64 ? LOW32 : clen), // compressed length
1212 4, (val_t)(zip64 ? LOW32 : ulen), // uncompressed length
1213 2, (val_t)(strlen(g.name == NULL ? g.alias : g.name)), // name len
1214 2, (val_t)(zip64 ? 29 : 9), // extra field size (see below)
1215 2, (val_t)(g.comment == NULL ? 0 : strlen(g.comment)), // comment
1216 2, (val_t)0, // disk number 0
1217 2, (val_t)0, // internal file attributes
1218 4, (val_t)0, // external file attributes (ignored)
1219 4, (val_t)0, // offset of local header
1220 0);
1221
1222 // write file name (use g.alias for stdin)
1223 cent += writen(g.outd, g.name == NULL ? g.alias : g.name,
1224 strlen(g.name == NULL ? g.alias : g.name));
1225
1226 // write Zip64 extra field block (20 bytes)
1227 if (zip64)
1228 cent += put(g.outd,
1229 2, (val_t)0x0001, // Zip64 extended information ID
1230 2, (val_t)16, // number of data bytes in this block
1231 8, (val_t)ulen, // uncompressed length
1232 8, (val_t)clen, // compressed length
1233 0);
1234
1235 // write extended timestamp extra field block (9 bytes)
1236 cent += put(g.outd,
1237 2, (val_t)0x5455, // extended timestamp signature
1238 2, (val_t)5, // number of data bytes in this block
1239 1, (val_t)1, // flag presence of mod time
1240 4, (val_t)g.mtime, // mod time
1241 0);
1242
1243 // write comment, if requested
1244 if (g.comment != NULL)
1245 cent += writen(g.outd, g.comment, strlen(g.comment));
1246
1247 // here zip64 is true if the offset of the central directory does not
1248 // fit in 32 bits, in which case insert the Zip64 end records to
1249 // provide a 64-bit offset
1250 zip64 = head + clen + desc >= LOW32;
1251 if (zip64) {
1252 // write Zip64 end of central directory record and locator
1253 put(g.outd,
1254 4, (val_t)0x06064b50, // Zip64 end of central dir sig
1255 8, (val_t)44, // size of the remainder of this record
1256 2, (val_t)45, // version made by
1257 2, (val_t)45, // version needed to extract
1258 4, (val_t)0, // number of this disk
1259 4, (val_t)0, // disk with start of central directory
1260 8, (val_t)1, // number of entries on this disk
1261 8, (val_t)1, // total number of entries
1262 8, (val_t)cent, // size of central directory
1263 8, (val_t)(head + clen + desc), // central dir offset
1264 4, (val_t)0x07064b50, // Zip64 end locator signature
1265 4, (val_t)0, // disk with Zip64 end of central dir
1266 8, (val_t)(head + clen + desc + cent), // location
1267 4, (val_t)1, // total number of disks
1268 0);
1269 }
1270
1271 // write end of central directory record
1272 put(g.outd,
1273 4, (val_t)0x06054b50, // end of central directory signature
1274 2, (val_t)0, // number of this disk
1275 2, (val_t)0, // disk with start of central directory
1276 2, (val_t)(zip64 ? 0xffff : 1), // entries on this disk
1277 2, (val_t)(zip64 ? 0xffff : 1), // total number of entries
1278 4, (val_t)(zip64 ? LOW32 : cent), // size of central directory
1279 4, (val_t)(zip64 ? LOW32 : head + clen + desc), // offset
1280 2, (val_t)0, // no zip file comment
1281 0);
1282 }
1283 else if (g.form) // zlib
1284 put(g.outd,
1285 -4, (val_t)check, // zlib format uses big-endian order
1286 0);
1287 else // gzip
1288 put(g.outd,
1289 4, (val_t)check,
1290 4, (val_t)ulen,
1291 0);
1292 }
1293
1294 // Compute an Adler-32, allowing a size_t length.
1295 local unsigned long adler32z(unsigned long adler,
1296 unsigned char const *buf, size_t len) {
1297 while (len > UINT_MAX && buf != NULL) {
1298 adler = adler32(adler, buf, UINT_MAX);
1299 buf += UINT_MAX;
1300 len -= UINT_MAX;
1301 }
1302 return adler32(adler, buf, (unsigned)len);
1303 }
1304
1305 // Compute a CRC-32, allowing a size_t length.
1306 local unsigned long crc32z(unsigned long crc,
1307 unsigned char const *buf, size_t len) {
1308 while (len > UINT_MAX && buf != NULL) {
1309 crc = crc32(crc, buf, UINT_MAX);
1310 buf += UINT_MAX;
1311 len -= UINT_MAX;
1312 }
1313 return crc32(crc, buf, (unsigned)len);
1314 }
1315
1316 // Compute check value depending on format.
1317 #define CHECK(a,b,c) (g.form == 1 ? adler32z(a,b,c) : crc32z(a,b,c))
1318
1319 // Return the zlib version as an integer, where each component is interpreted
1320 // as a decimal number and converted to four hexadecimal digits. E.g.
1321 // '1.2.11.1' -> 0x12b1, or return -1 if the string is not a valid version.
1322 local long zlib_vernum(void) {
1323 char const *ver = zlibVersion();
1324 long num = 0;
1325 int left = 4;
1326 int comp = 0;
1327 do {
1328 if (*ver >= '0' && *ver <= '9')
1329 comp = 10 * comp + *ver - '0';
1330 else {
1331 num = (num << 4) + (comp > 0xf ? 0xf : comp);
1332 left--;
1333 if (*ver != '.')
1334 break;
1335 comp = 0;
1336 }
1337 ver++;
1338 } while (left);
1339 return left < 3 ? num << (left << 2) : -1;
1340 }
1341
1342 // -- check value combination routines for parallel calculation --
1343
1344 #define COMB(a,b,c) (g.form == 1 ? adler32_comb(a,b,c) : crc32_comb(a,b,c))
1345 // Combine two crc-32's or two adler-32's (copied from zlib 1.2.3 so that pigz
1346 // can be compatible with older versions of zlib).
1347
1348 // We copy the combination routines from zlib here, in order to avoid linkage
1349 // issues with the zlib 1.2.3 builds on Sun, Ubuntu, and others.
1350
1351 // CRC-32 polynomial, reflected.
1352 #define POLY 0xedb88320
1353
1354 // Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
1355 // polynomial, reflected. For speed, this requires that a not be zero.
1356 local crc_t multmodp(crc_t a, crc_t b) {
1357 crc_t m = (crc_t)1 << 31;
1358 crc_t p = 0;
1359 for (;;) {
1360 if (a & m) {
1361 p ^= b;
1362 if ((a & (m - 1)) == 0)
1363 break;
1364 }
1365 m >>= 1;
1366 b = b & 1 ? (b >> 1) ^ POLY : b >> 1;
1367 }
1368 return p;
1369 }
1370
1371 // Table of x^2^n modulo p(x).
1372 local const crc_t x2n_table[] = {
1373 0x40000000, 0x20000000, 0x08000000, 0x00800000, 0x00008000,
1374 0xedb88320, 0xb1e6b092, 0xa06a2517, 0xed627dae, 0x88d14467,
1375 0xd7bbfe6a, 0xec447f11, 0x8e7ea170, 0x6427800e, 0x4d47bae0,
1376 0x09fe548f, 0x83852d0f, 0x30362f1a, 0x7b5a9cc3, 0x31fec169,
1377 0x9fec022a, 0x6c8dedc4, 0x15d6874d, 0x5fde7a4e, 0xbad90e37,
1378 0x2e4e5eef, 0x4eaba214, 0xa8a472c0, 0x429a969e, 0x148d302a,
1379 0xc40ba6d0, 0xc4e22c3c};
1380
1381 // Return x^(n*2^k) modulo p(x).
1382 local crc_t x2nmodp(size_t n, unsigned k) {
1383 crc_t p = (crc_t)1 << 31; // x^0 == 1
1384 while (n) {
1385 if (n & 1)
1386 p = multmodp(x2n_table[k & 31], p);
1387 n >>= 1;
1388 k++;
1389 }
1390 return p;
1391 }
1392
1393 // This uses the pre-computed g.shift value most of the time. Only the last
1394 // combination requires a new x2nmodp() calculation.
1395 local unsigned long crc32_comb(unsigned long crc1, unsigned long crc2,
1396 size_t len2) {
1397 return multmodp(len2 == g.block ? g.shift : x2nmodp(len2, 3), crc1) ^ crc2;
1398 }
1399
1400 #define BASE 65521U // largest prime smaller than 65536
1401 #define LOW16 0xffff // mask lower 16 bits
1402
1403 local unsigned long adler32_comb(unsigned long adler1, unsigned long adler2,
1404 size_t len2) {
1405 unsigned long sum1;
1406 unsigned long sum2;
1407 unsigned rem;
1408
1409 // the derivation of this formula is left as an exercise for the reader
1410 rem = (unsigned)(len2 % BASE);
1411 sum1 = adler1 & LOW16;
1412 sum2 = (rem * sum1) % BASE;
1413 sum1 += (adler2 & LOW16) + BASE - 1;
1414 sum2 += ((adler1 >> 16) & LOW16) + ((adler2 >> 16) & LOW16) + BASE - rem;
1415 if (sum1 >= BASE) sum1 -= BASE;
1416 if (sum1 >= BASE) sum1 -= BASE;
1417 if (sum2 >= (BASE << 1)) sum2 -= (BASE << 1);
1418 if (sum2 >= BASE) sum2 -= BASE;
1419 return sum1 | (sum2 << 16);
1420 }
1421
1422 #ifndef NOTHREAD
1423 // -- threaded portions of pigz --
1424
1425 // -- pool of spaces for buffer management --
1426
1427 // These routines manage a pool of spaces. Each pool specifies a fixed size
1428 // buffer to be contained in each space. Each space has a use count, which when
1429 // decremented to zero returns the space to the pool. If a space is requested
1430 // from the pool and the pool is empty, a space is immediately created unless a
1431 // specified limit on the number of spaces has been reached. Only if the limit
1432 // is reached will it wait for a space to be returned to the pool. Each space
1433 // knows what pool it belongs to, so that it can be returned.
1434
1435 // A space (one buffer for each space).
1436 struct space {
1437 lock *use; // use count -- return to pool when zero
1438 unsigned char *buf; // buffer of size size
1439 size_t size; // current size of this buffer
1440 size_t len; // for application usage (initially zero)
1441 struct pool *pool; // pool to return to
1442 struct space *next; // for pool linked list
1443 };
1444
1445 // Pool of spaces (one pool for each type needed).
1446 struct pool {
1447 lock *have; // unused spaces available, lock for list
1448 struct space *head; // linked list of available buffers
1449 size_t size; // size of new buffers in this pool
1450 int limit; // number of new spaces allowed, or -1
1451 int made; // number of buffers made
1452 };
1453
1454 // Initialize a pool (pool structure itself provided, not allocated). The limit
1455 // is the maximum number of spaces in the pool, or -1 to indicate no limit,
1456 // i.e., to never wait for a buffer to return to the pool.
1457 local void new_pool(struct pool *pool, size_t size, int limit) {
1458 pool->have = new_lock(0);
1459 pool->head = NULL;
1460 pool->size = size;
1461 pool->limit = limit;
1462 pool->made = 0;
1463 }
1464
1465 // Get a space from a pool. The use count is initially set to one, so there is
1466 // no need to call use_space() for the first use.
1467 local struct space *get_space(struct pool *pool) {
1468 struct space *space;
1469
1470 // if can't create any more, wait for a space to show up
1471 possess(pool->have);
1472 if (pool->limit == 0)
1473 wait_for(pool->have, NOT_TO_BE, 0);
1474
1475 // if a space is available, pull it from the list and return it
1476 if (pool->head != NULL) {
1477 space = pool->head;
1478 pool->head = space->next;
1479 twist(pool->have, BY, -1); // one less in pool
1480 possess(space->use);
1481 twist(space->use, TO, 1); // initially one user
1482 space->len = 0;
1483 return space;
1484 }
1485
1486 // nothing available, don't want to wait, make a new space
1487 assert(pool->limit != 0);
1488 if (pool->limit > 0)
1489 pool->limit--;
1490 pool->made++;
1491 release(pool->have);
1492 space = alloc(NULL, sizeof(struct space));
1493 space->use = new_lock(1); // initially one user
1494 space->buf = alloc(NULL, pool->size);
1495 space->size = pool->size;
1496 space->len = 0;
1497 space->pool = pool; // remember the pool this belongs to
1498 return space;
1499 }
1500
1501 // Increase the size of the buffer in space.
1502 local void grow_space(struct space *space) {
1503 size_t more;
1504
1505 // compute next size up
1506 more = grow(space->size);
1507 if (more == space->size)
1508 throw(ERANGE, "overflow");
1509
1510 // reallocate the buffer
1511 space->buf = alloc(space->buf, more);
1512 space->size = more;
1513 }
1514
1515 // Increment the use count to require one more drop before returning this space
1516 // to the pool.
1517 local void use_space(struct space *space) {
1518 long use;
1519
1520 possess(space->use);
1521 use = peek_lock(space->use);
1522 assert(use != 0);
1523 twist(space->use, BY, +1);
1524 }
1525
1526 // Drop a space, returning it to the pool if the use count is zero.
1527 local void drop_space(struct space *space) {
1528 long use;
1529 struct pool *pool;
1530
1531 if (space == NULL)
1532 return;
1533 possess(space->use);
1534 use = peek_lock(space->use);
1535 assert(use != 0);
1536 twist(space->use, BY, -1);
1537 if (use == 1) {
1538 pool = space->pool;
1539 possess(pool->have);
1540 space->next = pool->head;
1541 pool->head = space;
1542 twist(pool->have, BY, +1);
1543 }
1544 }
1545
1546 // Free the memory and lock resources of a pool. Return number of spaces for
1547 // debugging and resource usage measurement.
1548 local int free_pool(struct pool *pool) {
1549 int count;
1550 struct space *space;
1551
1552 possess(pool->have);
1553 count = 0;
1554 while ((space = pool->head) != NULL) {
1555 pool->head = space->next;
1556 FREE(space->buf);
1557 free_lock(space->use);
1558 FREE(space);
1559 count++;
1560 }
1561 assert(count == pool->made);
1562 release(pool->have);
1563 free_lock(pool->have);
1564 return count;
1565 }
1566
1567 // Input and output buffer pools.
1568 local struct pool in_pool;
1569 local struct pool out_pool;
1570 local struct pool dict_pool;
1571 local struct pool lens_pool;
1572
1573 // -- parallel compression --
1574
1575 // Compress or write job (passed from compress list to write list). If seq is
1576 // equal to -1, compress_thread is instructed to return; if more is false then
1577 // this is the last chunk, which after writing tells write_thread to return.
1578 struct job {
1579 long seq; // sequence number
1580 int more; // true if this is not the last chunk
1581 struct space *in; // input data to compress
1582 struct space *out; // dictionary or resulting compressed data
1583 struct space *lens; // coded list of flush block lengths
1584 unsigned long check; // check value for input data
1585 lock *calc; // released when check calculation complete
1586 struct job *next; // next job in the list (either list)
1587 };
1588
1589 // List of compress jobs (with tail for appending to list).
1590 local lock *compress_have = NULL; // number of compress jobs waiting
1591 local struct job *compress_head, **compress_tail;
1592
1593 // List of write jobs.
1594 local lock *write_first; // lowest sequence number in list
1595 local struct job *write_head;
1596
1597 // Number of compression threads running.
1598 local int cthreads = 0;
1599
1600 // Write thread if running.
1601 local thread *writeth = NULL;
1602
1603 // Setup job lists (call from main thread).
1604 local void setup_jobs(void) {
1605 // set up only if not already set up
1606 if (compress_have != NULL)
1607 return;
1608
1609 // allocate locks and initialize lists
1610 compress_have = new_lock(0);
1611 compress_head = NULL;
1612 compress_tail = &compress_head;
1613 write_first = new_lock(-1);
1614 write_head = NULL;
1615
1616 // initialize buffer pools (initial size for out_pool not critical, since
1617 // buffers will be grown in size if needed -- the initial size chosen to
1618 // make this unlikely, the same for lens_pool)
1619 new_pool(&in_pool, g.block, INBUFS(g.procs));
1620 new_pool(&out_pool, OUTPOOL(g.block), -1);
1621 new_pool(&dict_pool, DICT, -1);
1622 new_pool(&lens_pool, g.block >> (RSYNCBITS - 1), -1);
1623 }
1624
1625 // Command the compress threads to all return, then join them all (call from
1626 // main thread), free all the thread-related resources.
1627 local void finish_jobs(void) {
1628 struct job job;
1629 int caught;
1630
1631 // only do this once
1632 if (compress_have == NULL)
1633 return;
1634
1635 // command all of the extant compress threads to return
1636 possess(compress_have);
1637 job.seq = -1;
1638 job.next = NULL;
1639 compress_head = &job;
1640 compress_tail = &(job.next);
1641 twist(compress_have, BY, +1); // will wake them all up
1642
1643 // join all of the compress threads, verify they all came back
1644 caught = join_all();
1645 Trace(("-- joined %d compress threads", caught));
1646 assert(caught == cthreads);
1647 cthreads = 0;
1648
1649 // free the resources
1650 caught = free_pool(&lens_pool);
1651 Trace(("-- freed %d block lengths buffers", caught));
1652 caught = free_pool(&dict_pool);
1653 Trace(("-- freed %d dictionary buffers", caught));
1654 caught = free_pool(&out_pool);
1655 Trace(("-- freed %d output buffers", caught));
1656 caught = free_pool(&in_pool);
1657 Trace(("-- freed %d input buffers", caught));
1658 free_lock(write_first);
1659 free_lock(compress_have);
1660 compress_have = NULL;
1661 }
1662
1663 // Compress all strm->avail_in bytes at strm->next_in to out->buf, updating
1664 // out->len, grow the size of the buffer (out->size) if necessary. Respect the
1665 // size limitations of the zlib stream data types (size_t may be larger than
1666 // unsigned).
1667 local void deflate_engine(z_stream *strm, struct space *out, int flush) {
1668 size_t room;
1669
1670 do {
1671 room = out->size - out->len;
1672 if (room == 0) {
1673 grow_space(out);
1674 room = out->size - out->len;
1675 }
1676 strm->next_out = out->buf + out->len;
1677 strm->avail_out = room < UINT_MAX ? (unsigned)room : UINT_MAX;
1678 (void)deflate(strm, flush);
1679 out->len = (size_t)(strm->next_out - out->buf);
1680 } while (strm->avail_out == 0);
1681 assert(strm->avail_in == 0);
1682 }
1683
1684 // Get the next compression job from the head of the list, compress and compute
1685 // the check value on the input, and put a job in the write list with the
1686 // results. Keep looking for more jobs, returning when a job is found with a
1687 // sequence number of -1 (leave that job in the list for other incarnations to
1688 // find).
1689 local void compress_thread(void *dummy) {
1690 struct job *job; // job pulled and working on
1691 struct job *here, **prior; // pointers for inserting in write list
1692 unsigned long check; // check value of input
1693 unsigned char *next; // pointer for blocks, check value data
1694 size_t left; // input left to process
1695 size_t len; // remaining bytes to compress/check
1696 #if ZLIB_VERNUM >= 0x1260
1697 int bits; // deflate pending bits
1698 #endif
1699 int ret; // zlib return code
1700 ball_t err; // error information from throw()
1701
1702 (void)dummy;
1703
1704 try {
1705 z_stream strm; // deflate stream
1706 #ifndef NOZOPFLI
1707 struct space *temp = NULL;
1708 // get temporary space for zopfli input
1709 if (g.level > 9)
1710 temp = get_space(&out_pool);
1711 else
1712 #endif
1713 {
1714 // initialize the deflate stream for this thread
1715 strm.zfree = ZFREE;
1716 strm.zalloc = ZALLOC;
1717 strm.opaque = OPAQUE;
1718 ret = deflateInit2(&strm, 6, Z_DEFLATED, -15, 8, g.strategy);
1719 if (ret == Z_MEM_ERROR)
1720 throw(ENOMEM, "not enough memory");
1721 if (ret != Z_OK)
1722 throw(EINVAL, "internal error");
1723 }
1724
1725 // keep looking for work
1726 for (;;) {
1727 // get a job (like I tell my son)
1728 possess(compress_have);
1729 wait_for(compress_have, NOT_TO_BE, 0);
1730 job = compress_head;
1731 assert(job != NULL);
1732 if (job->seq == -1)
1733 break;
1734 compress_head = job->next;
1735 if (job->next == NULL)
1736 compress_tail = &compress_head;
1737 twist(compress_have, BY, -1);
1738
1739 // got a job -- initialize and set the compression level (note that
1740 // if deflateParams() is called immediately after deflateReset(),
1741 // there is no need to initialize input/output for the stream)
1742 Trace(("-- compressing #%ld", job->seq));
1743 #ifndef NOZOPFLI
1744 if (g.level <= 9) {
1745 #endif
1746 (void)deflateReset(&strm);
1747 (void)deflateParams(&strm, g.level, g.strategy);
1748 #ifndef NOZOPFLI
1749 }
1750 else
1751 temp->len = 0;
1752 #endif
1753
1754 // set dictionary if provided, release that input or dictionary
1755 // buffer (not NULL if g.setdict is true and if this is not the
1756 // first work unit)
1757 if (job->out != NULL) {
1758 len = job->out->len;
1759 left = len < DICT ? len : DICT;
1760 #ifndef NOZOPFLI
1761 if (g.level <= 9)
1762 #endif
1763 deflateSetDictionary(&strm, job->out->buf + (len - left),
1764 (unsigned)left);
1765 #ifndef NOZOPFLI
1766 else {
1767 memcpy(temp->buf, job->out->buf + (len - left), left);
1768 temp->len = left;
1769 }
1770 #endif
1771 drop_space(job->out);
1772 }
1773
1774 // set up input and output
1775 job->out = get_space(&out_pool);
1776 #ifndef NOZOPFLI
1777 if (g.level <= 9) {
1778 #endif
1779 strm.next_in = job->in->buf;
1780 strm.next_out = job->out->buf;
1781 #ifndef NOZOPFLI
1782 }
1783 else
1784 memcpy(temp->buf + temp->len, job->in->buf, job->in->len);
1785 #endif
1786
1787 // compress each block, either flushing or finishing
1788 next = job->lens == NULL ? NULL : job->lens->buf;
1789 left = job->in->len;
1790 job->out->len = 0;
1791 do {
1792 // decode next block length from blocks list
1793 len = next == NULL ? 128 : *next++;
1794 if (len < 128) // 64..32831
1795 len = (len << 8) + (*next++) + 64;
1796 else if (len == 128) // end of list
1797 len = left;
1798 else if (len < 192) // 1..63
1799 len &= 0x3f;
1800 else if (len < 224){ // 32832..2129983
1801 len = ((len & 0x1f) << 16) + ((size_t)*next++ << 8);
1802 len += *next++ + 32832U;
1803 }
1804 else { // 2129984..539000895
1805 len = ((len & 0x1f) << 24) + ((size_t)*next++ << 16);
1806 len += (size_t)*next++ << 8;
1807 len += (size_t)*next++ + 2129984UL;
1808 }
1809 left -= len;
1810
1811 #ifndef NOZOPFLI
1812 if (g.level <= 9) {
1813 #endif
1814 // run MAXP2-sized amounts of input through deflate -- this
1815 // loop is needed for those cases where the unsigned type
1816 // is smaller than the size_t type, or when len is close to
1817 // the limit of the size_t type
1818 while (len > MAXP2) {
1819 strm.avail_in = MAXP2;
1820 deflate_engine(&strm, job->out, Z_NO_FLUSH);
1821 len -= MAXP2;
1822 }
1823
1824 // run the last piece through deflate -- end on a byte
1825 // boundary, using a sync marker if necessary, or finish
1826 // the deflate stream if this is the last block
1827 strm.avail_in = (unsigned)len;
1828 if (left || job->more) {
1829 #if ZLIB_VERNUM >= 0x1260
1830 if (zlib_vernum() >= 0x1260) {
1831 deflate_engine(&strm, job->out, Z_BLOCK);
1832
1833 // add enough empty blocks to get to a byte
1834 // boundary
1835 (void)deflatePending(&strm, Z_NULL, &bits);
1836 if ((bits & 1) || !g.setdict)
1837 deflate_engine(&strm, job->out, Z_SYNC_FLUSH);
1838 else if (bits & 7) {
1839 do { // add static empty blocks
1840 bits = deflatePrime(&strm, 10, 2);
1841 assert(bits == Z_OK);
1842 (void)deflatePending(&strm, Z_NULL, &bits);
1843 } while (bits & 7);
1844 deflate_engine(&strm, job->out, Z_BLOCK);
1845 }
1846 }
1847 else
1848 #endif
1849 {
1850 deflate_engine(&strm, job->out, Z_SYNC_FLUSH);
1851 }
1852 if (!g.setdict) // two markers when independent
1853 deflate_engine(&strm, job->out, Z_FULL_FLUSH);
1854 }
1855 else
1856 deflate_engine(&strm, job->out, Z_FINISH);
1857 #ifndef NOZOPFLI
1858 }
1859 else {
1860 // compress len bytes using zopfli, end at byte boundary
1861 unsigned char bits, *out;
1862 size_t outsize;
1863
1864 out = NULL;
1865 outsize = 0;
1866 bits = 0;
1867 ZopfliDeflatePart(&g.zopts, 2, !(left || job->more),
1868 temp->buf, temp->len, temp->len + len,
1869 &bits, &out, &outsize);
1870 assert(job->out->len + outsize + 5 <= job->out->size);
1871 memcpy(job->out->buf + job->out->len, out, outsize);
1872 free(out);
1873 job->out->len += outsize;
1874 if (left || job->more) {
1875 bits &= 7;
1876 if ((bits & 1) || !g.setdict) {
1877 if (bits == 0 || bits > 5)
1878 job->out->buf[job->out->len++] = 0;
1879 job->out->buf[job->out->len++] = 0;
1880 job->out->buf[job->out->len++] = 0;
1881 job->out->buf[job->out->len++] = 0xff;
1882 job->out->buf[job->out->len++] = 0xff;
1883 }
1884 else if (bits) {
1885 do {
1886 job->out->buf[job->out->len - 1] += 2 << bits;
1887 job->out->buf[job->out->len++] = 0;
1888 bits += 2;
1889 } while (bits < 8);
1890 }
1891 if (!g.setdict) { // two markers when independent
1892 job->out->buf[job->out->len++] = 0;
1893 job->out->buf[job->out->len++] = 0;
1894 job->out->buf[job->out->len++] = 0;
1895 job->out->buf[job->out->len++] = 0xff;
1896 job->out->buf[job->out->len++] = 0xff;
1897 }
1898 }
1899 temp->len += len;
1900 }
1901 #endif
1902 } while (left);
1903 drop_space(job->lens);
1904 job->lens = NULL;
1905 Trace(("-- compressed #%ld%s", job->seq,
1906 job->more ? "" : " (last)"));
1907
1908 // reserve input buffer until check value has been calculated
1909 use_space(job->in);
1910
1911 // insert write job in list in sorted order, alert write thread
1912 possess(write_first);
1913 prior = &write_head;
1914 while ((here = *prior) != NULL) {
1915 if (here->seq > job->seq)
1916 break;
1917 prior = &(here->next);
1918 }
1919 job->next = here;
1920 *prior = job;
1921 twist(write_first, TO, write_head->seq);
1922
1923 // calculate the check value in parallel with writing, alert the
1924 // write thread that the calculation is complete, and drop this
1925 // usage of the input buffer
1926 len = job->in->len;
1927 next = job->in->buf;
1928 check = CHECK(0L, Z_NULL, 0);
1929 while (len > MAXP2) {
1930 check = CHECK(check, next, MAXP2);
1931 len -= MAXP2;
1932 next += MAXP2;
1933 }
1934 check = CHECK(check, next, (unsigned)len);
1935 drop_space(job->in);
1936 job->check = check;
1937 Trace(("-- checked #%ld%s", job->seq, job->more ? "" : " (last)"));
1938 possess(job->calc);
1939 twist(job->calc, TO, 1);
1940
1941 // done with that one -- go find another job
1942 }
1943
1944 // found job with seq == -1 -- return to join
1945 release(compress_have);
1946 #ifndef NOZOPFLI
1947 if (g.level > 9)
1948 drop_space(temp);
1949 else
1950 #endif
1951 {
1952 (void)deflateEnd(&strm);
1953 }
1954 }
1955 catch (err) {
1956 THREADABORT(err);
1957 }
1958 }
1959
1960 // Collect the write jobs off of the list in sequence order and write out the
1961 // compressed data until the last chunk is written. Also write the header and
1962 // trailer and combine the individual check values of the input buffers.
1963 local void write_thread(void *dummy) {
1964 long seq; // next sequence number looking for
1965 struct job *job; // job pulled and working on
1966 size_t len; // input length
1967 int more; // true if more chunks to write
1968 length_t head; // header length
1969 length_t ulen; // total uncompressed size (overflow ok)
1970 length_t clen; // total compressed size (overflow ok)
1971 unsigned long check; // check value of uncompressed data
1972 ball_t err; // error information from throw()
1973
1974 (void)dummy;
1975
1976 try {
1977 // build and write header
1978 Trace(("-- write thread running"));
1979 head = put_header();
1980
1981 // process output of compress threads until end of input
1982 ulen = clen = 0;
1983 check = CHECK(0L, Z_NULL, 0);
1984 seq = 0;
1985 do {
1986 // get next write job in order
1987 possess(write_first);
1988 wait_for(write_first, TO_BE, seq);
1989 job = write_head;
1990 write_head = job->next;
1991 twist(write_first, TO, write_head == NULL ? -1 : write_head->seq);
1992
1993 // update lengths, save uncompressed length for COMB
1994 more = job->more;
1995 len = job->in->len;
1996 drop_space(job->in);
1997 ulen += len;
1998 clen += job->out->len;
1999
2000 // write the compressed data and drop the output buffer
2001 Trace(("-- writing #%ld", seq));
2002 writen(g.outd, job->out->buf, job->out->len);
2003 drop_space(job->out);
2004 Trace(("-- wrote #%ld%s", seq, more ? "" : " (last)"));
2005
2006 // wait for check calculation to complete, then combine, once the
2007 // compress thread is done with the input, release it
2008 possess(job->calc);
2009 wait_for(job->calc, TO_BE, 1);
2010 release(job->calc);
2011 check = COMB(check, job->check, len);
2012 Trace(("-- combined #%ld%s", seq, more ? "" : " (last)"));
2013
2014 // free the job
2015 free_lock(job->calc);
2016 FREE(job);
2017
2018 // get the next buffer in sequence
2019 seq++;
2020 } while (more);
2021
2022 // write trailer
2023 put_trailer(ulen, clen, check, head);
2024
2025 // verify no more jobs, prepare for next use
2026 possess(compress_have);
2027 assert(compress_head == NULL && peek_lock(compress_have) == 0);
2028 release(compress_have);
2029 possess(write_first);
2030 assert(write_head == NULL);
2031 twist(write_first, TO, -1);
2032 }
2033 catch (err) {
2034 THREADABORT(err);
2035 }
2036 }
2037
2038 // Encode a hash hit to the block lengths list. hit == 0 ends the list.
2039 local void append_len(struct job *job, size_t len) {
2040 struct space *lens;
2041
2042 assert(len < 539000896UL);
2043 if (job->lens == NULL)
2044 job->lens = get_space(&lens_pool);
2045 lens = job->lens;
2046 if (lens->size < lens->len + 3)
2047 grow_space(lens);
2048 if (len < 64)
2049 lens->buf[lens->len++] = (unsigned char)(len + 128);
2050 else if (len < 32832U) {
2051 len -= 64;
2052 lens->buf[lens->len++] = (unsigned char)(len >> 8);
2053 lens->buf[lens->len++] = (unsigned char)len;
2054 }
2055 else if (len < 2129984UL) {
2056 len -= 32832U;
2057 lens->buf[lens->len++] = (unsigned char)((len >> 16) + 192);
2058 lens->buf[lens->len++] = (unsigned char)(len >> 8);
2059 lens->buf[lens->len++] = (unsigned char)len;
2060 }
2061 else {
2062 len -= 2129984UL;
2063 lens->buf[lens->len++] = (unsigned char)((len >> 24) + 224);
2064 lens->buf[lens->len++] = (unsigned char)(len >> 16);
2065 lens->buf[lens->len++] = (unsigned char)(len >> 8);
2066 lens->buf[lens->len++] = (unsigned char)len;
2067 }
2068 }
2069
2070 // Compress ind to outd, using multiple threads for the compression and check
2071 // value calculations and one other thread for writing the output. Compress
2072 // threads will be launched and left running (waiting actually) to support
2073 // subsequent calls of parallel_compress().
2074 local void parallel_compress(void) {
2075 long seq; // sequence number
2076 struct space *curr; // input data to compress
2077 struct space *next; // input data that follows curr
2078 struct space *hold; // input data that follows next
2079 struct space *dict; // dictionary for next compression
2080 struct job *job; // job for compress, then write
2081 int more; // true if more input to read
2082 unsigned hash; // hash for rsyncable
2083 unsigned char *scan; // next byte to compute hash on
2084 unsigned char *end; // after end of data to compute hash on
2085 unsigned char *last; // position after last hit
2086 size_t left; // last hit in curr to end of curr
2087 size_t len; // for various length computations
2088
2089 // if first time or after an option change, setup the job lists
2090 setup_jobs();
2091
2092 // start write thread
2093 writeth = launch(write_thread, NULL);
2094
2095 // read from input and start compress threads (write thread will pick up
2096 // the output of the compress threads)
2097 seq = 0;
2098 next = get_space(&in_pool);
2099 next->len = readn(g.ind, next->buf, next->size);
2100 hold = NULL;
2101 dict = NULL;
2102 scan = next->buf;
2103 hash = RSYNCHIT;
2104 left = 0;
2105 do {
2106 // create a new job
2107 job = alloc(NULL, sizeof(struct job));
2108 job->calc = new_lock(0);
2109
2110 // update input spaces
2111 curr = next;
2112 next = hold;
2113 hold = NULL;
2114
2115 // get more input if we don't already have some
2116 if (next == NULL) {
2117 next = get_space(&in_pool);
2118 next->len = readn(g.ind, next->buf, next->size);
2119 }
2120
2121 // if rsyncable, generate block lengths and prepare curr for job to
2122 // likely have less than size bytes (up to the last hash hit)
2123 job->lens = NULL;
2124 if (g.rsync && curr->len) {
2125 // compute the hash function starting where we last left off to
2126 // cover either size bytes or to EOF, whichever is less, through
2127 // the data in curr (and in the next loop, through next) -- save
2128 // the block lengths resulting from the hash hits in the job->lens
2129 // list
2130 if (left == 0) {
2131 // scan is in curr
2132 last = curr->buf;
2133 end = curr->buf + curr->len;
2134 while (scan < end) {
2135 hash = ((hash << 1) ^ *scan++) & RSYNCMASK;
2136 if (hash == RSYNCHIT) {
2137 len = (size_t)(scan - last);
2138 append_len(job, len);
2139 last = scan;
2140 }
2141 }
2142
2143 // continue scan in next
2144 left = (size_t)(scan - last);
2145 scan = next->buf;
2146 }
2147
2148 // scan in next for enough bytes to fill curr, or what is available
2149 // in next, whichever is less (if next isn't full, then we're at
2150 // the end of the file) -- the bytes in curr since the last hit,
2151 // stored in left, counts towards the size of the first block
2152 last = next->buf;
2153 len = curr->size - curr->len;
2154 if (len > next->len)
2155 len = next->len;
2156 end = next->buf + len;
2157 while (scan < end) {
2158 hash = ((hash << 1) ^ *scan++) & RSYNCMASK;
2159 if (hash == RSYNCHIT) {
2160 len = (size_t)(scan - last) + left;
2161 left = 0;
2162 append_len(job, len);
2163 last = scan;
2164 }
2165 }
2166 append_len(job, 0);
2167
2168 // create input in curr for job up to last hit or entire buffer if
2169 // no hits at all -- save remainder in next and possibly hold
2170 len = (size_t)((job->lens->len == 1 ? scan : last) - next->buf);
2171 if (len) {
2172 // got hits in next, or no hits in either -- copy to curr
2173 memcpy(curr->buf + curr->len, next->buf, len);
2174 curr->len += len;
2175 memmove(next->buf, next->buf + len, next->len - len);
2176 next->len -= len;
2177 scan -= len;
2178 left = 0;
2179 }
2180 else if (job->lens->len != 1 && left && next->len) {
2181 // had hits in curr, but none in next, and last hit in curr
2182 // wasn't right at the end, so we have input there to save --
2183 // use curr up to the last hit, save the rest, moving next to
2184 // hold
2185 hold = next;
2186 next = get_space(&in_pool);
2187 memcpy(next->buf, curr->buf + (curr->len - left), left);
2188 next->len = left;
2189 curr->len -= left;
2190 }
2191 else {
2192 // else, last match happened to be right at the end of curr, or
2193 // we're at the end of the input compressing the rest
2194 left = 0;
2195 }
2196 }
2197
2198 // compress curr->buf to curr->len -- compress thread will drop curr
2199 job->in = curr;
2200
2201 // set job->more if there is more to compress after curr
2202 more = next->len != 0;
2203 job->more = more;
2204
2205 // provide dictionary for this job, prepare dictionary for next job
2206 job->out = dict;
2207 if (more && g.setdict) {
2208 if (curr->len >= DICT || job->out == NULL) {
2209 dict = curr;
2210 use_space(dict);
2211 }
2212 else {
2213 dict = get_space(&dict_pool);
2214 len = DICT - curr->len;
2215 memcpy(dict->buf, job->out->buf + (job->out->len - len), len);
2216 memcpy(dict->buf + len, curr->buf, curr->len);
2217 dict->len = DICT;
2218 }
2219 }
2220
2221 // preparation of job is complete
2222 job->seq = seq;
2223 Trace(("-- read #%ld%s", seq, more ? "" : " (last)"));
2224 if (++seq < 1)
2225 throw(ERANGE, "overflow");
2226
2227 // start another compress thread if needed
2228 if (cthreads < seq && cthreads < g.procs) {
2229 (void)launch(compress_thread, NULL);
2230 cthreads++;
2231 }
2232
2233 // put job at end of compress list, let all the compressors know
2234 possess(compress_have);
2235 job->next = NULL;
2236 *compress_tail = job;
2237 compress_tail = &(job->next);
2238 twist(compress_have, BY, +1);
2239 } while (more);
2240 drop_space(next);
2241
2242 // wait for the write thread to complete (we leave the compress threads out
2243 // there and waiting in case there is another stream to compress)
2244 join(writeth);
2245 writeth = NULL;
2246 Trace(("-- write thread joined"));
2247 }
2248
2249 #endif
2250
2251 // Repeated code in single_compress to compress available input and write it.
2252 #define DEFLATE_WRITE(flush) \
2253 do { \
2254 do { \
2255 strm->avail_out = out_size; \
2256 strm->next_out = out; \
2257 (void)deflate(strm, flush); \
2258 clen += writen(g.outd, out, out_size - strm->avail_out); \
2259 } while (strm->avail_out == 0); \
2260 assert(strm->avail_in == 0); \
2261 } while (0)
2262
2263 // Do a simple compression in a single thread from ind to outd. If reset is
2264 // true, instead free the memory that was allocated and retained for input,
2265 // output, and deflate.
2266 local void single_compress(int reset) {
2267 size_t got; // amount of data in in[]
2268 size_t more; // amount of data in next[] (0 if eof)
2269 size_t start; // start of data in next[]
2270 size_t have; // bytes in current block for -i
2271 size_t hist; // offset of permitted history
2272 int fresh; // if true, reset compression history
2273 unsigned hash; // hash for rsyncable
2274 unsigned char *scan; // pointer for hash computation
2275 size_t left; // bytes left to compress after hash hit
2276 unsigned long head; // header length
2277 length_t ulen; // total uncompressed size
2278 length_t clen; // total compressed size
2279 unsigned long check; // check value of uncompressed data
2280 static unsigned out_size; // size of output buffer
2281 static unsigned char *in, *next, *out; // reused i/o buffers
2282 static z_stream *strm = NULL; // reused deflate structure
2283
2284 // if requested, just release the allocations and return
2285 if (reset) {
2286 if (strm != NULL) {
2287 (void)deflateEnd(strm);
2288 FREE(strm);
2289 FREE(out);
2290 FREE(next);
2291 FREE(in);
2292 strm = NULL;
2293 }
2294 return;
2295 }
2296
2297 // initialize the deflate structure if this is the first time
2298 if (strm == NULL) {
2299 int ret; // zlib return code
2300
2301 out_size = g.block > MAXP2 ? MAXP2 : (unsigned)g.block;
2302 in = alloc(NULL, g.block + DICT);
2303 next = alloc(NULL, g.block + DICT);
2304 out = alloc(NULL, out_size);
2305 strm = alloc(NULL, sizeof(z_stream));
2306 strm->zfree = ZFREE;
2307 strm->zalloc = ZALLOC;
2308 strm->opaque = OPAQUE;
2309 ret = deflateInit2(strm, 6, Z_DEFLATED, -15, 8, g.strategy);
2310 if (ret == Z_MEM_ERROR)
2311 throw(ENOMEM, "not enough memory");
2312 if (ret != Z_OK)
2313 throw(EINVAL, "internal error");
2314 }
2315
2316 // write header
2317 head = put_header();
2318
2319 // set compression level in case it changed
2320 #ifndef NOZOPFLI
2321 if (g.level <= 9) {
2322 #endif
2323 (void)deflateReset(strm);
2324 (void)deflateParams(strm, g.level, g.strategy);
2325 #ifndef NOZOPFLI
2326 }
2327 #endif
2328
2329 // do raw deflate and calculate check value
2330 got = 0;
2331 more = readn(g.ind, next, g.block);
2332 ulen = more;
2333 start = 0;
2334 hist = 0;
2335 clen = 0;
2336 have = 0;
2337 check = CHECK(0L, Z_NULL, 0);
2338 hash = RSYNCHIT;
2339 do {
2340 // get data to compress, see if there is any more input
2341 if (got == 0) {
2342 scan = in; in = next; next = scan;
2343 strm->next_in = in + start;
2344 got = more;
2345 if (g.level > 9) {
2346 left = start + more - hist;
2347 if (left > DICT)
2348 left = DICT;
2349 memcpy(next, in + ((start + more) - left), left);
2350 start = left;
2351 hist = 0;
2352 }
2353 else
2354 start = 0;
2355 more = readn(g.ind, next + start, g.block);
2356 ulen += more;
2357 }
2358
2359 // if rsyncable, compute hash until a hit or the end of the block
2360 left = 0;
2361 if (g.rsync && got) {
2362 scan = strm->next_in;
2363 left = got;
2364 do {
2365 if (left == 0) {
2366 // went to the end -- if no more or no hit in size bytes,
2367 // then proceed to do a flush or finish with got bytes
2368 if (more == 0 || got == g.block)
2369 break;
2370
2371 // fill in[] with what's left there and as much as possible
2372 // from next[] -- set up to continue hash hit search
2373 if (g.level > 9) {
2374 left = (size_t)(strm->next_in - in) - hist;
2375 if (left > DICT)
2376 left = DICT;
2377 }
2378 memmove(in, strm->next_in - left, left + got);
2379 hist = 0;
2380 strm->next_in = in + left;
2381 scan = in + left + got;
2382 left = more > g.block - got ? g.block - got : more;
2383 memcpy(scan, next + start, left);
2384 got += left;
2385 more -= left;
2386 start += left;
2387
2388 // if that emptied the next buffer, try to refill it
2389 if (more == 0) {
2390 more = readn(g.ind, next, g.block);
2391 ulen += more;
2392 start = 0;
2393 }
2394 }
2395 left--;
2396 hash = ((hash << 1) ^ *scan++) & RSYNCMASK;
2397 } while (hash != RSYNCHIT);
2398 got -= left;
2399 }
2400
2401 // clear history for --independent option
2402 fresh = 0;
2403 if (!g.setdict) {
2404 have += got;
2405 if (have > g.block) {
2406 fresh = 1;
2407 have = got;
2408 }
2409 }
2410
2411 #ifndef NOZOPFLI
2412 if (g.level <= 9) {
2413 #endif
2414 // clear history if requested
2415 if (fresh)
2416 (void)deflateReset(strm);
2417
2418 // compress MAXP2-size chunks in case unsigned type is small
2419 while (got > MAXP2) {
2420 strm->avail_in = MAXP2;
2421 check = CHECK(check, strm->next_in, strm->avail_in);
2422 DEFLATE_WRITE(Z_NO_FLUSH);
2423 got -= MAXP2;
2424 }
2425
2426 // compress the remainder, emit a block, finish if end of input
2427 strm->avail_in = (unsigned)got;
2428 got = left;
2429 check = CHECK(check, strm->next_in, strm->avail_in);
2430 if (more || got) {
2431 #if ZLIB_VERNUM >= 0x1260
2432 if (zlib_vernum() >= 0x1260) {
2433 int bits;
2434
2435 DEFLATE_WRITE(Z_BLOCK);
2436 (void)deflatePending(strm, Z_NULL, &bits);
2437 if ((bits & 1) || !g.setdict)
2438 DEFLATE_WRITE(Z_SYNC_FLUSH);
2439 else if (bits & 7) {
2440 do {
2441 bits = deflatePrime(strm, 10, 2);
2442 assert(bits == Z_OK);
2443 (void)deflatePending(strm, Z_NULL, &bits);
2444 } while (bits & 7);
2445 DEFLATE_WRITE(Z_NO_FLUSH);
2446 }
2447 }
2448 else
2449 DEFLATE_WRITE(Z_SYNC_FLUSH);
2450 #else
2451 DEFLATE_WRITE(Z_SYNC_FLUSH);
2452 #endif
2453 if (!g.setdict) // two markers when independent
2454 DEFLATE_WRITE(Z_FULL_FLUSH);
2455 }
2456 else
2457 DEFLATE_WRITE(Z_FINISH);
2458 #ifndef NOZOPFLI
2459 }
2460 else {
2461 // compress got bytes using zopfli, bring to byte boundary
2462 unsigned char bits, *def;
2463 size_t size, off;
2464
2465 // discard history if requested
2466 off = (size_t)(strm->next_in - in);
2467 if (fresh)
2468 hist = off;
2469
2470 def = NULL;
2471 size = 0;
2472 bits = 0;
2473 ZopfliDeflatePart(&g.zopts, 2, !(more || left),
2474 in + hist, off - hist, (off - hist) + got,
2475 &bits, &def, &size);
2476 bits &= 7;
2477 if (more || left) {
2478 if ((bits & 1) || !g.setdict) {
2479 writen(g.outd, def, size);
2480 if (bits == 0 || bits > 5)
2481 writen(g.outd, (unsigned char *)"\0", 1);
2482 writen(g.outd, (unsigned char *)"\0\0\xff\xff", 4);
2483 }
2484 else {
2485 assert(size > 0);
2486 writen(g.outd, def, size - 1);
2487 if (bits)
2488 do {
2489 def[size - 1] += 2 << bits;
2490 writen(g.outd, def + size - 1, 1);
2491 def[size - 1] = 0;
2492 bits += 2;
2493 } while (bits < 8);
2494 writen(g.outd, def + size - 1, 1);
2495 }
2496 if (!g.setdict) // two markers when independent
2497 writen(g.outd, (unsigned char *)"\0\0\0\xff\xff", 5);
2498 }
2499 else
2500 writen(g.outd, def, size);
2501 free(def);
2502 while (got > MAXP2) {
2503 check = CHECK(check, strm->next_in, MAXP2);
2504 strm->next_in += MAXP2;
2505 got -= MAXP2;
2506 }
2507 check = CHECK(check, strm->next_in, (unsigned)got);
2508 strm->next_in += got;
2509 got = left;
2510 }
2511 #endif
2512
2513 // do until no more input
2514 } while (more || got);
2515
2516 // write trailer
2517 put_trailer(ulen, clen, check, head);
2518 }
2519
2520 // --- decompression ---
2521
2522 #ifndef NOTHREAD
2523 // Parallel read thread. If the state is 1, then read a buffer and set the
2524 // state to 0 when done, if the state is > 1, then end this thread.
2525 local void load_read(void *dummy) {
2526 size_t len;
2527 ball_t err; // error information from throw()
2528
2529 (void)dummy;
2530
2531 Trace(("-- launched decompress read thread"));
2532 try {
2533 do {
2534 possess(g.load_state);
2535 wait_for(g.load_state, NOT_TO_BE, 0);
2536 if (peek_lock(g.load_state) > 1) {
2537 release(g.load_state);
2538 break;
2539 }
2540 g.in_len = len = readn(g.ind, g.in_which ? g.in_buf : g.in_buf2,
2541 BUF);
2542 Trace(("-- decompress read thread read %lu bytes", len));
2543 twist(g.load_state, TO, 0);
2544 } while (len == BUF);
2545 }
2546 catch (err) {
2547 THREADABORT(err);
2548 }
2549 Trace(("-- exited decompress read thread"));
2550 }
2551
2552 // Wait for load_read() to complete the current read operation. If the
2553 // load_read() thread is not active, then return immediately.
2554 local void load_wait(void) {
2555 if (g.in_which == -1)
2556 return;
2557 possess(g.load_state);
2558 wait_for(g.load_state, TO_BE, 0);
2559 release(g.load_state);
2560 }
2561 #endif
2562
2563 // load() is called when the input has been consumed in order to provide more
2564 // input data: load the input buffer with BUF or fewer bytes (fewer if at end
2565 // of file) from the file g.ind, set g.in_next to point to the g.in_left bytes
2566 // read, update g.in_tot, and return g.in_left. g.in_eof is set to true when
2567 // g.in_left has gone to zero and there is no more data left to read.
2568 local size_t load(void) {
2569 // if already detected end of file, do nothing
2570 if (g.in_short) {
2571 g.in_eof = 1;
2572 g.in_left = 0;
2573 return 0;
2574 }
2575
2576 #ifndef NOTHREAD
2577 // if first time in or procs == 1, read a buffer to have something to
2578 // return, otherwise wait for the previous read job to complete
2579 if (g.procs > 1) {
2580 // if first time, fire up the read thread, ask for a read
2581 if (g.in_which == -1) {
2582 g.in_which = 1;
2583 g.load_state = new_lock(1);
2584 g.load_thread = launch(load_read, NULL);
2585 }
2586
2587 // wait for the previously requested read to complete
2588 load_wait();
2589
2590 // set up input buffer with the data just read
2591 g.in_next = g.in_which ? g.in_buf : g.in_buf2;
2592 g.in_left = g.in_len;
2593
2594 // if not at end of file, alert read thread to load next buffer,
2595 // alternate between g.in_buf and g.in_buf2
2596 if (g.in_len == BUF) {
2597 g.in_which = 1 - g.in_which;
2598 possess(g.load_state);
2599 twist(g.load_state, TO, 1);
2600 }
2601
2602 // at end of file -- join read thread (already exited), clean up
2603 else {
2604 join(g.load_thread);
2605 free_lock(g.load_state);
2606 g.in_which = -1;
2607 }
2608 }
2609 else
2610 #endif
2611 {
2612 // don't use threads -- simply read a buffer into g.in_buf
2613 g.in_left = readn(g.ind, g.in_next = g.in_buf, BUF);
2614 }
2615
2616 // note end of file
2617 if (g.in_left < BUF) {
2618 g.in_short = 1;
2619
2620 // if we got bupkis, now is the time to mark eof
2621 if (g.in_left == 0)
2622 g.in_eof = 1;
2623 }
2624
2625 // update the total and return the available bytes
2626 g.in_tot += g.in_left;
2627 return g.in_left;
2628 }
2629
2630 // Terminate the load() operation. Empty buffer, mark end, close file (if not
2631 // stdin), and free the name and comment obtained from the header, if present.
2632 local void load_end(void) {
2633 #ifndef NOTHREAD
2634 // if the read thread is running, then end it
2635 if (g.in_which != -1) {
2636 // wait for the previously requested read to complete and send the
2637 // thread a message to exit
2638 possess(g.load_state);
2639 wait_for(g.load_state, TO_BE, 0);
2640 twist(g.load_state, TO, 2);
2641
2642 // join the thread (which has exited or will very shortly) and clean up
2643 join(g.load_thread);
2644 free_lock(g.load_state);
2645 g.in_which = -1;
2646 }
2647 #endif
2648 g.in_left = 0;
2649 g.in_short = 1;
2650 g.in_eof = 1;
2651 if (g.ind != 0)
2652 close(g.ind);
2653 RELEASE(g.hname);
2654 RELEASE(g.hcomm);
2655 }
2656
2657 // Initialize for reading new input.
2658 local void in_init(void) {
2659 g.in_left = 0;
2660 g.in_eof = 0;
2661 g.in_short = 0;
2662 g.in_tot = 0;
2663 #ifndef NOTHREAD
2664 g.in_which = -1;
2665 #endif
2666 }
2667
2668 // Buffered reading macros for decompression and listing.
2669 #define GET() (g.in_left == 0 && (g.in_eof || load() == 0) ? 0 : \
2670 (g.in_left--, *g.in_next++))
2671 #define GET2() (tmp2 = GET(), tmp2 + ((unsigned)(GET()) << 8))
2672 #define GET4() (tmp4 = GET2(), tmp4 + ((unsigned long)(GET2()) << 16))
2673 #define SKIP(dist) \
2674 do { \
2675 size_t togo = (dist); \
2676 while (togo > g.in_left) { \
2677 togo -= g.in_left; \
2678 if (load() == 0) \
2679 return -3; \
2680 } \
2681 g.in_left -= togo; \
2682 g.in_next += togo; \
2683 } while (0)
2684
2685 // GET(), GET2(), GET4() and SKIP() equivalents, with crc update.
2686 #define GETC() (g.in_left == 0 && (g.in_eof || load() == 0) ? 0 : \
2687 (g.in_left--, crc = crc32z(crc, g.in_next, 1), *g.in_next++))
2688 #define GET2C() (tmp2 = GETC(), tmp2 + ((unsigned)(GETC()) << 8))
2689 #define GET4C() (tmp4 = GET2C(), tmp4 + ((unsigned long)(GET2C()) << 16))
2690 #define SKIPC(dist) \
2691 do { \
2692 size_t togo = (dist); \
2693 while (togo > g.in_left) { \
2694 crc = crc32z(crc, g.in_next, g.in_left); \
2695 togo -= g.in_left; \
2696 if (load() == 0) \
2697 return -3; \
2698 } \
2699 crc = crc32z(crc, g.in_next, togo); \
2700 g.in_left -= togo; \
2701 g.in_next += togo; \
2702 } while (0)
2703
2704 // Get a zero-terminated string into allocated memory, with crc update.
2705 #define GETZC(str) \
2706 do { \
2707 unsigned char *end; \
2708 size_t copy, have, size = 0; \
2709 have = 0; \
2710 do { \
2711 if (g.in_left == 0 && load() == 0) \
2712 return -3; \
2713 end = memchr(g.in_next, 0, g.in_left); \
2714 copy = end == NULL ? g.in_left : (size_t)(end - g.in_next) + 1; \
2715 have = vmemcpy(&str, &size, have, g.in_next, copy); \
2716 g.in_left -= copy; \
2717 g.in_next += copy; \
2718 } while (end == NULL); \
2719 crc = crc32z(crc, (unsigned char *)str, have); \
2720 } while (0)
2721
2722 // Pull LSB order or MSB order integers from an unsigned char buffer.
2723 #define PULL2L(p) ((p)[0] + ((unsigned)((p)[1]) << 8))
2724 #define PULL4L(p) (PULL2L(p) + ((unsigned long)(PULL2L((p) + 2)) << 16))
2725 #define PULL2M(p) (((unsigned)((p)[0]) << 8) + (p)[1])
2726 #define PULL4M(p) (((unsigned long)(PULL2M(p)) << 16) + PULL2M((p) + 2))
2727
2728 // Convert MS-DOS date and time to a Unix time, assuming current timezone.
2729 // (You got a better idea?)
2730 local time_t dos2time(unsigned long dos) {
2731 struct tm tm;
2732
2733 if (dos == 0)
2734 return time(NULL);
2735 tm.tm_year = ((int)(dos >> 25) & 0x7f) + 80;
2736 tm.tm_mon = ((int)(dos >> 21) & 0xf) - 1;
2737 tm.tm_mday = (int)(dos >> 16) & 0x1f;
2738 tm.tm_hour = (int)(dos >> 11) & 0x1f;
2739 tm.tm_min = (int)(dos >> 5) & 0x3f;
2740 tm.tm_sec = (int)(dos << 1) & 0x3e;
2741 tm.tm_isdst = -1; // figure out if DST or not
2742 return mktime(&tm);
2743 }
2744
2745 // Convert an unsigned 32-bit integer to signed, even if long > 32 bits.
2746 local long tolong(unsigned long val) {
2747 return (long)(val & 0x7fffffffUL) - (long)(val & 0x80000000UL);
2748 }
2749
2750 // Process zip extra field to extract zip64 lengths and Unix mod time.
2751 local int read_extra(unsigned len, int save) {
2752 unsigned id, size, tmp2;
2753 unsigned long tmp4;
2754
2755 // process extra blocks
2756 while (len >= 4) {
2757 id = GET2();
2758 size = GET2();
2759 if (g.in_eof)
2760 return -1;
2761 len -= 4;
2762 if (size > len)
2763 break;
2764 len -= size;
2765 if (id == 0x0001) {
2766 // Zip64 Extended Information Extra Field
2767 g.zip64 = 1;
2768 if (g.zip_ulen == LOW32 && size >= 8) {
2769 g.zip_ulen = GET4();
2770 SKIP(4);
2771 size -= 8;
2772 }
2773 if (g.zip_clen == LOW32 && size >= 8) {
2774 g.zip_clen = GET4();
2775 SKIP(4);
2776 size -= 8;
2777 }
2778 }
2779 if (save) {
2780 if ((id == 0x000d || id == 0x5855) && size >= 8) {
2781 // PKWare Unix or Info-ZIP Type 1 Unix block
2782 SKIP(4);
2783 g.stamp = tolong(GET4());
2784 size -= 8;
2785 }
2786 if (id == 0x5455 && size >= 5) {
2787 // Extended Timestamp block
2788 size--;
2789 if (GET() & 1) {
2790 g.stamp = tolong(GET4());
2791 size -= 4;
2792 }
2793 }
2794 }
2795 SKIP(size);
2796 }
2797 SKIP(len);
2798 return 0;
2799 }
2800
2801 // Read a gzip, zip, zlib, or Unix compress header from ind and return the
2802 // compression method in the range 0..257. 8 is deflate, 256 is a zip method
2803 // greater than 255, and 257 is LZW (compress). The only methods decompressed
2804 // by pigz are 8 and 257. On error, return negative: -1 is immediate EOF, -2 is
2805 // not a recognized compressed format (considering only the first two bytes of
2806 // input), -3 is premature EOF within the header, -4 is unexpected header flag
2807 // values, -5 is the zip central directory, and -6 is a failed gzip header crc
2808 // check. If -2 is returned, the input pointer has been reset to the beginning.
2809 // If the return value is not negative, then get_header() sets g.form to
2810 // indicate gzip (0), zlib (1), or zip (2, or 3 if the entry is followed by a
2811 // data descriptor), and the input points to the first byte of compressed data.
2812 local int get_header(int save) {
2813 unsigned magic; // magic header
2814 unsigned method; // compression method
2815 unsigned flags; // header flags
2816 unsigned fname, extra; // name and extra field lengths
2817 unsigned tmp2; // for macro
2818 unsigned long tmp4; // for macro
2819 unsigned long crc; // gzip header crc
2820
2821 // clear return information
2822 if (save) {
2823 g.stamp = 0;
2824 RELEASE(g.hname);
2825 RELEASE(g.hcomm);
2826 }
2827
2828 // see if it's a gzip, zlib, or lzw file
2829 g.magic1 = GET();
2830 if (g.in_eof) {
2831 g.magic1 = -1;
2832 return -1;
2833 }
2834 magic = (unsigned)g.magic1 << 8;
2835 magic += GET();
2836 if (g.in_eof)
2837 return -2;
2838 if (magic % 31 == 0 && (magic & 0x8f20) == 0x0800) {
2839 // it's zlib
2840 g.form = 1;
2841 return 8;
2842 }
2843 if (magic == 0x1f9d) { // it's lzw
2844 g.form = -1;
2845 return 257;
2846 }
2847 if (magic == 0x504b) { // it's zip
2848 magic = GET2(); // the rest of the signature
2849 if (g.in_eof)
2850 return -3;
2851 if (magic == 0x0201 || magic == 0x0806)
2852 return -5; // central header or archive extra
2853 if (magic != 0x0403)
2854 return -4; // not a local header
2855 g.zip64 = 0;
2856 SKIP(2);
2857 flags = GET2();
2858 if (flags & 0xf7f0)
2859 return -4;
2860 method = GET(); // return low byte of method or 256
2861 if (GET() != 0 || flags & 1)
2862 method = 256; // unknown or encrypted
2863 if (save)
2864 g.stamp = dos2time(GET4());
2865 else
2866 SKIP(4);
2867 g.zip_crc = GET4();
2868 g.zip_clen = GET4();
2869 g.zip_ulen = GET4();
2870 fname = GET2();
2871 extra = GET2();
2872 if (save) {
2873 char *next;
2874
2875 if (g.in_eof)
2876 return -3;
2877 next = g.hname = alloc(NULL, fname + 1);
2878 while (fname > g.in_left) {
2879 memcpy(next, g.in_next, g.in_left);
2880 fname -= g.in_left;
2881 next += g.in_left;
2882 if (load() == 0)
2883 return -3;
2884 }
2885 memcpy(next, g.in_next, fname);
2886 g.in_left -= fname;
2887 g.in_next += fname;
2888 next += fname;
2889 *next = 0;
2890 }
2891 else
2892 SKIP(fname);
2893 read_extra(extra, save);
2894 g.form = 2 + ((flags & 8) >> 3);
2895 return g.in_eof ? -3 : (int)method;
2896 }
2897 if (magic != 0x1f8b) { // not gzip
2898 g.in_left++; // return the second byte
2899 g.in_next--;
2900 return -2;
2901 }
2902
2903 // it's gzip -- get method and flags
2904 crc = 0xf6e946c9; // crc of 0x1f 0x8b
2905 method = GETC();
2906 flags = GETC();
2907 if (flags & 0xe0)
2908 return -4;
2909
2910 // get time stamp
2911 if (save)
2912 g.stamp = tolong(GET4C());
2913 else
2914 SKIPC(4);
2915
2916 // skip extra field and OS
2917 SKIPC(2);
2918
2919 // skip extra field, if present
2920 if (flags & 4)
2921 SKIPC(GET2C());
2922
2923 // read file name, if present, into allocated memory
2924 if (flags & 8) {
2925 if (save)
2926 GETZC(g.hname);
2927 else
2928 while (GETC() != 0)
2929 ;
2930 }
2931
2932 // read comment, if present, into allocated memory
2933 if (flags & 16) {
2934 if (save)
2935 GETZC(g.hcomm);
2936 else
2937 while (GETC() != 0)
2938 ;
2939 }
2940
2941 // check header crc
2942 if ((flags & 2) && GET2() != (crc & 0xffff))
2943 return -6;
2944
2945 // return gzip compression method
2946 g.form = 0;
2947 return g.in_eof ? -3 : (int)method;
2948 }
2949
2950 // Process the remainder of a zip file after the first entry. Return true if
2951 // the next signature is another local file header. If listing verbosely, then
2952 // search the remainder of the zip file for the central file header
2953 // corresponding to the first zip entry, and save the file comment, if any.
2954 local int more_zip_entries(void) {
2955 unsigned long sig;
2956 int ret, n;
2957 unsigned char *first;
2958 unsigned tmp2; // for macro
2959 unsigned long tmp4; // for macro
2960 unsigned char const central[] = {0x50, 0x4b, 1, 2};
2961
2962 sig = GET4();
2963 ret = !g.in_eof && sig == 0x04034b50; // true if another entry follows
2964 if (!g.list || g.verbosity < 2)
2965 return ret;
2966
2967 // if it was a central file header signature, then already four bytes
2968 // into a central directory header -- otherwise search for the next one
2969 n = sig == 0x02014b50 ? 4 : 0; // number of bytes into central header
2970 for (;;) {
2971 // assure that more input is available
2972 if (g.in_left == 0 && load() == 0) // never found it!
2973 return ret;
2974 if (n == 0) {
2975 // look for first byte in central signature
2976 first = memchr(g.in_next, central[0], g.in_left);
2977 if (first == NULL) {
2978 // not found -- go get the next buffer and keep looking
2979 g.in_left = 0;
2980 }
2981 else {
2982 // found -- continue search at next byte
2983 n++;
2984 g.in_left -= first - g.in_next + 1;
2985 g.in_next = first + 1;
2986 }
2987 }
2988 else if (n < 4) {
2989 // look for the remaining bytes in the central signature
2990 if (g.in_next[0] == central[n]) {
2991 n++;
2992 g.in_next++;
2993 g.in_left--;
2994 }
2995 else
2996 n = 0; // mismatch -- restart search with this byte
2997 }
2998 else {
2999 // Now in a suspected central file header, just past the signature.
3000 // Read the rest of the fixed-length portion of the header.
3001 unsigned char head[CEN];
3002 size_t need = CEN, part = 0, len, i;
3003
3004 if (need > g.in_left) { // will only need to do this once
3005 part = g.in_left;
3006 memcpy(head + CEN - need, g.in_next, part);
3007 need -= part;
3008 g.in_left = 0;
3009 if (load() == 0) // never found it!
3010 return ret;
3011 }
3012 memcpy(head + CEN - need, g.in_next, need);
3013
3014 // Determine to sufficient probability that this is the droid we're
3015 // looking for, by checking the CRC and the local header offset.
3016 if (PULL4L(head + 12) == g.out_check && PULL4L(head + 38) == 0) {
3017 // Update the number of bytes consumed from the current buffer.
3018 g.in_next += need;
3019 g.in_left -= need;
3020
3021 // Get the comment length.
3022 len = PULL2L(head + 28);
3023 if (len == 0) // no comment
3024 return ret;
3025
3026 // Skip the file name and extra field.
3027 SKIP(PULL2L(head + 24) + (unsigned long)PULL2L(head + 26));
3028
3029 // Save the comment field.
3030 need = len;
3031 g.hcomm = alloc(NULL, len + 1);
3032 while (need > g.in_left) {
3033 memcpy(g.hcomm + len - need, g.in_next, g.in_left);
3034 need -= g.in_left;
3035 g.in_left = 0;
3036 if (load() == 0) { // premature EOF
3037 RELEASE(g.hcomm);
3038 return ret;
3039 }
3040 }
3041 memcpy(g.hcomm + len - need, g.in_next, need);
3042 g.in_next += need;
3043 g.in_left -= need;
3044 for (i = 0; i < len; i++)
3045 if (g.hcomm[i] == 0)
3046 g.hcomm[i] = ' ';
3047 g.hcomm[len] = 0;
3048 return ret;
3049 }
3050 else {
3051 // Nope, false alarm. Restart the search at the first byte
3052 // after what we thought was the central file header signature.
3053 if (part) {
3054 // Move buffer data up and insert the part of the header
3055 // data read from the previous buffer.
3056 memmove(g.in_next + part, g.in_next, g.in_left);
3057 memcpy(g.in_next, head, part);
3058 g.in_left += part;
3059 }
3060 n = 0;
3061 }
3062 }
3063 }
3064 }
3065
3066 // --- list contents of compressed input (gzip, zlib, or lzw) ---
3067
3068 // Find standard compressed file suffix, return length of suffix.
3069 local size_t compressed_suffix(char *nm) {
3070 size_t len;
3071
3072 len = strlen(nm);
3073 if (len > 4) {
3074 nm += len - 4;
3075 len = 4;
3076 if (strcmp(nm, ".zip") == 0 || strcmp(nm, ".ZIP") == 0 ||
3077 strcmp(nm, ".tgz") == 0)
3078 return 4;
3079 }
3080 if (len > 3) {
3081 nm += len - 3;
3082 len = 3;
3083 if (strcmp(nm, ".gz") == 0 || strcmp(nm, "-gz") == 0 ||
3084 strcmp(nm, ".zz") == 0 || strcmp(nm, "-zz") == 0)
3085 return 3;
3086 }
3087 if (len > 2) {
3088 nm += len - 2;
3089 if (strcmp(nm, ".z") == 0 || strcmp(nm, "-z") == 0 ||
3090 strcmp(nm, "_z") == 0 || strcmp(nm, ".Z") == 0)
3091 return 2;
3092 }
3093 return 0;
3094 }
3095
3096 // Listing file name lengths for -l and -lv.
3097 #define NAMEMAX1 48 // name display limit at verbosity 1
3098 #define NAMEMAX2 16 // name display limit at verbosity 2
3099
3100 // Print gzip, lzw, zlib, or zip file information.
3101 local void show_info(int method, unsigned long check, length_t len, int cont) {
3102 size_t max; // maximum name length for current verbosity
3103 size_t n; // name length without suffix
3104 time_t now; // for getting current year
3105 char mod[26]; // modification time in text
3106 char tag[NAMEMAX1+1]; // header or file name, possibly truncated
3107
3108 // create abbreviated name from header file name or actual file name
3109 max = g.verbosity > 1 ? NAMEMAX2 : NAMEMAX1;
3110 memset(tag, 0, max + 1);
3111 if (cont)
3112 strncpy(tag, "<...>", max + 1);
3113 else if (g.hname == NULL) {
3114 n = strlen(g.inf) - compressed_suffix(g.inf);
3115 memcpy(tag, g.inf, n > max + 1 ? max + 1 : n);
3116 if (strcmp(g.inf + n, ".tgz") == 0 && n < max + 1)
3117 strncpy(tag + n, ".tar", max + 1 - n);
3118 }
3119 else
3120 strncpy(tag, g.hname, max + 1);
3121 if (tag[max])
3122 strcpy(tag + max - 3, "...");
3123
3124 // convert time stamp to text
3125 if (g.stamp && !cont) {
3126 strcpy(mod, ctime(&g.stamp));
3127 now = time(NULL);
3128 if (strcmp(mod + 20, ctime(&now) + 20) != 0)
3129 strcpy(mod + 11, mod + 19);
3130 }
3131 else
3132 strcpy(mod + 4, "------ -----");
3133 mod[16] = 0;
3134
3135 // if first time, print header
3136 if (g.first) {
3137 if (g.verbosity > 1)
3138 fputs("method check timestamp ", stdout);
3139 if (g.verbosity > 0)
3140 puts("compressed original reduced name");
3141 g.first = 0;
3142 }
3143
3144 // print information
3145 if (g.verbosity > 1) {
3146 if (g.form == 3 && !g.decode)
3147 printf("zip%3d -------- %s ", method, mod + 4);
3148 else if (g.form > 1)
3149 printf("zip%3d %08lx %s ", method, check, mod + 4);
3150 else if (g.form == 1)
3151 printf("zlib%2d %08lx %s ", method, check, mod + 4);
3152 else if (method == 257)
3153 printf("lzw -------- %s ", mod + 4);
3154 else
3155 printf("gzip%2d %08lx %s ", method, check, mod + 4);
3156 }
3157 if (g.verbosity > 0) {
3158 // compute reduction percent -- allow divide-by-zero, displays as -inf%
3159 double red = 100. * (len - (double)g.in_tot) / len;
3160 if ((g.form == 3 && !g.decode) ||
3161 (method == 8 && g.in_tot > (len + (len >> 10) + 12)) ||
3162 (method == 257 && g.in_tot > len + (len >> 1) + 3))
3163 #if __STDC_VERSION__-0 >= 199901L || __GNUC__-0 >= 3
3164 printf("%10ju %10ju? unk %s\n", g.in_tot, len, tag);
3165 else
3166 printf("%10ju %10ju %6.1f%% %s\n", g.in_tot, len, red, tag);
3167 #else
3168 printf("%10lu %10lu? unk %s\n", g.in_tot, len, tag);
3169 else
3170 printf("%10lu %10lu %6.1f%% %s\n", g.in_tot, len, red, tag);
3171 #endif
3172 }
3173 if (g.verbosity > 1 && g.hcomm != NULL)
3174 puts(g.hcomm);
3175 }
3176
3177 // List content information about the gzip file at ind (only works if the gzip
3178 // file contains a single gzip stream with no junk at the end, and only works
3179 // well if the uncompressed length is less than 4 GB).
3180 local void list_info(void) {
3181 int method; // get_header() return value
3182 size_t n; // available trailer bytes
3183 off_t at; // used to calculate compressed length
3184 unsigned char tail[8]; // trailer containing check and length
3185 unsigned long check; // check value
3186 length_t len; // length from trailer
3187
3188 // initialize input buffer
3189 in_init();
3190
3191 // read header information and position input after header
3192 method = get_header(1);
3193 if (method < 0) {
3194 complain(method == -6 ? "skipping: %s corrupt: header crc error" :
3195 method == -1 ? "skipping: %s empty" :
3196 "skipping: %s unrecognized format", g.inf);
3197 return;
3198 }
3199
3200 #ifndef NOTHREAD
3201 // wait for read thread to complete current read() operation, to permit
3202 // seeking and reading on g.ind here in the main thread
3203 load_wait();
3204 #endif
3205
3206 // list zip file
3207 if (g.form > 1) {
3208 more_zip_entries(); // get first entry comment, if any
3209 g.in_tot = g.zip_clen;
3210 show_info(method, g.zip_crc, g.zip_ulen, 0);
3211 return;
3212 }
3213
3214 // list zlib file
3215 if (g.form == 1) {
3216 at = lseek(g.ind, 0, SEEK_END);
3217 if (at == -1) {
3218 check = 0;
3219 do {
3220 len = g.in_left < 4 ? g.in_left : 4;
3221 g.in_next += g.in_left - len;
3222 while (len--)
3223 check = (check << 8) + *g.in_next++;
3224 } while (load() != 0);
3225 check &= LOW32;
3226 }
3227 else {
3228 g.in_tot = (length_t)at;
3229 lseek(g.ind, -4, SEEK_END);
3230 readn(g.ind, tail, 4);
3231 check = PULL4M(tail);
3232 }
3233 g.in_tot -= 6;
3234 show_info(method, check, 0, 0);
3235 return;
3236 }
3237
3238 // list lzw file
3239 if (method == 257) {
3240 at = lseek(g.ind, 0, SEEK_END);
3241 if (at == -1)
3242 while (load() != 0)
3243 ;
3244 else
3245 g.in_tot = (length_t)at;
3246 g.in_tot -= 3;
3247 show_info(method, 0, 0, 0);
3248 return;
3249 }
3250
3251 // skip to end to get trailer (8 bytes), compute compressed length
3252 if (g.in_short) { // whole thing already read
3253 if (g.in_left < 8) {
3254 complain("skipping: %s not a valid gzip file", g.inf);
3255 return;
3256 }
3257 g.in_tot = g.in_left - 8; // compressed size
3258 memcpy(tail, g.in_next + (g.in_left - 8), 8);
3259 }
3260 else if ((at = lseek(g.ind, -8, SEEK_END)) != -1) {
3261 g.in_tot = (length_t)at - g.in_tot + g.in_left; // compressed size
3262 readn(g.ind, tail, 8); // get trailer
3263 }
3264 else { // can't seek
3265 len = g.in_tot - g.in_left; // save header size
3266 do {
3267 n = g.in_left < 8 ? g.in_left : 8;
3268 memcpy(tail, g.in_next + (g.in_left - n), n);
3269 load();
3270 } while (g.in_left == BUF); // read until end
3271 if (g.in_left < 8) {
3272 if (n + g.in_left < 8) {
3273 complain("skipping: %s not a valid gzip file", g.inf);
3274 return;
3275 }
3276 if (g.in_left) {
3277 if (n + g.in_left > 8)
3278 memcpy(tail, tail + n - (8 - g.in_left), 8 - g.in_left);
3279 memcpy(tail + 8 - g.in_left, g.in_next, g.in_left);
3280 }
3281 }
3282 else
3283 memcpy(tail, g.in_next + (g.in_left - 8), 8);
3284 g.in_tot -= len + 8;
3285 }
3286 if (g.in_tot < 2) {
3287 complain("skipping: %s not a valid gzip file", g.inf);
3288 return;
3289 }
3290
3291 // convert trailer to check and uncompressed length (modulo 2^32)
3292 check = PULL4L(tail);
3293 len = PULL4L(tail + 4);
3294
3295 // list information about contents
3296 show_info(method, check, len, 0);
3297 }
3298
3299 // --- copy input to output (when acting like cat) ---
3300
3301 local void cat(void) {
3302 // copy the first header byte read, if any
3303 if (g.magic1 != -1) {
3304 unsigned char buf[1] = {g.magic1};
3305 g.out_tot += writen(g.outd, buf, 1);
3306 }
3307
3308 // copy the remainder of the input to the output
3309 while (g.in_left) {
3310 g.out_tot += writen(g.outd, g.in_next, g.in_left);
3311 g.in_left = 0;
3312 load();
3313 }
3314 }
3315
3316 // --- decompress deflate input ---
3317
3318 // Call-back input function for inflateBack().
3319 local unsigned inb(void *desc, unsigned char **buf) {
3320 (void)desc;
3321 if (g.in_left == 0)
3322 load();
3323 *buf = g.in_next;
3324 unsigned len = g.in_left > UINT_MAX ? UINT_MAX : (unsigned)g.in_left;
3325 g.in_next += len;
3326 g.in_left -= len;
3327 return len;
3328 }
3329
3330 // Output buffers and window for infchk() and unlzw().
3331 #define OUTSIZE 32768U // must be at least 32K for inflateBack() window
3332 local unsigned char out_buf[OUTSIZE];
3333
3334 #ifndef NOTHREAD
3335 // Output data for parallel write and check.
3336 local unsigned char out_copy[OUTSIZE];
3337 local size_t out_len;
3338
3339 // outb threads states.
3340 local lock *outb_write_more = NULL;
3341 local lock *outb_check_more;
3342
3343 // Output write thread.
3344 local void outb_write(void *dummy) {
3345 size_t len;
3346 ball_t err; // error information from throw()
3347
3348 (void)dummy;
3349
3350 Trace(("-- launched decompress write thread"));
3351 try {
3352 do {
3353 possess(outb_write_more);
3354 wait_for(outb_write_more, TO_BE, 1);
3355 len = out_len;
3356 if (len && g.decode == 1)
3357 writen(g.outd, out_copy, len);
3358 Trace(("-- decompress wrote %lu bytes", len));
3359 twist(outb_write_more, TO, 0);
3360 } while (len);
3361 }
3362 catch (err) {
3363 THREADABORT(err);
3364 }
3365 Trace(("-- exited decompress write thread"));
3366 }
3367
3368 // Output check thread.
3369 local void outb_check(void *dummy) {
3370 size_t len;
3371 ball_t err; // error information from throw()
3372
3373 (void)dummy;
3374
3375 Trace(("-- launched decompress check thread"));
3376 try {
3377 do {
3378 possess(outb_check_more);
3379 wait_for(outb_check_more, TO_BE, 1);
3380 len = out_len;
3381 g.out_check = CHECK(g.out_check, out_copy, len);
3382 Trace(("-- decompress checked %lu bytes", len));
3383 twist(outb_check_more, TO, 0);
3384 } while (len);
3385 }
3386 catch (err) {
3387 THREADABORT(err);
3388 }
3389 Trace(("-- exited decompress check thread"));
3390 }
3391 #endif
3392
3393 // Call-back output function for inflateBack(). Wait for the last write and
3394 // check calculation to complete, copy the write buffer, and then alert the
3395 // write and check threads and return for more decompression while that's going
3396 // on (or just write and check if no threads or if proc == 1).
3397 local int outb(void *desc, unsigned char *buf, unsigned len) {
3398 (void)desc;
3399
3400 #ifndef NOTHREAD
3401 static thread *wr, *ch;
3402
3403 if (g.procs > 1) {
3404 // if first time, initialize state and launch threads
3405 if (outb_write_more == NULL) {
3406 outb_write_more = new_lock(0);
3407 outb_check_more = new_lock(0);
3408 wr = launch(outb_write, NULL);
3409 ch = launch(outb_check, NULL);
3410 }
3411
3412 // wait for previous write and check threads to complete
3413 possess(outb_check_more);
3414 wait_for(outb_check_more, TO_BE, 0);
3415 possess(outb_write_more);
3416 wait_for(outb_write_more, TO_BE, 0);
3417
3418 // copy the output and alert the worker bees
3419 out_len = len;
3420 if (len) {
3421 g.out_tot += len;
3422 memcpy(out_copy, buf, len);
3423 }
3424 twist(outb_write_more, TO, 1);
3425 twist(outb_check_more, TO, 1);
3426
3427 // if requested with len == 0, clean up -- terminate and join write and
3428 // check threads, free lock
3429 if (len == 0 && outb_write_more != NULL) {
3430 join(ch);
3431 join(wr);
3432 free_lock(outb_check_more);
3433 free_lock(outb_write_more);
3434 outb_write_more = NULL;
3435 }
3436
3437 // return for more decompression while last buffer is being written and
3438 // having its check value calculated -- we wait for those to finish the
3439 // next time this function is called
3440 return 0;
3441 }
3442 #endif
3443
3444 // if just one process or no threads, then do it without threads
3445 if (len) {
3446 if (g.decode == 1)
3447 writen(g.outd, buf, len);
3448 g.out_check = CHECK(g.out_check, buf, len);
3449 g.out_tot += len;
3450 }
3451 return 0;
3452 }
3453
3454 // Zip file data descriptor signature. This signature may or may not precede
3455 // the CRC and lengths, with either resulting in a valid zip file! There is
3456 // some odd code below that tries to detect and accommodate both cases.
3457 #define SIG 0x08074b50
3458
3459 // Inflate for decompression or testing. Decompress from ind to outd unless
3460 // decode != 1, in which case just test ind, and then also list if list != 0;
3461 // look for and decode multiple, concatenated gzip and/or zlib streams; read
3462 // and check the gzip, zlib, or zip trailer.
3463 local void infchk(void) {
3464 int ret, cont, more;
3465 unsigned long check, len, ktot;
3466 z_stream strm;
3467 unsigned tmp2;
3468 unsigned long tmp4;
3469 length_t clen, ctot, utot;
3470
3471 ctot = utot = 0;
3472 ktot = CHECK(0L, Z_NULL, 0);
3473 cont = more = 0;
3474 do {
3475 // header already read -- set up for decompression
3476 g.in_tot = g.in_left; // track compressed data length
3477 g.out_tot = 0;
3478 g.out_check = CHECK(0L, Z_NULL, 0);
3479 strm.zalloc = ZALLOC;
3480 strm.zfree = ZFREE;
3481 strm.opaque = OPAQUE;
3482 ret = inflateBackInit(&strm, 15, out_buf);
3483 if (ret == Z_MEM_ERROR)
3484 throw(ENOMEM, "not enough memory");
3485 if (ret != Z_OK)
3486 throw(EINVAL, "internal error");
3487
3488 // decompress, compute lengths and check value
3489 strm.avail_in = 0;
3490 strm.next_in = Z_NULL;
3491 ret = inflateBack(&strm, inb, NULL, outb, NULL);
3492 inflateBackEnd(&strm);
3493 g.in_left += strm.avail_in;
3494 g.in_next = strm.next_in;
3495 outb(NULL, NULL, 0); // finish off final write and check
3496 if (ret == Z_DATA_ERROR)
3497 throw(EDOM, "%s: corrupted -- invalid deflate data (%s)",
3498 g.inf, strm.msg);
3499 if (ret == Z_BUF_ERROR)
3500 throw(EDOM, "%s: corrupted -- incomplete deflate data", g.inf);
3501 if (ret != Z_STREAM_END)
3502 throw(EINVAL, "internal error");
3503
3504 // compute compressed data length
3505 clen = g.in_tot - g.in_left;
3506
3507 // read and check trailer
3508 if (g.form > 1) { // zip local trailer (if any)
3509 if (g.form == 3) { // data descriptor follows
3510 // get data descriptor values, assuming no signature
3511 g.zip_crc = GET4();
3512 g.zip_clen = GET4();
3513 g.zip_ulen = GET4(); // ZIP64 -> high clen, not ulen
3514
3515 // deduce whether or not a signature precedes the values
3516 if (g.zip_crc == SIG && // might be the signature
3517 // if the expected CRC is not SIG, then it's a signature
3518 (g.out_check != SIG || // assume signature
3519 // now we're in a very rare case where CRC == SIG -- the
3520 // first four bytes could be the signature or the CRC
3521 (g.zip_clen == SIG && // if not, then no signature
3522 // now we have the first two words are SIG and the
3523 // expected CRC is SIG, so it could be a signature and
3524 // the CRC, or it could be the CRC and a compressed
3525 // length that is *also* SIG (!) -- so check the low 32
3526 // bits of the expected compressed length for SIG
3527 ((clen & LOW32) != SIG || // assume signature and CRC
3528 // now the expected CRC *and* the expected low 32 bits
3529 // of the compressed length are SIG -- this is so
3530 // incredibly unlikely, clearly someone is messing with
3531 // us, but we continue ... if the next four bytes are
3532 // not SIG, then there is not a signature -- check those
3533 // bytes, currently in g.zip_ulen:
3534 (g.zip_ulen == SIG && // if not, then no signature
3535 // we have three SIGs in a row in the descriptor, and
3536 // both the expected CRC and the expected clen are SIG
3537 // -- the first one is a signature if we don't expect
3538 // the third word to be SIG, which is either the low 32
3539 // bits of ulen, or if ZIP64, the high 32 bits of clen:
3540 (g.zip64 ? clen >> 32 : g.out_tot) != SIG
3541 // if that last compare was equal, then the expected
3542 // values for the CRC, the low 32 bits of clen, *and*
3543 // the low 32 bits of ulen are all SIG (!!), or in the
3544 // case of ZIP64, even crazier, the CRC and *both*
3545 // 32-bit halves of clen are all SIG (clen > 500
3546 // petabytes!!!) ... we can no longer discriminate the
3547 // hypotheses, so we will assume no signature
3548 ))))) {
3549 // first four bytes were actually the descriptor -- shift
3550 // the values down and get another four bytes
3551 g.zip_crc = g.zip_clen;
3552 g.zip_clen = g.zip_ulen;
3553 g.zip_ulen = GET4();
3554 }
3555
3556 // if ZIP64, then ulen is really the high word of clen -- get
3557 // the actual ulen and skip its high word as well (we only
3558 // compare the low 32 bits of the lengths to verify)
3559 if (g.zip64) {
3560 g.zip_ulen = GET4();
3561 (void)GET4();
3562 }
3563 if (g.in_eof)
3564 throw(EDOM, "%s: corrupted entry -- missing trailer",
3565 g.inf);
3566 }
3567 check = g.zip_crc;
3568 if (check != g.out_check)
3569 throw(EDOM, "%s: corrupted entry -- crc32 mismatch", g.inf);
3570 if (g.zip_clen != (clen & LOW32) ||
3571 g.zip_ulen != (g.out_tot & LOW32))
3572 throw(EDOM, "%s: corrupted entry -- length mismatch",
3573 g.inf);
3574 more = more_zip_entries(); // see if more entries, get comment
3575 }
3576 else if (g.form == 1) { // zlib (big-endian) trailer
3577 check = (unsigned long)(GET()) << 24;
3578 check += (unsigned long)(GET()) << 16;
3579 check += (unsigned)(GET()) << 8;
3580 check += GET();
3581 if (g.in_eof)
3582 throw(EDOM, "%s: corrupted -- missing trailer", g.inf);
3583 if (check != g.out_check)
3584 throw(EDOM, "%s: corrupted -- adler32 mismatch", g.inf);
3585 }
3586 else { // gzip trailer
3587 check = GET4();
3588 len = GET4();
3589 if (g.in_eof)
3590 throw(EDOM, "%s: corrupted -- missing trailer", g.inf);
3591 if (check != g.out_check)
3592 throw(EDOM, "%s: corrupted -- crc32 mismatch", g.inf);
3593 if (len != (g.out_tot & LOW32))
3594 throw(EDOM, "%s: corrupted -- length mismatch", g.inf);
3595 }
3596
3597 // show file information if requested
3598 if (g.list) {
3599 ctot += clen;
3600 utot += g.out_tot;
3601 ktot = COMB(ktot, check, g.out_tot);
3602 g.in_tot = clen;
3603 show_info(8, check, g.out_tot, cont);
3604 cont = cont ? 2 : 1;
3605 }
3606
3607 // if a gzip entry follows a gzip entry, decompress it (don't replace
3608 // saved header information from first entry)
3609 } while (g.form == 0 && (ret = get_header(0)) == 8);
3610
3611 // show totals if more than one gzip member
3612 if (cont > 1 && g.verbosity > 0) {
3613 if (g.verbosity > 1)
3614 printf(" %08lx ", ktot);
3615 printf(
3616 #if __STDC_VERSION__-0 >= 199901L || __GNUC__-0 >= 3
3617 "%10ju %10ju %6.1f%% (total)\n",
3618 #else
3619 "%10lu %10lu %6.1f%% (total)\n",
3620 #endif
3621 ctot, utot, 100. * (utot - (double)ctot) / utot);
3622 }
3623
3624 // gzip -cdf copies junk after gzip stream directly to output
3625 if (g.form == 0 && ret == -2 && g.force && g.pipeout && g.decode != 2 &&
3626 !g.list)
3627 cat();
3628
3629 // check for more entries in zip file
3630 else if (more) {
3631 complain("warning: %s: entries after the first were ignored", g.inf);
3632 g.keep = 1; // don't delete the .zip file
3633 }
3634
3635 // check for non-gzip after gzip stream, or anything after zlib stream
3636 else if ((g.verbosity > 1 && g.form == 0 && ret != -1) ||
3637 (g.form == 1 && (GET(), !g.in_eof)))
3638 complain("warning: %s: trailing junk was ignored", g.inf);
3639 }
3640
3641 // --- decompress Unix compress (LZW) input ---
3642
3643 // Type for accumulating bits. 23 bits will be used to accumulate up to 16-bit
3644 // symbols.
3645 typedef unsigned long bits_t;
3646
3647 #define NOMORE() (g.in_left == 0 && (g.in_eof || load() == 0))
3648 #define NEXT() (g.in_left--, (unsigned)*g.in_next++)
3649
3650 // Decompress a compress (LZW) file from ind to outd. The compress magic header
3651 // (two bytes) has already been read and verified.
3652 local void unlzw(void) {
3653 unsigned bits; // current bits per code (9..16)
3654 unsigned mask; // mask for current bits codes = (1<<bits)-1
3655 bits_t buf; // bit buffer (need 23 bits)
3656 unsigned left; // bits left in buf (0..7 after code pulled)
3657 length_t mark; // offset where last change in bits began
3658 unsigned code; // code, table traversal index
3659 unsigned max; // maximum bits per code for this stream
3660 unsigned flags; // compress flags, then block compress flag
3661 unsigned end; // last valid entry in prefix/suffix tables
3662 unsigned prev; // previous code
3663 unsigned final; // last character written for previous code
3664 unsigned stack; // next position for reversed string
3665 unsigned outcnt; // bytes in output buffer
3666 // memory for unlzw() -- the first 256 entries of prefix[] and suffix[] are
3667 // never used, so could have offset the index but it's faster to waste a
3668 // little memory
3669 prefix_t prefix[65536]; // index to LZW prefix string
3670 unsigned char suffix[65536]; // one-character LZW suffix
3671 unsigned char match[65280 + 2]; // buffer for reversed match
3672
3673 // process remainder of compress header -- a flags byte
3674 g.out_tot = 0;
3675 if (NOMORE())
3676 throw(EDOM, "%s: lzw premature end", g.inf);
3677 flags = NEXT();
3678 if (flags & 0x60)
3679 throw(EDOM, "%s: unknown lzw flags set", g.inf);
3680 max = flags & 0x1f;
3681 if (max < 9 || max > 16)
3682 throw(EDOM, "%s: lzw bits out of range", g.inf);
3683 if (max == 9) // 9 doesn't really mean 9
3684 max = 10;
3685 flags &= 0x80; // true if block compress
3686
3687 // mark the start of the compressed data for computing the first flush
3688 mark = g.in_tot - g.in_left;
3689
3690 // clear table, start at nine bits per symbol
3691 bits = 9;
3692 mask = 0x1ff;
3693 end = flags ? 256 : 255;
3694
3695 // set up: get first 9-bit code, which is the first decompressed byte, but
3696 // don't create a table entry until the next code
3697 if (NOMORE()) // no compressed data is ok
3698 return;
3699 buf = NEXT();
3700 if (NOMORE())
3701 throw(EDOM, "%s: lzw premature end", g.inf); // need nine bits
3702 buf += NEXT() << 8;
3703 final = prev = buf & mask; // code
3704 buf >>= bits;
3705 left = 16 - bits;
3706 if (prev > 255)
3707 throw(EDOM, "%s: invalid lzw code", g.inf);
3708 out_buf[0] = (unsigned char)final; // write first decompressed byte
3709 outcnt = 1;
3710
3711 // decode codes
3712 stack = 0;
3713 for (;;) {
3714 // if the table will be full after this, increment the code size
3715 if (end >= mask && bits < max) {
3716 // flush unused input bits and bytes to next 8*bits bit boundary
3717 // (this is a vestigial aspect of the compressed data format
3718 // derived from an implementation that made use of a special VAX
3719 // machine instruction!)
3720 {
3721 unsigned rem = ((g.in_tot - g.in_left) - mark) % bits;
3722 if (rem) {
3723 rem = bits - rem;
3724 if (NOMORE())
3725 break; // end of compressed data
3726 while (rem > g.in_left) {
3727 rem -= g.in_left;
3728 if (load() == 0)
3729 throw(EDOM, "%s: lzw premature end", g.inf);
3730 }
3731 g.in_left -= rem;
3732 g.in_next += rem;
3733 }
3734 }
3735 buf = 0;
3736 left = 0;
3737
3738 // mark this new location for computing the next flush
3739 mark = g.in_tot - g.in_left;
3740
3741 // go to the next number of bits per symbol
3742 bits++;
3743 mask <<= 1;
3744 mask++;
3745 }
3746
3747 // get a code of bits bits
3748 if (NOMORE())
3749 break; // end of compressed data
3750 buf += (bits_t)(NEXT()) << left;
3751 left += 8;
3752 if (left < bits) {
3753 if (NOMORE())
3754 throw(EDOM, "%s: lzw premature end", g.inf);
3755 buf += (bits_t)(NEXT()) << left;
3756 left += 8;
3757 }
3758 code = buf & mask;
3759 buf >>= bits;
3760 left -= bits;
3761
3762 // process clear code (256)
3763 if (code == 256 && flags) {
3764 // flush unused input bits and bytes to next 8*bits bit boundary
3765 {
3766 unsigned rem = ((g.in_tot - g.in_left) - mark) % bits;
3767 if (rem) {
3768 rem = bits - rem;
3769 while (rem > g.in_left) {
3770 rem -= g.in_left;
3771 if (load() == 0)
3772 throw(EDOM, "%s: lzw premature end", g.inf);
3773 }
3774 g.in_left -= rem;
3775 g.in_next += rem;
3776 }
3777 }
3778 buf = 0;
3779 left = 0;
3780
3781 // mark this new location for computing the next flush
3782 mark = g.in_tot - g.in_left;
3783
3784 // go back to nine bits per symbol
3785 bits = 9; // initialize bits and mask
3786 mask = 0x1ff;
3787 end = 255; // empty table
3788 continue; // get next code
3789 }
3790
3791 // special code to reuse last match
3792 {
3793 unsigned temp = code; // save the current code
3794 if (code > end) {
3795 // be picky on the allowed code here, and make sure that the
3796 // code we drop through (prev) will be a valid index so that
3797 // random input does not cause an exception
3798 if (code != end + 1 || prev > end)
3799 throw(EDOM, "%s: invalid lzw code", g.inf);
3800 match[stack++] = (unsigned char)final;
3801 code = prev;
3802 }
3803
3804 // walk through linked list to generate output in reverse order
3805 while (code >= 256) {
3806 match[stack++] = suffix[code];
3807 code = prefix[code];
3808 }
3809 match[stack++] = (unsigned char)code;
3810 final = code;
3811
3812 // link new table entry
3813 if (end < mask) {
3814 end++;
3815 prefix[end] = (prefix_t)prev;
3816 suffix[end] = (unsigned char)final;
3817 }
3818
3819 // set previous code for next iteration
3820 prev = temp;
3821 }
3822
3823 // write output in forward order
3824 while (stack > OUTSIZE - outcnt) {
3825 while (outcnt < OUTSIZE)
3826 out_buf[outcnt++] = match[--stack];
3827 g.out_tot += outcnt;
3828 if (g.decode == 1)
3829 writen(g.outd, out_buf, outcnt);
3830 outcnt = 0;
3831 }
3832 do {
3833 out_buf[outcnt++] = match[--stack];
3834 } while (stack);
3835 }
3836
3837 // write any remaining buffered output
3838 g.out_tot += outcnt;
3839 if (outcnt && g.decode == 1)
3840 writen(g.outd, out_buf, outcnt);
3841 }
3842
3843 // --- file processing ---
3844
3845 // Extract file name from path.
3846 local char *justname(char *path) {
3847 char *p;
3848
3849 p = strrchr(path, '/');
3850 return p == NULL ? path : p + 1;
3851 }
3852
3853 // Copy file attributes, from -> to, as best we can. This is best effort, so no
3854 // errors are reported. The mode bits, including suid, sgid, and the sticky bit
3855 // are copied (if allowed), the owner's user id and group id are copied (again
3856 // if allowed), and the access and modify times are copied.
3857 local int copymeta(char *from, char *to) {
3858 struct stat st;
3859 struct timeval times[2];
3860
3861 // get all of from's Unix meta data, return if not a regular file
3862 if (stat(from, &st) != 0 || (st.st_mode & S_IFMT) != S_IFREG)
3863 return -4;
3864
3865 // set to's mode bits, ignore errors
3866 int ret = chmod(to, st.st_mode & 07777);
3867
3868 // copy owner's user and group, ignore errors
3869 ret += chown(to, st.st_uid, st.st_gid);
3870
3871 // copy access and modify times, ignore errors
3872 times[0].tv_sec = st.st_atime;
3873 times[0].tv_usec = 0;
3874 times[1].tv_sec = st.st_mtime;
3875 times[1].tv_usec = 0;
3876 ret += utimes(to, times);
3877 return ret;
3878 }
3879
3880 // Set the access and modify times of fd to t.
3881 local void touch(char *path, time_t t) {
3882 struct timeval times[2];
3883
3884 times[0].tv_sec = t;
3885 times[0].tv_usec = 0;
3886 times[1].tv_sec = t;
3887 times[1].tv_usec = 0;
3888 (void)utimes(path, times);
3889 }
3890
3891 // Request that all data buffered by the operating system for g.outd be written
3892 // to the permanent storage device. If fsync(fd) is used (POSIX), then all of
3893 // the data is sent to the device, but will likely be buffered in volatile
3894 // memory on the device itself, leaving open a window of vulnerability.
3895 // fcntl(fd, F_FULLSYNC) on the other hand, available in macOS only, will
3896 // request and wait for the device to write out its buffered data to permanent
3897 // storage. On Windows, _commit() is used.
3898 local void out_push(void) {
3899 if (g.outd == -1)
3900 return;
3901 #if defined(F_FULLSYNC)
3902 int ret = fcntl(g.outd, F_FULLSYNC);
3903 #elif defined(_WIN32)
3904 int ret = _commit(g.outd);
3905 #else
3906 int ret = fsync(g.outd);
3907 #endif
3908 if (ret == -1)
3909 throw(errno, "sync error on %s (%s)", g.outf, strerror(errno));
3910 }
3911
3912 // Process provided input file, or stdin if path is NULL. process() can call
3913 // itself for recursive directory processing.
3914 local void process(char *path) {
3915 volatile int method = -1; // get_header() return value
3916 size_t len; // length of base name (minus suffix)
3917 struct stat st; // to get file type and mod time
3918 ball_t err; // error information from throw()
3919 // all compressed suffixes for decoding search, in length order
3920 static char *sufs[] = {".z", "-z", "_z", ".Z", ".gz", "-gz", ".zz", "-zz",
3921 ".zip", ".ZIP", ".tgz", NULL};
3922
3923 // open input file with name in, descriptor ind -- set name and mtime
3924 if (path == NULL) {
3925 vstrcpy(&g.inf, &g.inz, 0, "<stdin>");
3926 g.ind = 0;
3927 g.name = NULL;
3928 g.mtime = (g.headis & 2) && fstat(g.ind, &st) == 0 &&
3929 S_ISREG(st.st_mode) ? st.st_mtime : 0;
3930 len = 0;
3931 }
3932 else {
3933 // set input file name (already set if recursed here)
3934 if (path != g.inf)
3935 vstrcpy(&g.inf, &g.inz, 0, path);
3936 len = strlen(g.inf);
3937
3938 // try to stat input file -- if not there and decoding, look for that
3939 // name with compressed suffixes
3940 if (lstat(g.inf, &st)) {
3941 if (errno == ENOENT && (g.list || g.decode)) {
3942 char **sufx = sufs;
3943 do {
3944 if (*sufx == NULL)
3945 break;
3946 vstrcpy(&g.inf, &g.inz, len, *sufx++);
3947 errno = 0;
3948 } while (lstat(g.inf, &st) && errno == ENOENT);
3949 }
3950 #if defined(EOVERFLOW) && defined(EFBIG)
3951 if (errno == EOVERFLOW || errno == EFBIG)
3952 throw(EDOM, "%s too large -- "
3953 "not compiled with large file support", g.inf);
3954 #endif
3955 if (errno) {
3956 g.inf[len] = 0;
3957 complain("skipping: %s does not exist", g.inf);
3958 return;
3959 }
3960 len = strlen(g.inf);
3961 }
3962
3963 // only process regular files or named pipes, but allow symbolic links
3964 // if -f, recurse into directory if -r
3965 if ((st.st_mode & S_IFMT) != S_IFREG &&
3966 (st.st_mode & S_IFMT) != S_IFIFO &&
3967 (st.st_mode & S_IFMT) != S_IFLNK &&
3968 (st.st_mode & S_IFMT) != S_IFDIR) {
3969 complain("skipping: %s is a special file or device", g.inf);
3970 return;
3971 }
3972 if ((st.st_mode & S_IFMT) == S_IFLNK && !g.force && !g.pipeout) {
3973 complain("skipping: %s is a symbolic link", g.inf);
3974 return;
3975 }
3976 if ((st.st_mode & S_IFMT) == S_IFDIR && !g.recurse) {
3977 complain("skipping: %s is a directory", g.inf);
3978 return;
3979 }
3980
3981 // recurse into directory (assumes Unix)
3982 if ((st.st_mode & S_IFMT) == S_IFDIR) {
3983 char *roll = NULL;
3984 size_t size = 0, off = 0, base;
3985 DIR *here;
3986 struct dirent *next;
3987
3988 // accumulate list of entries (need to do this, since readdir()
3989 // behavior not defined if directory modified between calls)
3990 here = opendir(g.inf);
3991 if (here == NULL)
3992 return;
3993 while ((next = readdir(here)) != NULL) {
3994 if (next->d_name[0] == 0 ||
3995 (next->d_name[0] == '.' && (next->d_name[1] == 0 ||
3996 (next->d_name[1] == '.' && next->d_name[2] == 0))))
3997 continue;
3998 off = vstrcpy(&roll, &size, off, next->d_name);
3999 }
4000 closedir(here);
4001 vstrcpy(&roll, &size, off, "");
4002
4003 // run process() for each entry in the directory
4004 base = len && g.inf[len - 1] != (unsigned char)'/' ?
4005 vstrcpy(&g.inf, &g.inz, len, "/") - 1 : len;
4006 for (off = 0; roll[off]; off += strlen(roll + off) + 1) {
4007 vstrcpy(&g.inf, &g.inz, base, roll + off);
4008 process(g.inf);
4009 }
4010 g.inf[len] = 0;
4011
4012 // release list of entries
4013 FREE(roll);
4014 return;
4015 }
4016
4017 // don't compress .gz (or provided suffix) files, unless -f
4018 if (!(g.force || g.list || g.decode) && len >= strlen(g.sufx) &&
4019 strcmp(g.inf + len - strlen(g.sufx), g.sufx) == 0) {
4020 grumble("skipping: %s ends with %s", g.inf, g.sufx);
4021 return;
4022 }
4023
4024 // create output file only if input file has compressed suffix
4025 if (g.decode == 1 && !g.pipeout && !g.list) {
4026 size_t suf = compressed_suffix(g.inf);
4027 if (suf == 0) {
4028 complain("skipping: %s does not have compressed suffix",
4029 g.inf);
4030 return;
4031 }
4032 len -= suf;
4033 }
4034
4035 // open input file
4036 g.ind = open(g.inf, O_RDONLY, 0);
4037 if (g.ind < 0)
4038 throw(errno, "read error on %s (%s)", g.inf, strerror(errno));
4039
4040 // prepare gzip header information for compression
4041 g.name = g.headis & 1 ? justname(g.inf) : NULL;
4042 g.mtime = g.headis & 2 ? st.st_mtime : 0;
4043 }
4044 SET_BINARY_MODE(g.ind);
4045
4046 // if requested, just list information about the input file
4047 if (g.list && g.decode != 2) {
4048 list_info();
4049 load_end();
4050 return;
4051 }
4052
4053 // if decoding or testing, try to read gzip header
4054 if (g.decode) {
4055 in_init();
4056 method = get_header(1);
4057 if (method != 8 && method != 257 &&
4058 // gzip -cdf acts like cat on uncompressed input
4059 !((method == -1 || method == -2) && g.force && g.pipeout &&
4060 g.decode != 2 && !g.list)) {
4061 load_end();
4062 complain(method == -6 ? "skipping: %s corrupt: header crc error" :
4063 method == -1 ? "skipping: %s empty" :
4064 method < 0 ? "skipping: %s unrecognized format" :
4065 "skipping: %s unknown compression method", g.inf);
4066 return;
4067 }
4068
4069 // if requested, test input file (possibly a test list)
4070 if (g.decode == 2) {
4071 try {
4072 if (method == 8)
4073 infchk();
4074 else {
4075 unlzw();
4076 if (g.list) {
4077 g.in_tot -= 3;
4078 show_info(method, 0, g.out_tot, 0);
4079 }
4080 }
4081 }
4082 catch (err) {
4083 if (err.code != EDOM)
4084 punt(err);
4085 complain("skipping: %s", err.why);
4086 drop(err);
4087 outb(NULL, NULL, 0);
4088 }
4089 load_end();
4090 return;
4091 }
4092 }
4093
4094 // create output file out, descriptor outd
4095 if (path == NULL || g.pipeout) {
4096 // write to stdout
4097 g.outf = alloc(NULL, strlen("<stdout>") + 1);
4098 strcpy(g.outf, "<stdout>");
4099 g.outd = 1;
4100 if (!g.decode && !g.force && isatty(g.outd))
4101 throw(EINVAL, "trying to write compressed data to a terminal"
4102 " (use -f to force)");
4103 }
4104 else {
4105 char *to = g.inf, *sufx = "";
4106 size_t pre = 0;
4107
4108 // select parts of the output file name
4109 if (g.decode) {
4110 // for -dN or -dNT, use the path from the input file and the name
4111 // from the header, stripping any path in the header name
4112 if ((g.headis & 1) != 0 && g.hname != NULL) {
4113 pre = (size_t)(justname(g.inf) - g.inf);
4114 to = justname(g.hname);
4115 len = strlen(to);
4116 }
4117 // for -d or -dNn, replace abbreviated suffixes
4118 else if (strcmp(to + len, ".tgz") == 0)
4119 sufx = ".tar";
4120 }
4121 else
4122 // add appropriate suffix when compressing
4123 sufx = g.sufx;
4124
4125 // create output file and open to write, overwriting any existing file
4126 // of the same name only if requested with --force or -f
4127 g.outf = alloc(NULL, pre + len + strlen(sufx) + 1);
4128 memcpy(g.outf, g.inf, pre);
4129 memcpy(g.outf + pre, to, len);
4130 strcpy(g.outf + pre + len, sufx);
4131 g.outd = open(g.outf, O_CREAT | O_TRUNC | O_WRONLY |
4132 (g.force ? 0 : O_EXCL), 0600);
4133
4134 // if it exists and wasn't forced, give the user a chance to overwrite
4135 if (g.outd < 0 && errno == EEXIST) {
4136 int overwrite = 0;
4137 if (isatty(0) && g.verbosity) {
4138 // get a response from the user -- the first non-blank
4139 // character has to be a "y" or a "Y" to permit an overwrite
4140 fprintf(stderr, "%s exists -- overwrite (y/n)? ", g.outf);
4141 fflush(stderr);
4142 int ch, first = 1;
4143 do {
4144 ch = getchar();
4145 if (first == 1) {
4146 if (ch == ' ' || ch == '\t')
4147 continue;
4148 if (ch == 'y' || ch == 'Y')
4149 overwrite = 1;
4150 first = 0;
4151 }
4152 } while (ch != EOF && ch != '\n' && ch != '\r');
4153 }
4154 if (!overwrite) {
4155 complain("skipping: %s exists", g.outf);
4156 RELEASE(g.outf);
4157 load_end();
4158 return;
4159 }
4160 g.outd = open(g.outf, O_CREAT | O_TRUNC | O_WRONLY, 0600);
4161 }
4162
4163 // if some other error, give up
4164 if (g.outd < 0)
4165 throw(errno, "write error on %s (%s)", g.outf, strerror(errno));
4166 }
4167 SET_BINARY_MODE(g.outd);
4168
4169 // process ind to outd
4170 if (g.verbosity > 1)
4171 fprintf(stderr, "%s to %s ", g.inf, g.outf);
4172 if (g.decode) {
4173 try {
4174 if (method == 8)
4175 infchk();
4176 else if (method == 257)
4177 unlzw();
4178 else
4179 cat();
4180 }
4181 catch (err) {
4182 if (err.code != EDOM)
4183 punt(err);
4184 complain("skipping: %s", err.why);
4185 drop(err);
4186 outb(NULL, NULL, 0);
4187 if (g.outd != -1 && g.outd != 1) {
4188 close(g.outd);
4189 g.outd = -1;
4190 unlink(g.outf);
4191 RELEASE(g.outf);
4192 }
4193 }
4194 }
4195 #ifndef NOTHREAD
4196 else if (g.procs > 1)
4197 parallel_compress();
4198 #endif
4199 else
4200 single_compress(0);
4201 if (g.verbosity > 1) {
4202 putc('\n', stderr);
4203 fflush(stderr);
4204 }
4205
4206 // finish up, copy attributes, set times, delete original
4207 load_end();
4208 if (g.outd != -1 && g.outd != 1) {
4209 if (g.sync)
4210 out_push(); // push to permanent storage
4211 if (close(g.outd))
4212 throw(errno, "write error on %s (%s)", g.outf, strerror(errno));
4213 g.outd = -1; // now prevent deletion on interrupt
4214 if (g.ind != 0) {
4215 copymeta(g.inf, g.outf);
4216 if (!g.keep) {
4217 if (st.st_nlink > 1 && !g.force)
4218 complain("%s has hard links -- not unlinking", g.inf);
4219 else
4220 unlink(g.inf);
4221 }
4222 }
4223 if (g.decode && (g.headis & 2) != 0 && g.stamp)
4224 touch(g.outf, g.stamp);
4225 }
4226 RELEASE(g.outf);
4227 }
4228
4229 local char *helptext[] = {
4230 "Usage: pigz [options] [files ...]",
4231 " will compress files in place, adding the suffix '.gz'. If no files are",
4232 #ifdef NOTHREAD
4233 " specified, stdin will be compressed to stdout. pigz does what gzip does.",
4234 #else
4235 " specified, stdin will be compressed to stdout. pigz does what gzip does,",
4236 " but spreads the work over multiple processors and cores when compressing.",
4237 #endif
4238 "",
4239 "Options:",
4240 #ifdef NOZOPFLI
4241 " -0 to -9 Compression level",
4242 #else
4243 " -0 to -9, -11 Compression level (level 11, zopfli, is much slower)",
4244 #endif
4245 " --fast, --best Compression levels 1 and 9 respectively",
4246 " -A, --alias xxx Use xxx as the name for any --zip entry from stdin",
4247 " -b, --blocksize mmm Set compression block size to mmmK (default 128K)",
4248 " -c, --stdout Write all processed output to stdout (won't delete)",
4249 " -C, --comment ccc Put comment ccc in the gzip or zip header",
4250 " -d, --decompress Decompress the compressed input",
4251 " -f, --force Force overwrite, compress .gz, links, and to terminal",
4252 #ifndef NOZOPFLI
4253 " -F --first Do iterations first, before block split for -11",
4254 #endif
4255 " -h, --help Display a help screen and quit",
4256 " -H, --huffman Use only Huffman coding for compression",
4257 " -i, --independent Compress blocks independently for damage recovery",
4258 #ifndef NOZOPFLI
4259 " -I, --iterations n Number of iterations for -11 optimization",
4260 " -J, --maxsplits n Maximum number of split blocks for -11",
4261 #endif
4262 " -k, --keep Do not delete original file after processing",
4263 " -K, --zip Compress to PKWare zip (.zip) single entry format",
4264 " -l, --list List the contents of the compressed input",
4265 " -L, --license Display the pigz license and quit",
4266 " -m, --no-time Do not store or restore mod time",
4267 " -M, --time Store or restore mod time",
4268 " -n, --no-name Do not store or restore file name or mod time",
4269 " -N, --name Store or restore file name and mod time",
4270 #ifndef NOZOPFLI
4271 " -O --oneblock Do not split into smaller blocks for -11",
4272 #endif
4273 #ifndef NOTHREAD
4274 " -p, --processes n Allow up to n compression threads (default is the",
4275 " number of online processors, or 8 if unknown)",
4276 #endif
4277 " -q, --quiet Print no messages, even on error",
4278 " -r, --recursive Process the contents of all subdirectories",
4279 " -R, --rsyncable Input-determined block locations for rsync",
4280 " -S, --suffix .sss Use suffix .sss instead of .gz (for compression)",
4281 " -t, --test Test the integrity of the compressed input",
4282 " -U, --rle Use run-length encoding for compression",
4283 #ifdef PIGZ_DEBUG
4284 " -v, --verbose Provide more verbose output (-vv to debug)",
4285 #else
4286 " -v, --verbose Provide more verbose output",
4287 #endif
4288 " -V --version Show the version of pigz",
4289 " -Y --synchronous Force output file write to permanent storage",
4290 " -z, --zlib Compress to zlib (.zz) instead of gzip format",
4291 " -- All arguments after \"--\" are treated as files"
4292 };
4293
4294 // Display the help text above.
4295 local void help(void) {
4296 int n;
4297
4298 if (g.verbosity == 0)
4299 return;
4300 for (n = 0; n < (int)(sizeof(helptext) / sizeof(char *)); n++)
4301 fprintf(stderr, "%s\n", helptext[n]);
4302 fflush(stderr);
4303 exit(0);
4304 }
4305
4306 #ifndef NOTHREAD
4307
4308 // Try to determine the number of processors.
4309 local int nprocs(int n) {
4310 # ifdef _SC_NPROCESSORS_ONLN
4311 n = (int)sysconf(_SC_NPROCESSORS_ONLN);
4312 # else
4313 # ifdef _SC_NPROC_ONLN
4314 n = (int)sysconf(_SC_NPROC_ONLN);
4315 # else
4316 # ifdef __hpux
4317 struct pst_dynamic psd;
4318
4319 if (pstat_getdynamic(&psd, sizeof(psd), (size_t)1, 0) != -1)
4320 n = psd.psd_proc_cnt;
4321 # endif
4322 # endif
4323 # endif
4324 return n;
4325 }
4326
4327 #endif
4328
4329 // Set option defaults.
4330 local void defaults(void) {
4331 g.level = Z_DEFAULT_COMPRESSION;
4332 g.strategy = Z_DEFAULT_STRATEGY;
4333 #ifndef NOZOPFLI
4334 // default zopfli options as set by ZopfliInitOptions():
4335 // verbose = 0
4336 // numiterations = 15
4337 // blocksplitting = 1
4338 // blocksplittinglast = 0
4339 // blocksplittingmax = 15
4340 ZopfliInitOptions(&g.zopts);
4341 #endif
4342 g.block = 131072UL; // 128K
4343 g.shift = x2nmodp(g.block, 3);
4344 #ifdef NOTHREAD
4345 g.procs = 1;
4346 #else
4347 g.procs = nprocs(8);
4348 #endif
4349 g.rsync = 0; // don't do rsync blocking
4350 g.setdict = 1; // initialize dictionary each thread
4351 g.verbosity = 1; // normal message level
4352 g.headis = 3; // store name and time (low bits == 11),
4353 // restore neither (next bits == 00),
4354 // where 01 is name and 10 is time
4355 g.pipeout = 0; // don't force output to stdout
4356 g.sufx = ".gz"; // compressed file suffix
4357 g.comment = NULL; // no comment
4358 g.decode = 0; // compress
4359 g.list = 0; // compress
4360 g.keep = 0; // delete input file once compressed
4361 g.force = 0; // don't overwrite, don't compress links
4362 g.sync = 0; // don't force a flush on output
4363 g.recurse = 0; // don't go into directories
4364 g.form = 0; // use gzip format
4365 }
4366
4367 // Long options conversion to short options.
4368 local char *longopts[][2] = {
4369 {"LZW", "Z"}, {"lzw", "Z"}, {"alias", "A"}, {"ascii", "a"}, {"best", "9"},
4370 {"bits", "Z"}, {"blocksize", "b"}, {"decompress", "d"}, {"fast", "1"},
4371 {"force", "f"}, {"comment", "C"},
4372 #ifndef NOZOPFLI
4373 {"first", "F"}, {"iterations", "I"}, {"maxsplits", "J"}, {"oneblock", "O"},
4374 #endif
4375 {"help", "h"}, {"independent", "i"}, {"keep", "k"}, {"license", "L"},
4376 {"list", "l"}, {"name", "N"}, {"no-name", "n"}, {"no-time", "m"},
4377 {"processes", "p"}, {"quiet", "q"}, {"recursive", "r"}, {"rsyncable", "R"},
4378 {"silent", "q"}, {"stdout", "c"}, {"suffix", "S"}, {"synchronous", "Y"},
4379 {"test", "t"}, {"time", "M"}, {"to-stdout", "c"}, {"uncompress", "d"},
4380 {"verbose", "v"}, {"version", "V"}, {"zip", "K"}, {"zlib", "z"},
4381 {"huffman", "H"}, {"rle", "U"}};
4382 #define NLOPTS (sizeof(longopts) / (sizeof(char *) << 1))
4383
4384 // Either new buffer size, new compression level, or new number of processes.
4385 // Get rid of old buffers and threads to force the creation of new ones with
4386 // the new settings.
4387 local void new_opts(void) {
4388 single_compress(1);
4389 #ifndef NOTHREAD
4390 finish_jobs();
4391 #endif
4392 }
4393
4394 // Verify that arg is only digits, and if so, return the decimal value.
4395 local size_t num(char *arg) {
4396 char *str = arg;
4397 size_t val = 0;
4398
4399 if (*str == 0)
4400 throw(EINVAL, "internal error: empty parameter");
4401 do {
4402 if (*str < '0' || *str > '9' ||
4403 (val && ((~(size_t)0) - (size_t)(*str - '0')) / val < 10))
4404 throw(EINVAL, "invalid numeric parameter: %s", arg);
4405 val = val * 10 + (size_t)(*str - '0');
4406 } while (*++str);
4407 return val;
4408 }
4409
4410 // Process an argument, return true if it is an option (not a filename)
4411 local int option(char *arg) {
4412 static int get = 0; // if not zero, look for option parameter
4413 char bad[3] = "-X"; // for error messages (X is replaced)
4414
4415 // if no argument or dash option, check status of get
4416 if (get && (arg == NULL || *arg == '-')) {
4417 bad[1] = "bpSIJAC"[get - 1];
4418 throw(EINVAL, "missing parameter after %s", bad);
4419 }
4420 if (arg == NULL)
4421 return 1;
4422
4423 // process long option or short options
4424 if (*arg == '-') {
4425 // a single dash will be interpreted as stdin
4426 if (*++arg == 0)
4427 return 0;
4428
4429 // process long option (fall through with equivalent short option)
4430 if (*arg == '-') {
4431 int j;
4432
4433 arg++;
4434 for (j = NLOPTS - 1; j >= 0; j--)
4435 if (strcmp(arg, longopts[j][0]) == 0) {
4436 arg = longopts[j][1];
4437 break;
4438 }
4439 if (j < 0)
4440 throw(EINVAL, "invalid option: %s", arg - 2);
4441 }
4442
4443 // process short options (more than one allowed after dash)
4444 do {
4445 // if looking for a parameter, don't process more single character
4446 // options until we have the parameter
4447 if (get) {
4448 if (get == 3)
4449 throw(EINVAL,
4450 "invalid usage: -S must be followed by space");
4451 if (get == 7)
4452 throw(EINVAL,
4453 "invalid usage: -C must be followed by space");
4454 break; // allow -*nnn to fall to parameter code
4455 }
4456
4457 // process next single character option or compression level
4458 bad[1] = *arg;
4459 switch (*arg) {
4460 case '0': case '1': case '2': case '3': case '4':
4461 case '5': case '6': case '7': case '8': case '9':
4462 g.level = *arg - '0';
4463 while (arg[1] >= '0' && arg[1] <= '9') {
4464 if (g.level && (INT_MAX - (arg[1] - '0')) / g.level < 10)
4465 throw(EINVAL, "only levels 0..9 and 11 are allowed");
4466 g.level = g.level * 10 + *++arg - '0';
4467 }
4468 if (g.level == 10 || g.level > 11)
4469 throw(EINVAL, "only levels 0..9 and 11 are allowed");
4470 break;
4471 case 'A': get = 6; break;
4472 case 'C': get = 7; break;
4473 #ifndef NOZOPFLI
4474 case 'F': g.zopts.blocksplittinglast = 1; break;
4475 #endif
4476 case 'H': g.strategy = Z_HUFFMAN_ONLY; break;
4477 #ifndef NOZOPFLI
4478 case 'I': get = 4; break;
4479 case 'J': get = 5; break;
4480 #endif
4481 case 'K': g.form = 2; g.sufx = ".zip"; break;
4482 case 'L':
4483 puts(VERSION);
4484 puts("Copyright (C) 2007-2023 Mark Adler");
4485 puts("Subject to the terms of the zlib license.");
4486 puts("No warranty is provided or implied.");
4487 exit(0);
4488 break; // avoid warning
4489 case 'M': g.headis |= 0xa; break;
4490 case 'N': g.headis = 0xf; break;
4491 #ifndef NOZOPFLI
4492 case 'O': g.zopts.blocksplitting = 0; break;
4493 #endif
4494 case 'R': g.rsync = 1; break;
4495 case 'S': get = 3; break;
4496 // -T defined below as an alternative for -m
4497 case 'V':
4498 puts(VERSION);
4499 if (g.verbosity > 1)
4500 printf("zlib %s\n", zlibVersion());
4501 exit(0);
4502 break; // avoid warning
4503 case 'Y': g.sync = 1; break;
4504 case 'Z':
4505 throw(EINVAL, "invalid option: LZW output not supported: %s",
4506 bad);
4507 break; // avoid warning
4508 case 'a':
4509 throw(EINVAL, "invalid option: no ascii conversion: %s",
4510 bad);
4511 break; // avoid warning
4512 case 'b': get = 1; break;
4513 case 'c': g.pipeout = 1; break;
4514 case 'd': if (!g.decode) g.headis >>= 2; g.decode = 1; break;
4515 case 'f': g.force = 1; break;
4516 case 'h': help(); break;
4517 case 'i': g.setdict = 0; break;
4518 case 'k': g.keep = 1; break;
4519 case 'l': g.list = 1; break;
4520 case 'n': g.headis = 0; break;
4521 case 'T':
4522 case 'm': g.headis &= ~0xa; break;
4523 case 'p': get = 2; break;
4524 case 'q': g.verbosity = 0; break;
4525 case 'r': g.recurse = 1; break;
4526 case 't': g.decode = 2; break;
4527 case 'U': g.strategy = Z_RLE; break;
4528 case 'v': g.verbosity++; break;
4529 case 'z': g.form = 1; g.sufx = ".zz"; break;
4530 default:
4531 throw(EINVAL, "invalid option: %s", bad);
4532 }
4533 } while (*++arg);
4534 if (*arg == 0)
4535 return 1;
4536 }
4537
4538 // process option parameter for -b, -p, -A, -S, -I, or -J
4539 if (get) {
4540 size_t n;
4541
4542 if (get == 1) {
4543 n = num(arg);
4544 g.block = n << 10; // chunk size
4545 #ifndef NOTHREAD
4546 g.shift = x2nmodp(g.block, 3);
4547 #endif
4548 if (g.block < DICT)
4549 throw(EINVAL, "block size too small (must be >= 32K)");
4550 if (n != g.block >> 10 ||
4551 OUTPOOL(g.block) < g.block ||
4552 (ssize_t)OUTPOOL(g.block) < 0 ||
4553 g.block > (1UL << 29)) // limited by append_len()
4554 throw(EINVAL, "block size too large: %s", arg);
4555 }
4556 else if (get == 2) {
4557 n = num(arg);
4558 g.procs = (int)n; // # processes
4559 if (g.procs < 1)
4560 throw(EINVAL, "invalid number of processes: %s", arg);
4561 if ((size_t)g.procs != n || INBUFS(g.procs) < 1)
4562 throw(EINVAL, "too many processes: %s", arg);
4563 #ifdef NOTHREAD
4564 if (g.procs > 1)
4565 throw(EINVAL, "compiled without threads");
4566 #endif
4567 }
4568 else if (get == 3) {
4569 if (*arg == 0)
4570 throw(EINVAL, "suffix cannot be empty");
4571 g.sufx = arg; // gz suffix
4572 }
4573 #ifndef NOZOPFLI
4574 else if (get == 4)
4575 g.zopts.numiterations = (int)num(arg); // optimize iterations
4576 else if (get == 5)
4577 g.zopts.blocksplittingmax = (int)num(arg); // max block splits
4578 else if (get == 6)
4579 g.alias = arg; // zip name for stdin
4580 #endif
4581 else if (get == 7)
4582 g.comment = arg; // header comment
4583 get = 0;
4584 return 1;
4585 }
4586
4587 // neither an option nor parameter
4588 return 0;
4589 }
4590
4591 #ifndef NOTHREAD
4592 // handle error received from yarn function
4593 local void cut_yarn(int err) {
4594 throw(err, "internal threads error");
4595 }
4596 #endif
4597
4598 // Process command line arguments.
4599 int main(int argc, char **argv) {
4600 int n; // general index
4601 int nop; // index before which "-" means stdin
4602 int done; // number of named files processed
4603 size_t k; // program name length
4604 char *opts, *p; // environment default options, marker
4605 ball_t err; // error information from throw()
4606
4607 g.ret = 0;
4608 try {
4609 // initialize globals
4610 g.inf = NULL;
4611 g.inz = 0;
4612 #ifndef NOTHREAD
4613 g.in_which = -1;
4614 #endif
4615 g.alias = "-";
4616 g.outf = NULL;
4617 g.first = 1;
4618 g.hname = NULL;
4619 g.hcomm = NULL;
4620
4621 // save pointer to program name for error messages
4622 p = strrchr(argv[0], '/');
4623 p = p == NULL ? argv[0] : p + 1;
4624 g.prog = *p ? p : "pigz";
4625
4626 // prepare for interrupts and logging
4627 signal(SIGINT, cut_short);
4628 #ifndef NOTHREAD
4629 yarn_prefix = g.prog; // prefix for yarn error messages
4630 yarn_abort = cut_yarn; // call on thread error
4631 #endif
4632 #ifdef PIGZ_DEBUG
4633 gettimeofday(&start, NULL); // starting time for log entries
4634 log_init(); // initialize logging
4635 #endif
4636
4637 // set all options to defaults
4638 defaults();
4639
4640 // check zlib version
4641 if (zlib_vernum() < 0x1230)
4642 throw(EINVAL, "zlib version less than 1.2.3");
4643
4644 // create CRC table, in case zlib compiled with dynamic tables
4645 get_crc_table();
4646
4647 // process user environment variable defaults in GZIP
4648 opts = getenv("GZIP");
4649 if (opts != NULL) {
4650 while (*opts) {
4651 while (*opts == ' ' || *opts == '\t')
4652 opts++;
4653 p = opts;
4654 while (*p && *p != ' ' && *p != '\t')
4655 p++;
4656 n = *p;
4657 *p = 0;
4658 if (!option(opts))
4659 throw(EINVAL, "cannot provide files in "
4660 "GZIP environment variable");
4661 opts = p + (n ? 1 : 0);
4662 }
4663 option(NULL); // check for missing parameter
4664 }
4665
4666 // process user environment variable defaults in PIGZ as well
4667 opts = getenv("PIGZ");
4668 if (opts != NULL) {
4669 while (*opts) {
4670 while (*opts == ' ' || *opts == '\t')
4671 opts++;
4672 p = opts;
4673 while (*p && *p != ' ' && *p != '\t')
4674 p++;
4675 n = *p;
4676 *p = 0;
4677 if (!option(opts))
4678 throw(EINVAL, "cannot provide files in "
4679 "PIGZ environment variable");
4680 opts = p + (n ? 1 : 0);
4681 }
4682 option(NULL); // check for missing parameter
4683 }
4684
4685 // decompress if named "unpigz" or "gunzip", to stdout if "*cat"
4686 if (strcmp(g.prog, "unpigz") == 0 || strcmp(g.prog, "gunzip") == 0) {
4687 if (!g.decode)
4688 g.headis >>= 2;
4689 g.decode = 1;
4690 }
4691 if ((k = strlen(g.prog)) > 2 && strcmp(g.prog + k - 3, "cat") == 0) {
4692 if (!g.decode)
4693 g.headis >>= 2;
4694 g.decode = 1;
4695 g.pipeout = 1;
4696 }
4697
4698 // if no arguments and compressed data to/from terminal, show help
4699 if (argc < 2 && isatty(g.decode ? 0 : 1))
4700 help();
4701
4702 // process all command-line options first
4703 nop = argc;
4704 for (n = 1; n < argc; n++)
4705 if (strcmp(argv[n], "--") == 0) {
4706 nop = n; // after this, "-" is the name "-"
4707 argv[n] = NULL; // remove option
4708 break; // ignore options after "--"
4709 }
4710 else if (option(argv[n])) // process argument
4711 argv[n] = NULL; // remove if option
4712 option(NULL); // check for missing parameter
4713
4714 // process command-line filenames
4715 done = 0;
4716 for (n = 1; n < argc; n++)
4717 if (argv[n] != NULL) {
4718 if (done == 1 && g.pipeout && !g.decode && !g.list &&
4719 g.form > 1)
4720 complain("warning: output will be concatenated zip files"
4721 " -- %s will not be able to extract", g.prog);
4722 process(n < nop && strcmp(argv[n], "-") == 0 ? NULL : argv[n]);
4723 done++;
4724 }
4725
4726 // list stdin or compress stdin to stdout if no file names provided
4727 if (done == 0)
4728 process(NULL);
4729 }
4730 always {
4731 // release resources
4732 RELEASE(g.inf);
4733 g.inz = 0;
4734 new_opts();
4735 }
4736 catch (err) {
4737 THREADABORT(err);
4738 }
4739
4740 // show log (if any)
4741 log_dump();
4742 return g.ret;
4743 }