w32tex
About: TeX Live provides a comprehensive TeX system including all the major TeX-related programs, macro packages, and fonts that are free software. Windows sources.
  Fossies Dox: w32tex-src.tar.xz  ("unofficial" and yet experimental doxygen-generated source code documentation)  

utillzw.c
Go to the documentation of this file.
1 /* lzw implementation for postscript/pdf filters
2 # Notes on LZW
3 
4 # Encoder
5 
6 Initially the table contains 256 entires for single bytes. Encoder consumes
7 input bytes trying to find the longest sequence stored so far in the table.
8 Once it finds a sequence that is not present in the table, it outputs the table
9 index of the longest sequence found (accumulated bytes except the last
10 consumed) and pushes the new sequence (accumulated bytes including the last
11 one) on the top of the table. The last taken byte is not yet written to the
12 output, it becomes the beginning of the new sequence to accumulate. Initially,
13 encoder outputs 9-bit codes. While the table grows, the number of bits for each
14 code increases up to 12. In example, after adding a table entry of index 511 it
15 is high time to switch to 10-bit bytes. /EarlyChange=true parameter in stream
16 dictionary (both postscript and pdf) informs to increase the number of bits one
17 code earlier then necessary. Looks pretty much like an early days bug that
18 became a specification :) I have never found a PDF having /EarlyChange key
19 specified anyway.
20 
21 Once the table becomes full (or when encoder decides it is worthy),
22 a clear-table marker (code 256) purges the table and restores codes length to
23 9. End-of-data marker (code 257) ends the stream. Conventionally, the beginning
24 of the stream starts with clear-table marker.
25 
26 Postscript allows to provide a /UnitLength which determines the bit length of
27 codes. The above description assumes UnitLength=8 (default). Allowed values are
28 from 3 to 8. Different UnitLength also affects markers; clear-table is then
29 2^UnitLength and end-of-data marker is 2^UnitLenth+1.
30 
31 Encoder outputs 9-12bit codes that are packed into bytes using high-bits-first
32 scheme (default) or low-bits-scheme.
33 
34 PDF spec p. 73 (PS spec p. 135 gives an mistaken output sequence and so
35 mistaken output bytes)
36 
37 Input character sequence (decimal)
38 45 45 45 45 45 65 45 45 45 66
39 
40 Output 9bit codes (decimal)
41 256 45 258 258 65 259 66 257
42 
43 Output 9bit codes (binary)
44 100000000 000101101 100000010 100000010 001000001 100000011 001000010 100000001
45 
46 Output bytes (LowBitsFirst=false); eight high-order bits of code becomes
47 the first byte, remaining low-order bit of code becomes the high-order bit of the
48 next byte;
49 10000000 00001011 01100000 01010000 00100010 00001100 00001100 10000101 00000001
50 -> 80 0B 60 50 22 0C 0C 85 01
51 
52 Output bytes (binary, LowBitsFirst=true); eight low-order bits of code becomes
53 the first byte, remaining high-order bit of code becomes low-order bit of the
54 next byte;
55 00000000 01011011 00001000 00010100 00011000 01100100 10100000 10000000 10010000
56 -> 00 5B 08 14 18 64 A0 80 90
57 
58 # Decoder
59 
60 Decoder consumes input bytes transforming them to 9 to 12 bit codes. Initially
61 it starts with 9bit codes and the table of 258 fixed codes (same as encoder).
62 Basically, it interprets incoming codes as table indices (except 256 and 257
63 markers) and it outputs byte sequences stored at given indices. It also
64 upbuilds the table and changes the number of bits of codes when necessary. The
65 key point on lzw is that both encoder and decoder builds the table
66 synchronously.
67 
68 However, decoder needs some "knowledge" about how encoder works to be able to
69 interpret a table index that it doesn't have so far. Look that the output from
70 encoder in the example above. The first output code is conventional clear-table
71 (256). Then comes a code 45. So far so good, decoder interprets code 45 as
72 a (fixed) entry of the table, emitting byte 45. The next code is 258, which is
73 should be interpreted as an index in the table. Oops, encoder doesn't have one
74 yet. If that occurs, it means that encoder was able to output the new entry
75 code just after adding it to a table. It means that
76 
77  sequence_before + next_byte == next_byte + sequence_after
78 
79 This may happen not only for sequences like 45 45 45, but also symmetric series
80 such as abcbabcba; abcb + a == a + bcba. Decoder must be aware of that and if
81 it gets a code one larger than the top table index, it should create one on-fly
82 by appending last entry sequence by the first by of the last entry.
83 
84 # UnitLength
85 
86 Postscript specification mentions about UnitLength parameter that can be used
87 in LZW decoder (not allowed in encoder), with possible values from 3 to 8. This
88 parameter determines the number of bits per code; form UnitLength + 1 to 12. It
89 also determines which codes are used for clear-table marker (2^UnitLength) and
90 end-of-data marker ((2^UnitLength)+1). Postscript specification says (page 134):
91 
92 "Initially, the code length is (UnitLength + 1) bits and the table contains only
93 entries for the (2^UnitLength + 2) fixed codes. As encoding proceeds, entries are
94 appended to the table, associating new codes with longer and longer input character
95 sequences. The encoding and decoding filters maintain identical copies of
96 this table."
97 
98 Later on page 136 Postscript specification says:
99 
100 "Data that has been LZW-encoded with a UnitLength less than 8 consists only of
101 codes in the range 0 to 2^UnitLength - 1; consequently, the LZWDecode filter produces
102 only codes in that range when read. UnitLength also affects the encoded
103 representation, as described above."
104 
105 UnitLength (Postscript only) and LowBitsFirst are used only by decoder.
106 EarlyChange should obviously be respected by both encoder and decoder. When
107 table index reaches current bit length boundary (511, 1023, ...) it must react
108 by increasing the number of bits of input code. But if the index reaches it
109 maximum value (when the table is full), decoder is NOT supposed to clear the
110 table. When the table is full, encoder must emit clear-table marker and it
111 emits this code using 12 bits and reinitialize code bits after that. It means
112 that, when the table is full, decoder should get one more 12-bit code (which
113 should be clear-table marker) and actually clear the table and reinitialize
114 code bits after that.
115 
116 # Clear-table vs last entry track (after tries and checks)
117 
118 It is also not quite clear what should actually happen when encoder gets a full
119 table and it is supposed to emit clear-table marker. When it gets full, it
120 means that it has just appended another entry to the table. And that happens
121 only the input sequence collected so far plus the last byte is not present in
122 the table. Encoder is supposed to output the table index of the present
123 sequence and set the recent byte as a starting index of the new sequence to be
124 collected. Even if it is time to clear the table, encoder is still supposed to
125 keep the track of the last table entry. Decoder, however, must drop the track of the
126 last code on clear-table.
127 
128 # Decoder table vs encoder table
129 
130 While decoding we need query lzw table by (subsequent) numeric codes and output
131 character sequences stored in the table. While encoding we need to query the
132 table on every input byte and fetch indices pointing to character sequences.
133 Note that we never need to query the entire table for the longest sequence
134 found so far. The encoder table do not need to access the longest character
135 sequence at one piece. It is enough to keep the track of the current table
136 index and the very next byte. We organize an encoder table into a search tree,
137 where every node contains its table index (value) and last byte (key). Except
138 initial tree content, every node is created on the base of the previous node
139 and it conceptually point the sequence represented by that nodo consists of the
140 previous node sequence plus the next byte.
141 
142 Every new node is a descendant of the node it has been derived from. Every node
143 has a map (a search subtree) indexed by suffix byte value, pointing to
144 descendants nodes. Every node also has binary tentackles (left/right fields)
145 necessary to search the map (except initials, every node lives in a map of some
146 ancestor node). The key point is that on every input byte we don't search the
147 entire tree, but only the map of the current node children. The map tree is
148 a simple binary tree with no balancing mechanism (not worthy to optimize an
149 ephemeric structure that may be upbuilt more often then queried).
150 
151 In our implementation, decoder table requires 4069 entries (topmost index 4095).
152 Encoder table, however, needs 4097 entries to handle the case when EarlyIndex
153 parameter is 0 (I have never a chance to test that in practise). The node of index
154 4096 might be added to a search tree, but its code is never emitted; the lookup
155 is purged just after adding that node.
156 
157 todo:
158 - support for LowBitsFirst encoding
159 */
160 
161 #include "utilmem.h"
162 #include "utillzw.h"
163 
164 /* filter state struct */
165 
166 typedef struct lzw_entry {
167  union {
168  const char *rdata; // to be able to init with string literal
169  char *data;
170  };
171  int size;
173 
174 #define lzw_index short
175 
176 typedef struct lzw_node lzw_node;
177 
178 struct lzw_node {
180  unsigned char suffix;
184 };
185 
186 struct lzw_state {
187  union {
188  lzw_node *lookup; /* encoder table */
189  lzw_entry *table; /* decoder table */
190  };
191  lzw_index index; /* table index */
192  union {
193  lzw_node *lastnode; /* previous encoder table node */
194  struct {
195  lzw_entry *lastentry; /* previous decoder table entry */
196  int tailbytes; /* num of bytes of lastentry not yet written out */
197  };
198  };
199  int basebits; /* /UnitLength parameter (8) */
200  int codebits; /* current code bits */
201  int lastbyte; /* previosly read byte */
202  int tailbits; /* lastbyte bits not yet consumed */
203  int flush; /* encoder */
204  int flags; /* options */
205 };
206 
207 typedef union { lzw_state *lzwstate; void *voidstate; } lzw_state_pointer; // to avoid 'dereferencing type-puned ...' warnings
208 
209 #define LZW_INIT_STATE { { 0 }, 0, { 0 }, 0, 0, 0, 0, 0, 0 }
210 
211 /* macros */
212 
213 #define LZW_MIN_BITS 3
214 #define LZW_MAX_BITS 12
215 #define LZW_TABLE_SIZE (1 << LZW_MAX_BITS)
216 #define LZW_LOOKUP_SIZE (LZW_TABLE_SIZE + 1)
217 
218 #define lzw_bit_range(bits) (bits >= LZW_MIN_BITS && bits <= LZW_BASE_BITS)
219 #define lzw_base_bits(flags) (flags & ((1 << 4) - 1)) // 4 low bits of flags is basebits (UnitLength)
220 
221 #define lzw_initial_codes(state) (1 << state->basebits)
222 #define lzw_clear_code(state) lzw_initial_codes(state)
223 #define lzw_eod_code(state) (lzw_initial_codes(state) + 1)
224 #define lzw_initial_index(state) (lzw_initial_codes(state) + 2)
225 
226 #define lzw_max_index(state) ((1 << state->codebits) - ((state->flags & LZW_EARLY_INDEX) ? 1 : 0))
227 #define lzw_check_bits(state) ((void)(state->index == lzw_max_index(state) && state->codebits < LZW_MAX_BITS && ++state->codebits))
228 
229 #define lzw_malloc util_malloc
230 #define lzw_free util_free
231 
232 /* decoder */
233 
234 static struct lzw_entry lzw_initial_table[] = {
235  {{"\x00"}, 1}, {{"\x01"}, 1}, {{"\x02"}, 1}, {{"\x03"}, 1}, {{"\x04"}, 1}, {{"\x05"}, 1}, {{"\x06"}, 1}, {{"\x07"}, 1}, {{"\x08"}, 1}, {{"\x09"}, 1}, {{"\x0A"}, 1}, {{"\x0B"}, 1}, {{"\x0C"}, 1}, {{"\x0D"}, 1}, {{"\x0E"}, 1}, {{"\x0F"}, 1},
236  {{"\x10"}, 1}, {{"\x11"}, 1}, {{"\x12"}, 1}, {{"\x13"}, 1}, {{"\x14"}, 1}, {{"\x15"}, 1}, {{"\x16"}, 1}, {{"\x17"}, 1}, {{"\x18"}, 1}, {{"\x19"}, 1}, {{"\x1A"}, 1}, {{"\x1B"}, 1}, {{"\x1C"}, 1}, {{"\x1D"}, 1}, {{"\x1E"}, 1}, {{"\x1F"}, 1},
237  {{"\x20"}, 1}, {{"\x21"}, 1}, {{"\x22"}, 1}, {{"\x23"}, 1}, {{"\x24"}, 1}, {{"\x25"}, 1}, {{"\x26"}, 1}, {{"\x27"}, 1}, {{"\x28"}, 1}, {{"\x29"}, 1}, {{"\x2A"}, 1}, {{"\x2B"}, 1}, {{"\x2C"}, 1}, {{"\x2D"}, 1}, {{"\x2E"}, 1}, {{"\x2F"}, 1},
238  {{"\x30"}, 1}, {{"\x31"}, 1}, {{"\x32"}, 1}, {{"\x33"}, 1}, {{"\x34"}, 1}, {{"\x35"}, 1}, {{"\x36"}, 1}, {{"\x37"}, 1}, {{"\x38"}, 1}, {{"\x39"}, 1}, {{"\x3A"}, 1}, {{"\x3B"}, 1}, {{"\x3C"}, 1}, {{"\x3D"}, 1}, {{"\x3E"}, 1}, {{"\x3F"}, 1},
239  {{"\x40"}, 1}, {{"\x41"}, 1}, {{"\x42"}, 1}, {{"\x43"}, 1}, {{"\x44"}, 1}, {{"\x45"}, 1}, {{"\x46"}, 1}, {{"\x47"}, 1}, {{"\x48"}, 1}, {{"\x49"}, 1}, {{"\x4A"}, 1}, {{"\x4B"}, 1}, {{"\x4C"}, 1}, {{"\x4D"}, 1}, {{"\x4E"}, 1}, {{"\x4F"}, 1},
240  {{"\x50"}, 1}, {{"\x51"}, 1}, {{"\x52"}, 1}, {{"\x53"}, 1}, {{"\x54"}, 1}, {{"\x55"}, 1}, {{"\x56"}, 1}, {{"\x57"}, 1}, {{"\x58"}, 1}, {{"\x59"}, 1}, {{"\x5A"}, 1}, {{"\x5B"}, 1}, {{"\x5C"}, 1}, {{"\x5D"}, 1}, {{"\x5E"}, 1}, {{"\x5F"}, 1},
241  {{"\x60"}, 1}, {{"\x61"}, 1}, {{"\x62"}, 1}, {{"\x63"}, 1}, {{"\x64"}, 1}, {{"\x65"}, 1}, {{"\x66"}, 1}, {{"\x67"}, 1}, {{"\x68"}, 1}, {{"\x69"}, 1}, {{"\x6A"}, 1}, {{"\x6B"}, 1}, {{"\x6C"}, 1}, {{"\x6D"}, 1}, {{"\x6E"}, 1}, {{"\x6F"}, 1},
242  {{"\x70"}, 1}, {{"\x71"}, 1}, {{"\x72"}, 1}, {{"\x73"}, 1}, {{"\x74"}, 1}, {{"\x75"}, 1}, {{"\x76"}, 1}, {{"\x77"}, 1}, {{"\x78"}, 1}, {{"\x79"}, 1}, {{"\x7A"}, 1}, {{"\x7B"}, 1}, {{"\x7C"}, 1}, {{"\x7D"}, 1}, {{"\x7E"}, 1}, {{"\x7F"}, 1},
243  {{"\x80"}, 1}, {{"\x81"}, 1}, {{"\x82"}, 1}, {{"\x83"}, 1}, {{"\x84"}, 1}, {{"\x85"}, 1}, {{"\x86"}, 1}, {{"\x87"}, 1}, {{"\x88"}, 1}, {{"\x89"}, 1}, {{"\x8A"}, 1}, {{"\x8B"}, 1}, {{"\x8C"}, 1}, {{"\x8D"}, 1}, {{"\x8E"}, 1}, {{"\x8F"}, 1},
244  {{"\x90"}, 1}, {{"\x91"}, 1}, {{"\x92"}, 1}, {{"\x93"}, 1}, {{"\x94"}, 1}, {{"\x95"}, 1}, {{"\x96"}, 1}, {{"\x97"}, 1}, {{"\x98"}, 1}, {{"\x99"}, 1}, {{"\x9A"}, 1}, {{"\x9B"}, 1}, {{"\x9C"}, 1}, {{"\x9D"}, 1}, {{"\x9E"}, 1}, {{"\x9F"}, 1},
245  {{"\xA0"}, 1}, {{"\xA1"}, 1}, {{"\xA2"}, 1}, {{"\xA3"}, 1}, {{"\xA4"}, 1}, {{"\xA5"}, 1}, {{"\xA6"}, 1}, {{"\xA7"}, 1}, {{"\xA8"}, 1}, {{"\xA9"}, 1}, {{"\xAA"}, 1}, {{"\xAB"}, 1}, {{"\xAC"}, 1}, {{"\xAD"}, 1}, {{"\xAE"}, 1}, {{"\xAF"}, 1},
246  {{"\xB0"}, 1}, {{"\xB1"}, 1}, {{"\xB2"}, 1}, {{"\xB3"}, 1}, {{"\xB4"}, 1}, {{"\xB5"}, 1}, {{"\xB6"}, 1}, {{"\xB7"}, 1}, {{"\xB8"}, 1}, {{"\xB9"}, 1}, {{"\xBA"}, 1}, {{"\xBB"}, 1}, {{"\xBC"}, 1}, {{"\xBD"}, 1}, {{"\xBE"}, 1}, {{"\xBF"}, 1},
247  {{"\xC0"}, 1}, {{"\xC1"}, 1}, {{"\xC2"}, 1}, {{"\xC3"}, 1}, {{"\xC4"}, 1}, {{"\xC5"}, 1}, {{"\xC6"}, 1}, {{"\xC7"}, 1}, {{"\xC8"}, 1}, {{"\xC9"}, 1}, {{"\xCA"}, 1}, {{"\xCB"}, 1}, {{"\xCC"}, 1}, {{"\xCD"}, 1}, {{"\xCE"}, 1}, {{"\xCF"}, 1},
248  {{"\xD0"}, 1}, {{"\xD1"}, 1}, {{"\xD2"}, 1}, {{"\xD3"}, 1}, {{"\xD4"}, 1}, {{"\xD5"}, 1}, {{"\xD6"}, 1}, {{"\xD7"}, 1}, {{"\xD8"}, 1}, {{"\xD9"}, 1}, {{"\xDA"}, 1}, {{"\xDB"}, 1}, {{"\xDC"}, 1}, {{"\xDD"}, 1}, {{"\xDE"}, 1}, {{"\xDF"}, 1},
249  {{"\xE0"}, 1}, {{"\xE1"}, 1}, {{"\xE2"}, 1}, {{"\xE3"}, 1}, {{"\xE4"}, 1}, {{"\xE5"}, 1}, {{"\xE6"}, 1}, {{"\xE7"}, 1}, {{"\xE8"}, 1}, {{"\xE9"}, 1}, {{"\xEA"}, 1}, {{"\xEB"}, 1}, {{"\xEC"}, 1}, {{"\xED"}, 1}, {{"\xEE"}, 1}, {{"\xEF"}, 1},
250  {{"\xF0"}, 1}, {{"\xF1"}, 1}, {{"\xF2"}, 1}, {{"\xF3"}, 1}, {{"\xF4"}, 1}, {{"\xF5"}, 1}, {{"\xF6"}, 1}, {{"\xF7"}, 1}, {{"\xF8"}, 1}, {{"\xF9"}, 1}, {{"\xFA"}, 1}, {{"\xFB"}, 1}, {{"\xFC"}, 1}, {{"\xFD"}, 1}, {{"\xFE"}, 1}, {{"\xFF"}, 1}
251 };
252 
253 #define lzw_entry_at(state, index) (&state->table[index])
254 
256 {
257  state->basebits = lzw_base_bits(flags); // first four bits or flags
258  if (!lzw_bit_range(state->basebits))
259  return NULL;
260  state->flags = flags;
261  if ((state->table = table) == NULL)
262  {
263  state->table = (lzw_entry *)lzw_malloc(LZW_TABLE_SIZE * sizeof(lzw_entry));
264  state->flags |= LZW_TABLE_ALLOC;
265  }
266  memcpy(state->table, lzw_initial_table, (size_t)lzw_initial_codes(state)*sizeof(lzw_entry));
267  // memset(&state->table[lzw_initial_codes(state)], 0, 2*sizeof(lzw_entry)); // eod and clear entries never accessed
268  state->codebits = state->basebits + 1;
269  state->index = lzw_initial_index(state);
270  state->lastentry = NULL;
271  state->tailbytes = 0;
272  state->lastbyte = 0;
273  state->tailbits = 0;
274  return state;
275 }
276 
278 {
280 }
281 
283 {
284  lzw_entry *entry;
285  lzw_index initindex = lzw_initial_index(state);
286  while (state->index > initindex)
287  {
288  entry = lzw_entry_at(state, --state->index);
289  lzw_free(entry->data);
290  // entry->data = NULL;
291  // entry->size = 0;
292  }
293  state->lastentry = NULL;
294  state->tailbytes = 0;
295  state->codebits = state->basebits + 1;
296 }
297 
299 {
301  if (state->flags & LZW_TABLE_ALLOC)
302  lzw_free(state->table);
303 }
304 
305 static int lzw_next_entry (lzw_state *state, lzw_entry *nextentry)
306 {
307  lzw_entry *lastentry, *newentry;
308  if ((lastentry = state->lastentry) == NULL)
309  return 1; /* its ok */
310  if (state->index == LZW_TABLE_SIZE)
311  return 0; /* invalid input; eod marker expected earlier */
312  /* put the new entry on the top of the table */
313  newentry = lzw_entry_at(state, state->index++);
314  /* its size is the last entrtyy size plus 1 */
315  newentry->size = lastentry->size + 1;
316  /* its content is the content of the last entry, */
317  newentry->data = (char *)lzw_malloc((size_t)newentry->size);
318  memcpy(newentry->data, lastentry->data, lastentry->size);
319  /* plus the first byte of the new entry (usually fixed code entry) */
320  newentry->data[newentry->size - 1] = nextentry->data[0];
321  return 1;
322 }
323 
324 #define lzw_write_bytes(O, state) ((state->tailbytes -= (int)iof_write(O, state->lastentry->data, (size_t)state->tailbytes)) == 0)
325 
327 {
329  lzw_index code;
330  lzw_entry *entry;
331  if (state->lastentry != NULL)
332  { /* write out the tail from the last call */
333  if (state->tailbytes > 0 && !lzw_write_bytes(O, state))
334  return IOFFULL;
335  /* do what we normally do at the end of the loop body below */
337  }
338  // if (state->flags & LZW_LOW_BITS_FIRST)
339  // return IOFERR;
340  while (1)
341  {
342  /* get input code of length state->codebits */
343  code = (state->lastbyte & ((1 << state->tailbits) - 1)) << (state->codebits - state->tailbits);
344  for (state->tailbits -= state->codebits; state->tailbits < 0; )
345  {
346  get_code:
347  if ((state->lastbyte = iof_get(I)) < 0)
348  return state->flush ? IOFEOF : state->lastbyte;
349  state->tailbits += 8;
350  if (state->tailbits < 0)
351  {
352  code |= (state->lastbyte << (-state->tailbits));
353  goto get_code;
354  }
355  else
356  {
357  code |= (state->lastbyte >> state->tailbits);
358  break;
359  }
360  }
361  /* interpret the code */
363  { /* single byte code or special marker */
364  if (code == clear)
365  {
367  continue;
368  }
369  if (code == eod)
370  return IOFEOF;
372  if (!lzw_next_entry(state, entry))
373  return IOFERR;
374  }
375  else if (code == state->index)
376  { /* apparently encoder has emitted the code of the key just created (see notes) */
377  if (!lzw_next_entry(state, state->lastentry))
378  return IOFERR;
379  entry = lzw_entry_at(state, state->index - 1);
380  }
381  else
382  { /* invalid input code */
383  return IOFERR;
384  }
385  /* record the entry found */
386  state->lastentry = entry;
387  /* emit the sequence pointed by that entry */
388  state->tailbytes = entry->size;
389  if (!lzw_write_bytes(O, state))
390  return IOFFULL;
391  /* check and update code bits */
393  }
394  return state->lastbyte; // never reached
395 }
396 
397 /* encoder */
398 
399 #define lzw_node_at(state, index) (&state->lookup[index])
400 
401 #define lzw_node_init(node, i, c) (node->index = i, node->suffix = c, node->left = NULL, node->right = NULL, node->map = NULL)
402 
404 {
406  lzw_node *node;
407  state->basebits = lzw_base_bits(flags); // first four bits of flags is base bits of code (default 8)
408  if (!lzw_bit_range(state->basebits))
409  return NULL;
410  state->flags = flags;
411  if ((state->lookup = lookup) == NULL)
412  {
413  state->lookup = lzw_malloc(LZW_LOOKUP_SIZE*sizeof(lzw_node));
414  state->flags |= LZW_TABLE_ALLOC;
415  }
416  state->index = lzw_initial_index(state);
417  for (index = 0; index < lzw_initial_codes(state); ++index)
418  {
420  lzw_node_init(node, index, (unsigned char)index);
421  }
422  state->codebits = state->basebits + 1;
423  state->lastnode = NULL;
424  state->lastbyte = 0;
425  state->tailbits = 0;
426  return state;
427 }
428 
430 {
432 }
433 
435 {
436  if (state->flags & LZW_TABLE_ALLOC)
437  lzw_free(state->lookup);
438 }
439 
441 {
442  lzw_node *node;
444  /* clear fixed nodes */
445  for (index = 0; index < lzw_initial_codes(state); ++index)
446  {
448  lzw_node_init(node, index, (unsigned char)index);
449  }
450  /* reset table index */
451  state->index = lzw_initial_index(state);
452  /* reset code bits */
453  state->codebits = state->basebits + 1;
454 }
455 
456 static void lzw_put_code (iof *O, lzw_state *state, lzw_index code, int todobits)
457 {
458  int leftbits, rightbits;
459  do
460  {
461  leftbits = 8 - state->tailbits;
462  rightbits = todobits - leftbits;
463  if (rightbits >= 0)
464  {
465  state->lastbyte |= (code >> rightbits);
466  iof_put(O, state->lastbyte);
467  code = code & ((1 << rightbits) - 1);
468  todobits -= leftbits;
469  state->lastbyte = 0;
470  state->tailbits = 0;
471  }
472  else
473  {
474  state->lastbyte |= (code << (-rightbits));
475  state->tailbits += todobits;
476  return;
477  }
478  } while (1);
479 }
480 
482 {
483  if (state->flush)
484  {
485  /* put the last code if any */
486  if (state->lastnode != NULL)
487  lzw_put_code(O, state, state->lastnode->index, state->codebits);
488  /* put eod marker, */
489  lzw_put_code(O, state, lzw_eod_code(state), state->codebits);
490  /* with tail bits set to 0 */
491  if (state->tailbits > 0)
492  lzw_put_code(O, state, 0, 8 - state->tailbits);
493  return IOFEOF;
494  }
495  return IOFEMPTY;
496 }
497 
498 static lzw_node * lzw_node_push (lzw_state *state, unsigned char suffix)
499 {
500  lzw_node *node;
501  node = lzw_node_at(state, state->index);
502  lzw_node_init(node, state->index, suffix);
503  ++state->index;
504  return node;
505 }
506 
507 static int lzw_next_node (lzw_state *state, unsigned char suffix)
508 {
509  lzw_node *node;
510  if ((node = state->lastnode->map) == NULL)
511  {
512  state->lastnode->map = lzw_node_push(state, suffix);
513  return 0;
514  }
515  while (1)
516  {
517  if (suffix < node->suffix)
518  {
519  if (node->left == NULL)
520  {
521  node->left = lzw_node_push(state, suffix);
522  return 0;
523  }
524  node = node->left;
525  }
526  else if (suffix > node->suffix)
527  {
528  if (node->right == NULL)
529  {
530  node->right = lzw_node_push(state, suffix);
531  return 0;
532  }
533  node = node->right;
534  }
535  else
536  {
537  state->lastnode = node;
538  return 1;
539  }
540  }
541  return 0; // never reached
542 }
543 
545 {
546  int byte;
547  if (state->lastnode == NULL)
548  { /* first call only; following convention, put clear-table marker */
549  if (!iof_ensure(O, 2))
550  return IOFFULL;
551  lzw_put_code(O, state, lzw_clear_code(state), state->codebits);
552  /* get the first input byte and initialize the current table entry */
553  if ((byte = iof_get(I)) < 0)
554  return lzw_encode_last(O, state);
555  state->lastnode = lzw_node_at(state, byte);
556  }
557  while (iof_ensure(O, 2))
558  { /* we need to write at most 2 bytes on each iteration */
559  if ((byte = iof_get(I)) < 0)
560  return lzw_encode_last(O, state);
561  if (lzw_next_node(state, (unsigned char)byte) == 0)
562  { /* means that the key hasn't been found and the new entry has just been created */
563  /* output the code pointing the longest sequence so far */
564  lzw_put_code(O, state, state->lastnode->index, state->codebits);
565  /* update code bits */
566  if (state->index == lzw_max_index(state) + 1)
567  {
568  if (state->codebits < LZW_MAX_BITS)
569  ++state->codebits;
570  else
571  {
572  /* put clear-table marker */
573  lzw_put_code(O, state, lzw_clear_code(state), state->codebits);
574  /* reset the table */
576  }
577  }
578  /* in any case, recent byte becomes the current table code */
579  state->lastnode = lzw_node_at(state, byte);
580  }
581  /* otherwise no new entry is appended and state->lastnode points the longer sequence just found */
582  }
583  return IOFFULL;
584 }
585 
586 /* single call codecs */
587 
589 {
592  int ret;
594  state.flush = 1;
595  ret = lzw_decode_state(I, O, &state);
596  // iof_flush(O); // ?
598  return ret;
599 }
600 
602 {
605  int ret;
607  state.flush = 1;
608  ret = lzw_encode_state(I, O, &state);
609  // iof_flush(O); // ?
611  return ret;
612 }
613 
614 /* filters */
615 
616 // lzw decoder function
617 
618 static size_t lzw_decoder (iof *F, iof_mode mode)
619 {
620  lzw_state *state;
622  size_t tail;
623 
625  switch(mode)
626  {
627  case IOFLOAD:
628  case IOFREAD:
629  if (F->flags & IOF_STOPPED)
630  return 0;
631  tail = iof_tail(F);
632  F->pos = F->buf + tail;
633  F->end = F->buf + F->space;
634  do {
635  status = lzw_decode_state(F->next, F, state);
636  } while (mode == IOFLOAD && status == IOFFULL && iof_resize_buffer(F));
637  return iof_decoder_retval(F, "lzw", status);
638  case IOFCLOSE:
640  iof_free(F);
641  return 0;
642  default:
643  break;
644  }
645  return 0;
646 }
647 
648 // lzw encoder function
649 
650 static size_t lzw_encoder (iof *F, iof_mode mode)
651 {
652  lzw_state *state;
654 
656  switch (mode)
657  {
658  case IOFFLUSH:
659  state->flush = 1;
660  FALLTHRU // fall through
661  case IOFWRITE:
662  F->end = F->pos;
663  F->pos = F->buf;
664  status = lzw_encode_state(F, F->next, state);
665  return iof_encoder_retval(F, "lzw", status);
666  case IOFCLOSE:
667  if (!state->flush)
670  iof_free(F);
671  return 0;
672  default:
673  break;
674  }
675  return 0;
676 }
677 
679 {
680  iof *I;
682  I = iof_filter_reader(lzw_decoder, sizeof(lzw_state), &P.voidstate);
683  iof_setup_next(I, N);
684  if (lzw_decoder_init(P.lzwstate, flags) == NULL)
685  {
686  iof_discard(I);
687  return NULL;
688  }
689  P.lzwstate->flush = 1;
690  return I;
691 }
692 
694 {
695  iof *O;
697  O = iof_filter_writer(lzw_encoder, sizeof(lzw_state), &P.voidstate);
698  iof_setup_next(O, N);
699  if (lzw_encoder_init(P.lzwstate, flags) == NULL)
700  {
701  iof_discard(O);
702  return NULL;
703  }
704  return O;
705 }
int code
Definition: aftopl.c:52
#define state
Definition: aptex-macros.h:996
#define tail
Definition: aptex-macros.h:514
#define mode
Definition: aptex-macros.h:510
mrb_ast_node node
Definition: codegen.c:30
paragraph P
#define memcpy(d, s, n)
Definition: gsftopk.c:64
static void clear()
#define byte
Definition: in_pcx.cpp:28
#define NULL
Definition: ftobjs.h:61
#define F(x, y, z)
Definition: md5.c:51
#define I(x, y, z)
Definition: md5.c:55
pdf_obj * entry
Definition: pdfdoc.c:64
static int ret
Definition: convert.c:72
#define O(_t, _a, _f)
Definition: makeint.h:501
const char * suffix
Definition: pkg_icu.cpp:27
#define index(s, c)
Definition: plain2.h:351
#define status
#define flags
size_t iof_encoder_retval(iof *O, const char *type, iof_status status)
Definition: utiliof.c:2343
size_t iof_decoder_retval(iof *I, const char *type, iof_status status)
Definition: utiliof.c:2322
void iof_discard(iof *F)
Definition: utiliof.c:2276
void iof_free(iof *F)
Definition: utiliof.c:2261
#define iof_ensure(O, n)
Definition: utiliof.h:352
#define iof_put(O, c)
Definition: utiliof.h:462
#define iof_filter_state(statetype, F)
Definition: utiliof.h:635
#define iof_filter_reader(handler, statesize, pstate)
Definition: utiliof.h:627
iof_status
Definition: utiliof.h:25
@ IOFEOF
Definition: utiliof.h:26
@ IOFFULL
Definition: utiliof.h:28
@ IOFERR
Definition: utiliof.h:29
@ IOFEMPTY
Definition: utiliof.h:27
#define iof_setup_next(I, N)
Definition: utiliof.h:293
iof_mode
Definition: utiliof.h:15
@ IOFCLOSE
Definition: utiliof.h:20
@ IOFREAD
Definition: utiliof.h:16
@ IOFLOAD
Definition: utiliof.h:17
@ IOFFLUSH
Definition: utiliof.h:19
@ IOFWRITE
Definition: utiliof.h:18
#define IOF_STOPPED
Definition: utiliof.h:111
#define iof_filter_writer(handler, statesize, pstate)
Definition: utiliof.h:631
#define iof_resize_buffer(O)
Definition: utiliof.h:641
#define iof_get(I)
Definition: utiliof.h:385
#define iof_tail(I)
Definition: utiliof.h:197
lzw_state * lzw_encoder_init(lzw_state *state, int flags)
Definition: utillzw.c:429
static lzw_node * lzw_node_push(lzw_state *state, unsigned char suffix)
Definition: utillzw.c:498
#define lzw_node_init(node, i, c)
Definition: utillzw.c:401
#define lzw_bit_range(bits)
Definition: utillzw.c:218
#define lzw_write_bytes(O, state)
Definition: utillzw.c:324
#define lzw_initial_index(state)
Definition: utillzw.c:224
static iof_status lzw_encode_last(iof *O, lzw_state *state)
Definition: utillzw.c:481
#define LZW_LOOKUP_SIZE
Definition: utillzw.c:216
#define LZW_TABLE_SIZE
Definition: utillzw.c:215
#define lzw_malloc
Definition: utillzw.c:229
iof_status lzw_decode_state(iof *I, iof *O, lzw_state *state)
Definition: utillzw.c:326
void lzw_decoder_close(lzw_state *state)
Definition: utillzw.c:298
static size_t lzw_encoder(iof *F, iof_mode mode)
Definition: utillzw.c:650
static int lzw_next_node(lzw_state *state, unsigned char suffix)
Definition: utillzw.c:507
iof_status lzw_encode(iof *I, iof *O, int flags)
Definition: utillzw.c:601
static lzw_state * lzw_decoder_init_table(lzw_state *state, lzw_entry *table, int flags)
Definition: utillzw.c:255
#define lzw_node_at(state, index)
Definition: utillzw.c:399
iof_status lzw_encode_state(iof *I, iof *O, lzw_state *state)
Definition: utillzw.c:544
static int lzw_next_entry(lzw_state *state, lzw_entry *nextentry)
Definition: utillzw.c:305
static void lzw_put_code(iof *O, lzw_state *state, short code, int todobits)
Definition: utillzw.c:456
#define lzw_free
Definition: utillzw.c:230
lzw_state * lzw_decoder_init(lzw_state *state, int flags)
Definition: utillzw.c:277
#define lzw_initial_codes(state)
Definition: utillzw.c:221
#define LZW_MAX_BITS
Definition: utillzw.c:214
static void lzw_decoder_clear(lzw_state *state)
Definition: utillzw.c:282
#define lzw_eod_code(state)
Definition: utillzw.c:223
void lzw_encoder_close(lzw_state *state)
Definition: utillzw.c:434
#define lzw_index
Definition: utillzw.c:174
iof * iof_filter_lzw_encoder(iof *N, int flags)
Definition: utillzw.c:693
struct lzw_entry lzw_entry
iof * iof_filter_lzw_decoder(iof *N, int flags)
Definition: utillzw.c:678
#define lzw_base_bits(flags)
Definition: utillzw.c:219
#define lzw_entry_at(state, index)
Definition: utillzw.c:253
#define lzw_max_index(state)
Definition: utillzw.c:226
static struct lzw_entry lzw_initial_table[]
Definition: utillzw.c:234
#define lzw_check_bits(state)
Definition: utillzw.c:227
static size_t lzw_decoder(iof *F, iof_mode mode)
Definition: utillzw.c:618
iof_status lzw_decode(iof *I, iof *O, int flags)
Definition: utillzw.c:588
#define LZW_INIT_STATE
Definition: utillzw.c:209
static void lzw_encoder_clear(lzw_state *state)
Definition: utillzw.c:440
static lzw_state * lzw_encoder_init_table(lzw_state *state, lzw_node *lookup, int flags)
Definition: utillzw.c:403
#define lzw_clear_code(state)
Definition: utillzw.c:222
#define LZW_TABLE_ALLOC
Definition: utillzw.h:9
#define FALLTHRU
Definition: utilplat.h:28
Definition: inftrees.h:24
Definition: mendex.h:20
Definition: utiliof.h:83
Definition: zic.c:306
Definition: utillzw.c:166
const char * rdata
Definition: utillzw.c:168
char * data
Definition: utillzw.c:169
int size
Definition: utillzw.c:171
short index
Definition: utillzw.c:179
lzw_node * map
Definition: utillzw.c:183
unsigned char suffix
Definition: utillzw.c:180
lzw_node * left
Definition: utillzw.c:181
lzw_node * right
Definition: utillzw.c:182
int flags
Definition: utillzw.c:204
int tailbits
Definition: utillzw.c:202
lzw_entry * table
Definition: utillzw.c:189
int lastbyte
Definition: utillzw.c:201
int tailbytes
Definition: utillzw.c:196
int basebits
Definition: utillzw.c:199
short index
Definition: utillzw.c:191
lzw_node * lastnode
Definition: utillzw.c:193
int codebits
Definition: utillzw.c:200
int flush
Definition: utillzw.c:203
lzw_node * lookup
Definition: utillzw.c:188
lzw_entry * lastentry
Definition: utillzw.c:195
void * data
Definition: pdfobj.c:70
Definition: table.h:30
long unsigned N
Definition: tex4ht.c:2765
lzw_state * lzwstate
Definition: utillzw.c:207