~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
decompress_bunzip2.c

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~
Diff markup

Differences between /lib/decompress_bunzip2.c (Version linux-6.12-rc7) and /lib/decompress_bunzip2.c (Version linux-4.15.18)

  1 /*      Small bzip2 deflate implementation, by      1 /*      Small bzip2 deflate implementation, by Rob Landley (rob@landley.net).
  2                                                     2 
  3         Based on bzip2 decompression code by J      3         Based on bzip2 decompression code by Julian R Seward (jseward@acm.org),
  4         which also acknowledges contributions       4         which also acknowledges contributions by Mike Burrows, David Wheeler,
  5         Peter Fenwick, Alistair Moffat, Radfor      5         Peter Fenwick, Alistair Moffat, Radford Neal, Ian H. Witten,
  6         Robert Sedgewick, and Jon L. Bentley.       6         Robert Sedgewick, and Jon L. Bentley.
  7                                                     7 
  8         This code is licensed under the LGPLv2      8         This code is licensed under the LGPLv2:
  9                 LGPL (http://www.gnu.org/copyl      9                 LGPL (http://www.gnu.org/copyleft/lgpl.html
 10 */                                                 10 */
 11                                                    11 
 12 /*                                                 12 /*
 13         Size and speed optimizations by Manuel     13         Size and speed optimizations by Manuel Novoa III  (mjn3@codepoet.org).
 14                                                    14 
 15         More efficient reading of Huffman code     15         More efficient reading of Huffman codes, a streamlined read_bunzip()
 16         function, and various other tweaks.  I     16         function, and various other tweaks.  In (limited) tests, approximately
 17         20% faster than bzcat on x86 and about     17         20% faster than bzcat on x86 and about 10% faster on arm.
 18                                                    18 
 19         Note that about 2/3 of the time is spe     19         Note that about 2/3 of the time is spent in read_unzip() reversing
 20         the Burrows-Wheeler transformation.  M     20         the Burrows-Wheeler transformation.  Much of that time is delay
 21         resulting from cache misses.               21         resulting from cache misses.
 22                                                    22 
 23         I would ask that anyone benefiting fro     23         I would ask that anyone benefiting from this work, especially those
 24         using it in commercial products, consi     24         using it in commercial products, consider making a donation to my local
 25         non-profit hospice organization in the     25         non-profit hospice organization in the name of the woman I loved, who
 26         passed away Feb. 12, 2003.                 26         passed away Feb. 12, 2003.
 27                                                    27 
 28                 In memory of Toni W. Hagan         28                 In memory of Toni W. Hagan
 29                                                    29 
 30                 Hospice of Acadiana, Inc.          30                 Hospice of Acadiana, Inc.
 31                 2600 Johnston St., Suite 200       31                 2600 Johnston St., Suite 200
 32                 Lafayette, LA 70503-3240           32                 Lafayette, LA 70503-3240
 33                                                    33 
 34                 Phone (337) 232-1234 or 1-800-     34                 Phone (337) 232-1234 or 1-800-738-2226
 35                 Fax   (337) 232-1297               35                 Fax   (337) 232-1297
 36                                                    36 
 37                 https://www.hospiceacadiana.co !!  37                 http://www.hospiceacadiana.com/
 38                                                    38 
 39         Manuel                                     39         Manuel
 40  */                                                40  */
 41                                                    41 
 42 /*                                                 42 /*
 43         Made it fit for running in Linux Kerne     43         Made it fit for running in Linux Kernel by Alain Knaff (alain@knaff.lu)
 44 */                                                 44 */
 45                                                    45 
 46                                                    46 
 47 #ifdef STATIC                                      47 #ifdef STATIC
 48 #define PREBOOT                                    48 #define PREBOOT
 49 #else                                              49 #else
 50 #include <linux/decompress/bunzip2.h>              50 #include <linux/decompress/bunzip2.h>
 51 #endif /* STATIC */                                51 #endif /* STATIC */
 52                                                    52 
 53 #include <linux/decompress/mm.h>                   53 #include <linux/decompress/mm.h>
 54 #include <linux/crc32poly.h>                   << 
 55                                                    54 
 56 #ifndef INT_MAX                                    55 #ifndef INT_MAX
 57 #define INT_MAX 0x7fffffff                         56 #define INT_MAX 0x7fffffff
 58 #endif                                             57 #endif
 59                                                    58 
 60 /* Constants for Huffman coding */                 59 /* Constants for Huffman coding */
 61 #define MAX_GROUPS              6                  60 #define MAX_GROUPS              6
 62 #define GROUP_SIZE              50      /* 64      61 #define GROUP_SIZE              50      /* 64 would have been more efficient */
 63 #define MAX_HUFCODE_BITS        20      /* Lon     62 #define MAX_HUFCODE_BITS        20      /* Longest Huffman code allowed */
 64 #define MAX_SYMBOLS             258     /* 256     63 #define MAX_SYMBOLS             258     /* 256 literals + RUNA + RUNB */
 65 #define SYMBOL_RUNA             0                  64 #define SYMBOL_RUNA             0
 66 #define SYMBOL_RUNB             1                  65 #define SYMBOL_RUNB             1
 67                                                    66 
 68 /* Status return values */                         67 /* Status return values */
 69 #define RETVAL_OK                       0          68 #define RETVAL_OK                       0
 70 #define RETVAL_LAST_BLOCK               (-1)       69 #define RETVAL_LAST_BLOCK               (-1)
 71 #define RETVAL_NOT_BZIP_DATA            (-2)       70 #define RETVAL_NOT_BZIP_DATA            (-2)
 72 #define RETVAL_UNEXPECTED_INPUT_EOF     (-3)       71 #define RETVAL_UNEXPECTED_INPUT_EOF     (-3)
 73 #define RETVAL_UNEXPECTED_OUTPUT_EOF    (-4)       72 #define RETVAL_UNEXPECTED_OUTPUT_EOF    (-4)
 74 #define RETVAL_DATA_ERROR               (-5)       73 #define RETVAL_DATA_ERROR               (-5)
 75 #define RETVAL_OUT_OF_MEMORY            (-6)       74 #define RETVAL_OUT_OF_MEMORY            (-6)
 76 #define RETVAL_OBSOLETE_INPUT           (-7)       75 #define RETVAL_OBSOLETE_INPUT           (-7)
 77                                                    76 
 78 /* Other housekeeping constants */                 77 /* Other housekeeping constants */
 79 #define BZIP2_IOBUF_SIZE                4096       78 #define BZIP2_IOBUF_SIZE                4096
 80                                                    79 
 81 /* This is what we know about each Huffman cod     80 /* This is what we know about each Huffman coding group */
 82 struct group_data {                                81 struct group_data {
 83         /* We have an extra slot at the end of !!  82         /* We have an extra slot at the end of limit[] for a sentinal value. */
 84         int limit[MAX_HUFCODE_BITS+1];             83         int limit[MAX_HUFCODE_BITS+1];
 85         int base[MAX_HUFCODE_BITS];                84         int base[MAX_HUFCODE_BITS];
 86         int permute[MAX_SYMBOLS];                  85         int permute[MAX_SYMBOLS];
 87         int minLen, maxLen;                        86         int minLen, maxLen;
 88 };                                                 87 };
 89                                                    88 
 90 /* Structure holding all the housekeeping data     89 /* Structure holding all the housekeeping data, including IO buffers and
 91    memory that persists between calls to bunzi     90    memory that persists between calls to bunzip */
 92 struct bunzip_data {                               91 struct bunzip_data {
 93         /* State for interrupting output loop      92         /* State for interrupting output loop */
 94         int writeCopies, writePos, writeRunCou     93         int writeCopies, writePos, writeRunCountdown, writeCount, writeCurrent;
 95         /* I/O tracking data (file handles, bu     94         /* I/O tracking data (file handles, buffers, positions, etc.) */
 96         long (*fill)(void*, unsigned long);        95         long (*fill)(void*, unsigned long);
 97         long inbufCount, inbufPos /*, outbufPo     96         long inbufCount, inbufPos /*, outbufPos*/;
 98         unsigned char *inbuf /*,*outbuf*/;         97         unsigned char *inbuf /*,*outbuf*/;
 99         unsigned int inbufBitCount, inbufBits;     98         unsigned int inbufBitCount, inbufBits;
100         /* The CRC values stored in the block      99         /* The CRC values stored in the block header and calculated from the
101         data */                                   100         data */
102         unsigned int crc32Table[256], headerCR    101         unsigned int crc32Table[256], headerCRC, totalCRC, writeCRC;
103         /* Intermediate buffer and its size (i    102         /* Intermediate buffer and its size (in bytes) */
104         unsigned int *dbuf, dbufSize;             103         unsigned int *dbuf, dbufSize;
105         /* These things are a bit too big to g    104         /* These things are a bit too big to go on the stack */
106         unsigned char selectors[32768];           105         unsigned char selectors[32768];         /* nSelectors = 15 bits */
107         struct group_data groups[MAX_GROUPS];     106         struct group_data groups[MAX_GROUPS];   /* Huffman coding tables */
108         int io_error;                   /* non    107         int io_error;                   /* non-zero if we have IO error */
109         int byteCount[256];                       108         int byteCount[256];
110         unsigned char symToByte[256], mtfSymbo    109         unsigned char symToByte[256], mtfSymbol[256];
111 };                                                110 };
112                                                   111 
113                                                   112 
114 /* Return the next nnn bits of input.  All rea    113 /* Return the next nnn bits of input.  All reads from the compressed input
115    are done through this function.  All reads     114    are done through this function.  All reads are big endian */
116 static unsigned int INIT get_bits(struct bunzi    115 static unsigned int INIT get_bits(struct bunzip_data *bd, char bits_wanted)
117 {                                                 116 {
118         unsigned int bits = 0;                    117         unsigned int bits = 0;
119                                                   118 
120         /* If we need to get more data from th    119         /* If we need to get more data from the byte buffer, do so.
121            (Loop getting one byte at a time to    120            (Loop getting one byte at a time to enforce endianness and avoid
122            unaligned access.) */                  121            unaligned access.) */
123         while (bd->inbufBitCount < bits_wanted    122         while (bd->inbufBitCount < bits_wanted) {
124                 /* If we need to read more dat    123                 /* If we need to read more data from file into byte buffer, do
125                    so */                          124                    so */
126                 if (bd->inbufPos == bd->inbufC    125                 if (bd->inbufPos == bd->inbufCount) {
127                         if (bd->io_error)         126                         if (bd->io_error)
128                                 return 0;         127                                 return 0;
129                         bd->inbufCount = bd->f    128                         bd->inbufCount = bd->fill(bd->inbuf, BZIP2_IOBUF_SIZE);
130                         if (bd->inbufCount <=     129                         if (bd->inbufCount <= 0) {
131                                 bd->io_error =    130                                 bd->io_error = RETVAL_UNEXPECTED_INPUT_EOF;
132                                 return 0;         131                                 return 0;
133                         }                         132                         }
134                         bd->inbufPos = 0;         133                         bd->inbufPos = 0;
135                 }                                 134                 }
136                 /* Avoid 32-bit overflow (dump    135                 /* Avoid 32-bit overflow (dump bit buffer to top of output) */
137                 if (bd->inbufBitCount >= 24) {    136                 if (bd->inbufBitCount >= 24) {
138                         bits = bd->inbufBits&(    137                         bits = bd->inbufBits&((1 << bd->inbufBitCount)-1);
139                         bits_wanted -= bd->inb    138                         bits_wanted -= bd->inbufBitCount;
140                         bits <<= bits_wanted;     139                         bits <<= bits_wanted;
141                         bd->inbufBitCount = 0;    140                         bd->inbufBitCount = 0;
142                 }                                 141                 }
143                 /* Grab next 8 bits of input f    142                 /* Grab next 8 bits of input from buffer. */
144                 bd->inbufBits = (bd->inbufBits    143                 bd->inbufBits = (bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
145                 bd->inbufBitCount += 8;           144                 bd->inbufBitCount += 8;
146         }                                         145         }
147         /* Calculate result */                    146         /* Calculate result */
148         bd->inbufBitCount -= bits_wanted;         147         bd->inbufBitCount -= bits_wanted;
149         bits |= (bd->inbufBits >> bd->inbufBit    148         bits |= (bd->inbufBits >> bd->inbufBitCount)&((1 << bits_wanted)-1);
150                                                   149 
151         return bits;                              150         return bits;
152 }                                                 151 }
153                                                   152 
154 /* Unpacks the next block and sets up for the     153 /* Unpacks the next block and sets up for the inverse burrows-wheeler step. */
155                                                   154 
156 static int INIT get_next_block(struct bunzip_d    155 static int INIT get_next_block(struct bunzip_data *bd)
157 {                                                 156 {
158         struct group_data *hufGroup = NULL;       157         struct group_data *hufGroup = NULL;
159         int *base = NULL;                         158         int *base = NULL;
160         int *limit = NULL;                        159         int *limit = NULL;
161         int dbufCount, nextSym, dbufSize, grou    160         int dbufCount, nextSym, dbufSize, groupCount, selector,
162                 i, j, k, t, runPos, symCount,     161                 i, j, k, t, runPos, symCount, symTotal, nSelectors, *byteCount;
163         unsigned char uc, *symToByte, *mtfSymb    162         unsigned char uc, *symToByte, *mtfSymbol, *selectors;
164         unsigned int *dbuf, origPtr;              163         unsigned int *dbuf, origPtr;
165                                                   164 
166         dbuf = bd->dbuf;                          165         dbuf = bd->dbuf;
167         dbufSize = bd->dbufSize;                  166         dbufSize = bd->dbufSize;
168         selectors = bd->selectors;                167         selectors = bd->selectors;
169         byteCount = bd->byteCount;                168         byteCount = bd->byteCount;
170         symToByte = bd->symToByte;                169         symToByte = bd->symToByte;
171         mtfSymbol = bd->mtfSymbol;                170         mtfSymbol = bd->mtfSymbol;
172                                                   171 
173         /* Read in header signature and CRC, t    172         /* Read in header signature and CRC, then validate signature.
174            (last block signature means CRC is     173            (last block signature means CRC is for whole file, return now) */
175         i = get_bits(bd, 24);                     174         i = get_bits(bd, 24);
176         j = get_bits(bd, 24);                     175         j = get_bits(bd, 24);
177         bd->headerCRC = get_bits(bd, 32);         176         bd->headerCRC = get_bits(bd, 32);
178         if ((i == 0x177245) && (j == 0x385090)    177         if ((i == 0x177245) && (j == 0x385090))
179                 return RETVAL_LAST_BLOCK;         178                 return RETVAL_LAST_BLOCK;
180         if ((i != 0x314159) || (j != 0x265359)    179         if ((i != 0x314159) || (j != 0x265359))
181                 return RETVAL_NOT_BZIP_DATA;      180                 return RETVAL_NOT_BZIP_DATA;
182         /* We can add support for blockRandomi    181         /* We can add support for blockRandomised if anybody complains.
183            There was some code for this in bus    182            There was some code for this in busybox 1.0.0-pre3, but nobody ever
184            noticed that it didn't actually wor    183            noticed that it didn't actually work. */
185         if (get_bits(bd, 1))                      184         if (get_bits(bd, 1))
186                 return RETVAL_OBSOLETE_INPUT;     185                 return RETVAL_OBSOLETE_INPUT;
187         origPtr = get_bits(bd, 24);               186         origPtr = get_bits(bd, 24);
188         if (origPtr >= dbufSize)                  187         if (origPtr >= dbufSize)
189                 return RETVAL_DATA_ERROR;         188                 return RETVAL_DATA_ERROR;
190         /* mapping table: if some byte values     189         /* mapping table: if some byte values are never used (encoding things
191            like ascii text), the compression c    190            like ascii text), the compression code removes the gaps to have fewer
192            symbols to deal with, and writes a     191            symbols to deal with, and writes a sparse bitfield indicating which
193            values were present.  We make a tra    192            values were present.  We make a translation table to convert the
194            symbols back to the corresponding b    193            symbols back to the corresponding bytes. */
195         t = get_bits(bd, 16);                     194         t = get_bits(bd, 16);
196         symTotal = 0;                             195         symTotal = 0;
197         for (i = 0; i < 16; i++) {                196         for (i = 0; i < 16; i++) {
198                 if (t&(1 << (15-i))) {            197                 if (t&(1 << (15-i))) {
199                         k = get_bits(bd, 16);     198                         k = get_bits(bd, 16);
200                         for (j = 0; j < 16; j+    199                         for (j = 0; j < 16; j++)
201                                 if (k&(1 << (1    200                                 if (k&(1 << (15-j)))
202                                         symToB    201                                         symToByte[symTotal++] = (16*i)+j;
203                 }                                 202                 }
204         }                                         203         }
205         /* How many different Huffman coding g    204         /* How many different Huffman coding groups does this block use? */
206         groupCount = get_bits(bd, 3);             205         groupCount = get_bits(bd, 3);
207         if (groupCount < 2 || groupCount > MAX    206         if (groupCount < 2 || groupCount > MAX_GROUPS)
208                 return RETVAL_DATA_ERROR;         207                 return RETVAL_DATA_ERROR;
209         /* nSelectors: Every GROUP_SIZE many s    208         /* nSelectors: Every GROUP_SIZE many symbols we select a new
210            Huffman coding group.  Read in the     209            Huffman coding group.  Read in the group selector list,
211            which is stored as MTF encoded bit     210            which is stored as MTF encoded bit runs.  (MTF = Move To
212            Front, as each value is used it's m    211            Front, as each value is used it's moved to the start of the
213            list.) */                              212            list.) */
214         nSelectors = get_bits(bd, 15);            213         nSelectors = get_bits(bd, 15);
215         if (!nSelectors)                          214         if (!nSelectors)
216                 return RETVAL_DATA_ERROR;         215                 return RETVAL_DATA_ERROR;
217         for (i = 0; i < groupCount; i++)          216         for (i = 0; i < groupCount; i++)
218                 mtfSymbol[i] = i;                 217                 mtfSymbol[i] = i;
219         for (i = 0; i < nSelectors; i++) {        218         for (i = 0; i < nSelectors; i++) {
220                 /* Get next value */              219                 /* Get next value */
221                 for (j = 0; get_bits(bd, 1); j    220                 for (j = 0; get_bits(bd, 1); j++)
222                         if (j >= groupCount)      221                         if (j >= groupCount)
223                                 return RETVAL_    222                                 return RETVAL_DATA_ERROR;
224                 /* Decode MTF to get the next     223                 /* Decode MTF to get the next selector */
225                 uc = mtfSymbol[j];                224                 uc = mtfSymbol[j];
226                 for (; j; j--)                    225                 for (; j; j--)
227                         mtfSymbol[j] = mtfSymb    226                         mtfSymbol[j] = mtfSymbol[j-1];
228                 mtfSymbol[0] = selectors[i] =     227                 mtfSymbol[0] = selectors[i] = uc;
229         }                                         228         }
230         /* Read the Huffman coding tables for     229         /* Read the Huffman coding tables for each group, which code
231            for symTotal literal symbols, plus     230            for symTotal literal symbols, plus two run symbols (RUNA,
232            RUNB) */                               231            RUNB) */
233         symCount = symTotal+2;                    232         symCount = symTotal+2;
234         for (j = 0; j < groupCount; j++) {        233         for (j = 0; j < groupCount; j++) {
235                 unsigned char length[MAX_SYMBO !! 234                 unsigned char length[MAX_SYMBOLS], temp[MAX_HUFCODE_BITS+1];
236                 unsigned short temp[MAX_HUFCOD << 
237                 int     minLen, maxLen, pp;       235                 int     minLen, maxLen, pp;
238                 /* Read Huffman code lengths f    236                 /* Read Huffman code lengths for each symbol.  They're
239                    stored in a way similar to     237                    stored in a way similar to mtf; record a starting
240                    value for the first symbol,    238                    value for the first symbol, and an offset from the
241                    previous value for everys s    239                    previous value for everys symbol after that.
242                    (Subtracting 1 before the l    240                    (Subtracting 1 before the loop and then adding it
243                    back at the end is an optim    241                    back at the end is an optimization that makes the
244                    test inside the loop simple    242                    test inside the loop simpler: symbol length 0
245                    becomes negative, so an uns    243                    becomes negative, so an unsigned inequality catches
246                    it.) */                        244                    it.) */
247                 t = get_bits(bd, 5)-1;            245                 t = get_bits(bd, 5)-1;
248                 for (i = 0; i < symCount; i++)    246                 for (i = 0; i < symCount; i++) {
249                         for (;;) {                247                         for (;;) {
250                                 if (((unsigned    248                                 if (((unsigned)t) > (MAX_HUFCODE_BITS-1))
251                                         return    249                                         return RETVAL_DATA_ERROR;
252                                                   250 
253                                 /* If first bi    251                                 /* If first bit is 0, stop.  Else
254                                    second bit     252                                    second bit indicates whether to
255                                    increment o    253                                    increment or decrement the value.
256                                    Optimizatio    254                                    Optimization: grab 2 bits and unget
257                                    the second     255                                    the second if the first was 0. */
258                                                   256 
259                                 k = get_bits(b    257                                 k = get_bits(bd, 2);
260                                 if (k < 2) {      258                                 if (k < 2) {
261                                         bd->in    259                                         bd->inbufBitCount++;
262                                         break;    260                                         break;
263                                 }                 261                                 }
264                                 /* Add one if     262                                 /* Add one if second bit 1, else
265                                  * subtract 1.    263                                  * subtract 1.  Avoids if/else */
266                                 t += (((k+1)&2    264                                 t += (((k+1)&2)-1);
267                         }                         265                         }
268                         /* Correct for the ini    266                         /* Correct for the initial -1, to get the
269                          * final symbol length    267                          * final symbol length */
270                         length[i] = t+1;          268                         length[i] = t+1;
271                 }                                 269                 }
272                 /* Find largest and smallest l    270                 /* Find largest and smallest lengths in this group */
273                 minLen = maxLen = length[0];      271                 minLen = maxLen = length[0];
274                                                   272 
275                 for (i = 1; i < symCount; i++)    273                 for (i = 1; i < symCount; i++) {
276                         if (length[i] > maxLen    274                         if (length[i] > maxLen)
277                                 maxLen = lengt    275                                 maxLen = length[i];
278                         else if (length[i] < m    276                         else if (length[i] < minLen)
279                                 minLen = lengt    277                                 minLen = length[i];
280                 }                                 278                 }
281                                                   279 
282                 /* Calculate permute[], base[]    280                 /* Calculate permute[], base[], and limit[] tables from
283                  * length[].                      281                  * length[].
284                  *                                282                  *
285                  * permute[] is the lookup tab    283                  * permute[] is the lookup table for converting
286                  * Huffman coded symbols into     284                  * Huffman coded symbols into decoded symbols.  base[]
287                  * is the amount to subtract f    285                  * is the amount to subtract from the value of a
288                  * Huffman symbol of a given l    286                  * Huffman symbol of a given length when using
289                  * permute[].                     287                  * permute[].
290                  *                                288                  *
291                  * limit[] indicates the large    289                  * limit[] indicates the largest numerical value a
292                  * symbol with a given number     290                  * symbol with a given number of bits can have.  This
293                  * is how the Huffman codes ca    291                  * is how the Huffman codes can vary in length: each
294                  * code with a value > limit[l    292                  * code with a value > limit[length] needs another
295                  * bit.                           293                  * bit.
296                  */                               294                  */
297                 hufGroup = bd->groups+j;          295                 hufGroup = bd->groups+j;
298                 hufGroup->minLen = minLen;        296                 hufGroup->minLen = minLen;
299                 hufGroup->maxLen = maxLen;        297                 hufGroup->maxLen = maxLen;
300                 /* Note that minLen can't be s    298                 /* Note that minLen can't be smaller than 1, so we
301                    adjust the base and limit a    299                    adjust the base and limit array pointers so we're
302                    not always wasting the firs    300                    not always wasting the first entry.  We do this
303                    again when using them (duri    301                    again when using them (during symbol decoding).*/
304                 base = hufGroup->base-1;          302                 base = hufGroup->base-1;
305                 limit = hufGroup->limit-1;        303                 limit = hufGroup->limit-1;
306                 /* Calculate permute[].  Concu    304                 /* Calculate permute[].  Concurrently, initialize
307                  * temp[] and limit[]. */         305                  * temp[] and limit[]. */
308                 pp = 0;                           306                 pp = 0;
309                 for (i = minLen; i <= maxLen;     307                 for (i = minLen; i <= maxLen; i++) {
310                         temp[i] = limit[i] = 0    308                         temp[i] = limit[i] = 0;
311                         for (t = 0; t < symCou    309                         for (t = 0; t < symCount; t++)
312                                 if (length[t]     310                                 if (length[t] == i)
313                                         hufGro    311                                         hufGroup->permute[pp++] = t;
314                 }                                 312                 }
315                 /* Count symbols coded for at     313                 /* Count symbols coded for at each bit length */
316                 for (i = 0; i < symCount; i++)    314                 for (i = 0; i < symCount; i++)
317                         temp[length[i]]++;        315                         temp[length[i]]++;
318                 /* Calculate limit[] (the larg    316                 /* Calculate limit[] (the largest symbol-coding value
319                  *at each bit length, which is    317                  *at each bit length, which is (previous limit <<
320                  *1)+symbols at this level), a    318                  *1)+symbols at this level), and base[] (number of
321                  *symbols to ignore at each bi    319                  *symbols to ignore at each bit length, which is limit
322                  *minus the cumulative count o    320                  *minus the cumulative count of symbols coded for
323                  *already). */                    321                  *already). */
324                 pp = t = 0;                       322                 pp = t = 0;
325                 for (i = minLen; i < maxLen; i    323                 for (i = minLen; i < maxLen; i++) {
326                         pp += temp[i];            324                         pp += temp[i];
327                         /* We read the largest    325                         /* We read the largest possible symbol size
328                            and then unget bits    326                            and then unget bits after determining how
329                            many we need, and t    327                            many we need, and those extra bits could be
330                            set to anything.  (    328                            set to anything.  (They're noise from
331                            future symbols.)  A    329                            future symbols.)  At each level we're
332                            really only interes    330                            really only interested in the first few
333                            bits, so here we se    331                            bits, so here we set all the trailing
334                            to-be-ignored bits     332                            to-be-ignored bits to 1 so they don't
335                            affect the value >     333                            affect the value > limit[length]
336                            comparison. */         334                            comparison. */
337                         limit[i] = (pp << (max    335                         limit[i] = (pp << (maxLen - i)) - 1;
338                         pp <<= 1;                 336                         pp <<= 1;
339                         base[i+1] = pp-(t += t    337                         base[i+1] = pp-(t += temp[i]);
340                 }                                 338                 }
341                 limit[maxLen+1] = INT_MAX; /*  !! 339                 limit[maxLen+1] = INT_MAX; /* Sentinal value for
342                                             *     340                                             * reading next sym. */
343                 limit[maxLen] = pp+temp[maxLen    341                 limit[maxLen] = pp+temp[maxLen]-1;
344                 base[minLen] = 0;                 342                 base[minLen] = 0;
345         }                                         343         }
346         /* We've finished reading and digestin    344         /* We've finished reading and digesting the block header.  Now
347            read this block's Huffman coded sym    345            read this block's Huffman coded symbols from the file and
348            undo the Huffman coding and run len    346            undo the Huffman coding and run length encoding, saving the
349            result into dbuf[dbufCount++] = uc     347            result into dbuf[dbufCount++] = uc */
350                                                   348 
351         /* Initialize symbol occurrence counte    349         /* Initialize symbol occurrence counters and symbol Move To
352          * Front table */                         350          * Front table */
353         for (i = 0; i < 256; i++) {               351         for (i = 0; i < 256; i++) {
354                 byteCount[i] = 0;                 352                 byteCount[i] = 0;
355                 mtfSymbol[i] = (unsigned char)    353                 mtfSymbol[i] = (unsigned char)i;
356         }                                         354         }
357         /* Loop through compressed symbols. */    355         /* Loop through compressed symbols. */
358         runPos = dbufCount = symCount = select    356         runPos = dbufCount = symCount = selector = 0;
359         for (;;) {                                357         for (;;) {
360                 /* Determine which Huffman cod    358                 /* Determine which Huffman coding group to use. */
361                 if (!(symCount--)) {              359                 if (!(symCount--)) {
362                         symCount = GROUP_SIZE-    360                         symCount = GROUP_SIZE-1;
363                         if (selector >= nSelec    361                         if (selector >= nSelectors)
364                                 return RETVAL_    362                                 return RETVAL_DATA_ERROR;
365                         hufGroup = bd->groups+    363                         hufGroup = bd->groups+selectors[selector++];
366                         base = hufGroup->base-    364                         base = hufGroup->base-1;
367                         limit = hufGroup->limi    365                         limit = hufGroup->limit-1;
368                 }                                 366                 }
369                 /* Read next Huffman-coded sym    367                 /* Read next Huffman-coded symbol. */
370                 /* Note: It is far cheaper to     368                 /* Note: It is far cheaper to read maxLen bits and
371                    back up than it is to read     369                    back up than it is to read minLen bits and then an
372                    additional bit at a time, t    370                    additional bit at a time, testing as we go.
373                    Because there is a trailing    371                    Because there is a trailing last block (with file
374                    CRC), there is no danger of    372                    CRC), there is no danger of the overread causing an
375                    unexpected EOF for a valid     373                    unexpected EOF for a valid compressed file.  As a
376                    further optimization, we do    374                    further optimization, we do the read inline
377                    (falling back to a call to     375                    (falling back to a call to get_bits if the buffer
378                    runs dry).  The following (    376                    runs dry).  The following (up to got_huff_bits:) is
379                    equivalent to j = get_bits(    377                    equivalent to j = get_bits(bd, hufGroup->maxLen);
380                  */                               378                  */
381                 while (bd->inbufBitCount < huf    379                 while (bd->inbufBitCount < hufGroup->maxLen) {
382                         if (bd->inbufPos == bd    380                         if (bd->inbufPos == bd->inbufCount) {
383                                 j = get_bits(b    381                                 j = get_bits(bd, hufGroup->maxLen);
384                                 goto got_huff_    382                                 goto got_huff_bits;
385                         }                         383                         }
386                         bd->inbufBits =           384                         bd->inbufBits =
387                                 (bd->inbufBits    385                                 (bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
388                         bd->inbufBitCount += 8    386                         bd->inbufBitCount += 8;
389                 }                              !! 387                 };
390                 bd->inbufBitCount -= hufGroup-    388                 bd->inbufBitCount -= hufGroup->maxLen;
391                 j = (bd->inbufBits >> bd->inbu    389                 j = (bd->inbufBits >> bd->inbufBitCount)&
392                         ((1 << hufGroup->maxLe    390                         ((1 << hufGroup->maxLen)-1);
393 got_huff_bits:                                    391 got_huff_bits:
394                 /* Figure how many bits are in !! 392                 /* Figure how how many bits are in next symbol and
395                  * unget extras */                393                  * unget extras */
396                 i = hufGroup->minLen;             394                 i = hufGroup->minLen;
397                 while (j > limit[i])              395                 while (j > limit[i])
398                         ++i;                      396                         ++i;
399                 bd->inbufBitCount += (hufGroup    397                 bd->inbufBitCount += (hufGroup->maxLen - i);
400                 /* Huffman decode value to get    398                 /* Huffman decode value to get nextSym (with bounds checking) */
401                 if ((i > hufGroup->maxLen)        399                 if ((i > hufGroup->maxLen)
402                         || (((unsigned)(j = (j    400                         || (((unsigned)(j = (j>>(hufGroup->maxLen-i))-base[i]))
403                                 >= MAX_SYMBOLS    401                                 >= MAX_SYMBOLS))
404                         return RETVAL_DATA_ERR    402                         return RETVAL_DATA_ERROR;
405                 nextSym = hufGroup->permute[j]    403                 nextSym = hufGroup->permute[j];
406                 /* We have now decoded the sym    404                 /* We have now decoded the symbol, which indicates
407                    either a new literal byte,     405                    either a new literal byte, or a repeated run of the
408                    most recent literal byte.      406                    most recent literal byte.  First, check if nextSym
409                    indicates a repeated run, a    407                    indicates a repeated run, and if so loop collecting
410                    how many times to repeat th    408                    how many times to repeat the last literal. */
411                 if (((unsigned)nextSym) <= SYM    409                 if (((unsigned)nextSym) <= SYMBOL_RUNB) { /* RUNA or RUNB */
412                         /* If this is the star    410                         /* If this is the start of a new run, zero out
413                          * counter */             411                          * counter */
414                         if (!runPos) {            412                         if (!runPos) {
415                                 runPos = 1;       413                                 runPos = 1;
416                                 t = 0;            414                                 t = 0;
417                         }                         415                         }
418                         /* Neat trick that sav    416                         /* Neat trick that saves 1 symbol: instead of
419                            or-ing 0 or 1 at ea    417                            or-ing 0 or 1 at each bit position, add 1
420                            or 2 instead.  For     418                            or 2 instead.  For example, 1011 is 1 << 0
421                            + 1 << 1 + 2 << 2.     419                            + 1 << 1 + 2 << 2.  1010 is 2 << 0 + 2 << 1
422                            + 1 << 2.  You can     420                            + 1 << 2.  You can make any bit pattern
423                            that way using 1 le    421                            that way using 1 less symbol than the basic
424                            or 0/1 method (exce    422                            or 0/1 method (except all bits 0, which
425                            would use no symbol    423                            would use no symbols, but a run of length 0
426                            doesn't mean anythi    424                            doesn't mean anything in this context).
427                            Thus space is saved    425                            Thus space is saved. */
428                         t += (runPos << nextSy    426                         t += (runPos << nextSym);
429                         /* +runPos if RUNA; +2    427                         /* +runPos if RUNA; +2*runPos if RUNB */
430                                                   428 
431                         runPos <<= 1;             429                         runPos <<= 1;
432                         continue;                 430                         continue;
433                 }                                 431                 }
434                 /* When we hit the first non-r    432                 /* When we hit the first non-run symbol after a run,
435                    we now know how many times     433                    we now know how many times to repeat the last
436                    literal, so append that man    434                    literal, so append that many copies to our buffer
437                    of decoded symbols (dbuf) n    435                    of decoded symbols (dbuf) now.  (The last literal
438                    used is the one at the head    436                    used is the one at the head of the mtfSymbol
439                    array.) */                     437                    array.) */
440                 if (runPos) {                     438                 if (runPos) {
441                         runPos = 0;               439                         runPos = 0;
442                         if (dbufCount+t >= dbu    440                         if (dbufCount+t >= dbufSize)
443                                 return RETVAL_    441                                 return RETVAL_DATA_ERROR;
444                                                   442 
445                         uc = symToByte[mtfSymb    443                         uc = symToByte[mtfSymbol[0]];
446                         byteCount[uc] += t;       444                         byteCount[uc] += t;
447                         while (t--)               445                         while (t--)
448                                 dbuf[dbufCount    446                                 dbuf[dbufCount++] = uc;
449                 }                                 447                 }
450                 /* Is this the terminating sym    448                 /* Is this the terminating symbol? */
451                 if (nextSym > symTotal)           449                 if (nextSym > symTotal)
452                         break;                    450                         break;
453                 /* At this point, nextSym indi    451                 /* At this point, nextSym indicates a new literal
454                    character.  Subtract one to    452                    character.  Subtract one to get the position in the
455                    MTF array at which this lit    453                    MTF array at which this literal is currently to be
456                    found.  (Note that the resu    454                    found.  (Note that the result can't be -1 or 0,
457                    because 0 and 1 are RUNA an    455                    because 0 and 1 are RUNA and RUNB.  But another
458                    instance of the first symbo    456                    instance of the first symbol in the mtf array,
459                    position 0, would have been    457                    position 0, would have been handled as part of a
460                    run above.  Therefore 1 unu    458                    run above.  Therefore 1 unused mtf position minus 2
461                    non-literal nextSym values     459                    non-literal nextSym values equals -1.) */
462                 if (dbufCount >= dbufSize)        460                 if (dbufCount >= dbufSize)
463                         return RETVAL_DATA_ERR    461                         return RETVAL_DATA_ERROR;
464                 i = nextSym - 1;                  462                 i = nextSym - 1;
465                 uc = mtfSymbol[i];                463                 uc = mtfSymbol[i];
466                 /* Adjust the MTF array.  Sinc    464                 /* Adjust the MTF array.  Since we typically expect to
467                  *move only a small number of     465                  *move only a small number of symbols, and are bound
468                  *by 256 in any case, using me    466                  *by 256 in any case, using memmove here would
469                  *typically be bigger and slow    467                  *typically be bigger and slower due to function call
470                  *overhead and other assorted     468                  *overhead and other assorted setup costs. */
471                 do {                              469                 do {
472                         mtfSymbol[i] = mtfSymb    470                         mtfSymbol[i] = mtfSymbol[i-1];
473                 } while (--i);                    471                 } while (--i);
474                 mtfSymbol[0] = uc;                472                 mtfSymbol[0] = uc;
475                 uc = symToByte[uc];               473                 uc = symToByte[uc];
476                 /* We have our literal byte.      474                 /* We have our literal byte.  Save it into dbuf. */
477                 byteCount[uc]++;                  475                 byteCount[uc]++;
478                 dbuf[dbufCount++] = (unsigned     476                 dbuf[dbufCount++] = (unsigned int)uc;
479         }                                         477         }
480         /* At this point, we've read all the H    478         /* At this point, we've read all the Huffman-coded symbols
481            (and repeated runs) for this block     479            (and repeated runs) for this block from the input stream,
482            and decoded them into the intermedi    480            and decoded them into the intermediate buffer.  There are
483            dbufCount many decoded bytes in dbu    481            dbufCount many decoded bytes in dbuf[].  Now undo the
484            Burrows-Wheeler transform on dbuf.     482            Burrows-Wheeler transform on dbuf.  See
485            http://dogma.net/markn/articles/bwt    483            http://dogma.net/markn/articles/bwt/bwt.htm
486          */                                       484          */
487         /* Turn byteCount into cumulative occu    485         /* Turn byteCount into cumulative occurrence counts of 0 to n-1. */
488         j = 0;                                    486         j = 0;
489         for (i = 0; i < 256; i++) {               487         for (i = 0; i < 256; i++) {
490                 k = j+byteCount[i];               488                 k = j+byteCount[i];
491                 byteCount[i] = j;                 489                 byteCount[i] = j;
492                 j = k;                            490                 j = k;
493         }                                         491         }
494         /* Figure out what order dbuf would be    492         /* Figure out what order dbuf would be in if we sorted it. */
495         for (i = 0; i < dbufCount; i++) {         493         for (i = 0; i < dbufCount; i++) {
496                 uc = (unsigned char)(dbuf[i] &    494                 uc = (unsigned char)(dbuf[i] & 0xff);
497                 dbuf[byteCount[uc]] |= (i << 8    495                 dbuf[byteCount[uc]] |= (i << 8);
498                 byteCount[uc]++;                  496                 byteCount[uc]++;
499         }                                         497         }
500         /* Decode first byte by hand to initia    498         /* Decode first byte by hand to initialize "previous" byte.
501            Note that it doesn't get output, an    499            Note that it doesn't get output, and if the first three
502            characters are identical it doesn't    500            characters are identical it doesn't qualify as a run (hence
503            writeRunCountdown = 5). */             501            writeRunCountdown = 5). */
504         if (dbufCount) {                          502         if (dbufCount) {
505                 if (origPtr >= dbufCount)         503                 if (origPtr >= dbufCount)
506                         return RETVAL_DATA_ERR    504                         return RETVAL_DATA_ERROR;
507                 bd->writePos = dbuf[origPtr];     505                 bd->writePos = dbuf[origPtr];
508                 bd->writeCurrent = (unsigned c    506                 bd->writeCurrent = (unsigned char)(bd->writePos&0xff);
509                 bd->writePos >>= 8;               507                 bd->writePos >>= 8;
510                 bd->writeRunCountdown = 5;        508                 bd->writeRunCountdown = 5;
511         }                                         509         }
512         bd->writeCount = dbufCount;               510         bd->writeCount = dbufCount;
513                                                   511 
514         return RETVAL_OK;                         512         return RETVAL_OK;
515 }                                                 513 }
516                                                   514 
517 /* Undo burrows-wheeler transform on intermedi    515 /* Undo burrows-wheeler transform on intermediate buffer to produce output.
518    If start_bunzip was initialized with out_fd    516    If start_bunzip was initialized with out_fd =-1, then up to len bytes of
519    data are written to outbuf.  Return value i    517    data are written to outbuf.  Return value is number of bytes written or
520    error (all errors are negative numbers).  I    518    error (all errors are negative numbers).  If out_fd!=-1, outbuf and len
521    are ignored, data is written to out_fd and     519    are ignored, data is written to out_fd and return is RETVAL_OK or error.
522 */                                                520 */
523                                                   521 
524 static int INIT read_bunzip(struct bunzip_data    522 static int INIT read_bunzip(struct bunzip_data *bd, char *outbuf, int len)
525 {                                                 523 {
526         const unsigned int *dbuf;                 524         const unsigned int *dbuf;
527         int pos, xcurrent, previous, gotcount;    525         int pos, xcurrent, previous, gotcount;
528                                                   526 
529         /* If last read was short due to end o    527         /* If last read was short due to end of file, return last block now */
530         if (bd->writeCount < 0)                   528         if (bd->writeCount < 0)
531                 return bd->writeCount;            529                 return bd->writeCount;
532                                                   530 
533         gotcount = 0;                             531         gotcount = 0;
534         dbuf = bd->dbuf;                          532         dbuf = bd->dbuf;
535         pos = bd->writePos;                       533         pos = bd->writePos;
536         xcurrent = bd->writeCurrent;              534         xcurrent = bd->writeCurrent;
537                                                   535 
538         /* We will always have pending decoded    536         /* We will always have pending decoded data to write into the output
539            buffer unless this is the very firs    537            buffer unless this is the very first call (in which case we haven't
540            Huffman-decoded a block into the in    538            Huffman-decoded a block into the intermediate buffer yet). */
541                                                   539 
542         if (bd->writeCopies) {                    540         if (bd->writeCopies) {
543                 /* Inside the loop, writeCopie    541                 /* Inside the loop, writeCopies means extra copies (beyond 1) */
544                 --bd->writeCopies;                542                 --bd->writeCopies;
545                 /* Loop outputting bytes */       543                 /* Loop outputting bytes */
546                 for (;;) {                        544                 for (;;) {
547                         /* If the output buffe    545                         /* If the output buffer is full, snapshot
548                          * state and return */    546                          * state and return */
549                         if (gotcount >= len) {    547                         if (gotcount >= len) {
550                                 bd->writePos =    548                                 bd->writePos = pos;
551                                 bd->writeCurre    549                                 bd->writeCurrent = xcurrent;
552                                 bd->writeCopie    550                                 bd->writeCopies++;
553                                 return len;       551                                 return len;
554                         }                         552                         }
555                         /* Write next byte int    553                         /* Write next byte into output buffer, updating CRC */
556                         outbuf[gotcount++] = x    554                         outbuf[gotcount++] = xcurrent;
557                         bd->writeCRC = (((bd->    555                         bd->writeCRC = (((bd->writeCRC) << 8)
558                                 ^bd->crc32Tabl    556                                 ^bd->crc32Table[((bd->writeCRC) >> 24)
559                                 ^xcurrent]);      557                                 ^xcurrent]);
560                         /* Loop now if we're o    558                         /* Loop now if we're outputting multiple
561                          * copies of this byte    559                          * copies of this byte */
562                         if (bd->writeCopies) {    560                         if (bd->writeCopies) {
563                                 --bd->writeCop    561                                 --bd->writeCopies;
564                                 continue;         562                                 continue;
565                         }                         563                         }
566 decode_next_byte:                                 564 decode_next_byte:
567                         if (!bd->writeCount--)    565                         if (!bd->writeCount--)
568                                 break;            566                                 break;
569                         /* Follow sequence vec    567                         /* Follow sequence vector to undo
570                          * Burrows-Wheeler tra    568                          * Burrows-Wheeler transform */
571                         previous = xcurrent;      569                         previous = xcurrent;
572                         pos = dbuf[pos];          570                         pos = dbuf[pos];
573                         xcurrent = pos&0xff;      571                         xcurrent = pos&0xff;
574                         pos >>= 8;                572                         pos >>= 8;
575                         /* After 3 consecutive    573                         /* After 3 consecutive copies of the same
576                            byte, the 4th is a     574                            byte, the 4th is a repeat count.  We count
577                            down from 4 instead    575                            down from 4 instead *of counting up because
578                            testing for non-zer    576                            testing for non-zero is faster */
579                         if (--bd->writeRunCoun    577                         if (--bd->writeRunCountdown) {
580                                 if (xcurrent !    578                                 if (xcurrent != previous)
581                                         bd->wr    579                                         bd->writeRunCountdown = 4;
582                         } else {                  580                         } else {
583                                 /* We have a r    581                                 /* We have a repeated run, this byte
584                                  * indicates t    582                                  * indicates the count */
585                                 bd->writeCopie    583                                 bd->writeCopies = xcurrent;
586                                 xcurrent = pre    584                                 xcurrent = previous;
587                                 bd->writeRunCo    585                                 bd->writeRunCountdown = 5;
588                                 /* Sometimes t    586                                 /* Sometimes there are just 3 bytes
589                                  * (run length    587                                  * (run length 0) */
590                                 if (!bd->write    588                                 if (!bd->writeCopies)
591                                         goto d    589                                         goto decode_next_byte;
592                                 /* Subtract th    590                                 /* Subtract the 1 copy we'd output
593                                  * anyway to g    591                                  * anyway to get extras */
594                                 --bd->writeCop    592                                 --bd->writeCopies;
595                         }                         593                         }
596                 }                                 594                 }
597                 /* Decompression of this block    595                 /* Decompression of this block completed successfully */
598                 bd->writeCRC = ~bd->writeCRC;     596                 bd->writeCRC = ~bd->writeCRC;
599                 bd->totalCRC = ((bd->totalCRC     597                 bd->totalCRC = ((bd->totalCRC << 1) |
600                                 (bd->totalCRC     598                                 (bd->totalCRC >> 31)) ^ bd->writeCRC;
601                 /* If this block had a CRC err    599                 /* If this block had a CRC error, force file level CRC error. */
602                 if (bd->writeCRC != bd->header    600                 if (bd->writeCRC != bd->headerCRC) {
603                         bd->totalCRC = bd->hea    601                         bd->totalCRC = bd->headerCRC+1;
604                         return RETVAL_LAST_BLO    602                         return RETVAL_LAST_BLOCK;
605                 }                                 603                 }
606         }                                         604         }
607                                                   605 
608         /* Refill the intermediate buffer by H    606         /* Refill the intermediate buffer by Huffman-decoding next
609          * block of input */                      607          * block of input */
610         /* (previous is just a convenient unus    608         /* (previous is just a convenient unused temp variable here) */
611         previous = get_next_block(bd);            609         previous = get_next_block(bd);
612         if (previous) {                           610         if (previous) {
613                 bd->writeCount = previous;        611                 bd->writeCount = previous;
614                 return (previous != RETVAL_LAS    612                 return (previous != RETVAL_LAST_BLOCK) ? previous : gotcount;
615         }                                         613         }
616         bd->writeCRC = 0xffffffffUL;              614         bd->writeCRC = 0xffffffffUL;
617         pos = bd->writePos;                       615         pos = bd->writePos;
618         xcurrent = bd->writeCurrent;              616         xcurrent = bd->writeCurrent;
619         goto decode_next_byte;                    617         goto decode_next_byte;
620 }                                                 618 }
621                                                   619 
622 static long INIT nofill(void *buf, unsigned lo    620 static long INIT nofill(void *buf, unsigned long len)
623 {                                                 621 {
624         return -1;                                622         return -1;
625 }                                                 623 }
626                                                   624 
627 /* Allocate the structure, read file header.      625 /* Allocate the structure, read file header.  If in_fd ==-1, inbuf must contain
628    a complete bunzip file (len bytes long).  I    626    a complete bunzip file (len bytes long).  If in_fd!=-1, inbuf and len are
629    ignored, and data is read from file handle     627    ignored, and data is read from file handle into temporary buffer. */
630 static int INIT start_bunzip(struct bunzip_dat    628 static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, long len,
631                              long (*fill)(void    629                              long (*fill)(void*, unsigned long))
632 {                                                 630 {
633         struct bunzip_data *bd;                   631         struct bunzip_data *bd;
634         unsigned int i, j, c;                     632         unsigned int i, j, c;
635         const unsigned int BZh0 =                 633         const unsigned int BZh0 =
636                 (((unsigned int)'B') << 24)+((    634                 (((unsigned int)'B') << 24)+(((unsigned int)'Z') << 16)
637                 +(((unsigned int)'h') << 8)+(u    635                 +(((unsigned int)'h') << 8)+(unsigned int)'';
638                                                   636 
639         /* Figure out how much data to allocat    637         /* Figure out how much data to allocate */
640         i = sizeof(struct bunzip_data);           638         i = sizeof(struct bunzip_data);
641                                                   639 
642         /* Allocate bunzip_data.  Most fields     640         /* Allocate bunzip_data.  Most fields initialize to zero. */
643         bd = *bdp = malloc(i);                    641         bd = *bdp = malloc(i);
644         if (!bd)                                  642         if (!bd)
645                 return RETVAL_OUT_OF_MEMORY;      643                 return RETVAL_OUT_OF_MEMORY;
646         memset(bd, 0, sizeof(struct bunzip_dat    644         memset(bd, 0, sizeof(struct bunzip_data));
647         /* Setup input buffer */                  645         /* Setup input buffer */
648         bd->inbuf = inbuf;                        646         bd->inbuf = inbuf;
649         bd->inbufCount = len;                     647         bd->inbufCount = len;
650         if (fill != NULL)                         648         if (fill != NULL)
651                 bd->fill = fill;                  649                 bd->fill = fill;
652         else                                      650         else
653                 bd->fill = nofill;                651                 bd->fill = nofill;
654                                                   652 
655         /* Init the CRC32 table (big endian) *    653         /* Init the CRC32 table (big endian) */
656         for (i = 0; i < 256; i++) {               654         for (i = 0; i < 256; i++) {
657                 c = i << 24;                      655                 c = i << 24;
658                 for (j = 8; j; j--)               656                 for (j = 8; j; j--)
659                         c = c&0x80000000 ? (c  !! 657                         c = c&0x80000000 ? (c << 1)^0x04c11db7 : (c << 1);
660                 bd->crc32Table[i] = c;            658                 bd->crc32Table[i] = c;
661         }                                         659         }
662                                                   660 
663         /* Ensure that file starts with "BZh['    661         /* Ensure that file starts with "BZh['1'-'9']." */
664         i = get_bits(bd, 32);                     662         i = get_bits(bd, 32);
665         if (((unsigned int)(i-BZh0-1)) >= 9)      663         if (((unsigned int)(i-BZh0-1)) >= 9)
666                 return RETVAL_NOT_BZIP_DATA;      664                 return RETVAL_NOT_BZIP_DATA;
667                                                   665 
668         /* Fourth byte (ascii '1'-'9'), indica    666         /* Fourth byte (ascii '1'-'9'), indicates block size in units of 100k of
669            uncompressed data.  Allocate interm    667            uncompressed data.  Allocate intermediate buffer for block. */
670         bd->dbufSize = 100000*(i-BZh0);           668         bd->dbufSize = 100000*(i-BZh0);
671                                                   669 
672         bd->dbuf = large_malloc(bd->dbufSize *    670         bd->dbuf = large_malloc(bd->dbufSize * sizeof(int));
673         if (!bd->dbuf)                            671         if (!bd->dbuf)
674                 return RETVAL_OUT_OF_MEMORY;      672                 return RETVAL_OUT_OF_MEMORY;
675         return RETVAL_OK;                         673         return RETVAL_OK;
676 }                                                 674 }
677                                                   675 
678 /* Example usage: decompress src_fd to dst_fd.    676 /* Example usage: decompress src_fd to dst_fd.  (Stops at end of bzip2 data,
679    not end of file.) */                           677    not end of file.) */
680 STATIC int INIT bunzip2(unsigned char *buf, lo    678 STATIC int INIT bunzip2(unsigned char *buf, long len,
681                         long (*fill)(void*, un    679                         long (*fill)(void*, unsigned long),
682                         long (*flush)(void*, u    680                         long (*flush)(void*, unsigned long),
683                         unsigned char *outbuf,    681                         unsigned char *outbuf,
684                         long *pos,                682                         long *pos,
685                         void(*error)(char *x))    683                         void(*error)(char *x))
686 {                                                 684 {
687         struct bunzip_data *bd;                   685         struct bunzip_data *bd;
688         int i = -1;                               686         int i = -1;
689         unsigned char *inbuf;                     687         unsigned char *inbuf;
690                                                   688 
691         if (flush)                                689         if (flush)
692                 outbuf = malloc(BZIP2_IOBUF_SI    690                 outbuf = malloc(BZIP2_IOBUF_SIZE);
693                                                   691 
694         if (!outbuf) {                            692         if (!outbuf) {
695                 error("Could not allocate outp    693                 error("Could not allocate output buffer");
696                 return RETVAL_OUT_OF_MEMORY;      694                 return RETVAL_OUT_OF_MEMORY;
697         }                                         695         }
698         if (buf)                                  696         if (buf)
699                 inbuf = buf;                      697                 inbuf = buf;
700         else                                      698         else
701                 inbuf = malloc(BZIP2_IOBUF_SIZ    699                 inbuf = malloc(BZIP2_IOBUF_SIZE);
702         if (!inbuf) {                             700         if (!inbuf) {
703                 error("Could not allocate inpu    701                 error("Could not allocate input buffer");
704                 i = RETVAL_OUT_OF_MEMORY;         702                 i = RETVAL_OUT_OF_MEMORY;
705                 goto exit_0;                      703                 goto exit_0;
706         }                                         704         }
707         i = start_bunzip(&bd, inbuf, len, fill    705         i = start_bunzip(&bd, inbuf, len, fill);
708         if (!i) {                                 706         if (!i) {
709                 for (;;) {                        707                 for (;;) {
710                         i = read_bunzip(bd, ou    708                         i = read_bunzip(bd, outbuf, BZIP2_IOBUF_SIZE);
711                         if (i <= 0)               709                         if (i <= 0)
712                                 break;            710                                 break;
713                         if (!flush)               711                         if (!flush)
714                                 outbuf += i;      712                                 outbuf += i;
715                         else                      713                         else
716                                 if (i != flush    714                                 if (i != flush(outbuf, i)) {
717                                         i = RE    715                                         i = RETVAL_UNEXPECTED_OUTPUT_EOF;
718                                         break;    716                                         break;
719                                 }                 717                                 }
720                 }                                 718                 }
721         }                                         719         }
722         /* Check CRC and release memory */        720         /* Check CRC and release memory */
723         if (i == RETVAL_LAST_BLOCK) {             721         if (i == RETVAL_LAST_BLOCK) {
724                 if (bd->headerCRC != bd->total    722                 if (bd->headerCRC != bd->totalCRC)
725                         error("Data integrity     723                         error("Data integrity error when decompressing.");
726                 else                              724                 else
727                         i = RETVAL_OK;            725                         i = RETVAL_OK;
728         } else if (i == RETVAL_UNEXPECTED_OUTP    726         } else if (i == RETVAL_UNEXPECTED_OUTPUT_EOF) {
729                 error("Compressed file ends un    727                 error("Compressed file ends unexpectedly");
730         }                                         728         }
731         if (!bd)                                  729         if (!bd)
732                 goto exit_1;                      730                 goto exit_1;
733         if (bd->dbuf)                             731         if (bd->dbuf)
734                 large_free(bd->dbuf);             732                 large_free(bd->dbuf);
735         if (pos)                                  733         if (pos)
736                 *pos = bd->inbufPos;              734                 *pos = bd->inbufPos;
737         free(bd);                                 735         free(bd);
738 exit_1:                                           736 exit_1:
739         if (!buf)                                 737         if (!buf)
740                 free(inbuf);                      738                 free(inbuf);
741 exit_0:                                           739 exit_0:
742         if (flush)                                740         if (flush)
743                 free(outbuf);                     741                 free(outbuf);
744         return i;                                 742         return i;
745 }                                                 743 }
746                                                   744 
747 #ifdef PREBOOT                                    745 #ifdef PREBOOT
748 STATIC int INIT __decompress(unsigned char *bu    746 STATIC int INIT __decompress(unsigned char *buf, long len,
749                         long (*fill)(void*, un    747                         long (*fill)(void*, unsigned long),
750                         long (*flush)(void*, u    748                         long (*flush)(void*, unsigned long),
751                         unsigned char *outbuf,    749                         unsigned char *outbuf, long olen,
752                         long *pos,                750                         long *pos,
753                         void (*error)(char *x)    751                         void (*error)(char *x))
754 {                                                 752 {
755         return bunzip2(buf, len - 4, fill, flu    753         return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error);
756 }                                                 754 }
757 #endif                                            755 #endif
758                                                   756
~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.
TOMOYO Linux Cross Reference Linux/lib/decompress_bunzip2.c

Diff markup

Differences between /lib/decompress_bunzip2.c (Version linux-6.12-rc7) and /lib/decompress_bunzip2.c (Version linux-4.15.18)

TOMOYO Linux Cross Reference
Linux/lib/decompress_bunzip2.c