Image Packaging & Compression

The Raster Data stream that represents the actual output image can be
represented as:

7 6 5 4 3 2 1 0
┌───────────────┐
│ code size │
├───────────────┤ ───┐
│blok byte count│ │
├───────────────┤ │
│ │ ├── Repeated as many times as necessary
│ data bytes │ │
│ │ │
├───────────────┤ ───┛
| . . . . |
├───────────────┤
│0 0 0 0 0 0 0 0│ zero byte count (terminates data stream)
└───────────────┛

The conversion of the image from a series of pixel values to a
transmitted or stored character stream involves several steps. In brief
these steps are:

1. Establish the Code Size - Define the number of bits needed to
represent the actual data.

2. Compress the Data - Compress the series of image pixels to a series
of compression codes.

3. Build a Series of Bytes - Take the set of compression codes and
convert to a string of 8-bit bytes.

4. Package the Bytes - Package sets of bytes into blocks preceeded by
character counts and output.



ESTABLISH CODE SIZE
The first byte of the GIF Raster Data stream is a value indicating the
minimum number of bits required to represent the set of actual pixel
values. Normally this will be the same as the number of colour bits.
Because of some algorithmic constraints however, black & white images
which have one colour bit must be indicated as having a code size of 2.
This code size value also implies that the compression codes must start
out one bit longer.


COMPRESSION
The LZW algorithm converts a series of data values into a series of
codes which may be raw values or a code designating a series of values.
Using text characters as an analogy, the output code consists of a
character or a code representing a string of characters.

The LZW algorithm used in GIF matches algorithmically with the
standard LZW algorithm with the following differences:

1. A special Clear code is defined which resets all
compression/decompression parameters and tables to a start-up state.
The value of this code is 2**<code size>. For example if the code
size indicated was 4 (image was 4 bits/pixel) the Clear code value
would be 16 (10000 binary). The Clear code can appear at any point
in the image data stream and therefore requires the LZW algorithm to
process succeeding codes as if a new data stream was starting.
Encoders should output a Clear code as the first code of each image
data stream.

2. An End of Information code is defined that explicitly indicates the
end of the image data stream. LZW processing terminates when this
code is encountered. It must be the last code output by the encoder
for an image. The value of this code is <Clear code>+1.

3. The first available compression code value is <Clear code>+2.

4. The output codes are of variable length, starting at <code size>+1
bits per code, up to 12 bits per code. This defines a maximum code
value of 4095 (hex FFF). Whenever the LZW code value would exceed
the current code length, the code length is increased by one. The
packing/unpacking of these codes must then be altered to reflect the
new code length.


BUILD 8-BIT BYTES
Because the LZW compression used for GIF creates a series of
variable length codes, of between 3 and 12 bits each, these codes must
be reformed into a series of 8-bit bytes that will be the characters
actually stored or transmitted. This provides additional compression of
the image. The codes are formed into a stream of bits as if they were
packed right to left and then picked off 8 bits at a time to be output.
Assuming a character array of 8 bits per character and using 5 bit codes
to be packed, an example layout would be similar to:

byte n byte 5 byte 4 byte 3 byte 2 byte 1
┌─-----─────┬────────┬────────┬────────┬────────┬────────┐
│ and so on │hhhhhggg│ggfffffe│eeeedddd│dcccccbb│bbbaaaaa│
└─-----─────┴────────┴────────┴────────┴────────┴────────┛

Note that the physical packing arrangement will change as the
number of bits per compression code change but the concept remains the
same.

PACKAGE THE BYTES
Once the bytes have been created, they are grouped into blocks for
output by preceeding each block of 0 to 255 bytes with a character count
byte. A block with a zero byte count terminates the Raster Data stream
for a given image. These blocks are what are actually output for the

GIF image. This block format has the side effect of allowing a decoding
program the ability to read past the actual image data if necessary by
reading block counts and then skipping over the data.