johnm@trsvax.UUCP (03/10/88)
I looked over the docs that I have on GIF and decided that there was no good reason not to post them (like big legal notices saying, "Distribute this and you die" :-) so here they are. Cut out the following and unshar it to get the three files. The first details the actual picture standard and the last two explain the LZW compression that is applied to the pictures. Have fun with the info! ----------------------------------/Cut here/------------------------------------ #!/bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #!/bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # gifstd.txt # howgif.doc # lzwexp.txt # This archive created: Wed Mar 9 13:33:22 1988 PATH=/bin:$PATH; export PATH if test -f 'gifstd.txt' then echo shar: over-writing existing file "'gifstd.txt'" fi cat << \SHAR_EOF > 'gifstd.txt' G I F (tm) Graphics Interchange Format (tm) A standard defining a mechanism for the storage and transmission of raster-based graphics information June 15, 1987 (c) CompuServe Incorporated, 1987 All rights reserved While this document is copyrighted, the information contained within is made available for use in computer software without royalties, or licensing restrictions. GIF and 'Graphics Interchange Format' are trademarks of CompuServe, Incorporated. an H&R Block Company 5000 Arlington Centre Blvd. Columbus, Ohio 43220 (614) 457-8600 Page 2 Graphics Interchange Format (GIF) Specification Table of Contents INTRODUCTION . . . . . . . . . . . . . . . . . page 3 GENERAL FILE FORMAT . . . . . . . . . . . . . page 3 GIF SIGNATURE . . . . . . . . . . . . . . . . page 4 SCREEN DESCRIPTOR . . . . . . . . . . . . . . page 4 GLOBAL COLOR MAP . . . . . . . . . . . . . . . page 5 IMAGE DESCRIPTOR . . . . . . . . . . . . . . . page 6 LOCAL COLOR MAP . . . . . . . . . . . . . . . page 7 RASTER DATA . . . . . . . . . . . . . . . . . page 7 GIF TERMINATOR . . . . . . . . . . . . . . . . page 8 GIF EXTENSION BLOCKS . . . . . . . . . . . . . page 8 APPENDIX A - GLOSSARY . . . . . . . . . . . . page 9 APPENDIX B - INTERACTIVE SEQUENCES . . . . . . page 10 APPENDIX C - IMAGE PACKAGING & COMPRESSION . . page 12 APPENDIX D - MULTIPLE IMAGE PROCESSING . . . . page 15 Graphics Interchange Format (GIF) Page 3 Specification INTRODUCTION 'GIF' (tm) is CompuServe's standard for defining generalized color raster images. This 'Graphics Interchange Format' (tm) allows high-quality, high-resolution graphics to be displayed on a variety of graphics hardware and is intended as an exchange and display mechanism for graphics images. The image format described in this document is designed to support current and future image technology and will in addition serve as a basis for future CompuServe graphics products. The main focus of this document is to provide the technical information necessary for a programmer to implement GIF encoders and decoders. As such, some assumptions are made as to terminology relavent to graphics and programming in general. The first section of this document describes the GIF data format and its components and applies to all GIF decoders, either as standalone programs or as part of a communications package. Appendix B is a section relavent to decoders that are part of a communications software package and describes the protocol requirements for entering and exiting GIF mode, and responding to host interrogations. A glossary in Appendix A defines some of the terminology used in this document. Appendix C gives a detailed explanation of how the graphics image itself is packaged as a series of data bytes. Graphics Interchange Format Data Definition GENERAL FILE FORMAT +-----------------------+ | +-------------------+ | | | GIF Signature | | | +-------------------+ | | +-------------------+ | | | Screen Descriptor | | | +-------------------+ | | +-------------------+ | | | Global Color Map | | | +-------------------+ | . . . . . . | +-------------------+ | ---+ | | Image Descriptor | | | | +-------------------+ | | | +-------------------+ | | | | Local Color Map | | |- Repeated 1 to n times | +-------------------+ | | | +-------------------+ | | | | Raster Data | | | | +-------------------+ | ---+ . . . . . . |- GIF Terminator -| +-----------------------+ Graphics Interchange Format (GIF) Page 4 Specification GIF SIGNATURE The following GIF Signature identifies the data following as a valid GIF image stream. It consists of the following six characters: G I F 8 7 a The last three characters '87a' may be viewed as a version number for this particular GIF definition and will be used in general as a reference in documents regarding GIF that address any version dependencies. SCREEN DESCRIPTOR The Screen Descriptor describes the overall parameters for all GIF images following. It defines the overall dimensions of the image space or logical screen required, the existance of color mapping information, background screen color, and color depth information. This information is stored in a series of 8-bit bytes as described below. bits 7 6 5 4 3 2 1 0 Byte # +---------------+ | | 1 +-Screen Width -+ Raster width in pixels (LSB first) | | 2 +---------------+ | | 3 +-Screen Height-+ Raster height in pixels (LSB first) | | 4 +-+-----+-+-----+ M = 1, Global color map follows Descriptor |M| cr |0|pixel| 5 cr+1 = # bits of color resolution +-+-----+-+-----+ pixel+1 = # bits/pixel in image | background | 6 background=Color index of screen background +---------------+ (color is defined from the Global color |0 0 0 0 0 0 0 0| 7 map or default map if none specified) +---------------+ The logical screen width and height can both be larger than the physical display. How images larger than the physical display are handled is implementation dependent and can take advantage of hardware characteristics (e.g. Macintosh scrolling windows). Otherwise images can be clipped to the edges of the display. The value of 'pixel' also defines the maximum number of colors within an image. The range of values for 'pixel' is 0 to 7 which represents 1 to 8 bits. This translates to a range of 2 (B & W) to 256 colors. Bit 3 of word 5 is reserved for future definition and must be zero. Graphics Interchange Format (GIF) Page 5 Specification GLOBAL COLOR MAP The Global Color Map is optional but recommended for images where accurate color rendition is desired. The existence of this color map is indicated in the 'M' field of byte 5 of the Screen Descriptor. A color map can also be associated with each image in a GIF file as described later. However this global map will normally be used because of hardware restrictions in equipment available today. In the individual Image Descriptors the 'M' flag will normally be zero. If the Global Color Map is present, it's definition immediately follows the Screen Descriptor. The number of color map entries following a Screen Descriptor is equal to 2**(# bits per pixel), where each entry consists of three byte values representing the relative intensities of red, green and blue respectively. The structure of the Color Map block is: bits 7 6 5 4 3 2 1 0 Byte # +---------------+ | red intensity | 1 Red value for color index 0 +---------------+ |green intensity| 2 Green value for color index 0 +---------------+ | blue intensity| 3 Blue value for color index 0 +---------------+ | red intensity | 4 Red value for color index 1 +---------------+ |green intensity| 5 Green value for color index 1 +---------------+ | blue intensity| 6 Blue value for color index 1 +---------------+ : : (Continues for remaining colors) Each image pixel value received will be displayed according to its closest match with an available color of the display based on this color map. The color components represent a fractional intensity value from none (0) to full (255). White would be represented as (255,255,255), black as (0,0,0) and medium yellow as (180,180,0). For display, if the device supports fewer than 8 bits per color component, the higher order bits of each component are used. In the creation of a GIF color map entry with hardware supporting fewer than 8 bits per component, the component values for the hardware should be converted to the 8-bit format with the following calculation: <map_value> = <component_value>*255/(2**<nbits> -1) This assures accurate translation of colors for all displays. In the cases of creating GIF images from hardware without color palette capability, a fixed palette should be created based on the available display colors for that hardware. If no Global Color Map is indicated, a default color map is generated internally which maps each possible incoming color index to the same hardware color index modulo <n> where <n> is the number of available hardware colors. Graphics Interchange Format (GIF) Page 6 Specification IMAGE DESCRIPTOR The Image Descriptor defines the actual placement and extents of the following image within the space defined in the Screen Descriptor. Also defined are flags to indicate the presence of a local color lookup map, and to define the pixel display sequence. Each Image Descriptor is introduced by an image separator character. The role of the Image Separator is simply to provide a synchronization character to introduce an Image Descriptor. This is desirable if a GIF file happens to contain more than one image. This character is defined as 0x2C hex or ',' (comma). When this character is encountered between images, the Image Descriptor will follow immediately. Any characters encountered between the end of a previous image and the image separator character are to be ignored. This allows future GIF enhancements to be present in newer image formats and yet ignored safely by older software decoders. bits 7 6 5 4 3 2 1 0 Byte # +---------------+ |0 0 1 0 1 1 0 0| 1 ',' - Image separator character +---------------+ | | 2 Start of image in pixels from the +- Image Left -+ left side of the screen (LSB first) | | 3 +---------------+ | | 4 +- Image Top -+ Start of image in pixels from the | | 5 top of the screen (LSB first) +---------------+ | | 6 +- Image Width -+ Width of the image in pixels (LSB first) | | 7 +---------------+ | | 8 +- Image Height-+ Height of the image in pixels (LSB first) | | 9 +-+-+-+-+-+-----+ M=0 - Use global color map, ignore 'pixel' |M|I|0|0|0|pixel| 10 M=1 - Local color map follows, use 'pixel' +-+-+-+-+-+-----+ I=0 - Image formatted in Sequential order I=1 - Image formatted in Interlaced order pixel+1 - # bits per pixel for this image The specifications for the image position and size must be confined to the dimensions defined by the Screen Descriptor. On the other hand it is not necessary that the image fill the entire screen defined. LOCAL COLOR MAP Graphics Interchange Format (GIF) Page 7 Specification A Local Color Map is optional and defined here for future use. If the 'M' bit of byte 10 of the Image Descriptor is set, then a color map follows the Image Descriptor that applies only to the following image. At the end of the image, the color map will revert to that defined after the Screen Descriptor. Note that the 'pixel' field of byte 10 of the Image Descriptor is used only if a Local Color Map is indicated. This defines the parameters not only for the image pixel size, but determines the number of color map entries that follow. The bits per pixel value will also revert to the value specified in the Screen Descriptor when processing of the image is complete. RASTER DATA The format of the actual image is defined as the series of pixel color index values that make up the image. The pixels are stored left to right sequentially for an image row. By default each image row is written sequentially, top to bottom. In the case that the Interlace or 'I' bit is set in byte 10 of the Image Descriptor then the row order of the image display follows a four-pass process in which the image is filled in by widely spaced rows. The first pass writes every 8th row, starting with the top row of the image window. The second pass writes every 8th row starting at the fifth row from the top. The third pass writes every 4th row starting at the third row from the top. The fourth pass completes the image, writing every other row, starting at the second row from the top. A graphic description of this process follows: Image Row Pass 1 Pass 2 Pass 3 Pass 4 Result --------------------------------------------------- 0 **1a** **1a** 1 **4a** **4a** 2 **3a** **3a** 3 **4b** **4b** 4 **2a** **2a** 5 **4c** **4c** 6 **3b** **3b** 7 **4d** **4d** 8 **1b** **1b** 9 **4e** **4e** 10 **3c** **3c** 11 **4f** **4f** 12 **2b** **2b** . . . The image pixel values are processed as a series of color indices which map into the existing color map. The resulting color value from the map is what is actually displayed. This series of pixel indices, the number of which is equal to image-width*image-height pixels, are passed to the GIF image data stream one value per pixel, compressed and packaged according to a version of the LZW compression algorithm as defined in Appendix C. Graphics Interchange Format (GIF) Page 8 Specification GIF TERMINATOR In order to provide a synchronization for the termination of a GIF image file, a GIF decoder will process the end of GIF mode when the character 0x3B hex or ';' is found after an image has been processed. By convention the decoding software will pause and wait for an action indicating that the user is ready to continue. This may be a carriage return entered at the keyboard or a mouse click. For interactive applications this user action must be passed on to the host as a carriage return character so that the host application can continue. The decoding software will then typically leave graphics mode and resume any previous process. GIF EXTENSION BLOCKS To provide for orderly extension of the GIF definition, a mechanism for defining the packaging of extensions within a GIF data stream is necessary. Specific GIF extensions are to be defined and documented by CompuServe in order to provide a controlled enhancement path. GIF Extension Blocks are packaged in a manner similar to that used by the raster data though not compressed. The basic structure is: 7 6 5 4 3 2 1 0 Byte # +---------------+ |0 0 1 0 0 0 0 1| 1 '!' - GIF Extension Block Introducer +---------------+ | function code | 2 Extension function code (0 to 255) +---------------+ ---+ | byte count | | +---------------+ | : : +-- Repeated as many times as necessary |func data bytes| | : : | +---------------+ ---+ . . . . . . +---------------+ |0 0 0 0 0 0 0 0| zero byte count (terminates block) +---------------+ A GIF Extension Block may immediately preceed any Image Descriptor or occur before the GIF Terminator. All GIF decoders must be able to recognize the existence of GIF Extension Blocks and read past them if unable to process the function code. This ensures that older decoders will be able to process extended GIF image files in the future, though without the additional functionality. Graphics Interchange Format (GIF) Page 9 Appendix A - Glossary GLOSSARY Pixel - The smallest picture element of a graphics image. This usually corresponds to a single dot on a graphics screen. Image resolution is typically given in units of pixels. For example a fairly standard graphics screen format is one 320 pixels across and 200 pixels high. Each pixel can appear as one of several colors depending on the capabilities of the graphics hardware. Raster - A horizontal row of pixels representing one line of an image. A typical method of working with images since most hardware is oriented to work most efficiently in this manner. LSB - Least Significant Byte. Refers to a convention for two byte numeric values in which the less significant byte of the value preceeds the more significant byte. This convention is typical on many microcomputers. Color Map - The list of definitions of each color used in a GIF image. These desired colors are converted to available colors through a table which is derived by assigning an incoming color index (from the image) to an output color index (of the hardware). While the color map definitons are specified in a GIF image, the output pixel colors will vary based on the hardware used and its ability to match the defined color. Interlace - The method of displaying a GIF image in which multiple passes are made, outputting raster lines spaced apart to provide a way of visualizing the general content of an entire image before all of the data has been processed. B Protocol - A CompuServe-developed error-correcting file transfer protocol available in the public domain and implemented in CompuServe VIDTEX products. This error checking mechanism will be used in transfers of GIF images for interactive applications. LZW - A sophisticated data compression algorithm based on work done by Lempel-Ziv & Welch which has the feature of very efficient one-pass encoding and decoding. This allows the image to be decompressed and displayed at the same time. The original article from which this technique was adapted is: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, vol 17 no 6 (June 1984) This basic algorithm is also used in the public domain ARC file compression utilities. The CompuServe adaptation of LZW for GIF is described in Appendix C. Graphics Interchange Format (GIF) Page 10 Appendix B - Interactive Sequences GIF Sequence Exchanges for an Interactive Environment The following sequences are defined for use in mediating control between a GIF sender and GIF receiver over an interactive communications line. These sequences do not apply to applications that involve downloading of static GIF files and are not considered part of a GIF file. GIF CAPABILITIES ENQUIRY The GCE sequence is issued from a host and requests an interactive GIF decoder to return a response message that defines the graphics parameters for the decoder. This involves returning information about available screen sizes, number of bits/color supported and the amount of color detail supported. The escape sequence for the GCE is defined as: ESC [ > 0 g (g is lower case, spaces inserted for clarity) (0x1B 0x5B 0x3E 0x30 0x67) GIF CAPABILITIES RESPONSE The GIF Capabilities Response message is returned by an interactive GIF decoder and defines the decoder's display capabilities for all graphics modes that are supported by the software. Note that this can also include graphics printers as well as a monitor screen. The general format of this message is: #version;protocol{;dev, width, height, color-bits, color-res}... <CR> '#' - GCR identifier character (Number Sign) version - GIF format version number; initially '87a' protocol='0' - No end-to-end protocol supported by decoder Transfer as direct 8-bit data stream. protocol='1' - Can use an error correction protocol to transfer GIF data interactively from the host directly to the display. dev = '0' - Screen parameter set follows dev = '1' - Printer parameter set follows width - Maximum supported display width in pixels height - Maximum supported display height in pixels color-bits - Number of bits per pixel supported. The number of supported colors is therefore 2**color-bits. color-res - Number of bits per color component supported in the hardware color palette. If color-res is '0' then no hardware palette table is available. Note that all values in the GCR are returned as ASCII decimal numbers and the message is terminated by a Carriage Return character. Graphics Interchange Format (GIF) Page 11 Appendix B - Interactive Sequences The following GCR message describes three standard EGA configurations with no printer; the GIF data stream can be processed within an error correcting protocol: #87a;1 ;0,320,200,4,0 ;0,640,200,2,2 ;0,640,350,4,2<CR> ENTER GIF GRAPHICS MODE Two sequences are currently defined to invoke an interactive GIF decoder into action. The only difference between them is that different output media are selected. These sequences are: ESC [ > 1 g Display GIF image on screen (0x1B 0x5B 0x3E 0x31 0x67) ESC [ > 2 g Display image directly to an attached graphics printer. The image may optionally be displayed on the screen as well. (0x1B 0x5B 0x3E 0x32 0x67) Note that the 'g' character terminating each sequence is in lower case. INTERACTIVE ENVIRONMENT The assumed environment for the transmission of GIF image data from an interactive application is a full 8-bit data stream from host to micro. All 256 character codes must be transferrable. The establishing of an 8-bit data path for communications will normally be taken care of by the host application programs. It is however up to the receiving communications programs supporting GIF to be able to receive and pass on all 256 8-bit codes to the GIF decoder software. Graphics Interchange Format (GIF) Page 12 Appendix C - Image Packaging & Compression The Raster Data stream that represents the actual output image can be represented as: 7 6 5 4 3 2 1 0 +---------------+ | code size | +---------------+ ---+ |blok byte count| | +---------------+ | : : +-- Repeated as many times as necessary | data bytes | | : : | +---------------+ ---+ . . . . . . +---------------+ |0 0 0 0 0 0 0 0| zero byte count (terminates data stream) +---------------+ The conversion of the image from a series of pixel values to a transmitted or stored character stream involves several steps. In brief these steps are: 1. Establish the Code Size - Define the number of bits needed to represent the actual data. 2. Compress the Data - Compress the series of image pixels to a series of compression codes. 3. Build a Series of Bytes - Take the set of compression codes and convert to a string of 8-bit bytes. 4. Package the Bytes - Package sets of bytes into blocks preceeded by character counts and output. ESTABLISH CODE SIZE The first byte of the GIF Raster Data stream is a value indicating the minimum number of bits required to represent the set of actual pixel values. Normally this will be the same as the number of color bits. Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2. This code size value also implies that the compression codes must start out one bit longer. COMPRESSION The LZW algorithm converts a series of data values into a series of codes which may be raw values or a code designating a series of values. Using text characters as an analogy, the output code consists of a character or a code representing a string of characters. Graphics Interchange Format (GIF) Page 13 Appendix C - Image Packaging & Compression The LZW algorithm used in GIF matches algorithmically with the standard LZW algorithm with the following differences: 1. A special Clear code is defined which resets all compression/decompression parameters and tables to a start-up state. The value of this code is 2**<code size>. For example if the code size indicated was 4 (image was 4 bits/pixel) the Clear code value would be 16 (10000 binary). The Clear code can appear at any point in the image data stream and therefore requires the LZW algorithm to process succeeding codes as if a new data stream was starting. Encoders should output a Clear code as the first code of each image data stream. 2. An End of Information code is defined that explicitly indicates the end of the image data stream. LZW processing terminates when this code is encountered. It must be the last code output by the encoder for an image. The value of this code is <Clear code>+1. 3. The first available compression code value is <Clear code>+2. 4. The output codes are of variable length, starting at <code size>+1 bits per code, up to 12 bits per code. This defines a maximum code value of 4095 (hex FFF). Whenever the LZW code value would exceed the current code length, the code length is increased by one. The packing/unpacking of these codes must then be altered to reflect the new code length. BUILD 8-BIT BYTES Because the LZW compression used for GIF creates a series of variable length codes, of between 3 and 12 bits each, these codes must be reformed into a series of 8-bit bytes that will be the characters actually stored or transmitted. This provides additional compression of the image. The codes are formed into a stream of bits as if they were packed right to left and then picked off 8 bits at a time to be output. Assuming a character array of 8 bits per character and using 5 bit codes to be packed, an example layout would be similar to: byte n byte 5 byte 4 byte 3 byte 2 byte 1 +-.....-----+--------+--------+--------+--------+--------+ | and so on |hhhhhggg|ggfffffe|eeeedddd|dcccccbb|bbbaaaaa| +-.....-----+--------+--------+--------+--------+--------+ Note that the physical packing arrangement will change as the number of bits per compression code change but the concept remains the same. PACKAGE THE BYTES Once the bytes have been created, they are grouped into blocks for output by preceeding each block of 0 to 255 bytes with a character count byte. A block with a zero byte count terminates the Raster Data stream for a given image. These blocks are what are actually output for the Graphics Interchange Format (GIF) Page 14 Appendix C - Image Packaging & Compression GIF image. This block format has the side effect of allowing a decoding program the ability to read past the actual image data if necessary by reading block counts and then skipping over the data. Graphics Interchange Format (GIF) Page 15 Appendix D - Multiple Image Processing Since a GIF data stream can contain multiple images, it is necessary to describe processing and display of such a file. Because the image descriptor allows for placement of the image within the logical screen, it is possible to define a sequence of images that may each be a partial screen, but in total fill the entire screen. The guidelines for handling the multiple image situation are: 1. There is no pause between images. Each is processed immediately as seen by the decoder. 2. Each image explicitly overwrites any image already on the screen inside of its window. The only screen clears are at the beginning and end of the GIF image process. See discussion on the GIF terminator. SHAR_EOF if test -f 'howgif.doc' then echo shar: over-writing existing file "'howgif.doc'" fi cat << \SHAR_EOF > 'howgif.doc' LZW compression used to encode/decode a GIF file by Bob Montgomery [73357,3140] ENCODER Consider the following input data stream in a 4 color (A, B, C, D) system. We will build a table of codes which represent strings of colors. Each time we find a new string, we will give it the next code, and break it into a prefix string and a suffix color. The symbols \ or --- represent the prefix string, and / represents the suffix color. The first 4 entries in the table are the 4 colors with codes 0 thru 3, each of which represents a single color. The next 2 codes (4 and 5) are the clear code and the end of image code. The first available code to represent a string of colors is 6. Each time we find a new code, we will send the prefix for that code to the output data stream. A B A B A B A B B B A B A B A A C D A C D A D C A B A A A B A B ..... \/\/---/-----/\/---/-------/\/ \/ \ /\/---/---/\ /\/-----/---/---/ 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Code 6 8 10 8 14 16 8 13 7 Prefix The encoder table is built from input data stream. Always start with the suffix of last code, and keep getting colors until you have a new combination. Color Code Prefix Suffix String Output A 0 - B 1 - C 2 - D 3 - Clear 4 - End 5 - A \ A A First color is a special case. B / \ 6 A B AB A A | / 7 B A BA B B | A / | 8 6 A ABA 6 B | A | B \ / 9 8 B ABAB 8 B / | 10 B B BB B B | A | / 11 10 A BBA 10 B | A | B | A / \ 12 9 A ABABA 9 A \ / 13 A A AA A C / \ 14 A C AC A D \ / 15 C D CD C A / | 16 D A DA D C | D | / 17 14 D ACD 14 A | D / \ 18 16 D DAD 16 C \ / 19 D C DC D A / | 20 C A CA C B | A | A | / 21 8 A ABAA 8 A | B / | 22 13 B AAB 13 A | B / 23 7 B BAB 7 The resultant output stream is: A B 6 8 B 10 9 A A C D 14 16 D C 8 .... The GIF encoder starts with a code length of 2+1=3 bits for 4 colors, so when the code reaches 8 we will have to increase the code size to 4 bits. Similarly, when the code gets to 16 we will have to increse the code size to 5 bits, etc. If the code gets to 13 bits, we send a clear code and start over. See GIFENCOD.GIF for a flow diagram of the encoding process. This uses a tree method to search if a new string is already in the table, which is much simpler, faster, and easier to understand than hashing. =========================================================================== DECODER We will now see if we can regenerate the original data stream and duplicate the table looking only at the output data stream generated by the encoder on the previous page. The output data stream is: A B 6 8 B 10 9 A A C D 14 16 D C 8 .... The docoding process is harder to see, but easier to implement, than the encoding process. The data is taken in pairs, and a new code is assigned to each pair. The prefix is the left side of the pair, and the suffix is the color that the right side of the pair decomposes to from the table. The decomposition is done by outputing the suffix of the code, and using the prefix as the new code. The process repeats until the prefix is a single color, and it is output too. The output of the decomposition is pushed onto a stack, and then poped off the stack to the display, which restores the original order that the colors were seen by the encoder. We will go thru the first few entries in detail, which will hopefully make the process clearer. The first pair is (A B), so the prefix of code 6 is A and the suffix is B, and 6 represents the string AB. The color A is sent to the display. The 2nd pair is (B 6), so the prefix of code 7 is B and the suffix is the the last color in the decomposition of code 6. Code 6 decomposes into BA, so code 7 = BA, and has a suffix A. The color B is sent to the display. The 3rd pair is (6 8) and the next code is 8. How can we decompose 8. We know that the prefix of code 8 is 6, but we don't know the suffix. The answer is that we use the suffix of the prefix code; A in this case since the suffix of 6 is A. Thus, code 8 = ABA and has a suffix A. We decompose 6 to get BA, which becomes AB when we pop it off the stack to the display. The 4th pair is (8 B), so code 9 has a prefix of 8 and a suffix of B, and code 9 = ABAB. We output ABA to the stack, and pop it off to the display as ABA. The 5th pair is (B 10) and the next code is 10. The prefix of code 10 is B and the suffix is B (since the prefix is B). Code 10 = BB, and we output the prefix B to the display. The 6th pair is (10 9) and the next code is 11. Thus the prefix of code 11 is 10 and the suffix is the last color in the decomposition of 9, which is A. Thus code 11 = BBA, And we output BB to the display. So far, we have output the correct colors stream A B AB ABA B BB to the display, and have duplicated the codes 6 thru 11 in the encoder table. This process is repeated for the whole data stream to reconstruct the original color stream and build a table identical to the one built by the encoder. We start the table with codes 0-5 representing the 4 colors, the clear code, and the end code. When we get to code 8, we must increse the code size to 5 bits, etc. See GIFDECOD.GIF for a flow diagram of the decoding process. I Hope this helps take some of the mystery out of LZW compression, which is really quite easy once you 'see' it. Bob Montgomery SHAR_EOF if test -f 'lzwexp.txt' then echo shar: over-writing existing file "'lzwexp.txt'" fi cat << \SHAR_EOF > 'lzwexp.txt' LZW and GIF explained----Steve Blackstock I hope this little document will help enlighten those of you out there who want to know more about the Lempel-Ziv Welch compression algorithm, and, specifically, the implementation that GIF uses. Before we start, here's a little terminology, for the purposes of this document: "character": a fundamental data element. In normal text files, this is just a single byte. In raster images, which is what we're interested in, it's an index that specifies the color of a given pixel. I'll refer to an arbitray character as "K". "charstream": a stream of characters, as in a data file. "string": a number of continuous characters, anywhere from one to very many characters in length. I can specify an arbitrary string as "[...]K". "prefix": almost the same as a string, but with the implication that a prefix immediately precedes a character, and a prefix can have a length of zero. So, a prefix and a character make up a string. I will refer to an arbitrary prefix as "[...]". "root": a single-character string. For most purposes, this is a character, but we may occasionally make a distinction. It is [...]K, where [...] is empty. "code": a number, specified by a known number of bits, which maps to a string. "codestream": the output stream of codes, as in the "raster data" "entry": a code and its string. "string table": a list of entries; usually, but not necessarily, unique. That should be enough of that. LZW is a way of compressing data that takes advantage of repetition of strings in the data. Since raster data usually contains a lot of this repetition, LZW is a good way of compressing and decompressing it. For the moment, lets consider normal LZW encoding and decoding. GIF's variation on the concept is just an extension from there. LZW manipulates three objects in both compression and decompression: the charstream, the codestream, and the string table. In compression, the charstream is the input and the codestream is the output. In decompression, the codestream is the input and the charstream is the output. The string table is a product of both compression and decompression, but is never passed from one to the other. The first thing we do in LZW compression is initialize our string table. To do this, we need to choose a code size (how many bits) and know how many values our characters can possibly take. Let's say our code size is 12 bits, meaning we can store 0->FFF, or 4096 entries in our string table. Lets also say that we have 32 possible different characters. (This corresponds to, say, a picture in which there are 32 different colors possible for each pixel.) To initialize the table, we set code#0 to character#0, code #1 to character#1, and so on, until code#31 to character#31. Actually, we are specifying that each code from 0 to 31 maps to a root. There will be no more entries in the table that have this property. Now we start compressing data. Let's first define something called the "current prefix". It's just a prefix that we'll store things in and compare things to now and then. I will refer to it as "[.c.]". Initially, the current prefix has nothing in it. Let's also define a "current string", which will be the current prefix plus the next character in the charstream. I will refer to the current string as "[.c.]K", where K is some character. OK, look at the first character in the charstream. Call it P. Make [.c.]P the current string. (At this point, of course, it's just the root P.) Now search through the string table to see if [.c.]P appears in it. Of course, it does now, because our string table is initialized to have all roots. So we don't do anything. Now make [.c.]P the current prefix. Look at the next character in the charstream. Call it Q. Add it to the current prefix to form [.c.]Q, the current string. Now search through the string table to see if [.c.]Q appears in it. In this case, of course, it doesn't. Aha! Now we get to do something. Add [.c.]Q (which is PQ in this case) to the string table for code#32, and output the code for [.c.] to the codestream. Now start over again with the current prefix being just the root P. Keep adding characters to [.c.] to form [.c.]K, until you can't find [.c.]K in the string table. Then output the code for [.c.] and add [.c.]K to the string table. In pseudo-code, the algorithm goes something like this: [1] Initialize string table; [2] [.c.] <- empty; [3] K <- next character in charstream; [4] Is [.c.]K in string table? (yes: [.c.] <- [.c.]K; go to [3]; ) (no: add [.c.]K to the string table; output the code for [.c.] to the codestream; [.c.] <- K; go to [3]; ) It's as simple as that! Of course, when you get to step [3] and there aren't any more characters left, you just output the code for [.c.] and throw the table away. You're done. Wanna do an example? Let's pretend we have a four-character alphabet: A,B,C,D. The charstream looks like ABACABA. Let's compress it. First, we initialize our string table to: #0=A, #1=B, #2=C, #3=D. The first character is A, which is in the string table, so [.c.] becomes A. Next we get AB, which is not in the table, so we output code #0 (for [.c.]), and add AB to the string table as code #4. [.c.] becomes B. Next we get [.c.]A = BA, which is not in the string table, so output code #1, and add BA to the string table as code #5. [.c.] becomes A. Next we get AC, which is not in the string table. Output code #0, and add AC to the string table as code #6. Now [.c.] becomes C. Next we get [.c.]A = CA, which is not in the table. Output #2 for C, and add CA to table as code#7. Now [.c.] becomes A. Next we get AB, which IS in the string table, so [.c.] gets AB, and we look at ABA, which is not in the string table, so output the code for AB, which is #4, and add ABA to the string table as code #8. [.c.] becomes A. We can't get any more characters, so we just output #0 for the code for A, and we're done. So, the codestream is #0#1#0#2#4#0. A few words (four) should be said here about efficiency: use a hashing strategy. The search through the string table can be computationally intensive, and some hashing is well worth the effort. Also, note that "straight LZW" compression runs the risk of overflowing the string table - getting to a code which can't be represented in the number of bits you've set aside for codes. There are several ways of dealing with this problem, and GIF implements a very clever one, but we'll get to that. An important thing to notice is that, at any point during the compression, if [...]K is in the string table, [...] is there also. This fact suggests an efficient method for storing strings in the table. Rather than store the entire string of K's in the table, realize that any string can be expressed as a prefix plus a character: [...]K. If we're about to store [...]K in the table, we know that [...] is already there, so we can just store the code for [...] plus the final character K. Ok, that takes care of compression. Decompression is perhaps more difficult conceptually, but it is really easier to program. Here's how it goes: We again have to start with an initialized string table. This table comes from what knowledge we have about the charstream that we will eventually get, like what possible values the characters can take. In GIF files, this information is in the header as the number of possible pixel values. The beauty of LZW, though, is that this is all we need to know. We will build the rest of the string table as we decompress the codestream. The compression is done in such a way that we will never encounter a code in the codestream that we can't translate into a string. We need to define something called a "current code", which I will refer to as "<code>", and an "old-code", which I will refer to as "<old>". To start things off, look at the first code. This is now <code>. This code will be in the intialized string table as the code for a root. Output the root to the charstream. Make this code the old-code <old>. *Now look at the next code, and make it <code>. It is possible that this code will not be in the string table, but let's assume for now that it is. Output the string corresponding to <code> to the codestream. Now find the first character in the string you just translated. Call this K. Add this to the prefix [...] generated by <old> to form a new string [...]K. Add this string [...]K to the string table, and set the old-code <old> to the current code <code>. Repeat from where I typed the asterisk, and you're all set. Read this paragraph again if you just skimmed it!!! Now let's consider the possibility that <code> is not in the string table. Think back to compression, and try to understand what happens when you have a string like P[...]P[...]PQ appear in the charstream. Suppose P[...] is already in the string table, but P[...]P is not. The compressor will parse out P[...], and find that P[...]P is not in the string table. It will output the code for P[...], and add P[...]P to the string table. Then it will get up to P[...]P for the next string, and find that P[...]P is in the table, as the code just added. So it will output the code for P[...]P if it finds that P[...]PQ is not in the table. The decompressor is always "one step behind" the compressor. When the decompressor sees the code for P[...]P, it will not have added that code to it's string table yet because it needed the beginning character of P[...]P to add to the string for the last code, P[...], to form the code for P[...]P. However, when a decompressor finds a code that it doesn't know yet, it will always be the very next one to be added to the string table. So it can guess at what the string for the code should be, and, in fact, it will always be correct. If I am a decompressor, and I see code#124, and yet my string table has entries only up to code#123, I can figure out what code#124 must be, add it to my string table, and output the string. If code#123 generated the string, which I will refer to here as a prefix, [...], then code#124, in this special case, will be [...] plus the first character of [...]. So just add the first character of [...] to the end of itself. Not too bad. As an example (and a very common one) of this special case, let's assume we have a raster image in which the first three pixels have the same color value. That is, my charstream looks like: QQQ.... For the sake of argument, let's say we have 32 colors, and Q is the color#12. The compressor will generate the code sequence 12,32,.... (if you don't know why, take a minute to understand it.) Remember that #32 is not in the initial table, which goes from #0 to #31. The decompressor will see #12 and translate it just fine as color Q. Then it will see #32 and not yet know what that means. But if it thinks about it long enough, it can figure out that QQ should be entry#32 in the table and QQ should be the next string output. So the decompression pseudo-code goes something like: [1] Initialize string table; [2] get first code: <code>; [3] output the string for <code> to the charstream; [4] <old> = <code>; [5] <code> <- next code in codestream; [6] does <code> exist in the string table? (yes: output the string for <code> to the charstream; [...] <- translation for <old>; K <- first character of translation for <code>; add [...]K to the string table; <old> <- <code>; ) (no: [...] <- translation for <old>; K <- first character of [...]; output [...]K to charstream and add it to string table; <old> <- <code> ) [7] go to [5]; Again, when you get to step [5] and there are no more codes, you're finished. Outputting of strings, and finding of initial characters in strings are efficiency problems all to themselves, but I'm not going to suggest ways to do them here. Half the fun of programming is figuring these things out! --- Now for the GIF variations on the theme. In part of the header of a GIF file, there is a field, in the Raster Data stream, called "code size". This is a very misleading name for the field, but we have to live with it. What it is really is the "root size". The actual size, in bits, of the compression codes actually changes during compression/decompression, and I will refer to that size here as the "compression size". The initial table is just the codes for all the roots, as usual, but two special codes are added on top of those. Suppose you have a "code size", which is usually the number of bits per pixel in the image, of N. If the number of bits/pixel is one, then N must be 2: the roots take up slots #0 and #1 in the initial table, and the two special codes will take up slots #4 and #5. In any other case, N is the number of bits per pixel, and the roots take up slots #0 through #(2**N-1), and the special codes are (2**N) and (2**N + 1). The initial compression size will be N+1 bits per code. If you're encoding, you output the codes (N+1) bits at a time to start with, and if you're decoding, you grab (N+1) bits from the codestream at a time. As for the special codes: <CC> or the clear code, is (2**N), and <EOI>, or end-of-information, is (2**N + 1). <CC> tells the compressor to re- initialize the string table, and to reset the compression size to (N+1). <EOI> means there's no more in the codestream. If you're encoding or decoding, you should start adding things to the string table at <CC> + 2. If you're encoding, you should output <CC> as the very first code, and then whenever after that you reach code #4095 (hex FFF), because GIF does not allow compression sizes to be greater than 12 bits. If you're decoding, you should reinitialize your string table when you observe <CC>. The variable compression sizes are really no big deal. If you're encoding, you start with a compression size of (N+1) bits, and, whenever you output the code (2**(compression size)-1), you bump the compression size up one bit. So the next code you output will be one bit longer. Remember that the largest compression size is 12 bits, corresponding to a code of 4095. If you get that far, you must output <CC> as the next code, and start over. If you're decoding, you must increase your compression size AS SOON AS YOU write entry #(2**(compression size) - 1) to the string table. The next code you READ will be one bit longer. Don't make the mistake of waiting until you need to add the code (2**compression size) to the table. You'll have already missed a bit from the last code. The packaging of codes into a bitsream for the raster data is also a potential stumbling block for the novice encoder or decoder. The lowest order bit in the code should coincide with the lowest available bit in the first available byte in the codestream. For example, if you're starting with 5-bit compression codes, and your first three codes are, say, <abcde>, <fghij>, <klmno>, where e, j, and o are bit#0, then your codestream will start off like: byte#0: hijabcde byte#1: .klmnofg So the differences between straight LZW and GIF LZW are: two additional special codes and variable compression sizes. If you understand LZW, and you understand those variations, you understand it all! Just as sort of a P.S., you may have noticed that a compressor has a little bit of flexibility at compression time. I specified a "greedy" approach to the compression, grabbing as many characters as possible before outputting codes. This is, in fact, the standard LZW way of doing things, and it will yield the best compression ratio. But there's no rule saying you can't stop anywhere along the line and just output the code for the current prefix, whether it's already in the table or not, and add that string plus the next character to the string table. There are various reasons for wanting to do this, especially if the strings get extremely long and make hashing difficult. If you need to, do it. Hope this helps out.----steve blackstock SHAR_EOF # End of shell archive exit 0
steve@raspail.UUCP (Steve Schonberger) (03/22/88)
I posted an offer to email people that information, and got twenty or thirty replies. I attempted to mail out copies of that GIF information (after getting formal clearance from its owner, by the way), and got "barfmail" on several of the attempts to reply by mail. Thanks for posting the information for those I couldn't reach. Also, I have not seen the middle article, I don't think. That will come in handy, I think. To everyone I didn't get GIF information to: sorry about that. Steve Schonberger Control Data, Arden Hills, MN ...shamash!raspail!steve "Any similarity between the opinions shamash.UUCP!raspail!steve expressed here and those of my raspail.UUCP!steve employer are purely coincidental."