jaw@ames.UUCP (James A. Woods) (07/26/84)
# "It's fresh squozen daily." -- official slogan of the Ames fast file finder (yet another data compression technique) Herein we report on a simple and fleet implementation of the LZW algorithm now making the rounds on UNIX. For equivalent compression rates (2:1 for unix-wizards, 3:1 and more for numerical data), it runs at least 1.6 times faster for large files than the code offered by Spencer Thomas. This is a C language comparison only; a greater speed differential is expected if both versions were written in machine language). This method may be impractical for output codes of length greater than 13 bits, though, fortunately, the basic algorithm offers only diminishing returns beyond this. Like Mr. Thomas, I adapted the Welch implementation to a different data structure, believing that the hashing method reported in the June IEEE Computer to be "too slow". Essentially, the code below is a direct-access lookup of the alternate string table pictured in the article. There are 2 ** BITS codes which serve as prefixes to be associated with any of 256 ascii characters. For the typical BITS == 12, this looks unwieldy, but is actually workable on a 32-bit processor with the enhancements I describe. A first pass took the form: short table [(1 << BITS) * 256]; /* 2 megabytes for BITS = 12 */ squash ( ) { register int K, i, p, prefix; register long code; bzero ( table, 1 << BITS ); /* clear table, but see below */ for ( i = 0; i < 256; i++ ) table [i] = i; prefix = getchar ( ); while ( (K = getchar ( )) != EOF ) { code = (long) ((K << BITS) + prefix); /* test for code in "string" table */ if ( (p = table [code]) != 0 ) prefix = p; else { /* put code in table */ output ( prefix ); prefix = K; if ( i < (1 << BITS) ) /* ran out of codes */ table [code] = (short) i++; } } output ( prefix ); /* last one */ } The table initialization time (Is bcopy() the fastest way to do this in C?) is disconcerting, some 6 seconds of system time (2 MB) on our 11/750. This is unacceptable for small files. It turns out that, using a variant of the sparse array initialization trick discussed by J. Bentley (CACM, 8/83), we can associate a bit-vector with the large code table, and save time by setting a bit in this map before each read access to the table, and by testing the bit before each write. The macros provided in M. D. McIlroy's now defunct V7 "spell" code, can be used for these bit tests: #define SHIFT 5 #define get(h) (bits[h>>SHIFT]&(1<<((long)h&((1<<SHIFT)-1)))) #define set(h) bits[h>>SHIFT] |= 1<<((long)h&((1<<SHIFT)-1)) How's that for spiff? It compiles into just a few VAX instructions and beats the mod operator hands down. If you give up, look at p. 47 in K & R. Though probabilistic Bloom filtering for spelling error detection fades away, the bit macros live on! Integrating this in the above fragment is omitted here for clarity. Another subtle point: originally the indexing code = (long) ((K << BITS) + prefix); was, keeping in spirit with Welch's alternate string table exposition: code = (long) ((prefix << BITS) + K); But we discovered that (on a virtual memory machine at least), the first form easily halves (!) system and user time (and space) for text files, since array accesses, in the absence of parity marked characters, neatly clusters things in the top half (one meg here) of the array. Just for a lark, I also wired in special-case code to keep prefix / character 040 (space) pairs in their own fast storage, since most net English [sic] (and also numerical tables) are 15% space. With these modifications and using Thomas's exact output() routine but with a doubled fwrite() buffer size, the following obtains: Input: 200 KB unix-wizards Machine: VAX 11/750, BSD 4.2 program user system % compression time time pack 15.2 4.4 34.0 Sys V.1 port compress (16 bit) 45.2 13.2 59.6 compact 221.2 4.9 34.0 compress (12 bit) 35.0 1.4 43.1 assembler output() compress (13 bit) 35.7 1.8 48.3 function squash (12 bit) 18.4 2.9 43.1 " squash (13 bit) 19.1 4.5 48.3 " Comments, largely of a philosophical nature, are continued in part 2 ...