drtiller@uokmax.ecn.uoknor.edu (Donald Richard Tillery Jr) (02/17/91)
I was playing around with IFF ILBM files and began toying with the idea of improving a bit on the standard ByteRun1 compression technique by including a primitive sort of vertical compression. Before I describe it, I better ask if anyone can send me a list of the registered compression schemes. I know 0x00 = None, and 0x01 = ByteRun1. I've also heard that all the other values up to 9 are used as well. I even heard that 0x02 was some type of Huffman encoding (like LZH). I achieve anywhere from 8 to 20% improvement over ByteRun1 compression with this technique and the decompression is from 20 to 50% slower (extreme cases can get up to 80%+ improvement and the file is NEVER larger than ByteRun1). Anyway, for anyone interested, here's how it works: (It turns out, purely unintentionally, that the decompression code for this new scheme will also correctly decode a properly and efficiently encoded normal ByteRun1 compressed image...) I call the new technique (is anything really "new", I'm just adding to another) ByteRun1Plus and the key byte is 0x2B ('+'). (For those unfamiliar with ByteRun1 I'll go into excruciating detail on this so the rest of you bear with me). If you assume that there are X horizontal pixels, Rows vertical pixels, and the image is Depth bitplanes deep, then BytesPerRow = INT( X / 8 ). There are BytesPerRow x Rows x Depth bytes in the entire image. ILBM encoding starts at the top of the image in bitplane 0 and writes the first BytesPerRow bytes of each plane (encoding if necessary). Then the next row is done in a similar manner until the entire image is written to the storage device. ByteRun1 (and ByteRun1Plus) compression scheme encodes each BytesPerRow bytes in the following manner: If there is a run of _3_ or more bytes in this row of the plane, then this run is encoded as a byte with the value of the number of duplicate bytes subtracted from 257 followed by the byte's value. A limit of 128 run bytes is set to keep the value in the 129-254 range for that byte, thus distinguishing it from the encoding of a length of unique data. This data is encoded by a value representing the number of unique bytes minus 1 (also limited to a string of 128 to limit its value to 0-127) followed by that number of data bytes with those values. This represents a horizontal compression. This leaves out 128 and 255. Normally 255 is used for a run of length 2. A properly efficient encoder will never use this however, since is results in an additional byte being needed to return to the unique data string following the run. It is more efficient in this case to ignore this 2 byte run and encode it as if it were unique data. The 128 was denoted a NOP (no operation) and of course no efficient encoder would throw in useless bytes when trying to compress something. I have thus found two byte values to steel for my own use in a primitive vertical compression technique. The ByteRun1Plus scheme uses the 255 value to denote a repeat of the entire previous BytesPerRow bytes (row) of the same bitplane(!). The 128 value represents a flag indicating that there were sufficient repeats from the previous line to warrant this vertical compression (as determined at compress time by the brute force method of compressing the row in both ways and and comparing their respective lengths). The following BytesPerRow / 8 (rounded UP!) bytes are a bit encoded set of flags which determine which bytes are duplicates of the one BytesPerRow before it. A 1 indicates that the corresponding byte is a duplicate and a 0 indicates that it is not. Whenever a 0 is encountered the next byte of those immediately following the last flag byte is taken as the delta byte with the new value of the byte. For example(values in decimal unless preceeded by $ [hex] or % [binary]): 235 100 165 207 92 34 38 42 235 98 165 207 242 34 223 42 would encode as: 128 %10110101 98 242 223 That's it. Any comments or suggestions are welcome. And anyone who knows how I can register this please let me know. If anyone cares, I'll make the translater and viewing programs available when they are debugged (trans is done, but the viewer will be a while...school you know :-) Rick Tillery (drtiller@uokmax.ecn.uoknor.edu)
ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) (02/18/91)
In article <1991Feb17.015223.25784@uokmax.ecn.uoknor.edu> drtiller@uokmax.ecn.uoknor.edu (Donald Richard Tillery Jr) writes: >I've also heard that all the other values up to 9 are used as well. I >even heard that 0x02 was some type of Huffman encoding (like LZH). No offense, but if you're going ti play about with compression, you'd better study a it bit more. Huffman and LZW are entirely different systems. Huffman maps a fixed-length character set into a variable length character set on the provision that the more commonly used characters in the source are the smaller ones in the destination, and LZW builds a dictionary of recently used character streams which can be referenced to reduce redundancy. BTW, LZW is under a bit of a legal cloud, owing to a patent reportly obtained upon it. I have not heard the outcome of the case. [ Description of the system deleted ] >That's it. Any comments or suggestions are welcome. And anyone who knows >how I can register this please let me know. One of the areas where IFF will get into real trouble, and this is starting to happen now, is if standardised formats end up with an enormous number of variants that require continual modification of the reader program. One such danger is if ILBM starts to have a large number of different compression systems assigned to it. Now I would be the first to agree that the current compression system in IFF is inefficient and poorly thought out, but I would strongly resist any attempt to improve it without a careful consideration of the options available (ie. huffman, lzw, jpeg etc.) to choose a *SMALL* subset of efficient and suitable compression systems. I just don't think that the system described above has enough advantages to warrant its official registration. -- Ian Farquhar Phone : + 61 2 805-9400 Office of Computing Services Fax : + 61 2 805-7433 Macquarie University NSW 2109 Also : + 61 2 805-7420 Australia EMail : ifarqhar@suna.mqcc.mq.oz.au