peraino@gmu90x.UUCP ( ) (09/09/88)
Compression Schemes for the HP-28S So what do you do when your 28S gives you a low memory warning? What happens when you've only got 7k left in the little wonder, and your directories are REALLY full? Well, you could add a hard drive or some extra RAM chips, if you've got a degree in micro-electronics and are a hardware whiz! The rest of us have to just bone up and PURGE some variables. Or do we? What about data compression? This short paper, and the up-coming software, is part of a project by Bob Peraino and Tom Affinito to design and implement two data compression methods on an HP28s. Considering the size and complexity of current compression schemes, is compression on a 28S feasable? With 32k, one could write a very robust compression algorithm implementation. But how useful would it be if it took up half, or even more, of existing memory? The real trick is to develop compression that not only works, but is useable. The two implementations discussed here each use about 3k bytes....which one MIGHT think is a lot to pay. After the initial version of this code was written, a quick test was performed to test the ability of the software to produce a good return for the 3k investment. Various 28S programs, totalling 6,839 bytes, were compressed. The compressed code took up only 2,890 bytes...this is a 3949 byte savings, which pays for the compression code already! How much "effective" memory can one have through compression? In the above example, we see code compressed down to 42% of its original size. If only a 50% compression rate is achieved, the "effective" memory on a 28S is 48k bytes! Tests have shown that compression rates better than 50% are easily achieved. Description What could a compression utility look like? Well, what are the major objects that fill up space? Programs and their related data variables certainly count highest in this category for some (most?) 28S users. Some users might have a reversal though...with more data-intensive storage, like BIG matrices, and relatively few programs. Clearly, both of these types of needs should be met by any candidate compression utility. Presented below are some performance statistics for TWO systems, both of which are used in approximately the same way. First is the ARC/XARC system (listed as ARC below). This system has several parts. PAK/XPAK, MASH/XMASH, and ARC/XARC/ARCL PAK compresses programs through a process of tokenization of keywords and can benefit the user who has a lot of programs. Put a program on the stack and invoke PAK, and the program will be reduced to a compressed string. XPAK will take a compressed string and return the original program to the stack. MASH compresses vectors/matrices, again through a process of tokenization of the limited number of characters that can exist in a vector/matrix. Put a vector/matrix on the stack and invoke MASH, and the data will be reduced to a compressed string. XMASH will take a compressed string and return the original vector/matrix to the stack. ARC manages PAK/XPAK/MASH/XMASH so that the user does not have to worry about keeping track of what is compressed, and what is not. The second system listed below is the VAP-PAV-MAP-PAM (or VAP) system. This system is functionally similar to ARC as described above; internally, VAP does not use tokenization at the word level, but focuses instead on compressing bits together due to limiting its recognizable alphabet. The statistics below are an operational summary of the strengths and weaknesses of each system. Statistics Code size: ARC -- 2933 bytes in 10 variables VAP -- 3244 bytes in 31 variables ___________________________________________________________________________ | seconds |seconds |compressed|original| object description | to |to de- |string |object | | compress|compress|SIZE |size | ___________________________________|_________|________|__________|________| 411 directory utility: | | | | 2280 | ARC | 681 | 99 | 1437 | | VAP | 417 | 359 | 1377 | | ___________________________________|_________|________|__________|________| CHK checksum program: | | | | 132.5 | (command intensive) ARC | 58 | 6 | 50 | | VAP | 31 | 20 | 106 | | ___________________________________|_________|________|__________|________| sample program (name intensive) | | | | 67 | << X1 Y2 + Z1 / SWAP X ARC | 17 | 2 | 24 | | * - PRINTIT >> VAP | 8 | 6 | 26 | | ___________________________________|_________|________|__________|________| sample algebraic | | | | 93 | '(U-X)^2+(V-Y)^2-((U-X)*C+ ARC | 1 | 2 | 38 | | (V-Y)*S)^2': VAP | 7 | 7 | 31 | | ___________________________________|_________|________|__________|________| solo matrix data: Each char | | | | | counts as "length" of number | | | | | Example: "(-5.E3,3.12)" is a | | | | | length 12 number. | | | | | | | | | | 10 X 10 real matrix: | | | | 850 | average number length 3: ARC | 32 | 42 | 257 | | VAP | 89 | 53 | 207 | | | | | | | average number length 7: ARC | 53 | 81 | 457 | | VAP | 155 | 104 | 407 | | | | | | | average number length 11: ARC | 74 | 120 | 657 | | VAP | 222 | 155 | 607 | | ___________________________________|_________|________|__________|________| 10 X 10 complex matrix: | | | | 1620.5 | average number length 5: ARC | 42 | 50 | 307 | | VAP | 129 | 75 | 207 | | | | | | | average number length 9: ARC | 61 | 89 | 457 | | VAP | 199 | 126 | 407 | | | | | | | avarage number length 13: ARC | 83 | 128 | 657 | | VAP | 269 | 178 | 607 | | ___________________________________|_________|________|__________|________| Through the above stats and other tests performed by the authors, the following observations have been made. VAP advantages --------------- Compresses program objects quicker than ARC. Compresses matrix objects consistently slightly better than ARC. Will even compress text strings to a degree, which ARC will not do. ARC advantages --------------- Program objects are usually compressed slightly better than VAP. Compressed programs uncompress blindingly fast. Matrix objects are consistenly compressed and uncompressed faster than VAP. ARC code is about 300 bytes smaller than VAP. Both ARC and VAP will be posted in the coming weeks; these statistics and descriptions are given with the intent of helping net readers discover which, if any, of these systems can help them in their HP 28S use. Readers should also be aware that familiarity with the specific techniques used by ARC and VAP can lead to "data massaging" that can be used to further decrease the storage values or times listed above. In the meantime, questions/ flames/etc can be sent to either: ARC: ---- Bob Peraino George Mason University, Fairfax, VA UUCP :uunet!pyrdc!gmu90x!peraino Internet:peraino@gmuvax.gmu.edu Bitnet :peraino@gmuvax (preferred, but not essential) Postal :4400 University Dr., Thompson Hall, rm. 2, UCIS. Fairfax, VA 22030 VAP: ---- Thomas J. Affinito UC Santa Cruz, Board of Computer & Information Sciences Internet: taff@saturn.ucsc.edu or taff%saturn@ucscc.ucsc.edu Bitnet: taff@ucsccrls (probably) Postal addr: Box 14 Grad Student Housing, 401 Heller Dr., Santa Cruz CA 95064 -- ----------------------------------------------- |UUCP : uunet!pyrdc!gmu90x!peraino | |INTERNET: peraino@gmuvax.gmu.edu | |BITNET : peraino@gmuvax |
williamo@hpcupt1.HP.COM (William O'Shaughnessy) (09/14/88)
Be careful S. E. A. Inc. thinks it owns the word "ARC". Bill O'Shaughnessy