kgdykes@watmath.waterloo.edu (Ken Dykes) (01/09/91)
is there an effective algorithm (what is it? :-) or even lazier a C source for a function (pascal, 286 assembler can also be survived :-) which will reduce in-memory requirements for strings. My application has several-K of strings of ascii subset used for messages and diagnostics. (they vary from 1-2 chars long to about 90chars in length, probably average length of about 20-40chars) I am hoping to reduce it's already large memory requirements at the expense of a little CPU. the bytes may be considered 7bit ascii subset (alphabetics,digits,punctuation, white-space, ie: no ctrl-chars or \0177rubouts) (the ctrl-chars are represented by C printf-like sequences such as \n for newline, \f for formfeed, etc) actually the sequence "\n" occurs in >50% of the strings, so perhaps a special representation like ctrl-J could be used :-) :-) the in memory byte-capacity is actually 9bit not 8bit (honeywell bull dps8 architecture, or if you prefer pretend it's DEC Tops-10 or Tops-20, words of 36 bits (four 9bit bytes)) although i would be interested in a more-protable 8-bit, not 9bit, algorithm too. i can think of some crude shift-&-pack algorithms, but am hoping someone has something nifty & efficient. If i can get a 33% space reduction without obscene amounts of cpu required i will be ecstatic. the simplest shift-2bits-&-pack would give me about 28% with a lot of fairly expensive shift operations. actually i suppose an accurate description of a compress/decompress alogrithm would be prefered come to think of it, it will after all be a teeny part of what is essentially a commercial package and i wouldnt want to be accused of stealing code :-) -ken, not at all versed in data compression -- - Ken Dykes, Software Development Group, UofWaterloo, Canada [43.47N 80.52W] kgdykes@watmath.waterloo.edu [129.97.128.1] watmath!kgdykes postmaster@watbun.waterloo.edu B8 P6/6 s+ f+ m t w e r p
kdq@demott.com (Kevin D. Quitt) (01/10/91)
In article <1991Jan9.053459.13231@watmath.waterloo.edu> kgdykes@watmath.waterloo.edu (Ken Dykes) writes: >is there an effective algorithm (what is it? :-) or even lazier a C source >for a function (pascal, 286 assembler can also be survived :-) >which will reduce in-memory requirements for strings. You can try DEC's RAD50, or a home-brewed equivalent. SPACE is 0, A-Z, then 0-9, followed by three more characters. I've used these latter as an escape to other character sets. This encoding will allow putting 3 characters into 2 bytes. Not the greatest, but it's easy. -- _ Kevin D. Quitt demott!kdq kdq@demott.com DeMott Electronics Co. 14707 Keswick St. Van Nuys, CA 91405-1266 VOICE (818) 988-4975 FAX (818) 997-1190 MODEM (818) 997-4496 PEP last
cur022%cluster@ukc.ac.uk (Bob Eager) (01/10/91)
In article <1991Jan9.053459.13231@watmath.waterloo.edu>, kgdykes@watmath.waterloo.edu (Ken Dykes) writes: > is there an effective algorithm (what is it? :-) or even lazier a C source > for a function (pascal, 286 assembler can also be survived :-) > which will reduce in-memory requirements for strings. I have a small C program that takes a file of strings (and message numbers) as input, and generates two constant arrays and a tiny function for retrieving any given message by number. It stores each English word once, and each message is stored as a set of pointers to the words. Letters are stored as 6 bit characters with a shift character to get the punctuation stuff. If there is enough interest, I'll post it. -------------------------+------------------------------------------------- Bob Eager | University of Kent at Canterbury | +44 227 764000 ext 7589 -------------------------+-------------------------------------------------