jsm@vax1.UUCP (03/25/87)
Needed : information on data compression techniques and algorithms, such as Huffman coding and Lempel-Zev, that are used in programs such as SQ and ARC. I need something that is fast enough to be transparent to the user in an interactive system. I don't know anything about the topic, and would appreciate suggestions of good, comprehensive textbooks containing algorithms. A few weeks ago a version of ARC was posted to either net.sources or mod.sources (source code for UNIX* , I believe). If anyone saved this posting, I would appreciate a copy. Thank you ... please email responses ... Jon Meltzer Dept. of Modern Languages and Linguistics, Cornell University *UNIX is a trademark of you-know-who.
akk2@ur-tut.UUCP (03/27/87)
Two references that you could look at : 1. Welch, Terry A., "A Technique for High Performance Data Compression", IEEE Computer, Vol. 17, June 1984. 2. Huffman, D., "A Method for Constructing Minimum Redundancy Codes", Proceedings of the Institute of Radio Engineers, Vol. 40, May 1952, pp 1098-1101. There are a couple of other good refs. that escape me right now. -- ----------------------- Atul Kacker UUCP: ...seismo!rochester!ur-tut!akk2
eppstein@tom.columbia.edu.UUCP (03/27/87)
Another good reference that is perhaps less well known: Victor S. Miller and Mark N. Wegman, Variations on a theme by Ziv and Lempel, in Combinatorial Algorithms on Words (A. Apostolico and Z. Galil, eds), Springer-Verlag, 1985. References to a number of related works can be found in this paper. -- David Eppstein, eppstein@cs.columbia.edu, Columbia U. Computer Science Dept.
mct@praxis.UUCP (03/30/87)
Reply-To:mct%praxis.uucp@ukc.ac.uk Once apon a time, about five years ago, a UK company called DJAI, (the authors of The Last One - a 4GL program generator) said that they had an algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. They even had a demonstration system which claimed to show the algorithm in operation. They planned to demonstrate the system by compressing the complete source text of The Last One, copying the <80 bytes to a hand-held micro with a printer, then decompressing and producing the total listing from the micro. I expect the algorithm has been suppressed by the UK MoD :-)
rolf@warwick.UUCP (03/31/87)
Another reference: Shannon, C.E. "Prediction and Entropy of Printed English", Bell Sys. Tech. J. vol.30, pp.50-64, Jan 1951. If you get many other replies perhaps you'd like to post a summary? -- Rolf Dept. of Computer Science, Tel: +44 203 523523 ext 2485 Warwick University, JANET: rolf@uk.ac.warwick.flame Coventry, UUCP: {seismo,mcvax}!ukc!warwick!rolf CV4 7AL England. "Three pints? At lunchtime?!"
jsm@vax1.UUCP (04/01/87)
In article <512@ubu.warwick.UUCP> rolf@warwick.UUCP (Rolf Howarth) writes: >Another reference: > > Shannon, C.E. "Prediction and Entropy of Printed English", > Bell Sys. Tech. J. vol.30, pp.50-64, Jan 1951. > >If you get many other replies perhaps you'd like to post a summary? > Stuff is still coming in (thanks, everyone!) ,and I haven't had time to check all the references out. Give me about two weeks.
keeshu@nikhefk.UUCP (04/03/87)
In article <327@vax1.ccs.cornell.edu> jsm@vax1.UUCP (Jon Meltzer) writes: >Stuff is still coming in (thanks, everyone!) ,and I haven't had time to >check all the references out. Give me about two weeks. another reference : White, Ronald G. "Compressing Image Data With Quadtrees" Dr. Dobb's Journal, march 1987 page 16 -> -- Kees | UUCP : keeshu@nikhefk.uucp or : {[wherever]!seismo}!mcvax!nikhefk!keeshu | FIDO : kees huyser at 28/9 or 500/11 | BITNET : u00212@hasara5.bitnet | SNAIL : kees huyser, NIKHEF-K, PO Box 4395, 1009 AJ Amsterdam, Netherlands
dick@zaphod.UUCP (04/03/87)
Summary: Expires: Sender: Followup-To: In article <512@ubu.warwick.UUCP> rolf@warwick.UUCP (Rolf Howarth) writes: >Another reference: >[...] >If you get many other replies perhaps you'd like to post a summary? I would very much appreciate a summary being posted. Thanks very much in advance. . . . -- Dick Flanagan, W6OLD UUCP: ...!ucbvax!decwrl!sun!plx!dick GEnie: C.FLANAGAN (The usual disclaimers apply)
john@viper.UUCP (04/05/87)
In article <383@newton.praxis.co.uk> mct%praxis.uucp@ukc.ac.uk writes: >Once apon a time, about five years ago, a UK company called DJAI, (the >authors of The Last One - a 4GL program generator) said that they had an >algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. >They even had a demonstration system which claimed to show the algorithm in >operation. > >They planned to demonstrate the system by compressing the complete source >text of The Last One, copying the <80 bytes to a hand-held micro with a >printer, then decompressing and producing the total listing from the micro. > >I expect the algorithm has been suppressed by the UK MoD :-) This is a facinating story... Anyone else ever hear about this? Anyone know -anything- (or have any guesses) about the algorithm? --- John Stanley (john@viper.UUCP) Software Consultant - DynaSoft Systems UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john
pitaro@savax.UUCP (04/05/87)
>> Once apon a time, about five years ago, a UK company called DJAI, (the >> authors of The Last One - a 4GL program generator) said that they had an >> algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. >> They even had a demonstration system which claimed to show the algorithm in >> operation. >> >> They planned to demonstrate the system by compressing the complete source >> text of The Last One, copying the <80 bytes to a hand-held micro with a >> printer, then decompressing and producing the total listing from the micro. >> >> I expect the algorithm has been suppressed by the UK MoD :-) > This is a facinating story... Anyone else ever hear about this? > Anyone know -anything- (or have any guesses) about the algorithm? My guess is that it only handles certain text, ie those in it's list. If the text is indexed in the list then it prints it out otherwise it stops with an error. The "arbitrary text" this algorithm compresses is arbitrary as long as it's in the database. Or maybe it's a hoax. What was the date on the original article? 4-1-87 maybe :-) Michael Pitaro USMail: Sanders Associates UUCP: decvax!savax!pitaro MER24-1583C PHONE: +1 (603) 885-9036 CS 2034 HOME: 46-D Hampshire Drive Nashua, NH 03063-2034 Nashua, NH 03063
lambert@mcvax.UUCP (04/06/87)
]>> Once apon a time, about five years ago, a UK company called DJAI, (the ]>> authors of The Last One - a 4GL program generator) said that they had an ]>> algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. ]> This is a facinating story... Anyone else ever hear about this? ] Or maybe it's a hoax. What was the date on the original article? 4-1-87 ] maybe :-) No, Mar-30-87. But think: when was "about five years ago"? -- Lambert Meertens, CWI, Amsterdam; lambert@cwi.nl
lindsay@cheviot.UUCP (04/07/87)
In article <528@savax.UUCP> pitaro@savax.UUCP (Michael Pitaro) writes: >compresses is arbitrary as long as it's in the database. Or maybe >it's a hoax. What was the date on the original article? 4-1-87 maybe :-) As far as I remember it wasnt a hoax but a con and I dont think it was to do with the Last One people either. I recollect that someone was touring the country charging 500 pounds for a seminar on this technique. Several people paid him "just in case". The man I am thinking of subsequently vanished with a large sum of money..... Lindsay -- Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK JANET: lindsay@uk.ac.newcastle.cheviot ARPA: lindsay%cheviot.newcastle@ucl-cs PHONE: +44-91-2329233 UUCP: <UK>!ukc!cheviot!lindsay "How can you be in two places at once when you're not anywhere at all?"
hollombe@ttidca.UUCP (04/10/87)
>> Once apon a time, about five years ago, a UK company called DJAI, (the >> authors of The Last One - a 4GL program generator) said that they had an >> algorithm which would (reversibly) compress arbitrary text into 40-80 bytes. >> They even had a demonstration system which claimed to show the algorithm in >> operation. >> >> They planned to demonstrate the system by compressing the complete source >> text of The Last One, copying the <80 bytes to a hand-held micro with a >> printer, then decompressing and producing the total listing from the micro. > Anyone know -anything- (or have any guesses) about the algorithm? There are algorithms that can do this, though whether they could be implemented on a micro is problematical. The one that comes to mind is converting the arbitrary text into one _very_ large number and then representing that number as a combination of powers of smaller numbers. E.g.: 1234^5678 + 456^123 + 90^20 - 8^2 - 3 Note the above amount probably couldn't be contained in the known universe if you attempted to write it out in full, but the above representation is highly compact. Sorry, I'm not aware of the details of how the text conversion and restoration is done. I think Kurt Goedel (as in _Goedel, Escher, Bach: an Eternal Golden Braid_) first proposed the technique (occasionally referred to as "Goedelizing"). -- The Polymath (aka: Jerry Hollombe, hollombe@TTI.COM) Citicorp(+)TTI 3100 Ocean Park Blvd. (213) 450-9111, x2483 Santa Monica, CA 90405 {csun|philabs|psivax|trwrb}!ttidca!hollombe
pdg@ihdev.UUCP (04/13/87)
In article <4542@columbia.UUCP> metzger@garfield.columbia.edu.UUCP (Perry Metzger) writes: >Look folks, as we all know from being cryptographers (you do actually >know a bit about the subject, don't you?) that the redundancy of >english is NEVER estimated by anyone in his right mind at more than >75%, so thus it is impossible to compress it beyond that in general. Unless you take symbolic representation into account. Say you numbered the (2^14)-1 words, and used that to replace just the absolute matches (forgetting about endings etc), you would already be elimintating more than you claim, before even implementing redundancy techniques. If you read a little more on the subject, you would realize there is more than one way to skin a cat. -- Paul Guthrie ihnp4!ihdev!pdg This Brain left intentionally blank.
metzger@garfield.columbia.edu.UUCP (04/16/87)
In article <1323@ihdev.ATT.COM> pdg@ihdev.UUCP (Joe Isuzu) writes: >Unless you take symbolic representation into account. Say you >numbered the (2^14)-1 words, and used that to replace just the >absolute matches (forgetting about endings etc), you would already be >elimintating more than you claim, before even implementing redundancy >techniques. If you read a little more on the subject, you would >realize there is more than one way to skin a cat. Sorry, you still lose. If you do that, you are not compressing an ARBITRARY text that much. For instance, if I have 256 books in my library, and each one of them contains say a million words (I know that is big) you can then compress any book in the library down to 8 bits, provided that you know that the text you are transmitting is in the library. But then, you see, you are not compressing an ARBITRARY text, only a text you can find in that limited library. If I pick completely random text I can't do that. Enough of this. .pm
rab@well.UUCP (04/22/87)
In a previous article Perry Metzger writes: + Joe Isuzu writes: +>Unless you take symbolic representation into account. Say you +>numbered the (2^14)-1 words, and used that to replace just the +>absolute matches (forgetting about endings etc), you would already be +>elimintating more than you claim, before even implementing redundancy +>techniques. If you read a little more on the subject, you would +>realize there is more than one way to skin a cat. +Sorry, you still lose. If you do that, you are not compressing an +ARBITRARY text that much. For instance, if I have 256 books in my +library, and each one of them contains say a million words (I know +that is big) you can then compress any book in the library down to 8 +bits, provided that you know that the text you are transmitting is in +the library. But then, you see, you are not compressing an ARBITRARY +text, only a text you can find in that limited library. If I pick +completely random text I can't do that. The subject was English text. Last time I checked, English was not completely arbitrary. Illogical, perhaps, but not *completely* arbitrary. The above described technique would work fairly well on most text, although I'm not sure whether it would allow you to break the 75% barrier you previously referred to. Maybe. -- Robert Bickford {hplabs, ucbvax, lll-lcc, ptsfa}!well!rab terrorist cryptography DES drugs cipher secret decode NSA CIA NRO IRS coke crack pot LSD russian missile atom nuclear assassinate libyan RSA
mouse@mcgill-vision.UUCP (der Mouse) (04/26/87)
In article <1323@ihdev.ATT.COM>, pdg@ihdev.ATT.COM (Joe Isuzu) writes: > In article <4542@columbia.UUCP> metzger@garfield.columbia.edu.UUCP (Perry Metzger) writes: >> Look folks, as we all know from being cryptographers [...] that the >> redundancy of english is NEVER estimated [...] at more than 75%, so >> thus it is impossible to compress it beyond that in general. > Unless you take symbolic representation into account. Say you > numbered the (2^14)-1 words, and used that to replace just the > absolute matches (forgetting about endings etc), What you appear to be suggesting is not just compressing English text but representing it at a higher level. If you "compress" beyond the point of discarding all redundancy you are losing information; in other words, there will be distinct input texts which compress to the same compressed text, which means it will be impossible to correctly regenerate both of them from the compressed text. For example, in English there are often two (or more) ways to represent the same concept. We are not interested in merely storing ideas (or if we are, we should be explicit in admitting it), we want to compress text and then uncompress to the exact same sequence of bytes. Alternatively, your posting could be taken to mean that you believe there is more than 75% redundancy in English. In the case of Usenet postings I fear you are correct - maybe your software would improve netnews performance! -- what's that? You don't have this implemented? Then how do you know it works? Go ahead and try it; I suspect you will find it doesn't compress by as much as you think it will. der Mouse (mouse@mcgill-vision.uucp)