nr@notecnirp.Princeton.EDU (Norman Ramsey) (01/17/89)
Someone suggested to me that it might pay off to store my data files in compressed format, then uncompress them when I get ready to use them. The claim was that uncompression is faster than the associated disk I/O. so here's the $64 question: has anybody substantiated this claim for IBM PC, XT, AT, or PS/2 (remember computation and I/O speeds differ on these machines, so your mileage may vary) In particular, how hard would it be to adapt the zoo source to do the following: f = zopen (file, pathname, "r") open pathname in the zoo archive file for read only zread(f,...) read from the zoo archive zgets(f,...) like fgets only compressed zgetc(f,...) like fgetc only compressed zeof(f,...) you get the idea... If I had this facility, I might actually be able to speed up my programs and have them use less disk space, at the same time (provided the claim is true). So, has anybody built a library like this? My immediate desire is for a fast Boyer-Moore grep on files in a zoo archive. Does anybody have that? Does anybody who has messed with the zoo source (or Rahul of course) have any idea how hard it would be to twiddle zoo to do something like this? Obviously the space overhead would be low, since sez is just a few Kbytes extra... Norman Ramsey nr@princeton.edu
dmurdoch@watdcsu.waterloo.edu (D.J. Murdoch - Statistics) (01/18/89)
In article <14227@princeton.Princeton.EDU> nr@princeton.Princeton.EDU (Norman Ramsey) writes: > >Someone suggested to me that it might pay off to store my data files >in compressed format, then uncompress them when I get ready to use >them. The claim was that uncompression is faster than the associated >disk I/O. so here's the $64 question: has anybody substantiated this >claim for IBM PC, XT, AT, or PS/2 (remember computation and I/O >speeds differ on these machines, so your mileage may vary) I've found this to be true in one very special case: searching a lot of tiny files for text. I've got about 200 files containing 1-2K each, and found that Borland's GREP on all the little files takes about twice as long as Buerg's ARCF on a big ARC file. I'm using a 16Mhz 386 clone with a 28ms disk. Duncan Murdoch
leonard@bucket.UUCP (Leonard Erickson) (01/18/89)
Sounds like what you want is a copy of looz. On of looz's options is "extract and execute". looz xx {zoo name} {filename} will extract the file to *memory* and run it. Why reinvent the wheel? -- Leonard Erickson ...!tektronix!reed!percival!bucket!leonard CIS: [70465,203] "I used to be a hacker. Now I'm a 'microcomputer specialist'. You know... I'd rather be a hacker."
ttp@beta.lanl.gov (T T Phillips) (01/20/89)
In article <14227@princeton.Princeton.EDU>, nr@notecnirp.Princeton.EDU (Norman Ramsey) writes: > > Someone suggested to me that it might pay off to store my data files > in compressed format, then uncompress them when I get ready to use > them. ... has anybody substantiated this > claim for IBM PC, XT, AT, or PS/2 I have found that some times this is true, other times not. In working with Turbo C on a Kaypro 2000 portable with one of the slowest 3.5" floppy drives ever created, I found that it was several times faster to ARChive the .H files, and then extract them to a RAM disk for use than it was to copy the individual files over to the RAM disk. This was a case with a lot of little files, where the time spent accessing the directory between reads really slowed down the file by file copy. On the same machine working with some of the large files like the libraries for Turbo C, the ARC overhead slowed things down. I never tried options like ARCing the files in an uncompressed format. Incidentally, I was using a pretty old and now probably illegal version of pkxarc.
nanook@novavax.UUCP (Keith Dickinson) (01/23/89)
in article <14227@princeton.Princeton.EDU>, nr@notecnirp.Princeton.EDU (Norman Ramsey) says: > Someone suggested to me that it might pay off to store my data files > in compressed format, then uncompress them when I get ready to use > them. The claim was that uncompression is faster than the associated > disk I/O. so here's the $64 question: has anybody substantiated this > claim for IBM PC, XT, AT, or PS/2 (remember computation and I/O > speeds differ on these machines, so your mileage may vary) > > Norman Ramsey > nr@princeton.edu Norman. If you were running your software off of floppy, I'd say it's possible. But even then I'd suggest that you find some way to obtain either the quick compress routines from PKPAK or fron Sea (Arc). A long time ago, someone came out with a program that would compress the spreadsheet files for Lotus. This worked just fine for saving space, but took for ever & a day to save. You inherant problem is that data compression takes up more cpu cycles than writing the file probably ever could. If your writing to a Hard disk. I'd say that there was NO way you could compress faster than you could write. Now with the new, high speed, compression programs. It is possible to do minimal compression (ie packing) quickly enought that the slow down wouldn't be noticible, but it would still be a little slower. Keith _ /| | Fidonet : 369/2 [(305) 421-8593] Brave Mew World South \'o.O' | Internet : nanook@muadib.FIDONET.ORG =(___)= | UUCP : (novavax,hoptoad!ankh,muadib)!nanook U | USNail : 433 SE 13th CT. J-202, Deerfield Beach, Fl. 33441 Ack! | Disclamer: This message was created by a faulty AI program. Don't blame me...I voted for Bill'n'Opus in '88
simon@ms.uky.edu (Simon Gales) (01/23/89)
LOOZ (part of the zoo set of programs) allows you to execute programs that are stored in a ZOO archive. It doesn't help you access archived data files, but if you are using floppies, you could put most of the DOS stuff and your utilities into an archive. -- /--------------------------------------------------------------------------\ Simon Gales@University of Ky UUCP: {rutgers, uunet}!ukma!simon (- Insert Virus Code Here -) Arpa: simon@ms.uky.edu MaBell: (606) 263-2285/257-3597 BitNet: simon@UKMA.BITNET
john@stiatl.UUCP (John DeArmond) (01/23/89)
In article <929@novavax.UUCP> nanook@novavax.UUCP (Keith Dickinson) writes: >in article <14227@princeton.Princeton.EDU>, nr@notecnirp.Princeton.EDU (Norman Ramsey) says: >> Someone suggested to me that it might pay off to store my data files >> in compressed format, then uncompress them when I get ready to use >> them. >> >> Norman Ramsey >> nr@princeton.edu > >Norman. If you were running your software off of floppy, I'd say it's possible. >But even then I'd suggest that you find some way to obtain either the quick >compress routines from PKPAK or fron Sea (Arc). > >You inherant problem is that data compression takes up more cpu cycles than >writing the file probably ever could. If your writing to a Hard disk. I'd say >that there was NO way you could compress faster than you could write. > It really does not matter that data compression takes more machine cycles than writes. What does matter is whether or not a given block of data can be compressed during the interval a program would wait for disk I/O. A concrete example: About a year ago a friend and I wrote a high resolution Mandelbrot map generator. This program calculates maps to 8 bit resolution and thus stores a single pixel per byte. An EGA resolution map occupies about 220k bytes. Aside from the obvious hassles and waste in storing and transmitting such maps, the load time into the display program is significant. I implemented a simple RLL-based compression. Depending on the complexity of the map, the data file is reduced in size from 50% to a factor of 5 or more. I've found that on my Compaq 386, it is MUCH faster to decompress on the fly than to read the raw data. The CPU is so much faster than the I/O system that there is really no contest. It's interesting to note that even with the image file totally in disk cache space, the uncompress program run faster than the program that uses raw pixel maps. The difference is not large but nontheless significant. So Norman, the answer to your question is - IT Depends! The general purpose algorithms such as LZW (ARC, compress, etc) and Huffman do a pretty good job in the general sense but you can do much better if you can exploit some characteristic of your data set. KISS principles apply fully here. Many times a very simple algorithm will achieve a high fraction of more complicated routines but with a vastly smaller implementation and execution time. So, take your compiler in one hand and an editor in the other and EXPERIMENT! John -- John De Armond, WD4OQC | "I can't drive 85!" Sales Technologies, Inc. Atlanta, GA | Sammy Hagar driving ...!gatech!stiatl!john | thru Atlanta!
wcs@skep2.ATT.COM (Bill.Stewart.[ho95c]) (01/24/89)
In article <2863@stiatl.UUCP> john@stiatl.UUCP (John DeArmond) writes: } >> Someone suggested to me that it might pay off to store my data files } >> in compressed format, then uncompress them when I get ready to use } I've found that on my Compaq 386, it is MUCH faster to decompress } on the fly than to read the raw data. The CPU is so much faster than } the I/O system that there is really no contest. ........ } So Norman, the answer to your question is - IT Depends! The That's really the correct answer. With the more sophisticated algorithms, compression is often slow, but decompress is table-driven and very fast - if you are going to generate a data-set that you read a lot but seldom write to, and don't need to access randomly from the disk, storing it compressed can be a big win (assuming you've got enough memory around.) If you're going to do a lot of both, try LZW, but also try simple run-length encoding (RLE) for images. } general purpose algorithms such as LZW (ARC, compress, etc) and } Huffman do a pretty good job in the general sense but you can do } much better if you can exploit some characteristic of your data } set. KISS principles apply fully here. Many times a very } simple algorithm will achieve a high fraction of more complicated } routines but with a vastly smaller implementation and execution } time. So, take your compiler in one hand and an editor in the } other and EXPERIMENT! I've seldom found anything to beat LZW, even when I know a *lot* about the special characteristics of the data. For some data (e.g. images with lots of empty space) a simple RLE can be worthwhile - a small fast 90% solution may be better than a big, slow 98% one, especially on a PC. But document your algorithms *very thoroughly*, and don't waste too much time tweaking. -- # Thanks; # Bill Stewart, AT&T Bell Labs 2G218 Holmdel NJ 201-949-0705 ho95c.att.com!wcs # # News. Don't talk to me about News.