nicholaA@moravian.EDU (04/28/89)
>>I believe sparse files are handled the only way possible on an 8-bit machine- >>you just ask prodos to read them, prodos says, hey' there is alot of 00 data >>here, and gives it to you anyway. >It has nothing to do with bits, Andy. Sparse files are designed to reduce disk >usage of empty blocks. ProDOS is SUPPOSED to return 00's when you read from a >gap in the file. The gaps represent all of those 00's. What I was suggesting >(or maybe just jesting) was that you track down and shrink only those blocks >which actually contain data. What *I* am suggesting is that trying to "track down and shrink on those blocks which contain data" is a bit much for the current shrinkit, let along the ][+ versions (which have almost no space left for improvements. The IIe version has about 3-4k left.) I was just trying to let you know the way shrinkit _currently_ works. >Suppose you have to shrink /USERS.DISK/HELPSCREENS. >Open the parent directory for the file to shrink. > OPEN /USERS.DISK > descriptor = directory header entry >Read the directory until we come to the file descriptor for /HELPSCREENS. > while descriptor is not for HELPSCREENS > READ [next descriptor] >Depending on the file's storage type: find and read blocks of data into the >input stream, shrink them, and write the shrunk data to the output stream. > If a seedling, read the single data> block. > BLOCK_READ [data block] > shrink [data block] > If a sapling, read the index block and use it to read single data > blocks. > BLOCK_READ [index block] > while more data blocks > if non-contiguous data block > output SET_MARK for next file position > BLOCK_READ [next data block] > shrink [data block] > If a tree, read the master index block. Use that to read index blocks, > and use those to read single data blocks. > BLOCK_READ [master index block] > while more index blocks > BLOCK_READ [next index block] > while more data blocks > if non-contiguous data block > output SET_MARK for next file position > BLOCK_READ [next data block] > shrink [data block] >Of course, this method relies on the current ProDOS file system's directory an >file structures. If Apple ever decided to come out with a new filing system, >you would have to update ShrinkIT. As your program stands now, it is basically >file system independent, but sparse files are not handled very elegantly. I think the easiest method to handle what you are suggesting would be to include the entire set of index blocks within the archive. The is also probably the most space-efficient way as well, unless you consider the alternative, file system independent, method of doing as well. Alternatively, even though the ProDOS filing system will probably _never_ (so says apple) change, a "better" way for handling sparse files would be to have a series of threads within a NuFX archive which described the next mark within a program which actually contained data. The overhead for this is 16 bytes per blank area, be it 1 byte or 4 gigabytes. This would allow the archive to remain file-system independent. Thus, your example of the single byte preceded by 16 million bytes of "blankness" would be reduced to the header record (64 bytes), length of the filename (xx bytes), single sparse record (16 bytes), single data record (16 bytes), and 1 byte of non-compressed data. If the filename is another 10 bytes, we have a grand total of 107 bytes to represent a 16 megabyte file in a NuFX archive. >>ShrinkIt run-length encodes the block before it >>does LZW on it, so at most, there is an extra 6 bytes of overhead per 512 >>bytes of "blankness" -- this is probably less for small files which would >>have to have a "map" of the sparse blocks included in the data anyway... >ProDOS currently allows for a file of length 16777215 bytes. While RLE and LZW >are good for the majority of cases, in the extreme case, a file with one byte >at the end, you still end up with a file which is approximately > block 6 bytes > 16777215 bytes * --------- * ------- = 196608 bytes long! > 512 bytes block Um, not really. the 6 bytes overhead I quoted was from the RLE'ing of the 512 bytes. The RLE is then LZW'd to save space, so since the actual savings is more like 27 bytes per 4096 bytes read. This is roughly double the space needed to represent the blank area via a NuFX sparse tread (16 bytes) per 4k represented. This applies to any of the same type of bytes, be they 00's or $20's or whatever... chunk 27 bytes 16777215 bytes * ---------- * --------- = 110592 bytes 4096 bytes chunk And, chances are, that if you have a sparse file _that_ long, the user will also have storage space for a measly 110k more. That is, the MAXimum overhead to handle sparse files under the ProDOS filing system is 110k. Never more. AND, this is an extreme case -- most users will never generate a file with a single non-00 on the end preceded by 16 million 00's, so the actual space savings would be somewhat more substantial. ShrinkIT is made to be a fairly simple archive program. It is _not_ made to be the answer to all of man's problems. Right now, I consider the handling of sparse files is adequate. Perhaps in the future when the HFS FST becomes available for GS/OS, the GS version may have to deal with sparse files only because the file system for HFS can deal with files up to 4 gig in length. But, by then, I hope to have the GS version of ShrinkIT done, and will always have more memory available to handle the full implementation of NuFX as it now stands (since IIe ShrinkIT actually implements a subset of it). andy ---- Andy Nicholas CsNET: shrinkit@moravian.edu Box 435 InterNET: shrinkit%moravian.edu@relay.cs.net Moravian College uucp: rutgers!lafcol!lehi3b15!mc70!shrinkit Bethlehem, PA 18018 GEnie: shrinkit ---- AppleLink PE: shrinkit