[comp.sys.amiga.misc] a compression filesystem

cpc@czaeap.UUCP (Chris Cebelenski) (02/12/91)

Here's an idea I've been tossing around:

	FCFS: Fast Compression File System

	Purpose: Reduce storage needed for seldom accessed files.

	Implementation: Pseudo or real device driver. (Probably Psuedo)

	How it would work:
		Basically, you MOUNT it just like any other device,
	and it works just like any other drive would, EXCEPT that
	it access a pre-allocated disk-file on your hard drive (Pseudo
	device, a real device would be more difficult to implement due
	to different HD controllers.)  This "drive" (called DC0: maybe)
	would look just like a regular Amidos disk to applications, but
	anything stored on it would be compressed (LZW or whatever) rather
	than stored "straight."


	Advantages:  No special "data decoder" or headers required to
	de-compress files when they are loaded, the device takes care
	of it itself. Reduced storage needed for seldom-changed files.
	Pseudo file can be backed up just like any other file.

	Disadvantage: Possible that a pre-allocated chunk of your drive
	needs to be set up. Possible slow write speeds, depending on
	depth of compression. ( A good algorithm should keep the read
	speed at a reasonable rate.)



	Anything else I might have forgotten to take into consideration?
	Anyone want to volunteer to write this beastie??? (Heh heh, why
	is everyone looking at me so strangely??)

--
==========================================================================
    Chris Cebelenski	    UUCP: portal.com!gdc!aminet!czaeap!cpc
    The Red Mage	    Internet: czaeap!cpc@aminet.gdc.portal.com
			    GEnie: C.CEBELENSKI
				 // "Amiga - The way REAL people compute"
 "Better dead than mellow"     \X/
==========================================================================
NOTE: Due to brain dead mailers, this message can *NOT* be REPLIED to, to
reach me you MUST send a NEW message.  Sorry!

jap@convex.cl.msu.edu (Joe Porkka) (02/13/91)

cpc@czaeap.UUCP (Chris Cebelenski) writes:

>Here's an idea I've been tossing around:

>	FCFS: Fast Compression File System

>	Purpose: Reduce storage needed for seldom accessed files.

>	Implementation: Pseudo or real device driver. (Probably Psuedo)

There is a program for Macs that do something like this.
Basically, you specify some or all files to be compressed.
Said files are then compressed.
When an application opens a compressed file, this utility intercepts
the Open call, uncompresses the file, and then proceeds to do a normal
Open. On close in recompresses the file.

Kind of yucky, but it will work on top of any filesystem, with any kind
of file, with any application.

dave@cs.arizona.edu (Dave P. Schaumann) (02/13/91)

In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
|Here's an idea I've been tossing around:
|
|	FCFS: Fast Compression File System
|[...]
|	Anything else I might have forgotten to take into consideration?
|	Anyone want to volunteer to write this beastie??? (Heh heh, why
|	is everyone looking at me so strangely??)


I've seen this idea proposed periodically since I first started reading the
net a year ago.  Nothing ever seems to come of it though.  Maybe it's not
really worth the effort?

|    Chris Cebelenski	    UUCP: portal.com!gdc!aminet!czaeap!cpc
|    The Red Mage	    Internet: czaeap!cpc@aminet.gdc.portal.com
-- 
Dave Schaumann      | DANGER: Access holes may tear easily.  Use of the access
		    | holes for lifting or carrying may result in damage to the
dave@cs.arizona.edu | carton and subsequent injury to the user.

jap@convex.cl.msu.edu (Joe Porkka) (02/13/91)

dave@cs.arizona.edu (Dave P. Schaumann) writes:

>In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
>|Here's an idea I've been tossing around:
>|


>I've seen this idea proposed periodically since I first started reading the
>net a year ago.  Nothing ever seems to come of it though.  Maybe it's not
>really worth the effort?

Well, you got it :-)

Iv'e frequently thought of taking the source to something like ZOO, and
turning it into a filesystem. That would probly be the easiest way.

Unfortunatly, I haven't much need for it, so I don't have much ambition
to try it :-(

I guess it will take somebody who needs it bad enuf to write it themselves.

peter@sugar.hackercorp.com (Peter da Silva) (02/13/91)

In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
> 	Anything else I might have forgotten to take into consideration?

Yeh, the reason Unisys dropped compression on their drives. Programs
get weird when the size of a disk depends on what's on it.

One thing that would be cute, though, would be a read-only FS that
was based on a Zipped/Zooed/LHArced/whatever file...

	1> dir work:archives
		fred.zoo
	1> dir zoo:work=archives/fred.zoo	; assume "=" maps to :
		fred			fred.info
		fred.doc		fred.doc.info
	1> zoo:work=archives/fred.zoo/fred
-- 
Peter da Silva.   `-_-'
<peter@sugar.hackercorp.com>.

ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) (02/13/91)

In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
>Here's an idea I've been tossing around:
>
>	FCFS: Fast Compression File System
>	How it would work:
>		Basically, you MOUNT it just like any other device,
>	and it works just like any other drive would, EXCEPT that
>	it access a pre-allocated disk-file on your hard drive (Pseudo
>	device, a real device would be more difficult to implement due
>	to different HD controllers.)  This "drive" (called DC0: maybe)
>	would look just like a regular Amidos disk to applications, but
>	anything stored on it would be compressed (LZW or whatever) rather
>	than stored "straight."

I do not think you are differentiating between a handler and a device
here.  The device driver deals with the hardware, in this case accessing
the disk physically.  The handler adds the file system over the top.  As
such, you really only want a handler, you can use the current hard disk
drivers, or the trackdisk.device for floppies.

>	Advantages:  No special "data decoder" or headers required to
>	de-compress files when they are loaded, the device takes care
>	of it itself. Reduced storage needed for seldom-changed files.
>	Pseudo file can be backed up just like any other file.

No headers?  Depends on what compression system you use.  A standard
huffman encoder becomes quite a reasonable encryption system if you
forgot the decoding tree.  You also probably need an index (see later.)

No data decoder?  I presume that you mean hardware, because you are sure
going to need something (hardware or software) to turn your file back into 
raw data.

>	Disadvantage: Possible that a pre-allocated chunk of your drive
>	needs to be set up. Possible slow write speeds, depending on
>	depth of compression. ( A good algorithm should keep the read
>	speed at a reasonable rate.)

Definate slow write speed, and read speed.  Additionally, random seeks
become very difficult, and you will either have to decompress the file
from the start and roll forward to the point, or have some sort of
index.  The later would probably be a good idea, as many compression
systems read bit streams divided up into variable length parts, and if
you lose part of the file, it is often very difficult to reconstruct
past that, and definately impossible if the system is adaptive
(arguments welcome on this last point.)

You should also consider that many compression systems need large
amounts of memory, for storing tables and so forth.  Usually, the more
memory they are given, the better their compression gets, and the slower
they get at compressing.  The run time-requirements of such a
compressing handler may be very significant.  Adaptive compression
systems are the worst, and it is an adaptive system that would probably
need to be used as multi-pass compressors would be somewhat inconvenient
in this application.

>	Anything else I might have forgotten to take into consideration?
>	Anyone want to volunteer to write this beastie??? (Heh heh, why
>	is everyone looking at me so strangely??)

Excuse the sound of hysterical laughter... :-)

You will notice that so many PC hardware products that function as has
been suggested have failed dismally, usually taking the company with
them.  One of the commonest causes, I have read, was the number of
damage suits leveled against the manufacturers for lost data.

--
Ian Farquhar                      Phone : + 61 2 805-9400
Office of Computing Services      Fax   : + 61 2 805-7433
Macquarie University  NSW  2109   Also  : + 61 2 805-7420
Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au

GUTEST8@cc1.kuleuven.ac.be (Ives Aerts) (02/14/91)

Just be patient... SOMEBODY's working on it now 8^)
------------------------------------------------------------------------
      Ives Aerts           |          IBM definition SY-34378
GUTEST8@BLEKUL11.BITNET    |   A signature consists of sequences of
gutest8@cc1.kuleuven.ac.be | non-blank characters separated by blanks.
------------------------------------------------------------------------

drool@bisco.kodak.COM (Drool Rockworm) (02/14/91)

> 
> Here's an idea I've been tossing around:
> 
> 	FCFS: Fast Compression File System
> 
> 	Purpose: Reduce storage needed for seldom accessed files.
> 
> 	Implementation: Pseudo or real device driver. (Probably Psuedo)

I had thought that this should have been available already for quite some
time.  With speedier CPU's and quicker disks, why not take advantage of these
high speed compression utilities under the hood.  I second the motion for
volunteers..


NOTE: Please IGNORE the return address on this message if it says
nobody@kodak.com.  Use one of the return addresses specified below.

										<: Drool :>

UUCP:	kodak!bisco!retreat!drool
ARPA:	retreat!drool@bisco.kodak.com

andy@cbmvax.commodore.com (Andy Finkel) (02/14/91)

In article <1991Feb13.052608.23920@msuinfo.cl.msu.edu> jap@convex.cl.msu.edu (Joe Porkka) writes:
>dave@cs.arizona.edu (Dave P. Schaumann) writes:
>
>>In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
>>|Here's an idea I've been tossing around:
>>|
>
>
>>I've seen this idea proposed periodically since I first started reading the
>>net a year ago.  Nothing ever seems to come of it though.  Maybe it's not
>>really worth the effort?
>
>Well, you got it :-)
>
>Iv'e frequently thought of taking the source to something like ZOO, and
>turning it into a filesystem. That would probly be the easiest way.
>
>Unfortunatly, I haven't much need for it, so I don't have much ambition
>to try it :-(
>
>I guess it will take somebody who needs it bad enuf to write it themselves.


I think the sequence goes something like this:

2 drive floppy programmer/developer has hard time fitting a development
environment on disks...sees a solution in automatic compression/decompression,
notes the Amiga supports alternate filesystems, and *poof* another
idea for COMPRESS: or ZOO: or ARC:  is born.

The intrepid programmer bravely begins development of the handler...soon
runs into the usual problems of floppy based development.  Looks into
the budget, decides to spring for a small hard disk...it will make
development work so much faster...

Then suddenly, the intense need for COMPRESS: seems to decrease, and
there are many more interesting projects out there....

		andy

-- 
andy finkel		{uunet|rutgers|amiga}!cbmvax!andy
Commodore-Amiga, Inc.

"God was able to create the world in only seven days because there
 was no installed base to consider."

Any expressed opinions are mine; but feel free to share.
I disclaim all responsibilities, all shapes, all sizes, all colors.

David.Plummer@f70.n140.z1.FIDONET.ORG (David Plummer) (02/15/91)

You know, for a very simplistic compression system, it should be able to 
keep up with hard drive speed, shouldn't it?  We're not talking 
Lempel-Zev at 400K/sec, but even repeated-pattern checking.  I know
that a data compression loader we wrote for the 64 years back really
sped up transfer, but of course the Amiga is nothing lioke a 1541 (hate 
to even mention them both in the same message). 
 
There is a HD card (albeit this is hardware) in the AT world that does 
something similar, and reportedly almost doubles your HD capacity in a 
quite transparent manner.
 
- Dave



--  
David Plummer - via FidoNet node 1:140/22
UUCP: ...!herald!weyr!70!David.Plummer
Domain: David.Plummer@f70.n140.z1.FIDONET.ORG
Standard Disclaimers Apply...

dac@prolix.pub.uu.oz.au (Andrew Clayton) (02/17/91)

In article <71.27BCCBB8@weyr.FIDONET.ORG>, David Plummer writes:

> You know, for a very simplistic compression system, it should be able to 
> keep up with hard drive speed, shouldn't it?  We're not talking 
> Lempel-Zev at 400K/sec, but even repeated-pattern checking.  I know
> that a data compression loader we wrote for the 64 years back really
> sped up transfer, but of course the Amiga is nothing lioke a 1541 (hate 
> to even mention them both in the same message). 
>  
> There is a HD card (albeit this is hardware) in the AT world that does 
> something similar, and reportedly almost doubles your HD capacity in a 
> quite transparent manner.

And is universally touted as the worst piece of junk people ever used.

The 'KISS' approach is best. Keep It Simple, Stupid. If people want to put
more data on their devices, they should BUY larger storage capability
devices. As simple as that.

Dac
--

ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) (02/18/91)

In article <71.27BCCBB8@weyr.FIDONET.ORG> David.Plummer@f70.n140.z1.FIDONET.ORG (David Plummer) writes:
>You know, for a very simplistic compression system, it should be able to 
>keep up with hard drive speed, shouldn't it?  

Only if you've got a very slow hard disk.

>We're not talking 
>Lempel-Zev at 400K/sec, but even repeated-pattern checking.  

LZ is really repeated pattern checking: the reduction of redundancy.

>There is a HD card (albeit this is hardware) in the AT world that does 
>something similar, and reportedly almost doubles your HD capacity in a 
>quite transparent manner.

Yes, there was.  The company sank without a trace under a mass of
lawsuits resulting from lost data on its product.  As for the system,
it was apparently just run-length encoding anyway, and in real use only
got about 20% packing, which was not worth the money or the hassle.

--
Ian Farquhar                      Phone : + 61 2 805-9400
Office of Computing Services      Fax   : + 61 2 805-7433
Macquarie University  NSW  2109   Also  : + 61 2 805-7420
Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au

doug@eris.berkeley.edu (Doug Merritt) (02/18/91)

In article <18b2ca89.ARN2bfd@prolix.pub.uu.oz.au> dac@prolix.pub.uu.oz.au writes:
>The 'KISS' approach is best. Keep It Simple, Stupid. If people want to put
>more data on their devices, they should BUY larger storage capability
>devices. As simple as that.

Not quite. One approach still makes sense (this is something I was
thinking of doing for several years, but it doesn't look like I'll
ever get to it): have a "compression file system" which works on
top of the regular one, so that individual files can be read/written
as compressed files transparently. Access would be something
like "type pak:dh0/directory/file.Z", where "pak:" is the device
that knows about compression, and "file.Z" is a previously compressed
file.

This lets all compressed files be directly visible to all programs,
something you can't really get any other way.

The same approach makes similar sense (maybe even more so) for
archives (zoo, lharc, arc, etc). Including cd'ing into archives.
	Doug
--
Doug Merritt		doug@eris.berkeley.edu (ucbvax!eris!doug)
		or	uunet.uu.net!crossck!dougm

dac@prolix.pub.uu.oz.au (Andrew Clayton) (02/19/91)

In article <1991Feb18.005236.8760@agate.berkeley.edu>, Doug Merritt writes:

> In article <18b2ca89.ARN2bfd@prolix.pub.uu.oz.au> dac@prolix.pub.uu.oz.au writes:
> >The 'KISS' approach is best. Keep It Simple, Stupid. If people want to put
> >more data on their devices, they should BUY larger storage capability
> >devices. As simple as that.
> 
> Not quite. One approach still makes sense (this is something I was
> thinking of doing for several years, but it doesn't look like I'll
> ever get to it): have a "compression file system" which works on
> top of the regular one, so that individual files can be read/written
> as compressed files transparently. Access would be something
> like "type pak:dh0/directory/file.Z", where "pak:" is the device
> that knows about compression, and "file.Z" is a previously compressed
> file.


I dunno Doug... I get _awfully_ peeved when it takes longer to unscramble a
file than to load it from HD in the first place. Especially when I cannot
store it on the drive in unpacked format (i.e. I got it packed from one of the
european groups). In some cases, it is only the packing/unpacking routine that
stops a particular 'program' working on an accellerated platform.

Compression is fine for text, for stuff that you don't really mind having
spinning on your disk, but would hate to have wastings lots of unnecessary
space, for files that you want to archive off, BUT, for executables I am a
firm believer in -speed, speed, speed-. No timewasting compression for this
little black duck!

Dac
--
 _l _  _   // Andrew Clayton. Canberra, Australia.         I Post  .
(_](_l(_ \X/  Send mail to dac@prolix.pub.uu.oz.au                . .  I am.

dave@unislc.uucp (Dave Martin) (02/19/91)

From article <1213@macuni.mqcc.mq.oz>, by ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar):
> In article <cpc.3184@czaeap.UUCP> cpc@czaeap.UUCP (Chris Cebelenski) writes:
>>	Advantages:  No special "data decoder" or headers required to
>>	de-compress files when they are loaded, the device takes care
                                                    ^^^^^^^^^^^^^^^^^
>>	of it itself. Reduced storage needed for seldom-changed files.
        ^^^^^^^^^^^^
>>	Pseudo file can be backed up just like any other file.
> 
> No headers?  Depends on what compression system you use.  A standard
> huffman encoder becomes quite a reasonable encryption system if you
> forgot the decoding tree.  You also probably need an index (see later.)
> 
> No data decoder?  I presume that you mean hardware, because you are sure
> going to need something (hardware or software) to turn your file back into 
> raw data.

Of course there are headers and "data decoders" but they are handled in the
handler.  The application programs would never see them, need to deal with
them, or even know they exist.  The idea is that applications/utilities
would have no idea that the files are compressed, or care.

Personally, I would like to see a bit in the protection mask that tells the
handler that this file is compressed.  Changing the bit with the protect
command (without a NOACTION keyword) would cause the file to be converted
to compressed/uncompressed state.  If there were insufficient disk space
then an error would be given and the action would not occur.  If protect
were used with some sort of NOACTION type of keyword, then the bit state
will be changed without anything being done otherwise; this would assume
that the user knew what s/he were doing.  You could even disable random
access on (C)ompressed files, or (preferred) document that there will be
a performance penalty for seeks.  Files that get seeked in a lot could
just have their C bit cleared first by the user.

Well, just some wishful thinking on my part, perhaps I'll experiment
with this when I have some time.

-- 
VAX Headroom	Speaking for myself only... blah blah blahblah blah...
Internet: DMARTIN@CC.WEBER.EDU                 dave@saltlcy-unisys.army.mil
uucp:     dave@unislc.uucp or use the Path: line.
Now was that civilized?  No, clearly not.  Fun, but in no sense civilized.

bombadil@diku.dk (Kristian Nielsen) (02/19/91)

 Can anybody provide some pointers to source for effective compressors such
as lharc (ftp, fish disk etc.), or to litterature describing algorithms?
This would help people who(like me) would consider doing a compression
file system, but can't afford the time to delve into the deep reals of
AmigaDOS AND at the same time doing extensive math/statistic analysis or
whatever might be nessesary to provide good/fast compression.

	Thanks in advance

	Kristian.

==========================================================================
Kristian Nielsen                          |      ///   Only the AMIGA
Student at DIKU, University of Copenhagen |     ///
(Department of Computer Science)          | \\\///     makes it possible!
Denmark                                   |  \XX/
==========================================================================

HOWELL@whitman.BITNET (DAVE HOWELL) (02/19/91)

Date sent:  19-FEB-1991 08:39:12

>Not quite. One approach still makes sense (this is something I was
>thinking of doing for several years, but it doesn't look like I'll
>ever get to it): have a "compression file system" which works on
>top of the regular one, so that individual files can be read/written
>as compressed files transparently.
>
>This lets all compressed files be directly visible to all programs,
>something you can't really get any other way.

There's still the performance hit problem. How about .5 compression
filesystem? I'll run a compressor program (zoo would be good, lharc would
make me happier) on most of the stuff on my hard disk, particularly the
stuff that I don't think that I'll use any time soon. Six weeks from now,
I'll call it up in a program, and the file system will notice it's
compressed, and uncompress it. The first access is thus slow, but future
access is uncompressed.

Now, I'm a little short on room, so I run the autosqueeze program, which
finds all files that haven't been accessed in n days (14 perhaps) and
compresses them in the background.

It's the automatic decompression that is the critical advantage of such a
filesystem. With data files (say, my 120K book data base) the performance
hit on a compressed file could really hurt. One-time decompression seems a
more reasonable solution.

| Dave Howell                                                    bix: dhowell |
|  User Support Specialist                            Bitnet: HOWELL@WHITMAN  |
|   Whitman College                         Internet: howell@whitman.bitnet   |
|    "I'll get you a Satanic Mechanic." - Frank, Rocky Horror Picture Show    |