[comp.os.aos] The *real* file size

kohli@gemed (Jim Kohli, but my friends call me) (09/23/89)

> Richard Alan Brown @ Comp Sci, Melbourne Uni, Australia notes:
>	But I have noticed DG's sneaky compression of files
>(executables) with large blocks of nulls in them (am I right?),
>so that a file can seem large (in bytes), while actually taking
>up much less disk space. (Is this only for PRV files?  Why
>doesn't the file system tell users the 'correct' size?)
>
AOS/VS "compresses" *ANY* kind of file which has a complete
"element" (i.e., contiguous disk allocation) of zeroes.  This
happens to save disk space -- and it does have some advantages
which I think outweigh the disadvantages.  This is also one
reason DG discourages make .PR files contiguous-- although you
may get faster page faulting, you will take up more disk
space-- although it may be worth the tradeoff.
Why doesn't the file system tell you the 'correct' size?  It
doesn't know!  What is the 'correct' size, after all?  The size
which AOS/VS gives you is the amount of space that the data *IN*
the file would occupy if it had no compression applied to it.  This
number should be treated as the size of the file in all cases
where disk space allocation is not critical.  Obviously, if a
file is contiuous, and has any non-zero data it it, the only
space it will occupy is directory overhead space.
If you really need to find out exactly how much space is
occupied by a single file, your best recourse is to create
a CPD, make a junk file in it (to init the directory),
delete the junk file, *NOW* note the space in the directory,
move your file into it and note the difference in space.
This method will not work for a series of files unless you
recreate the CPD for each file (because the directory
overhead space is not reclaimed).

>Now for the *really* tricky part. Create an empty CPD. put a
>file in it (length 0). Start adding data. Who knows how much
>space the file takes up!?
>
You are pretty close-- see the above...

>  Does the SPACE command include the
>space taken up by directory entries?
>
You betcha

> How does one calculate
>that (Note that when one deletes the file, the CPD is not
>'empty'. This presumably is the directory entry...?).
>
Correct, again!

How bad do you need to know?

Dumping stuff won't really do much because the same "rule of
compression" applies to dumpfiles.

DG gets a lot of flack from people who really need to know
how much space their file is really taking, and I believe
their method is "make CPD, make junk file, delete junk
file, write down space, move file into CPD, subtract new
space from old space".

Jim

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
decrepitate: 1. to roast or calcine (salt, minerals, etc.) so as to
cause crackling or until crackling ceases...

In a context: "Oh, my brain is decrepitating!  Aaaaaaarrrgggghh!!!!"
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Jim Kohli                    | "Oh Grammar!  Water bag icer gut!
GE Medical Systems           | A nervous sausage bag ice!"
PO Box 414                   |
Milwaukee, WI	53201-414    |
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
gemed!hal!kohli@crd.ge.com
sun!sunbird!gemed!hal!kohli
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

mjn@sbcs.sunysb.edu (The Sixth Replicant) (09/23/89)

In article <1055@mrsvr.UUCP> kohli@gemed.med.ge.com (Jim Kohli, but my friends call me) writes:
>AOS/VS "compresses" *ANY* kind of file which has a complete
>"element" (i.e., contiguous disk allocation) of zeroes.  This
>...

I just want to clarify one point here. AOS/VS doesn't do any compression.
The file system allows users to have unallocated blocks in files. When
read, these blocks will be returned as zeros. Certain commands (MOVE,
DUMP, DUMP_II) exploit this property to save space. If, however I write
an element of zeros to disk, I'll get an element of zeros on disk, the
file systems doesn't scan for zeros to see if it can optimize the write.
This would take far too much CPU.

I don't honestly know what UNIX does when I write to a high block number
in an empty file, but I'd be somewhat surprised if it doesn't allow for
unallocated blocks. The description of WRITE in "The design of the UNIX
Operating System" by Bach (p 101) suggests to me that the intervening
space is left unallocated. I think the difference between AOS/VS and
UNIX may be that AOS/VS provides the CPD is which you are told _exact_
space usage, where in UNIX you only have du, which I tend to suspect
doesn't give the precise number.
-----------------------------------------------------------------------------
Marc Neuberger                                            mjn@sbcs.sunysb.edu

kohli@gemed (Jim Kohli, but my friends call me) (09/24/89)

Path: mrsvr.UUCP!csd4.csd.uwm.edu!uwm.edu!mailrus!ncar!boulder!sunybcs!sbcs!mjn
From: mjn@sbcs.sunysb.edu (The Sixth Replicant)
Newsgroups: comp.os.aos
Subject: Re: The *real* file size (was: How to find *real* file sizes in AOS/VS...?)
Keywords: AOS/VS filesize fubar
Message-ID: <3540@sbcs.sunysb.edu>
Date: 23 Sep 89 00:29:22 GMT
References: <1055@mrsvr.UUCP>
Sender: news@sbcs.sunysb.edu
Reply-To: mjn@sbstaff2.UUCP (The Sixth Replicant)
Organization: Tyrell Corp.
Lines: 23

>In article <1055@mrsvr.UUCP> I (Jim Kohli) wrote
>>AOS/VS "compresses" *ANY* kind of file which has a complete
>>"element" (i.e., contiguous disk allocation) of zeroes.  This
>>...

>I just want to clarify one point here. AOS/VS doesn't do any compression.
>The file system allows users to have unallocated blocks in files. When
>read, these blocks will be returned as zeros. Certain commands (MOVE,
>DUMP, DUMP_II) exploit this property to save space. If, however I write
>an element of zeros to disk, I'll get an element of zeros on disk, the
>file systems doesn't scan for zeros to see if it can optimize the write.
>This would take far too much CPU.
>

You are right!  I had originally believed this fairy tale
because I was involved in doing some I/O benchmarking which
involved a lot of ?RDB's/?WRB's, and DG had one of their high
power support dudes (Dave Barrows) criticize our results as
follows: "well, if you're only reading and writing zeroes, they
aren't actually written to the disk..." (this is a
recollection, but it isn't vague on this point).  I guess it
was easy to rationalize at the time because it seemed possible
that the PTE's might maintain a "zero page" bit (no such bit
has been documented to my knowledge, but this was in 1982 when
the soul of the new machine was more like an unfriendly
spirit).

Sorry about that bbbbbboard readers!

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
A sun 3/50 with 8 MB isn't *JUST* a conspicuous consumption of
silicon!
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Jim Kohli                    | "Oh Grammar!  Water bag icer gut!
GE Medical Systems           | A nervous sausage bag ice!"
PO Box 414                   | (Oar aesthete groin-murder???)
Milwaukee, WI	53201-414    |
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
gemed!hal!kohli@crd.ge.com
sun!sunbird!gemed!hal!kohli
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

meissner@dg-rtp.dg.com (Michael Meissner) (10/14/89)

>  From: kohli@gemed (Jim Kohli, but my friends call me)
>  Newsgroups: comp.os.aos
>  Keywords: AOS/VS filesize fubar
>  Date: 22 Sep 89 20:08:47 GMT
>  Reply-To: kohli@gemed.med.ge.com (Jim Kohli, but my friends call me)
>  Organization: GE Medical (Applied Science Lab)
>  
>  > Richard Alan Brown @ Comp Sci, Melbourne Uni, Australia notes:
>  >	But I have noticed DG's sneaky compression of files
>  >(executables) with large blocks of nulls in them (am I right?),
>  >so that a file can seem large (in bytes), while actually taking
>  >up much less disk space. (Is this only for PRV files?  Why
>  >doesn't the file system tell users the 'correct' size?)
>  >
>  AOS/VS "compresses" *ANY* kind of file which has a complete
>  "element" (i.e., contiguous disk allocation) of zeroes.  This
>  happens to save disk space -- and it does have some advantages
>  which I think outweigh the disadvantages.  

Wrong.  If you actually write an element's worth of zeros, you will
get a block allocated on the disk.  If you call ?ALLOCATE it will
allocate blocks on the disk.  The only way you get holes is by:

    1)	You do a ?SPOS to a location that is at least an element-
	size beyond the current end of the file;  or

    2)	You run a 32-bit program that ?SPAGES more than a couple
	of pages (I forget what the threshold is between where
	the system allocates the pages for you, and where the
	pages are not created until you touch them).  I'm not
	entirely positive about this last case.

The linker will use ?SPOS when linking common sections together that
don't have any initializations.  All of the DUMP/LOAD commands will
check for zero blocks and not write the blocks on the tape, and use
?SPOS to create the hole.  I'm not sure about COPY.

The newer DUMP commands will use the ?BLKIO system call to bypass any
holes in a file, while the older DUMP and early versions of DUMP_II
would become CPU intensive (the kernel would realize that the element
was not on disk, and zero fill the page -- meanwhile DUMP would then
rapidly search the page to see if it contained all zeros, and if it
did, chuck it out the window).  On my development system, we once had
a 40+ Meg file that had been created by a bad ?SPOS (the file was only
about 1 Meg), it it took quite some time before the system load went
back to normal.

Normal disclaimer -- I only speak for myself, and not for Data General
--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?