[comp.os.aos] How to find *real* file sizes in AOS/VS...?

rab@murdu.oz (Richard Alan Brown) (09/21/89)

	After trying to understand the AOS/VS filesystem, I have become
more confused than ever. What I have tried to do, is to show how much _real_
disk space is used by a file or group of files. I have written a CLI command
to list 'f/len/index/elem/typ/l=filename', and then I have used 'awk' to
process this and calculate the numbers. However, the results are clearly
incorrect, and reflect my poor understanding of AOS/VS. Here's what I think
I know:

Each file has a length given in bytes. Given the element size for that file
(default 4 on our system), the real file size is just the length in bytes
taken up to the next multiple of the element size. (e.g. An element size of 4
means 4*512 = 2k bytes, so such files are allocated in 2k chunks).

But what about index blocks? OK, so I also count the number of index levels
in a file, and allow a block for each level (Is this correct? Are index blocks
true blocks, or blocks within a 2k chunk? In other words, does the system lose
4 blocks on the first index 'block' and use this for the next three?)

BUT! I have noticed DG's sneaky compression of files (executables) with large
blocks of nulls in them (am I right?), so that a file can seem large (in bytes),
while actually taking up much less disk space. (Is this only for PRV files?
Why doesn't the file system tell users the 'correct' size?)

Now for the *really* tricky part. Create an empty CPD. put a file in it
(length 0). Start adding data. Who knows how much space the file takes up!?
Does the SPACE command include the space taken up by directory entries? How
does one calculate that (Note that when one deletes the file, the CPD is not
'empty'. This presumably is the directory entry...?).

So if there are any Data General employees or hackers out there, maybe you
could enlighten me?
						Richard Brown

						(rab@murdu.mu.OZ.AU)
						   or
						(rab@murdu.ucs.unimelb.OZ.AU)

jba@harald.ruc.dk (Jan B. Andersen) (09/21/89)

I think it's impossible, but you can get pretty close by DUMP'ing the
files to disk and then check the size of the dumpfile. The overhead is
only for the filename, the ACL, time and date etc.

      /|  /       Postmaster@RUC.dk               /^^^\     .----------------.
     / | /        DG-passer@RUC.dk               { o_o }    | SIMULA does it |
    /--|/         jba@meza.RUC.dk                 \ o / --> | with CLASS     |
`--'   '          rucjb@os1100.uni-c.dk        --mm---mm--  `----------------'

dik@cwi.nl (Dik T. Winter) (09/22/89)

About getting the disk space used by a specific file.

In article <116@harald.UUCP> jba@harald.ruc.dk (Jan B. Andersen) writes:
 > I think it's impossible, but you can get pretty close by DUMP'ing the
 > files to disk and then check the size of the dumpfile. The overhead is
 > only for the filename, the ACL, time and date etc.
I would think that a dumped file might use more space than a file that is
not dumped!  The previous poster alluded to compression techniques where
large blocks of 0's were not written to disk.  They might very well expand
when dumping in AOS (I do not know).

Anyhow, there are many OSes were it is impossible to obtain the number of
disk blocks used by a specific file.  E.g. under Unix (is this heresy?)
it is possible to create a file were ls (list files) gives a size of
several hundreds of megabytes and du (disk usage) on the directory reveals
that only 2 blocks are used.  (And of course under Unix it is possible that
the file that uses most space on the system is invisible because, although
it is still open, it is already unlinked!)
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

mjn@sbcs.sunysb.edu (The Sixth Replicant) (09/22/89)

In article <8416@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>About getting the disk space used by a specific file.
>
>I would think that a dumped file might use more space than a file that is
>not dumped!  The previous poster alluded to compression techniques where
>large blocks of 0's were not written to disk.  They might very well expand
>when dumping in AOS (I do not know).

In point of fact, DUMP, DUMP_II and, I believe MOVE, do zero elimination.
Whenever there are blocks of zeros, these are not written to the output. 
There is a system call which has an option to give the next _allocated_
block. I believe it's ?BLKIO, but its been several years. This is used
by DUMP_II to avoid spending large quantities of CPU time while scanning
zeros from large, unallocated files.

-------------------------------------------------------------------------
Marc Neuberger                                      mjn@sbcs.sunysb.edu



-----------------------------------------------------------------------------
Marc Neuberger                                            mjn@sbcs.sunysb.edu

rgs@igw.megatek.uucp (Rusty Sanders) (09/23/89)

> 	After trying to understand the AOS/VS filesystem, I have become
> more confused than ever. What I have tried to do, is to show how much _real_
> disk space is used by a file or group of files.
>
> [...]
> 
> So if there are any Data General employees or hackers out there, maybe you
> could enlighten me?

Well, I'm not currently a DG hacker, but in a previous life I used to be
(I do unix/Sun stuff now, much nicer as far as I'm concerned). Anyway,
you're right. What you are trying to do is VERY difficult with the AOS/VS
I remember (it's been a few years, so things MIGHT of changed, but I doubt
it).

Anyway, quite a long time ago I wrote a little program to do just what you're
asking about. You ran it on a filesystem and it generated a nice listing
of all files (in size order), and their actuall sizes. It is possible to do,
but it's not at all easy.

What do you have to do? Well, first thing is to dismount the filesystem.
This means you can't do it on the root, but that's just tough (actually,
I re-wrote it to run under MP-AOS, and booted it from a floopy when I wanted
to size the root, but I won't ever admit it). Anyway, the trick is to read
the raw disk device, traversing the directory structures, and reading all the
raw index blocks for all the files.

Sounds like a pain? You bet it was. Fortunately, DG provides (or at least
used to provide) a internals document that described the file sructure.
If you're still interested contact your local friendly neighborhood DG
sales engineer (does DG call them applications engineers?) and see what you
can shake loose.

Either that, or beat them up to provide a decent O/S interface that allows
you to find things like this out.

And don't ask if I still have that program handy. It long ago was lost in the
abyss of corporate hyjinks and layoffs. Of course, if you wanted to hire me
as a consultant.....
----
Rusty Sanders, Megatek Corp. --> rgs@megatek or...
         ...ucsd!    ...hplabs!hp-sdd!    ...ames!scubed!   ...uunet!

gary@dgcad.SV.DG.COM (Gary Bridgewater) (09/24/89)

In article <1702@murdu.oz> rab@murdu.oz (Richard Alan Brown) writes:
 >I know:
 >
 >Each file has a length given in bytes. Given the element size for that file
 >(default 4 on our system), the real file size is just the length in bytes
 >taken up to the next multiple of the element size. (e.g. An element size of 4
 >means 4*512 = 2k bytes, so such files are allocated in 2k chunks).

	Yes.

 >But what about index blocks? OK, so I also count the number of index levels
 >in a file, and allow a block for each level (Is this correct? Are index blocks
 >true blocks, or blocks within a 2k chunk? In other words, does the system lose
 >4 blocks on the first index 'block' and use this for the next three?)

	A 0 level file has no index blocks. It is a direct file and its size
	is one element.
	A 1 level file has one disk block (512 bytes) which contains 128 four
	byte logical disk addresses pointing to data elements.
	A 2 level file has one disk block pointing to 128 'level 1' index blocks.
	A 3 level file has one disk block pointing to 128 'level 2' index blocks.

 >BUT! I have noticed DG's sneaky compression of files (executables) with large
 >blocks of nulls in them (am I right?), so that a file can seem large (in bytes),
 >while actually taking up much less disk space. (Is this only for PRV files?

	We prefer "clever".
	In the above index block scheme you can hae element pointers that are
	zero. AOS/VS takes this to mean that the entire element is empty and
	it provides the 0 bytes if you try to read these blocks. This is a
	great disk space savings for executables and data bases. It will work
	on ANY kind of file but only if you A) write at least  whole index block
	worth of 0s at once or B) use some form of file positioning command to
	skip data.
	Note that it is important when moving such files over the net to use the
	MOVE/FTA/COMPRESS command rather than just MOVE (RMA form) or MOVE/FTA
	(no compression). Without both FTA and COMPRESS the transfer takes place
	a byte at a time and the system won't notice the null blocks. A way to
	fix files which have been incorrectly grown this way is to DUMP and LOAD
	them since DUMP will squeeze out the 0s and LOAD will do positional
	block writes.
	This 'compression' makes exact space computation tricky.
	It also makes reading such file interesting - study the BLKIO system
	call, for instance. Its main feature is the ability to skip these
	empty spaces - that is why DUMP_II can dump such files MUCH faster
	than DUMP which reads a byte at a time. It is also why the system
	can seem to "go into its navel" when READing such a file - no disk
	activity and the expansion is done at system priority.

 >Why doesn't the file system tell users the 'correct' size?)

	What is the real size? If you read it a byte at a time you will get an
	EOF after the Nth byte so the file is N bytes long. If you ?BLKIO the
	file an element at a time you can discover how many blocks it is taking
	up and from that you can infer the element structure if you map the
	empty elements. But do you want the CLI to do that everytime you say
	F/LEN? You could submit an str to have another switch added to the
	VSII CLI to perform this activity.  You could also submit an STR
	asking that the system maintain a count of the number of elements
	allocated to a file which would make the system bigger and slower
	to provide a rarely needed piece of information.
	If you want to know how much space on the disk the file takes then
	create a cpd, move the file there, do a space, delete the file, do
	another space to get the size of the cpd itself and subtract from the
	first size. Crude but exact.

 >Now for the *really* tricky part. Create an empty CPD. put a file in it
 >(length 0). Start adding data. Who knows how much space the file takes up!?
 >Does the SPACE command include the space taken up by directory entries? How
 >does one calculate that (Note that when one deletes the file, the CPD is not
 >'empty'. This presumably is the directory entry...?).

	The size of the directory is the second number above. Directory space
	is a function of the number of files, any UDAs and the length of the
	filenames. Directories are also files so they have index blocks too!
	Yes, the space a directory takes is included in the size of the
	directory.

 >So if there are any Data General employees or hackers out there, maybe you
 >could enlighten me?

	You could also order a Filesystem Internals manual which goes into all
	the gory details of this.
	Another poster mentions using DUMP to discover a file's true size. Won't
	work - DUMP compresses nulls wherever it finds them irrespective of
	block boundaries.
	And some Unix versions also do this sort of 'hollow' file optimization.
	We may very well have inherited it from MULTICS which is the inspiration
	for both Unix and AOS(/VS).

The above is my interpretation of How It All Works and should not be interpreted
as an Official Version. See the manual and the Release notices. Buy the sources
and KNOW enlightenment. Your mileage may vary.
-- 
Gary Bridgewater, Data General Corp., Sunnyvale Ca.
gary@sv4.ceo.sv.dg.com or 
{amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary
No good deed goes unpunished.

guestx@wave4.webo.dg.com (Guest login for misc) (10/18/89)

In article <741@megatek.UUCP> rgs@igw.megatek.uucp (Rusty Sanders) writes:
> 	After trying to understand the AOS/VS filesystem, I have become
> more confused than ever. What I have tried to do, is to show how much _real_
> disk space is used by a file or group of files.
>
> [...]
> 
> So if there are any Data General employees or hackers out there, maybe you
> could enlighten me?

A little known feature of AOS/VS II is that this information is available
through the ?FSTAT system call. If you have a AOS/VS II system, issue a 
FILESTATUS/PACKET CLI command and check out locations 25 and 26, this will
give you the actual number of blocks taken up by the file (in octal). This
used to only work with CPDs and LDUs, but we made it work with all files.

Standard Disclaimers Apply

Don Lehman
AOS/VS II Development.
Internet: don@tzone.ceo.dg.com