[mod.computers.apollo] Apollo disk space erosion

dennis%cod@NOSC.MIL.UUCP (02/04/87)

The disk on my DN3000 (SR9.2.3, IX9.2.3, TCP2.1) has been gradually
filling up (down to ~2 Mbytes free as reported by LVOLFS), as well as
seeming to have a slower response time.  After being "up" for several
weeks, I had to reboot the node for an unrelated reason, and noticed
that I had recovered 10 Mbytes on the disk!  Machine seems faster,
too.

As a check, we did a /SYSTEST/LST on a node that had been up for about
45 days, rebooted, and "found" another 5.5 MBytes.  But a subsequent
LST could account for only a few tens of Kbytes in the user-visible
file system.  Where does all this disk space disappear to?  How can I
get it back without rebooting?

       Dennis Cottel  Naval Ocean Systems Center, San Diego, CA  92152
       (619) 225-2406     dennis@NOSC.MIL       sdcsvax!noscvax!dennis

Erstad@HI-MULTICS.ARPA.UUCP (02/05/87)

There are a couple places where storage seems to mysteriously disappear.
The first place (aside from formatting losses) is the the VTOC (Volume
Table Of Contents?).  The total size shown in a LVOLFS takes into
account formatting losses, but not the VTOC.  This is around 5% or so,
and depends in part on the parameters used in involing the disk.

There is also file space which is not catalogued and thus is not "user
visible".  Most of this hangs around only for the duration of a process
(transcripts, process paging, etc.)  and should not be a long term
effect, although for a given task this can be a problem.

Other common things to watch out for is excessive use of the EDACL
command (which creates a new ACL object each time; edacling an entire
tree with 10K objects using 10MB!).  Use ACL to assign ACLs (EDACLing
only a "template") and/or use SALACL to merge ACL structures.  If you
have processes which die horribly on a frequent basis a FIND_ORPHANS
might help.

The above explain only where disk space "disappears" to.  These are
things to watch out for, but won't cause disk space to be 'found' on
rebooting.  A little is normal (if you SALVOL, the amount you get is a
meg or two lower than what the node will show you booted) but we have
never seen (unexplained) loss of free space (Our site has 40+ Apollos of
almost every type).  If I had to guess (and this is ONLY a guess) you
may have an application which is creating uncatalogued permanent
objects.  Does the disk space reappear with a boot only, or with a
SALVOL/boot combination only?  If the latter, look at the SALVOL report
and see if it tells you anything.

HI-MULTICS

"Disclaimer:  My employer doesn't believe a word I say"

peterson@UTAH-CS.ARPA.UUCP (02/05/87)

Unlike native Unix boxes, Apollos do not have an explicit swap area.
Instead, user paging space is allocated from the general pool of free
disk space.

If a diskless node crashes or a process dies abnormally (e.g, "sigp -blast"
or vanishes with a "process not found") space used by the process may
not be reclaimed.  The result is that over a period of weeks disk space
leaks away.

The best way to reclaim this space is to take the node down and run
SALVOL. There is another program, find_orphans, that tracks down objects
that are allocated on the disk but don't have a directory entry (these
can also result from crashes or abnormal process terminations).
However, only run find_orphans when there is no activity on the disk.
Otherwise, it might decide a file is an orphan if it finds it while it's
being created...)

There is a major advantage to this way of allocating swap space - it makes
adding or removing diskless partners a fairly trivial process.  (If you've
ever watched a Sun adminstrator reformat a disk all night to build new swap
partitions you'll know what I mean.  I think Sun is switching to the global
paging pool scheme in their next release...)