[comp.sys.hp] Huge directories

pearmana@prlhp1.prl.philips.co.uk (Andy Pearman) (11/14/88)

Can anyone help ?

I keep a check on the size of our news directories by issuing

    du      ( this is on an HP9000-350 running HP-UX6.0 )

This works fine except I get the message

    Huge directory < dir-name >--call administrator

Even after most of the files in these directories disappear
it still says "Huge directory".

Do I have to compress the directory in some way and if so
how do I go about doing this ??


Thank for any help you can give,

     Andy.
   
-- 

Andy Pearman, Computer Dept, Philips Research Labs, Redhill, Surrey, England. 

guy@auspex.UUCP (Guy Harris) (11/26/88)

>Do I have to compress the directory in some way

Probably.  The message appears (at least in the S5R3 source) to be
advisory; it doesn't prevent "du" from continuing to work.

>and if so how do I go about doing this ??

1) Make sure nothing is running that could create new files in the
   directory.

2) Make a new directory in the same parent directory as the directory
   in question.

3) Move all the files from the directory in question into the new
   directory.

4) Remove the now-empty directory in question.

5) Rename the new directory to have the same name as the just-removed
   directory.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/26/88)

In article <706@prlhp1.prl.philips.co.uk> pearmana@prlhp1.prl.philips.co.uk (Andy Pearman) writes:
>Even after most of the files in these directories disappear
>it still says "Huge directory".
>Do I have to compress the directory in some way and if so
>how do I go about doing this ??

Except on systems that perform directory compaction (rumored to be true
for 4.nBSD, n >= 2, although I've never seen it happen), UNIX directories
never shrink even though entries are "deleted" from them.  A deleted
entry merely has its inumber changed to 0.  The simplest way in such
circumstances to reclaim some disk storage is to make a new directory,
plant the same inode links in it (using "mv" or "ln"), then remove the
old directory and rename the new one to the original name.

By the way, you can look at the allocated size of a directory (in
blocks) by "ls -sd dir_path_name".

dave@elandes.UUCP (D. Mathis) (11/27/88)

In article <515@auspex.UUCP>, guy@auspex.UUCP (Guy Harris) writes:
.
. gives a method for compressing directories
. 
.
> 1) Make sure nothing is running that could create new files in the
>    directory.
> 2) Make a new directory in the same parent directory as the directory
>    in question.
> 3) Move all the files from the directory in question into the new
>    directory.
> 4) Remove the now-empty directory in question.
> 5) Rename the new directory to have the same name as the just-removed
>    directory.


	When I have to do this operation, I 'move' the directory to a new
name, and make a new directory with the correct name as 'one' command. i.e.
$ mv oldname newname ; mkdir oldname

	The window of vulnerability seems small enough that I have never
seen a process fail for not having the 'correct' directory available.  It
also allows current processes to keep their files open, since the directory
name isnt part of the file descriptor.
	Then I move the files from newname to oldname and remove newname.

	Are ther problems with this sequence that I have just never been
bitten by?
-- 
	Dave Mathis, ELAN designs           UUCP  ...oliveb!elandes!dave

guy@auspex.UUCP (Guy Harris) (11/27/88)

>Except on systems that perform directory compaction (rumored to be true
>for 4.nBSD, n >= 2, although I've never seen it happen),

I don't know that I've seen 4.xBSD perform directory compacting, in the
sense of shuffling entries within a block to coalesce several empty
spaces in that block.  Then again, I've never looked for it....

I have, however, seen a system with the 4.3BSD directory *shrinking*
code shrink a directory, i.e. do the moral equivalent of an "truncate"
on it when the last block is empty.  4.2BSD doesn't do that; n must be
>= 3 for that to happen.

Note that directories are compacted when a new entry is made, not when
an entry is removed.  I can think of two reasons why they might have
chosen to do it then, as opposed to doing it when entries are removed:

	1) It may cut down on "hunting", i.e. if a directory is just
	   barely big enough to require N blocks, deleting the last
	   entry and then creating a new entry that goes into the last
	   block won't cause the last block to be deleted and then
	   reallocated.

	2) It was easier that way.  When removing an entry, you stop the
	   directory search when you find the entry, rather than
	   scanning to the end of the directory; in order to decide
	   whether to truncate a directory, you have to scan all the way
	   to the end.  When creating a new directory entry, the
	   directory is scanned all the way to the end (to make sure
	   there isn't already an entry with that name).

I have no idea which, if any, of those is the reason why; there may well
be a third reason.  I currently have no opinion on whether it should be
done when an entry is created or when one is removed.