[comp.unix.questions] File space allocation/deallocation under Unix

rich@eddie.MIT.EDU (Richard Caloggero) (07/12/88)

     If you create a directory "d", then create a file "d/f" 1 megabyte
long (big), then "rm d/f", is it true that the space remains allocated
to directory "d" and cannot be garbage collected or otherwise reclaimed
until another file is created in directory "d" or "d" is deleted?  If this is true, why?
What good does it do to delete files, say, in your home directory
(if, for example, disk resources are low and you system administrator
  keeps hounding you to "clean up your directory")?



-- 
						-- Rich (rich@eddie.mit.edu).
	The circle is open, but unbroken.
	Merry meet, merry part,
	and merry meet again.

ron@topaz.rutgers.edu.UUCP (07/13/88)

Not true.  When the last reference of a file goes a way, so does the space
it took up.  What is probably misleading you is the fact that typical UNIX
directories themselves never get smaller when you delete files.  First, you
need to realize what a UNIX directory contains.  A UNIX directory only contains
file names (1-14 characters on most, variable length on BSD variants) and
pointers to the inode that that name refers to.  An inode is a data structure
that contains the "essence" of the file, information such as the owner,
permissions, access times, and where on the disk the actual data contained
in the file is stored.  When the last directory reference to that inode
is removed (there is a count kept in the inode), the inode and the data
it represents is freed.  On traditional UNIX file systems, when you delete
a directory entry, the entry is just zeroed (actually only part is zeroed)
but it's slot is left in the directory for future reuse.

Consider that you create a directory with 1000 files in it.  On a System V
file system a directory entry is 16 bytes long (14 characters for the name,
2 bytes for the inode number).  The directory would grow to 16000 bytes long.
If you removed all the files, the entries would be zapped, but your directory
would still have 16000 bytes, representing 1000 free entries.  You can compress
these "holes" by removing the directory (if it were empty) or copying the
directory over fresh by moving each file or subdirectory to a fresh directory
and discarding the old one.  However, the data blocks actually consumed by the
files disappeared as soon as no one was still pointing to them.

-Ron

chris@mimsy.UUCP (Chris Torek) (07/13/88)

In article <9662@eddie.MIT.EDU> rich@eddie.MIT.EDU (Richard Caloggero) writes:
>     If you create a directory "d", then create a file "d/f" 1 megabyte
>long (big), then "rm d/f", is it true that the space remains allocated
>to directory "d" and cannot be garbage collected or otherwise reclaimed
>until another file is created in directory "d" or "d" is deleted?

The answer is `yes', but given the way you phrased the question, I think
you misunderstand what is going on.  The space FOR THE NAME `f' remains
allocated.  Since file names are typically very small, this makes little
difference (space is allocated in 512 byte or 1 kbyte units).  On the
other hand, if a directory grows very large (many file names) and the
file names are subsequently removed, the space remains occupied, except
in 4.3BSD and later versions of BSD, where any new file creation or
renaming will truncate any unused blocks off the end.  For this to
work, the remaining file names need to be near the front of the directory;
this can be arranged by renaming those files near the end:

	# approximately,
	# do
	ls -f
	mv last_name_shown x; mv x last_name_shown
	# until last_name_shown does not move.
	# the directory has now been compacted.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gpasq@picuxa.UUCP (Greg Pasquariello X1190) (07/13/88)

In article <9662@eddie.MIT.EDU> rich@eddie.MIT.EDU (Richard Caloggero) writes:
>
>
>     If you create a directory "d", then create a file "d/f" 1 megabyte
>long (big), then "rm d/f", is it true that the space remains allocated
>to directory "d" and cannot be garbage collected or otherwise reclaimed
>until another file is created in directory "d" or "d" is deleted?  If this is true, why?
>-- 
>						-- Rich (rich@eddie.mit.edu).


No, this is not true.  What really happens is the inode is removed (thereby
"freeing" the disk space), but the filename remains in the directory (which
is really just a file with special permissions that holds other file names
and i-numbers).  The i-number in the directory entry is zeroed, signifying
a free slot in the directory.  When a new file is created in that directory,
any free slots are used for the new name and i-number rather than adding onto
the directory file itself.
-- 
=========================================================================
Greg Pasquariello                   AT&T Product Integration Center
att!picuxa!gpasq                299 Jefferson Rd, Parsippany, NJ 07054
=========================================================================

rjd@occrsh.ATT.COM (Randy_Davis) (07/13/88)

In article <9662@eddie.MIT.EDU> rich@eddie.MIT.EDU (Richard Caloggero) writes:
:     If you create a directory "d", then create a file "d/f" 1 megabyte
:long (big), then "rm d/f", is it true that the space remains allocated
:to directory "d" and cannot be garbage collected or otherwise reclaimed
:until another file is created in directory "d" or "d" is deleted?  If this is
 true, why?  What good does it do to delete files, say, in your home directory
:(if, for example, disk resources are low and you system administrator
:keeps hounding you to "clean up your directory")?
:						-- Rich (rich@eddie.mit.edu).

  When you remove the file, the space taken by the file *contents* is reclaimed,
i.e., the 1 megabye above is available for system use again.  The 16 bytes
(for System V) taken by the file entry in the directory "file" is *not*
reclaimed unless you remove the directory.
  To explain the directory part further: if you make the directory then ls -al
the directory, you will see that the directory entry, ".", takes up 32 bytes
of space; 16 bytes for the "." entry, and 16 bytes for the ".." entry (you can
check this under System V via the 'od -c .' command).  If you create a file
under this directory, the directory "file" will increase by 16 bytes to 48
bytes.  If you then remove this file, the directory "file" does *not* reduce
by 16 bytes....

Randy

plipp@tugiig (Lipp Peter) (07/13/88)

In article <9662@eddie.MIT.EDU>, rich@eddie.MIT.EDU (Richard Caloggero) writes:
> 
> 
>      If you create a directory "d", then create a file "d/f" 1 megabyte
> long (big), then "rm d/f", is it true that the space remains allocated
> to directory "d" and cannot be garbage collected or otherwise reclaimed
> until another file is created in directory "d" or "d" is deleted?  If this is true, why?
> What good does it do to delete files, say, in your home directory
> (if, for example, disk resources are low and you system administrator
>   keeps hounding you to "clean up your directory")?

I really have never heard such a ridiculous theory. I really do not KNOW, how
bsd4.2 does it, but (see Maurice J. Bach, The Design of the Unix Operating System,
Prentice Hall) as in all Unix Systems I am very sure it is like this:

Every file is represented by an inode. This inode contains a list of disk-blocks
the file consist of (about 2000 for your MB). If you remove the file all the disk-
blocks are freed.

The only chance you have that the MB is still in use after removing is, that somebody 
has made a link to your file (hard-link). You wouldn`t notice this and, if you do the rm,
the directory of the other user still has a reference to the files inode - so the inode
and the disk-blocks would not be freed. But - the inode and the data still belong to you.

Peter Lipp (plipp@tugiig.uucp)

eao@anumb.UUCP (e.a.olson) (07/16/88)

In article <17@tugiig> plipp@tugiig (Lipp Peter) writes:
>In article <9662@eddie.MIT.EDU>, rich@eddie.MIT.EDU (Richard Caloggero) writes:
>>      If you create a directory "d", then create a file "d/f" 1 megabyte
>> long (big), then "rm d/f", is it true that the space remains allocated
>> to directory "d" and cannot be garbage collected or otherwise reclaimed
>> until another file is created in directory "d" or "d" is deleted?  If this is true, why?

    I believe that directory blocks are never reclaimed until
    the directory is deleted.  If you have many files in a directory,
    (i.e. more entries than can fit into a directory block), one
    entry in the first block points to another disk block for
    more name-inode entries.  Even if you later clean up that directory
    so that there are only enough entries to fit into one disk block,
    the indirect block is retained.

ge@hobbit.sci.kun.nl (Ge' Weijers) (07/19/88)

From article <292@anumb.UUCP>, by eao@anumb.UUCP (e.a.olson):
>     I believe that directory blocks are never reclaimed until
>     the directory is deleted.  If you have many files in a directory,
>     (i.e. more entries than can fit into a directory block), one
>     entry in the first block points to another disk block for
>     more name-inode entries.  Even if you later clean up that directory
>     so that there are only enough entries to fit into one disk block,
>     the indirect block is retained.

This was true for BSD 4.2 at least. I just looked at the directory
/usr/spool/news/.rnews on our BSD 4.3 system, and it was only 512 bytes
long, so there must be some space reclaiming going on (after a news problem
1500 articles were queued last week)
-- 
Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands
UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge

guy@gorodish.Sun.COM (Guy Harris) (07/21/88)

> I just looked at the directory /usr/spool/news/.rnews on our BSD 4.3 system,
> and it was only 512 bytes long, so there must be some space reclaiming going
> on (after a news problem 1500 articles were queued last week)

This is true.  4.3BSD will reclaim empty space at the end of a directory when
you create a new entry in the directory.  (One benefit of this, as you
discovered, is that it reclaims space in huge spool directories....)