[comp.unix.wizards] Dynamic Inode lists

mangler@cit-vax.Caltech.Edu (Don Speck) (04/09/89)

In article <28454@apple.Apple.COM>, fair@Apple.COM (Erik E. Fair) writes:
> In the referenced article, mangler@cit-vax.Caltech.Edu (Don Speck) writes:
>
>	By the way, will someone please put the ilist into a file, so
>	it can be dynamically extended?
>
> If you do that, how do you propose to allocate it, such that you don't
> have to go seeking all over the disk to find the inode you're looking
> for?

A file containing all the inodes would probably need only one indirect
block on a BSD or V9 filesystem.  That indirect block is likely to remain
in the buffer cache.

The kernel itod() macro would be replaced by a lookup in an indirect block.

For large files such as this, the BSD filesystem allocates 256K from
each cylinder group before switching to the next.  That's 2048 inodes
allocated from each cylinder group, same as mkfs allocates now.

BSD programs that read inodes sequentially generally do it one filesystem
block at a time.  Having inode blocks rotationally placed instead of
contiguous would speed up such sequential reading.  Having it in a
file means that utility programs also get the benefit of readahead.

The kernel likewise reads (and caches) inodes a block at a time, hence
is not taking advantage of the current contiguous allocation of inodes.

ialloc() would want to allocate related files to the same inode block
(BSD ialloc halfway does this now) and scan the indirect block of the
file-of-inodes when it needed another inode block, looking for block
pointers near the desired value.

Granted, the file-of-inodes could get scattered if you have to extend
it or fill in holes while the disk is almost full.  You'd probably want
to preallocate a best guess, like with SunOS 4.0 NFS swap files.  But
that's better than being unable to extend it and consequently having
to allocate something huge just in case.  That could get expensive on
filesystems with large inodes, such as in Unix V9.

I suggest that inode 1 be used for the file-of-inodes (so it will be
in the same block as the root inode, hence cached together).

I'd mention where this idea has been used before, but some might find
the example to be odious.

jfh@rpp386.Dallas.TX.US (John F. Haugh II) (04/11/89)

In article <10301@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes:
>In article <28454@apple.Apple.COM>, fair@Apple.COM (Erik E. Fair) writes:
>> In the referenced article, mangler@cit-vax.Caltech.Edu (Don Speck) writes:
>>
>>	By the way, will someone please put the ilist into a file, so
>>	it can be dynamically extended?
>>
>> If you do that, how do you propose to allocate it, such that you don't
>> have to go seeking all over the disk to find the inode you're looking
>> for?
>
>A file containing all the inodes would probably need only one indirect
>block on a BSD or V9 filesystem.  That indirect block is likely to remain
>in the buffer cache.

Why not have the inumber contain information concerning the location on
the partition of itself?  Inumbers could be expanded to 32 bits, with
the first 28 giving the block number [ 1K blocks ] on the device and the
last 4 giving the inode within the block.  Keep that list of blocks
organized in a file.

Now you have no need to check the indirect block since you can directly
locate the inode given the inumber.  You may also expand the file by
grabbing a new block and making up a new group of inumbers.  You would
only have to read the ilist file when looking for new inumbers to use
and the inumber cache is empty [ same as now, except the search is
more expensive ].

>I'd mention where this idea has been used before, but some might find
>the example to be odious.

Yup.  It was sort-of used in VMS.  Directories contained pointers into
a giant index file.  An excellent example of how not to write a file
system.
-- 
John F. Haugh II                        +-Quote of the Week:-------------------
VoiceNet: (214) 250-3311   Data: -6272  | "Porsche does not recommend
InterNet: jfh@rpp386.Dallas.TX.US       |  exceeding any speed limits"
UucpNet : <backbone>!killer!rpp386!jfh  +--        -- Porsche Ad   ------------

karl@asylum.SF.CA.US (Karl Auerbach) (04/11/89)

A couple of years ago we did a Unix for the Crays out at the Livermore
labs.  As part of this system we allowed the Unix file systems to grow
(and contract) dynamically to change the number of available inodes
and disk blocks.  (The reason for this was because we were competing
for disk space with a competing OS on the same machine.)

In any case, I have forgotten the details myself, but I'm sure you can
follow up, if you are interested, with the author, Karen Schaeffer, at
Sandia Labs, Livermore (no, I don't know her net address.)

				--karl--

allbery@ncoast.ORG (Brandon S. Allbery) (04/15/89)

As quoted from <10301@cit-vax.Caltech.Edu> by mangler@cit-vax.Caltech.Edu (Don Speck):
+---------------
| In article <28454@apple.Apple.COM>, fair@Apple.COM (Erik E. Fair) writes:
| > In the referenced article, mangler@cit-vax.Caltech.Edu (Don Speck) writes:
| >	By the way, will someone please put the ilist into a file, so
| >	it can be dynamically extended?
| >
| > If you do that, how do you propose to allocate it, such that you don't
| > have to go seeking all over the disk to find the inode you're looking
| > for?
| 
| A file containing all the inodes would probably need only one indirect
| block on a BSD or V9 filesystem.  That indirect block is likely to remain
| in the buffer cache.
| (...)
| I'd mention where this idea has been used before, but some might find
| the example to be odious.
+---------------

"Damn the torpedoes; full speed ahead!"  It had a lot of good ideas, but
nobody took the time to implement them properly.  Pity, that.  The source is
TRSDOS (file DIR/SYS is the disk directory).

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser