[comp.arch] OS Disk services,

gsarff@meph.UUCP (Gary Sarff) (01/30/91)
In article <1991Jan24.193458.16429@dirtydog.ima.isc.com>, suitti@ima.isc.com (Stephen Uitti) writes:
>In article <PCG.91Jan23183334@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>>On 21 Jan 91 22:52:11 GMT, dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) said:
>>dennis> And I distinctly remember arguments being made at the time to
>>dennis> the effect that the speed of the Berkeley fast file system
> ... [excised]
>Interactive Systems has a file system based on the V.3.2
>(essentially V7) file system, but with bitmapped block
>allocation.  You can increase the static block size, but that
>doesn't buy anything, anymore.  It improves file system
>performance by a factor of 10 to 15, on good hardware by my
>benchmarks.  There is a vast difference between a good disk
>controller and a lesser disk controller.  The drivers make use of
>track buffering, if available.  A controller which can read a
>track in one revolution performs as if rotational latency is not
>an issue.  With read-ahead, and contiguous or near contiguous
>files, sequential access is very fast.  Random access is still
>pretty fast, mostly because the blocks are still close to each
>other.  The overhead of figuring out where they are is still an issue.
>
> ... [excised]
>I used to think that using a file system as complicated as the
>Berkeley Fast File System would not be as fast as a simpler file
>system with a bitmap access.  ...
> ... [excised]
>Perhaps a relatively simple file system using sorted extent lists would
>provide a faster system for sequential and random access.  If the file
>system were implemented in a user process under UNIX, you'd be able to
>profile the system, and figure out what the latencies really are for each
>part.
>
>It would also be nice if the inode information were near something
>else.  For that matter it would be nice if inodes had a larger
>name space than 16 bits (V7) and could be allocated dynamically,
>rather than statically (at mkfs time).  A good spot would be in
>directories.  This makes hard links more difficult.  This should
>not be a deterrent.

I am sorry, I could not resist 8-), I have read the previous poster's
comments several times, and the fact is that I am working as OS developer at
a company (WICAT) that does many of these things he is talking about.
Bitmap sector mapping, extent lists, even an editor as he mentions below,
that can edit arbitrarily large files.  Funny thing though, that it isn't
UNIX, so to many people, it is bad, evil, never mind that it might be faster.
We have tested our OS, called WMCS against the Unix Sys V we sell, on the
same hardware, and always the wmcs was faster, more responsive, could handle
a larger number of users, _on the same hardware_.  Now maybe our UNIX wasn't
ported well to motorola cpus, we got it from Unisoft, I don't know since I am
not in the unix group.

The way the file system works has in it echoes of VMS, we have a file
FCB.SYS that holds the file control blocks of all files on the disk.  Then
there is BITMAP.SYS that holds two interleaved bitmaps 2 bits per sector, one
determining if the sector is free or used in some file, the other for bad
sector avoidance, the bit is turned on if some error occurred during read or
write or whatever, and the OS then shuns that sector for writing.  And then
FCBBITMAP.SYS which has 2 bits in it for each FCB, same thing, 1 bit for "is
the FCB being used by some file", 1 bit for "is the FCB good or bad, i.e.
writeable".  FCB's can hold up to 32 extents in them and if the file grows
beyond that, the FCB chains continuation FCB's to hold them.  Note there is
no limit to how many extents, or FCB's one file could have.  

The file system is rather robust, at least in my view, concerning machine
crashes for whatever reason while processes are doing things on the disk.
There is a disk recovery utility that will be able to rebuild the disk and
directory structure in the vast majority of circumstances becase there is
some redundancy and multiple ways of checking some things out.  For example,
the recovery program can find the file FCB.SYS by looking in the disk header
at sector 0 for the sector number of FCB.SYS, if that does not work, sector 1
contains a duplicate of the first four entries in FCB.SYS, namely #0 FCB.SYS,
#1 the root directory, #2 BITMAP.SYS, and #3 FCBBITMAP.SYS.  If that sector
is gone too or invalid, recover can calculate generally where FCB.SYS might
be given size of the disk, (we keep it in the middle of the disk to average
the head movement.)  So it goes there and starts a binary search forwards and
backwards looking for any FCB (these can be detected because they contain
a checksum.)  If it finds any FCB, it can then calculate where FCB #0 should
be and seek to that position and look.  If that doesn't work, the disk
is toast.  FCB.SYS contains sector extents for files, which can be checked
for validity by looking in BITMAP.SYS and FCBBITMAP.SYS.  In case a file was
being deleted for example when the machine went down.  It is sort of majority
rules, best two out of three that agree whether a file is there or not, or
whether some FCB is used or not, etc. at recovery time.

Also note that we do not have the problem mentioned above about 16 bit
name space for inodes.  FCB's are 32 bit, and FCB.SYS is just a file too so
if one creates more files than there were initially FCB's, FCB.SYS will
itself be grown, and then the FCB for FCB.SYS (which is #0 at the start of
the file,) will get another extent plugged into it.

Ah, but all this sounds very time consuming, computationally expensive, and
yes, it can be on a badly fragmented disk.  But when a file is opened, the
disk routines read in all the FCB's for that file so that disk accesses are
not needed to compute file access positions, and if another process opens the
same file, the in memory FCB's will be shared.  They are periodically flushed
to the disk, as sectors in the disk cache are.

To handle the fragmentation, there is an OS utility that the user can run, or
have it run periodically, which de-fragments the disk, it attempts to
rearrange the sectors of all files, such that each file has only one FCB
containing one extent.  Then file position calculations just deal with one
number, and do a few shifts and adds.  Not very expensive at all.

>
>I wrote a text editor for my record-oriented calculator.  Memory
>and file space are shared in this environment.  I decided that I
>couldn't afford two copies of the file.  I only read enough of
> ... [excised]  
>I decided that a similar system could be used for a
>block oriented editor for larger systems.  It would handle
>arbitrarily large files, be able to make changes to single files
>that nearly fill disks, have very fast startup (since it only has
>to read the first screen), 

Our system text editor called VEW for virtual edit window, is like this,
scrolling through the file as it is on disk, so it can be arbitrarily large.

Sorry for the long followup, I have redirected follow-ups to my message at
least to comp.os.misc, where it probably belongs.  Even though I prefer wmcs
to unix, I may be biased ;-) and don't need another unix-vs-something war, I
can get that here at work. 8-)

---------------------------------------------------------------------------
                          I _don't_ live for the Leap!
     ..uplherc!wicat!sarek!gsarff