[mod.os] Who needs files. Really "Apollo's been doing it for 6 years"

darrell@sdcsvax.UUCP (03/23/87)

Jim Rees's article (which may or may not have been posted yet) gives
the outlines of how Apollo uses the single-level store idea to implement
its distributed filesystem.

Something else to note is that we did run into the problem that the mapped
I/O model did present performance problems to the sequential file access
level.  The kernel knows nothing about sequential I/O and open files
and such and originally did not do any read-ahead.  Later (a while ago
now), we added "touch ahead":  Mapped segments (32K regions of virtual
address space) can be marked with an integer that is the number of pages
the pager should read in when any page in the segment gets faulted on.
This give read-ahead of a sort.

Sequential output was a bit trickier.  There is no parallel to page
faulting that occurs when you're "done" writing through a page.  What
we did do was make it so that segments can be marked "flush-behind";
i.e. the physical memory pages should be treated as good candidates for
re-use as soon as the segment mapped over them gets unmapped from the
virtual address space.  Another change we made to make sequential output
better is "grow-ahead":  Note that when you're writing a new file, the
pages that are mapped in do not correspond to real disk pages until they
are touched.  Touching one of these pages is called a "growth fault".
The system now optionally grows the file by more than one page on growth
faults.  The Stream I/O library takes care of truncating off any extra
pages when the stream is closed.

All in all, mapped I/O is a nice and useful idea, but anyone who thinks
that it will perform just like a traditional sequential I/O system withOUT
any special purpose features like the ones described above is whistling
in the dark.  Also, some of the features don't always work like you'd
think/hope.

To deal with cache consistency, the file locking mechanism is tied into
the remote file caching mechanism.  Although it is possible to bypass
the locking mechanism and map a file withOUT locking, this is strongly
discouraged (i.e. not a documented feature and not used by the Stream
I/O library).  When a node locks a file, it contacts the home node of
the file and gets back the current date-time-modified (DTM) for the file.
It uses this value to determine whether any pages the using node has are
still OK (i.e. whether it can avoid re-reading the pages from the home
node).  When the using node unlocks the file, dirty pages are sent back
to the home node before the lock is released.  If you use the mechanisms
as intended, you NEVER get bad (stale) data.  We consider this property
a necessity.

Another thing to remember about the single-level store:  Don't be seduced.
I.e. don't use it for things for which it is not intended.  I.e. just
because you can access some database file all over the network doesn't mean
that you should implement a DBMS this way.  You probably really want
to use RPC.  The problem with using a file system in the "wrong" way
is that the interface to file systems is just generally not designed
to deal with failure -- e.g. to letting you know just how much of your
I/O succeeded when the network partitioned.  Things can be made even worse
in a mapped I/O system since any random memory reference can cause an
exception to be raised.  (At least when you make "normal" filesystem
calls, you generally get an error returned.  You can simulate this with
mapped I/O, but not always cleanly and faithfully.)

                -- Nat Mishkin
                   Apollo Computer Inc.
-------

darrell@sdcsvax.UUCP (03/27/87)

In article <2906@sdcsvax.UCSD.EDU> mishkin@apollo.UUCP (Nathaniel Mishkin) 
writes:
>Jim Rees's article (which may or may not have been posted yet) gives
>the outlines of how Apollo uses the single-level store idea to implement
>its distributed filesystem.
>
:
>now), we added "touch ahead":  Mapped segments (32K regions of virtual
:
>we did do was make it so that segments can be marked "flush-behind";
:
>better is "grow-ahead":  Note that when you're writing a new file, the
:
>All in all, mapped I/O is a nice and useful idea, but anyone who thinks
>that it will perform just like a traditional sequential I/O system withOUT
>any special purpose features like the ones described above is whistling
>in the dark.  Also, some of the features don't always work like you'd
>think/hope.
:

Thank you for writing such an informative summary.  These are exactly the
features that are needed.  The third, new-file "grow-ahead", is something that
is missing, but is sorely needed, in VSOS.  VSOS actually pages in new
untouched pages from disk.  Your article gave me hope: not all the lessons of
the past have been forgotten (or, for the more emotional: MULTICS and TSS
live!   :-)

Two more questions:  

1)  Does Aegis have an advise function that allows a process to give
    advanced information to the system about which pages are needed?
    VSOS has this, and it turns out to be very useful.  Example:  A
    "large" fluid dynamics code which doesn't fit in memory (we only
    have 64 MBytes :-)  )  is structured so that it touches memory
    semi-sequentially.  The programmer calls ADVISE periodically to
    let the system know which pages will be needed next.  The 
    resulting system call sets PAGER going, and the process continues
    to run.  When the process gets to that memory, it is fortuitously
    already there.  This can result in unbelievable performance
    improvements in some cases.

2)  How much of this is available in Apollo's Un*x environment?  It
    would be nice if Aegis provided a "real" BSD Unix environment,
    but still had the "advanced" facilities available as system
    services.



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

darrell@sdcsvax.UUCP (03/31/87)

> Two more questions:  
> 
> 1)  Does Aegis have an advise function that allows a process to give
>     advanced information to the system about which pages are needed?

There is an advice call, but only one that is related to the touchahead
and growahead features I mentioned in my previous article.

> 2)  How much of this is available in Apollo's Un*x environment?  It
>     would be nice if Aegis provided a "real" BSD Unix environment,
>     but still had the "advanced" facilities available as system
>     services.

Well, it's ALL available from the Unix environment.  One might argue
that we could do a better job of making the interfaces to the feature
more culturally compatible with Unix (e.g. omitted return parameters
that tell you more than than "errno" and made the documentation more
opaque :), but that's really just noise.  Seriously though, I think we
could do a better job of making it clear that these are extensions
available to all, not just some wierdo "Aegis users".  Of course, using
the features might make your code less portable, but sometimes you just
have to decide how long coding with the constraints that existed in
1970 minicomputers is appropriate.

                    -- Nat Mishkin
                       Apollo Computer Inc.
                       Chelmsford, MA
                       {wanginst,yale,mit-eddie}!apollo!mishkin

-------