[comp.arch] VM used for filesystems

rouellet@crhc.uiuc.edu (Roland G. Ouellette) (12/21/90)

> Including output that you must guarantee has been written to
> nonvolatile store?  In other words, output that survives
> operating system crashes?

If I'm not mistaken Multics laid to rest all kinds of problems of
using VM fro a filesystem including this one.
--
= Roland G. Ouellette			ouellette@tarkin.enet.dec.com	=
= 1203 E. Florida Ave			rouellet@[dwarfs.]crhc.uiuc.edu	=
= Urbana, IL 61801	   "You rescued me; I didn't want to be saved." =
=							- Cyndi Lauper	=

barmar@think.com (Barry Margolin) (12/22/90)

In article <ROUELLET.90Dec21025732@pinnacle.crhc.uiuc.edu> rouellet@crhc.uiuc.edu (Roland G. Ouellette) writes:
>> Including output that you must guarantee has been written to
>> nonvolatile store?  In other words, output that survives
>> operating system crashes?
>If I'm not mistaken Multics laid to rest all kinds of problems of
>using VM fro a filesystem including this one.

While Multics does a good job of this, it does have a few problems.

Multics's general solution to the above problem is a mechanism called
"Emergency ShutDown" (ESD).  This is a procedure in the wired
bootstrap/shutdown code that is normally invoked during a crash or manually
by an operator doing a non-graceful shutdown, and walks through the page
tables flushing pages to disk.  It's somewhat analogous to Unix sync().

However, when we were enhancing our database system (MRDS - Multics
Relational Data Store, the first commercially-available relational database
system) to include transactions and journaling, we realized that this
wasn't good enough.  First of all, it is possible to manually bypass ESD
during a shutdown (you might do this if you suspect a kernel bug has
corrupted the page tables, so you'd rather lose a few pages than overwrite
the wrong disk sectors).  Second, proper before-journaling requires that
pages be written to permanent storage in a particular order: the journal
entry for a change must precede the change itself.  Third, we wanted to be
able to recover from some forms of media error.

However, we didn't want to abandon memory-mapped files, so we enhanced
them.  I don't remember all the details (I was only peripherally involved
in this project, mostly because my officemate was the project leader), but
I'll try to describe it as best as I remember (I'm pretty sure it is more
general than I'll describe, but I'm simplifying as well).  We added an
attribute to files called "synchronized"; only the routines for creating a
database file could set this attribute.  There is also a system-wide
journal file.  When mapping a synchronized file a process must also map the
journal file.  I think there is a system call to indicate that a journal
page is a snapshot of a particular synchronized file page, and the paging
code that writes to disk guarantees that it will not write a page of a
synchronized file before the corresponding snapshot page.  There are also
some system calls that are used during transaction commit and abort, but I
can't remember the specifics.

ESD is good enough for most uses of files (although there was the one time
when a confused administrator got an error from ESD, so he cleared memory
and then invoked it again, thus writing zero pages all over the place).
Synchronized files handle the extraordinary cases, without throwing out
the elegance and performance of memory-mapped files.

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar