[net.arch] Why Not Virtual Files?

steveh@hammer.UUCP (Stephen Hemminger) (04/26/84)

Ever heard of Multics?

tjt@kobold.UUCP (04/26/84)

Virtual files (the ability to have data in a file appear as part of
your address space) are used extensively in Multics, and are available
on several other operating systems to varying extents.  The
"demand-loaded" text programs of 4BSD are an example of this, as is the
ill-fated vread system call from 4.1BSD.  The original 4.2BSD
specification included more facilities for mapping files into an
address space, although this has been deferred to 4.3BSD.

By the way, "virtual files" isn't all beer and skittles.  Ever try
mapping your tty into memory?

-- 
	Tom Teixeira,  Massachusetts Computer Corporation.  Westford MA
	...!{ihnp4,harpo,decvax}!masscomp!tjt   (617) 692-6200 x275

clyde@ut-ngp.UUCP (04/26/84)

It's been done: TENEX/TWENEX (TOPS-20).  Space for memory and files
is allocated in 'pages', and pages of disk get/can be mapped into
pages of memory so that changes to memory will be reflected in the
corresponding disk file.
-- 
Clyde W. Hoover @ Univ. of Texas Computation Center; Austin, Texas  
(Shouter-To-Dead-Parrots)
"The ennui is overpowering" - Marvin 
clyde@ut-ngp.{UUCP,ARPA} clyde@ut-sally.{UUCP,ARPA} ihnp4!ut-ngp!clyde

ka@hou3c.UUCP (Kenneth Almquist) (04/26/84)

MULTICS does what you suggest.

The problem with the approach is that while it unifies memory and
disk files, it does not unify disk files and devices.  In MULTICS,
there are two ways to read a file.  One is to map it into memory.
The other is to use I/O system calls.  If you choose the former
approach then you are assuming that the user will never want to
substitute a device (such as his terminal) for the disk file.  I
would argue that good programming practice dictates using the I/O
system calls to access a file if you simply want to read or write
the file sequentially.
				Kenneth Almquist

jsq@ut-sally.UUCP (John Quarterman) (04/27/84)

MULTICS.
-- 
John Quarterman, CS Dept., University of Texas, Austin, Texas 78712 USA
jsq@ut-sally.ARPA, jsq@ut-sally.UUCP, {ihnp4,seismo,ctvax}!ut-sally!jsq

david@iwu1a.UUCP (david) (04/27/84)

Has there ever been an operating system that used paging to  make
disk files appear resident in main memory?  Yes, MULTICS did that
and much more.  In MULTICS  a  file  could  be  referenced  as  a
segment  in  memory or as a stream (as in the UNIX (TM) operating
system).  Pages from the file were brought into main memory  from
disk using demand paging.  MULTICS, however, went further than to
just access data files with paging--programs  were  brought  into
memory  for  execution  with  this  same  mechanism.   And when a
program made a function call that function could  be  in  another
file  which  would  be dynamically paged into memory when needed.
MULTICS demand paging therefore  eliminated  the  need  for  link
editors  and  loaders,  and  allowed  any  library to be a shared
library.

The cost for all these features was a complex  memory  management
mechanism  which  reduced  performance.   MULTICS,  however,  was
developed over ten years ago and hardware technology continues to
advance.  Perhaps it is now worth another try.

		David Scheibelhut

koved@umcp-cs.UUCP (04/27/84)

There have been several systems which treat ALL memory as a single
level store.  All memory references are to objects in the system,
and the underlying hardware/software takes care of the rest.
If the object being referenced is not in main memory (equivalent
to a page or segment fault), the system retrieves the object from
an external source (ie: disk, network).

This scheme is essentially the scheme proposed by the Smalltalk-80
LOOM (Large Object Oriented Memory) mechanism used to manage objects
in memory or secondary storage.  IBM System/38 does the same thing
in microcode (another obscure object oriented system).  The Intel 432
may work the same way.

The way around the addressing problem is to only allow references to
objects and offsets (ie: segment number, offset), and only the underlying
hardware knows about bits and bytes.

The uniform addressing aproach has a number of advantages.  If you are
interested, I can send you a paper which I am writing on a closely related
topic. 

Several systems use uniform addressing to refer to objects in main
memory, disk files, or on remote (distributed) systems.

Larry
-- 
Spoken: Larry Koved
Arpa:   koved.umcp-cs@CSNet-relay
Uucp:...{allegra,seismo}!umcp-cs!koved

fostel@ncsu.UUCP (Gary Fostel) (04/27/84)

    Well, the idea of virtual files is not very new, MULTICS stands out as
    probably the first to give the idea a good work-out.  But I think there 
    is a related issue of programming style: most files are processed in a
    strict sequential fashion: if you are going to treat my file like an
    array of byte (or somesuch) then who will maintain the file pointer for
    all us old-timers who still thing sequential files have something to offer?
    Are you going to force me to maintain my own file pointer?   Yeccch!   
    Give me a nice simple get/putchar or give me death!

    And anyway, what of io redirection and pipes and things?  How am I going
    to randomly access a virtual file which is comming in little dribbles
    out of an inherently sequential virtual device like a pipe? Or a terminal?

    An elegant idea, but I think there is still an apriori desire to have
    good-ol sequential little bit at a time processing.
    ----GaryFostel----

darrelj@sdcrdcf.UUCP (Darrel VanBuer) (04/27/84)

While it's based on address spaces of only about a megabyte (2^18*36 bit),
TENEX running on DEC PDP-10s and 20s has a system call called PMAP which
declares that a set of pages in a file are to be mapped into a set of memory
pages.  Following this call, the mapped pages are accessed by memory
references (including all normal demand paging effects directly to the
mapped file) for both reads and writ.  In fact, executing a program is based
on the same mechanism--do a PMAP of the file with the program to memory.
PMAP has a mode which simplifies sharing programs--a file can be mapped
read-only, with writes going into a delta file (the latter is a common case,
so large systems like interlisp generally map two files on start up--a
makesys file [usually a major version] and a sysout file [a saved user delta
file].
-- 
Darrel J. Van Buer, PhD
System Development Corp.
2500 Colorado Ave
Santa Monica, CA 90406
(213)820-4111 x5449
...{allegra,burdvax,cbosgd,hplabs,ihnp4,sdccsu3,trw-unix}!sdcrdcf!darrelj
VANBUER@USC-ECL.ARPA

zrm@mit-eddie.UUCP (Zigurd R. Mednieks) (04/28/84)

Kenneth Almquist stated that files-as-segments leaves out the notion
of devices. True, but devices are not needed for terminal i/o. One
could use the model-view-controller idea in Smalltalk-80. Lets all
hear it for a personal Multics with 64 bit addressing and object
oriented systems programming!

Cheers,
Zig

spaf@gatech.UUCP (Gene Spafford) (04/29/84)

Yes, mapping files into memory has been done before.  Multics is
one of the prime examples of this.  

The Clouds system, which I am helping to design, will map files
into the address space of users.  Our prototype implementation
is being done on Vaxen.  The design also implements mapping of
abstract data objects (including code) into memory, and shared
data objects.  We'll just have to wait and see how efficient
it all is and how natural it is to use.

-- 
Off the Wall of Gene Spafford
The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332
CSNet:	Spaf @ GATech		ARPA:	Spaf.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!spaf

ac4@pucc-h (Putnam) (04/30/84)

The CDC CYBER 205 (a descendant of the Star-100 and the CYBER 203) allows
you as a programmer to decide how best to address your "file" space: you can use
the normal sequential file I/O, but you can also "map-in" a file to a
specified location in your virtual address space.  Another interesting element
of this architecture: the machine address all of memory using virtual *BIT*
addresses... 48 bits in every address!  Makes for interesting programming.

	Tom Putnam
	...!pur-ee!pucc-h:ac4

tom@umcp-cs.UUCP (05/02/84)

Traditional, disk based, filesystems have several important
advantages over memory-mapped filesystems in the event
of a crash.

Since traditional filesystems force the process to wait for
the actual disk access, they are kept more up to date (on disk)
than a memory-mapped filesystem would be (unless you force 
a disk write after each file write, in which case you lose
some of the performance of a memory-mapped filesystem).
More importantly, your process must have a way to know
when data it has written is safely non-volitile.

A file that is in memory is an easy victim to the runaway
rabid cpu, and will have to be backed up to its disk
saved previous state if it is to be salvaged reasonably.
A disk controller is far safer from accidental file munging,
and disk files usually weather cpu crashes.

Clearly the memory-mapped filesystem is much faster than
a traditional filesystem, but these issues must be
considered.  The problem of cpu plague will even remain
when we can afford to use non-volatile main memory.

With programmers getting more and more used to large chunks
of (virtual) memory, a hardware breakthrough will be required
to build a real machine with non-volitile memory.

	Tom Melton
	University of Maryland
	tom@maryland
	...decvax!harpo!seismo!umcp-cs!tom

guy@rlgvax.UUCP (Guy Harris) (05/02/84)

> Traditional, disk based, filesystems have several important
> advantages over memory-mapped filesystems in the event
> of a crash.

> Since traditional filesystems force the process to wait for
> the actual disk access, they are kept more up to date (on disk)
> than a memory-mapped filesystem would be (unless you force 
> a disk write after each file write, in which case you lose
> some of the performance of a memory-mapped filesystem).
> More importantly, your process must have a way to know
> when data it has written is safely non-volitile.

> A file that is in memory is an easy victim to the runaway
> rabid cpu, and will have to be backed up to its disk
> saved previous state if it is to be salvaged reasonably.
> A disk controller is far safer from accidental file munging,
> and disk files usually weather cpu crashes.

Well, since the UNIX file system has a cache in the kernel, this
problem occurs with UNIX's non-memory-mapped file system.  What is
needed is a way to force a given "disk block", whether it's in your
process' virtual memory or in the kernel buffer cache, to be flushed
to disk on command (and for the process to be blocked until it is).
4.2BSD has the "fsync" system call, and at least the VAX-11 version
of System V has an undocumented file descriptor flag (settable by
"fcntl") which forces all writes to that file descriptor to be synchronous
(it's undocumented because it's a possibly-unintentional side-effect
of the way they implemented synchronous writes of file system data
structures).

I suspect these issues have been discussed, both for memory-mapped and
non-memory-mapped filesystems, by the same people who came up with
notions like "stable storage" when implementing crash-resistant file
and database systems.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

mat@hou5d.UUCP (05/04/84)

Y'know, this has been done.  A paper describing an implementation for
v6 (!) was given at the second Usenix (then UUG) conference (the one
at Columbia U.)


It IS a useful file model, but there are things that the stream model
does better.  If we are going to make UN*X a ``big system'', there's room
for both.  Otherwise, let's forget it.  With computer hardware getting
cheaper, I kind of like the idea ..
-- 

					from Mole End
					Mark Terribile
		     (scrape..dig)	hou5d!mat
    ,..      .,,       ,,,   ..,***_*.

berenson@regal.DEC (Hal Berenson DTN 381 2694) (05/04/84)

> Traditional, disk based, filesystems have several important
> advantages over memory-mapped filesystems in the event
> of a crash.
> 
> Since traditional filesystems force the process to wait for
> the actual disk access, they are kept more up to date (on disk)
> than a memory-mapped filesystem would be (unless you force 
> a disk write after each file write, in which case you lose
> some of the performance of a memory-mapped filesystem).
> More importantly, your process must have a way to know
> when data it has written is safely non-volitile.


Most commercial operating systems have added cacheing to their 
"traditional" filesystems in order to improve performance.  However, 
most of these implementations seem to be write-through so that the 
odds of losing any data are minimal.  The real problems show up in that 
these same o/s's which provide write-through for data use a write-back 
scheme for file-system data structures.  The result is bad eof pointers 
and lost or multiple-allocated sectors.

TOPS-20 is a memory mapped system which simulates traditional i/o for 
programmers who so desire.  You do take some risks however:  It's a 
write-back scheme.  Depending on the resource situation, data pages can 
end up in the page file, instead of the data file, for seconds, minutes, 
or hours unless the file is closed.  If the system crashes, bye bye 
data.

If you use TOPS-20's memory mapping correctly, you can avoid write-back
problems AND obtain some major benefits.  If one process has read a page
off of disk into memory, other processes just map to it, avoiding any
actual i/o.  There is a system service to force pages back to the data
file.  There is also the capability to say "DON'T write any pages back
to the data file unless I explicitly force them back" and "Reset the
pointer to this page to point back to the copy on disk" (i.e., forget
about those updates I just made).  DBMS-20 uses memory-mapped I/O quite
effectively to provide true dbms data integrity with high performance,
particularly in the simultaneous access environment. 

Hal Berenson
Digital Equipment Corporation
...decvax!decwrl!rhea!regal!berenson

chris@umcp-cs.UUCP (05/08/84)

Actually, you don't need a special system call to map a file into
virtual memory.  All you need is ``read'' and ``write'', and a way
to protect memory areas.  If ``read'' just marks each page as
fill on demand (similar to the way ``vread'' works but with the
bugs fixed), and you then take away write permission for yourself
on those pages, and catch addressing faults, you can set things up
so that anytime you modify such a page, your routine takes over,
redoes the modify with permission turned on, write()s the page back
wherever it belongs, and then turns permissions off again.

See, no extra system calls beyond the ones already in System V.
(I think - I haven't read the System V manuals, but they do have
memory protection stuff, don't they?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

sharma@uicsg.UUCP (05/08/84)

#R:ncsu:-257200:uicsg:3200003:000:71
uicsg!sharma    May  7 21:58:00 1984


Wouldn't you expect the system to save its core before it goes down?

lincoln@eosp1.UUCP (Dick Lincoln) (05/09/84)

> Wouldn't you expect the system to save its core before it goes down?

Generally speaking this feature disappeared when physical memory
stopped being real "core" and became some form of dynamic RAM - many
moons ago.

opus@drutx.UUCP (05/10/84)

> > Wouldn't you expect the system to save its core before it goes down?

> Generally speaking this feature disappeared when physical memory
> stopped being real "core" and became some form of dynamic RAM - many
> moons ago.

It did?  Then what's on those tapes the operators make when one of
our systems crashes?  (Hint:  it's not chopped liver.)

Jim Shankland
..!ihnp4!druxy!opus

rde@ukc.UUCP (R.D.Eager) (06/20/84)

    I  have  used  a  mainframe  system (not MULTICS) which uses virtual
files exclusively for about 5 years.  The answer is that getchar/putchar
are still there; they  simply  treat  the  file  as  one  large  buffer.
Normally,  the  C library (or whatever) still has a pointer, but it only
relates to the current buffer load.  Thus there is no difference as  the
old-timers see it.

    Having  got  that  out  of  the  way,  let  me  say  that for *some*
applications virtual files are pretty neat.  Trouble is, most  languages
don't  have  any facility to get at them!  We have a FORTRAN system (OK,
OK, let's assume it's all been said) which allows you to map COMMON onto
a file.  Great for large amounts of data which are updated in itty bitty
bits.  I could mention other benefits, but I think  my  main  point  was
really  that  conventional  I/O is still possible and no less efficient;
the language library actually has less to do.

      Bob Eager University of Kent UK

      (...ukc!rde)