[comp.unix.wizards] UNIX Facilities for Interpreters

boba@iscuva.ISCS.COM (Bob Alexander) (06/09/87)

Modern, memory managed operating systems (like UNIX) have addressed
quite nicely certain special requirements of executable files.  In
particular (1) the file (text and data) need not be loaded into memory
in its entirety to begin executing, and (2) the pages can be shared
among processes that are executing them (both on disk and in memory).

As far as I know, those capabilities are not made available to
interpreters for their pseudo-code and data, even though they would be
equally as applicable as they are to "real" programs.  If 15 users are
running a program written in an interpretive language, the interpreter
code is shared, but the p-code exists separately for each user.  This
results in a major disadvantage in the use of interpretive languages to
produce production programs.  Interpretive systems are in quite wide
use today (e.g. shells, SQLs, (((Lisp))), Icon, etc., etc., [even
BASIC]), and as processor speeds increase, use of interpreters will
likely continue to grow.

There are a few ways of working this problem with existing UNIX
facilities, but the ones I've come up with so far are kluges.  My
reason for posting to this newsgroup is to get your reaction to a
possible new UNIX facility for this purpose.  I'll express my
suggestion in SVID format, sort of:

------------------------------

NAME

   vread -- read from a file into memory [but not really, maybe].

SYNOPSIS

   int vread(fildes, bufptr, nbyte)
   int fildes;
   char **bufptr;
   unsigned nbyte;

DESCRIPTION

   The function "vread" attempts to read "nbyte" bytes from the file
   associated with "fildes" into an allocated buffer whose address is
   returned in "bufptr".  This function is similar to read(ba_os)
   [read(ba_os) is SVIDese for read(2)] except for its implications
   concerning virtual memory and that it allocates a buffer rather than
   being given one.

   In a memory managed system, the contents of the file are not
   transferred into the program's memory space.  Instead, the file is
   "mapped" into an area of the caller's data space (involving no
   actual data transfer) and demand-paged into real memory, directly
   from its disk file, as accessed by the program.  As long as any such
   page remains pure, it never needs to be swapped out to disk, and can
   always be swapped in from its original location on disk.  If a page
   becomes dirty, it will have separate swap space allocated for it on
   disk and the page will be re-mapped to that space.  [This technique
   is often used for the initialized data portion of executing
   programs].

   Therefore, "vread" produces the appearance of reading from a file
   into memory, but no data actually transferred (in a memory managed
   system), and the system is afforded the opportunity to optimize by
   sharing the data among all processes accessing the file.  From the
   program's point of view, this operation is indistinguishable from an
   actual data transfer.  In non-memory-managed versions of UNIX,
   "vread" is implemented as a true data transfer.  Therefore, "vread"
   calls are portable between memory-managed and non-memory-managed
   systems.

   Since the system decides the address at which the space will be
   allocated, specific memory management requirements (such as page
   size and alignment) are hidden from the caller and are therefore of
   no concern to a program using this facility.

   In a memory managed system, use of "vread" can provide a significant
   optimization when large portions of files must be available in their
   entirety, but are sparsely and/or randomly accessed (such as the
   pseudo-code for an interpreter), and when it is desirable to share
   large, read-only files.

RETURN VALUE

   Same as read(ba_os).

ERRORS

   Same as read(ba_os).

-------------------------------------

For interpreters to take full advantage of this facility, they would
have to interpret their p-code "as is" as it sits on disk.  If they
modify the code, much of the advantage would be lost.

I'd be interested in hearing your comments and suggestions regarding
this idea; alternative ideas to solve this problem, ways other OSs have
dealt with it, implementation problems, or gross oversights.  What
would you think of a "read only" option for this function (a fourth
argument?), where the data would be mapped as read only (i.e.
protected).
-- 

Bob Alexander	   ISC Systems Corp.  Spokane, WA  (509)927-5445
		   UUCP: ihnp4!tektronix!reed!iscuva!boba

guy%gorodish@Sun.COM (Guy Harris) (06/10/87)

>    vread -- read from a file into memory [but not really, maybe].

Wow, *deja vu*.  Check out the manual page VREAD(2V) in the 4.1BSD
manuals.  Same name, same calling sequence, and, I believe, pretty
much the same semantics.

However, the manual page also says it "is likely to be replaced by
more general virtual memory facilities in the near future."  They
were, presumably, referring to the 4.2BSD "mmap" system call;
however, "mmap" wasn't really implemented for 4.2BSD.

Some systems *do* implement a real live "mmap" that permits you to
map files into your address space.  This is probably the way to go;
yes, it means you have to use different code on systems that support
"mmap" and systems that don't, but you may very well want to do so
*anyway* for performance reasons.  (Sometimes the appropriate layer
to put portable interfaces in isn't the system call layer or the
system library layer; it may be better to put it at a low layer in
the application.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

mike@peregrine.peregrine.com (Mike Wexler) (06/10/87)

In article <540@iscuva.ISCS.COM> boba@iscuva.ISCS.COM (Bob Alexander) writes:
>
>As far as I know, those capabilities are not made available to
>interpreters for their pseudo-code and data, even though they would be
>equally as applicable as they are to "real" programs. 
First, System V release 2 and above have a general purpose shared
memory facility.  We use this in our fourth generation language 
interpreter to cache allow centralized access to pseudo-code.  Second
some version of 4.2BSD(sequent for example), have implement and mmap
call that allow you to map a file into your address space.  This
almost exactly mimics your system call.
-- 
Mike Wexler
UUCP: (trwrb|scgvaxd)!felix!peregrine!mike
INTERNET: mike@peregrine.com
ATT: (714)855-3923

daveb@geac.UUCP (Dave Brown) (06/11/87)

In article <20776@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>Some systems *do* implement a real live "mmap" that permits you to
>map files into your address space.  This is probably the way to go.

  Unix's pa did that, and it worked quite well:  the file system was
mostly managed by the VMM, and the code was smaller than a seperate
file system and VMM (is was larger than the v6 swapper & file system,
though).  It is also trivial to build a file-system-flavored interface
to a VMM-based disk system, which is what both Multics and presumably
Apollo did.

 --dave (mutlicks is alive and living on the riviera) brown

wagner@iaoobelix.UUCP (06/11/87)

I endorse Guy Harris' posting. In addition to his remarks, note the following:

In most cases programs written in an interpreted language (LISP, PROLOG, even
BASIC) can be modified by the user from within sort of a toplevel. Shared
files are ok as long as users do not modify their contents! It is the same
with UNIX' shared images: only the text portions are shared among processes,
data areas reside in a process-private space. I think, however memory mapping
of files is a good idea, anyway, if used with large (read-only) data files
accessed by several users (e.g. font descriptor files).

Juergen Wagner,		     (USENET) ...seismo!unido!iaoobel!wagner
("Gandalf")			 Fraunhofer Institute IAO, Stuttgart

rwhite@nu3b2.UUCP (Robert C. White Jr.) (06/16/87)

In article <8300006@iaoobelix.UUCP>, wagner@iaoobelix.UUCP writes:
> I endorse Guy Harris' posting. In addition to his remarks, note the following:
> 
> with UNIX' shared images: only the text portions are shared among processes,
> data areas reside in a process-private space. I think, however memory mapping
> of files is a good idea, anyway, if used with large (read-only) data files
> accessed by several users (e.g. font descriptor files).

My questin stil stands...

if the following are intended to be true:

1) Files will be read into system-wide shared memory space.

2) previously unused files and/or file segments will be read into virtual
	memory as such parts are requested.

3) as long as some "open"s are valid all attempts possible are made to
	keep said files in memory.

4) request/space conflicts will be handeled in an intellegent manner

WHAT is the benifit over an approprately set quantity of disk-block buffers?
if 2 is fales either the choice of files will be limited or the entire
file will be read into memory even if only part is needed.  If 3 is false
the system will devour it's resources.  If 1 is false then their is no
point to any of it because the dataspace would be private.  If 4 is false
the system would halt at the instant the shared space ws full.

1 - 4 are the disk block buffering rules [as near as I learned them]

As the call defines a local buffer as teh second parameter as opposed
to "one being assigned by the system" [according to the man page]
it would seem to my uneducated mind that the individual in question
is overlooking the fact that the normal disk read takes place with two
levels of buffering already:

1) disk block buffering <as in the tunable parameter>
and
2) local process buffering <as in stdio.h>

his two layer buffering scheme is already taking place on a system wide
level.  His solution would simply add a third level of buffering in
the middle.  His "interpreter" point does not seem to make sense either
because most interpreters load a file in it's entirity and "psudo-complie"
them to get critical information <line numbers in BASIC and such>

This extra layer of buffering would seem an unnecessary drain on services
to me.

NOTE: It is possible that I have COMPLEETLY mis-understood his intent,
but after three or four readings of the preposed command/function
description someone is going to laborusly explain, point by point,
the difference between what I have expressed and what he intended
if I have infact got it wrong.

Robert.

guy@gorodish.UUCP (06/16/87)

> WHAT is the benifit over an approprately set quantity of disk-block buffers?

1) You have a less kludgy interface (sorry, "vread"-type calls are
kludges).

2) You don't have to worry about the quantity of disk-block buffers;
if the file takes more blocks than are in your buffer cache, no
problem.

3) You don't tie up parts of your buffer cache for long periods of
time while the pages are in your address space after a "vread".

(Note also that there may be systems where there *is* no conventional
buffer cache *per se*; the system might just do something similar to
"mmap" inside the kernel to do "read"s and "write"s.  Don't make
assumptions about how your OS works!)

> As the call defines a local buffer as teh second parameter as opposed
> to "one being assigned by the system" [according to the man page]
> it would seem to my uneducated mind that the individual in question
> is overlooking the fact that the normal disk read takes place with two
> levels of buffering already:
> 
> 1) disk block buffering <as in the tunable parameter>
> and
> 2) local process buffering <as in stdio.h>
> 
> his two layer buffering scheme is already taking place on a system wide
> level.  His solution would simply add a third level of buffering in
> the middle.

Huh?  If you "mmap" a file, there's only one level of buffering;
systems generally do not do paging I/O through the buffer cache.
Using "mmap" *reduces* the number of buffering layers, since the
local process buffer *is* the disk buffer.

> His "interpreter" point does not seem to make sense either
> because most interpreters load a file in it's entirity and "psudo-complie"
> them to get critical information <line numbers in BASIC and such>

Huh?  What is "his 'interpreter' point"?  The only thing I can find
in any of the responses that fits this description is Juergen
Wagner's point that interpreters usually let you modify the program
being interpreted, and thus that you can't share all of it under all
circumstances.  This point still stands; with copy-on-write, you can
probably arrange to share most of the code, assuming that the size of
the program being interpreted, in its internal form (most
interpreters don't run directly off the source code), is large
relative to the page size (or some similar quantum) of the system.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

chris@mimsy.UUCP (06/17/87)

>>WHAT is the benifit over an approprately set quantity of disk-block buffers?

In article <21204@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>3) You don't tie up parts of your buffer cache for long periods of
>time while the pages are in your address space after a "vread".
>(Note also that there may be systems where there *is* no conventional
>buffer cache *per se* ....)

And indeed, the new VM system under development for SunOS (and
eventually 4BSD) does away with the buffer cache, as have other
experimental Unix kernels.  The old PDP11 kernels did not have core
maps, and buffer caches were sensible; but more real memory means
that you should do something useful with it, which requires something
like a core map.  There is no good reason to have both a core map
(which remembers which pages of memory came from where on what
disks) and a buffer cache (which remembers which pages of memory
came from where on what disks).

	for (i = 0; i < count; i += CLBYTES / DEV_BSIZE)
		if (mfind(dev, bn + i))
			munhash(dev, bn + i);
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

boba@iscuva.ISCS.COM (Bob Alexander) (06/17/87)

Robert, I think your reading a lot more complexity into this file
mapping business than there needs to be.  The key points are:

Although the call looks something like a read, no data is actually
transferred.  This file data is just mapped in to the caller's address
space.

The mechanism is completely independent from the file system buffer
cache.

The file's data is accessed just like a memory access.

Decisions about what parts of the file are read into real memory when
are made by exactly the same mechanisms as regular virtual memory.
Parts (i.e. pages) of the file are simply paged in as needed.  No
special tuning is required.  No consideration need be given to whether
the file is open or not (in fact, the mapped file can be accessed
after it is closed -- just like a "read" buffer can be accessed after
the file is closed).  If a part of the file is frequently accessed, it
will likely stay in real memory; if not it becomes a candidate to be
swapped out.

The nice thing about mapped files is that, for the most part, they are
just a repackaging of existing facilities.
-- 

Bob Alexander	   ISC Systems Corp.  Spokane, WA  (509)927-5445
		   UUCP: ihnp4!tektronix!reed!iscuva!boba

rwhite@nu3b2.UUCP (06/18/87)

In article <544@iscuva.ISCS.COM>, boba@iscuva.ISCS.COM (Bob Alexander) writes:
> Robert, I think your reading a lot more complexity into this file
> mapping business than there needs to be.  The key points are:
> 
> Although the call looks something like a read, no data is actually
> transferred.  This file data is just mapped in to the caller's address
> space.

OK, so whare does the file come from??  Is it in a special disk
partition, and loaded at system boot time?

If no data is transferd, why does the call require an established
buffer pointer to an established buffer?

> The mechanism is completely independent from the file system buffer
> cache.

As I gathered, but for an open file the above questions aplied imply
that the system buffer catche & read could, and does, preform all the functions
of vread.

> The file's data is accessed just like a memory access.

Once again, what's the buffer for?  If it moves the pointer, what about
the allocate memory pointed to by the buffer pointer.  [required for
compatabality]

> Decisions about what parts of the file are read into real memory when
> are made by exactly the same mechanisms as regular virtual memory.
> Parts (i.e. pages) of the file are simply paged in as needed.  No
> special tuning is required.

Can the file still be used normally?  how does the system know to page it?
does this pass through the system cache buffers?  If it does, isin't
this an extra buffering level?  Lastly, what if someone modifies the file
in a normal manner while it is in the page buffer?

> No consideration need be given to whether
> the file is open or not (in fact, the mapped file can be accessed
> after it is closed -- just like a "read" buffer can be accessed after
> the file is closed).  If a part of the file is frequently accessed, it
> will likely stay in real memory; if not it becomes a candidate to be
> swapped out.

This sounds like the purpose of the system buffers and read to me?

> The nice thing about mapped files is that, for the most part, they are
> just a repackaging of existing facilities.

As I said... What's the point?

Well, I feel that I have pounded that into the ground.  I sort of get the
picture, but it seems like one more thing to needlessly go wrong.  With
the 600 block buffer cache on my 4mb 3B2 The likely-hood of my second
read of the same material will be in cache is high enough, weather my last
opperation was a read or wtrite.

For the applications mentioned it dosn't sound like much of an asset because
it is an unstable addition to a stable functionality.

Thankyou for the comments.
Robert.

(-: Feeling dense or smug, but not shure which, in S.D. :-)

chris@mimsy.UUCP (Chris Torek) (06/23/87)

>In article <544@iscuva.ISCS.COM> boba@iscuva.ISCS.COM (Bob Alexander) writes:
>>Robert, I think your reading a lot more complexity into this file
>>mapping business than there needs to be. ...  Although the call
>>looks something like a read, no data is actually transferred.
>>This file data is just mapped in to the caller's address space.

In article <684@nu3b2.UUCP> rwhite@nu3b2.UUCP (Robert C. White Jr.) writes:
>OK, so whare does the file come from??  Is it in a special disk
>partition, and loaded at system boot time?

The file comes from the file system, of course.  With mmap, you
write

	fd = open(filename, mode);
	if (fd < 0) ...		/* error */
	/* mmap(addr, len, protection, share, fd, offset); */
	mmap(buf, filesize, PROT_READ, MAP_SHARED, fd, (off_t) 0);

>If no data is transferd, why does the call require an established
>buffer pointer to an established buffer?

So that data can be transferred later:

	char c = buf[n];

causes a page fault, identifying some particular byte that must now
be read from the file.

>>The mechanism is completely independent from the file system buffer
>>cache.

>As I gathered, but for an open file the above questions aplied imply
>that the system buffer catche & read could, and does, preform all the
>functions of vread.

Not quite.  In particular, the semantics are different.  The character
buf[n] is automatically associated with the current contents of
location offset+n in file fd, where `current' means `at the time the
byte is read from memory'.  To do the same thing on a traditional
Unix system, you must do this:

	if (lseek(fd, offset+n, 0) == -1) ...	/* error */
	if (read(fd, &c, 1) != 1)		/* error */

In addition to being clumsier to code, this is much less efficient
on a paging machine than simply using the paging hardware.  In this
particular example, the kernel would mark invalid the appropriate
pages associated with the user buffer whenever any other programs
wrote over that file.  Until then, those pages would remain valid
and readable, and afterward, once referenced, those pages would be
reread automatically and again be valid and readable.

>>The file's data is accessed just like a memory access.

>Once again, what's the buffer for?

It names the addresses the user program wants to have reflect the
contents of the file.  For this reason, `addr' must be page aligned
(a restriction I consider bogus, although for efficiency. . .).

>>Decisions about what parts of the file are read into real memory when
>>are made by exactly the same mechanisms as regular virtual memory.
>>Parts (i.e. pages) of the file are simply paged in as needed.  No
>>special tuning is required.

>Can the file still be used normally?

*These*, now, are the sticky questions.  Yes.

>How does the system know to page it?

Programs that do not use mmap() must not see that it is being paged.
Programs that do use mmap() must see it being paged.  It is up to
the kernel to maintain the proper illusions.

>does this pass through the system cache buffers?

Typically, there are no system cache buffers.  Everything is done
with mirrors (page mapping, with copies made as necessary).

>Lastly, what if someone modifies the file in a normal manner while
>it is in the page buffer?

As to this I am uncertain.  If you have specified MAP_SHARED, it
seems that you should see such changes.  If you have said MAP_PRIVATE,
should the system copy the old pages and update the PTEs to point
to the new copies before overwriting the old data?  Or do you get
the changes anyway, MAP_PRIVATE meaning only that if you write to
the pages, they are copied?

One question you missed is `what happens if I map a 100K file and then
someone truncates it to zero bytes'?  (Answer:  Who knows?  Some say
SIGBUS.)

>>No consideration need be given to whether the file is open or not
>>(in fact, the mapped file can be accessed after it is closed --
>>just like a "read" buffer can be accessed after the file is closed).

This sounds somewhat suspicious, but certainly convenient.

>>The nice thing about mapped files is that, for the most part, they are
>>just a repackaging of existing facilities.

>As I said... What's the point?

There are several:
  - The new kernel runs faster (for some selected set of benchmarks);
  - The new kernel code is simpler;
  - Mapped file semantics are more convenient for some programs;
  - Mapped files provide shared memory.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

daveb@geac.UUCP (Dave Brown) (06/24/87)

Bob Alexander asked:
>Can the file still be used normally?

  Maybe even better...

  On Multics I once saw a demonstration (done by Paul Stachour, if
memory serves), of two persons editing the same place in the same
file at the same time...  the mapping mechanism searialized the
accesses so they didn't mash the same character at the same time, 
and Emacs laboured mightily to keep its screens updated in the face
of the file changing *under* it.
  It was impressive, but not directly useful: it only worked because
the two people were in the same office discussing the changes on a
secondary channel (voice).. But on a TP system, it would be a joy!
Instead of complex locks & commits on disk blocks, you have them
on memory.
  Of course, this opens a whole new pandora's box of implementation
problems. (:-))

	--dave

-- 
Computer Science  | David (Collier-) Brown 
loses its memory  | Geac Computers International Inc.
every 6 months    | 350 Steelcase Road,Markham, Ontario,
           -me.   | CANADA, L3R 1B3 (416) 475-0525 x3279