[comp.os.research] UNIX Facilities for Interpreters

boba@iscuva.ISCS.COM (Bob Alexander) (06/09/87)

Modern, memory managed operating systems (like UNIX) have addressed
quite nicely certain special requirements of executable files.  In
particular (1) the file (text and data) need not be loaded into memory
in its entirety to begin executing, and (2) the pages can be shared
among processes that are executing them (both on disk and in memory).

As far as I know, those capabilities are not made available to
interpreters for their pseudo-code and data, even though they would be
equally as applicable as they are to "real" programs.  If 15 users are
running a program written in an interpretive language, the interpreter
code is shared, but the p-code exists separately for each user.  This
results in a major disadvantage in the use of interpretive languages to
produce production programs.  Interpretive systems are in quite wide
use today (e.g. shells, SQLs, (((Lisp))), Icon, etc., etc., [even
BASIC]), and as processor speeds increase, use of interpreters will
likely continue to grow.

There are a few ways of working this problem with existing UNIX
facilities, but the ones I've come up with so far are kluges.  My
reason for posting to this newsgroup is to get your reaction to a
possible new UNIX facility for this purpose.  I'll express my
suggestion in SVID format, sort of:

------------------------------

NAME

   vread -- read from a file into memory [but not really, maybe].

SYNOPSIS

   int vread(fildes, bufptr, nbyte)
   int fildes;
   char **bufptr;
   unsigned nbyte;

DESCRIPTION

   The function "vread" attempts to read "nbyte" bytes from the file
   associated with "fildes" into an allocated buffer whose address is
   returned in "bufptr".  This function is similar to read(ba_os)
   [read(ba_os) is SVIDese for read(2)] except for its implications
   concerning virtual memory and that it allocates a buffer rather than
   being given one.

   In a memory managed system, the contents of the file are not
   transferred into the program's memory space.  Instead, the file is
   "mapped" into an area of the caller's data space (involving no
   actual data transfer) and demand-paged into real memory, directly
   from its disk file, as accessed by the program.  As long as any such
   page remains pure, it never needs to be swapped out to disk, and can
   always be swapped in from its original location on disk.  If a page
   becomes dirty, it will have separate swap space allocated for it on
   disk and the page will be re-mapped to that space.  [This technique
   is often used for the initialized data portion of executing
   programs].

   Therefore, "vread" produces the appearance of reading from a file
   into memory, but no data actually transferred (in a memory managed
   system), and the system is afforded the opportunity to optimize by
   sharing the data among all processes accessing the file.  From the
   program's point of view, this operation is indistinguishable from an
   actual data transfer.  In non-memory-managed versions of UNIX,
   "vread" is implemented as a true data transfer.  Therefore, "vread"
   calls are portable between memory-managed and non-memory-managed
   systems.

   Since the system decides the address at which the space will be
   allocated, specific memory management requirements (such as page
   size and alignment) are hidden from the caller and are therefore of
   no concern to a program using this facility.

   In a memory managed system, use of "vread" can provide a significant
   optimization when large portions of files must be available in their
   entirety, but are sparsely and/or randomly accessed (such as the
   pseudo-code for an interpreter), and when it is desirable to share
   large, read-only files.

RETURN VALUE

   Same as read(ba_os).

ERRORS

   Same as read(ba_os).

-------------------------------------

For interpreters to take full advantage of this facility, they would
have to interpret their p-code "as is" as it sits on disk.  If they
modify the code, much of the advantage would be lost.

I'd be interested in hearing your comments and suggestions regarding
this idea; alternative ideas to solve this problem, ways other OSs have
dealt with it, implementation problems, or gross oversights.  What
would you think of a "read only" option for this function (a fourth
argument?), where the data would be mapped as read only (i.e.
protected).
-- 

Bob Alexander	   ISC Systems Corp.  Spokane, WA  (509)927-5445
		   UUCP: ihnp4!tektronix!reed!iscuva!boba

darrell@sdcsvax.UUCP (06/11/87)

This (a virtual read operation) can be simulated on SystemV with the
shared memory operations (shmctl(2), shmget(2) and shmop(2)).  This is
not to say that vread is a bad idea, but that it could be a library routine
rather than a system call, at least on SystemV.

	../Dave Mason,	TM Software Associates	(Compilers & System Consulting)
	..!{utzoo seismo!mnetor utcsri utgpu lsuc}!tmsoft!mason

darrell@sdcsvax.UUCP (06/13/87)

In article <3293@sdcsvax.UCSD.EDU>, boba@iscuva.ISCS.COM (Bob Alexander) writes:
+> Modern, memory managed operating systems (like UNIX) have addressed
+> quite nicely certain special requirements of executable files.  In
+> particular (1) the file (text and data) need not be loaded into memory
+> in its entirety to begin executing, and (2) the pages can be shared
+> among processes that are executing them (both on disk and in memory).
+> 
				:
+> 
+> For interpreters to take full advantage of this facility, they would
+> have to interpret their p-code "as is" as it sits on disk.  If they
+> modify the code, much of the advantage would be lost.
+> 
+> I'd be interested in hearing your comments and suggestions regarding
+> this idea; alternative ideas to solve this problem, ways other OSs have
+> dealt with it, implementation problems, or gross oversights.  What
+> would you think of a "read only" option for this function (a fourth
+> argument?), where the data would be mapped as read only (i.e.
+> protected).
+> -- 
+> 
more recent versions of system v have improved support for interpreters.
for one, there is a new region type called "doubly mapped memory".  a
single region can be treated as both text and data (exactly what interpreters
do with their p-code).  add that with the copy on write feature of
the demand paging system and you have most of the advantages that you
desire.

danny chen
ihnp4!homxc!dwc



-- 
Darrell Long
Department of Computer Science & Engineering, UC San Diego, La Jolla CA 92093
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp

Avadis.Tevanian@wb1.cs.cmu.edu (Avie) (06/27/87)

In article <3293@sdcsvax.UCSD.EDU|, boba@iscuva.ISCS.COM (Bob Alexander) writes:
| Modern, memory managed operating systems (like UNIX) have addressed
| quite nicely certain special requirements of executable files.  In
| particular (1) the file (text and data) need not be loaded into memory
| in its entirety to begin executing, and (2) the pages can be shared
| among processes that are executing them (both on disk and in memory).
| 
			:
| 
| For interpreters to take full advantage of this facility, they would
| have to interpret their p-code "as is" as it sits on disk.  If they
| modify the code, much of the advantage would be lost.
| 
| I'd be interested in hearing your comments and suggestions regarding
| this idea; alternative ideas to solve this problem, ways other OSs have
| dealt with it, implementation problems, or gross oversights.  What
| would you think of a "read only" option for this function (a fourth
| argument?), where the data would be mapped as read only (i.e.
| protected).

This type of stuff is trivial in modern operating systems like Mach (even
natural).  You simply map your file copy-on-write - all unmodified pages are
automatically demand paged and shared - modified pages are automicatally
copied as necessary.  Using a system like Mach, which maintains a virtual
memory cache of pages even after they are no longer in use, you can rerun a
program later (or access a file later) without any disk I/Os - tremendously
reducing total elapsed time of operations.  On machines with large memories
and/or high cpu speed to disk access time ratios performance wins can be
stunning.

	Avie