boba@iscuva.ISCS.COM (Bob Alexander) (06/09/87)
Modern, memory managed operating systems (like UNIX) have addressed quite nicely certain special requirements of executable files. In particular (1) the file (text and data) need not be loaded into memory in its entirety to begin executing, and (2) the pages can be shared among processes that are executing them (both on disk and in memory). As far as I know, those capabilities are not made available to interpreters for their pseudo-code and data, even though they would be equally as applicable as they are to "real" programs. If 15 users are running a program written in an interpretive language, the interpreter code is shared, but the p-code exists separately for each user. This results in a major disadvantage in the use of interpretive languages to produce production programs. Interpretive systems are in quite wide use today (e.g. shells, SQLs, (((Lisp))), Icon, etc., etc., [even BASIC]), and as processor speeds increase, use of interpreters will likely continue to grow. There are a few ways of working this problem with existing UNIX facilities, but the ones I've come up with so far are kluges. My reason for posting to this newsgroup is to get your reaction to a possible new UNIX facility for this purpose. I'll express my suggestion in SVID format, sort of: ------------------------------ NAME vread -- read from a file into memory [but not really, maybe]. SYNOPSIS int vread(fildes, bufptr, nbyte) int fildes; char **bufptr; unsigned nbyte; DESCRIPTION The function "vread" attempts to read "nbyte" bytes from the file associated with "fildes" into an allocated buffer whose address is returned in "bufptr". This function is similar to read(ba_os) [read(ba_os) is SVIDese for read(2)] except for its implications concerning virtual memory and that it allocates a buffer rather than being given one. In a memory managed system, the contents of the file are not transferred into the program's memory space. Instead, the file is "mapped" into an area of the caller's data space (involving no actual data transfer) and demand-paged into real memory, directly from its disk file, as accessed by the program. As long as any such page remains pure, it never needs to be swapped out to disk, and can always be swapped in from its original location on disk. If a page becomes dirty, it will have separate swap space allocated for it on disk and the page will be re-mapped to that space. [This technique is often used for the initialized data portion of executing programs]. Therefore, "vread" produces the appearance of reading from a file into memory, but no data actually transferred (in a memory managed system), and the system is afforded the opportunity to optimize by sharing the data among all processes accessing the file. From the program's point of view, this operation is indistinguishable from an actual data transfer. In non-memory-managed versions of UNIX, "vread" is implemented as a true data transfer. Therefore, "vread" calls are portable between memory-managed and non-memory-managed systems. Since the system decides the address at which the space will be allocated, specific memory management requirements (such as page size and alignment) are hidden from the caller and are therefore of no concern to a program using this facility. In a memory managed system, use of "vread" can provide a significant optimization when large portions of files must be available in their entirety, but are sparsely and/or randomly accessed (such as the pseudo-code for an interpreter), and when it is desirable to share large, read-only files. RETURN VALUE Same as read(ba_os). ERRORS Same as read(ba_os). ------------------------------------- For interpreters to take full advantage of this facility, they would have to interpret their p-code "as is" as it sits on disk. If they modify the code, much of the advantage would be lost. I'd be interested in hearing your comments and suggestions regarding this idea; alternative ideas to solve this problem, ways other OSs have dealt with it, implementation problems, or gross oversights. What would you think of a "read only" option for this function (a fourth argument?), where the data would be mapped as read only (i.e. protected). -- Bob Alexander ISC Systems Corp. Spokane, WA (509)927-5445 UUCP: ihnp4!tektronix!reed!iscuva!boba
guy%gorodish@Sun.COM (Guy Harris) (06/10/87)
> vread -- read from a file into memory [but not really, maybe].
Wow, *deja vu*. Check out the manual page VREAD(2V) in the 4.1BSD
manuals. Same name, same calling sequence, and, I believe, pretty
much the same semantics.
However, the manual page also says it "is likely to be replaced by
more general virtual memory facilities in the near future." They
were, presumably, referring to the 4.2BSD "mmap" system call;
however, "mmap" wasn't really implemented for 4.2BSD.
Some systems *do* implement a real live "mmap" that permits you to
map files into your address space. This is probably the way to go;
yes, it means you have to use different code on systems that support
"mmap" and systems that don't, but you may very well want to do so
*anyway* for performance reasons. (Sometimes the appropriate layer
to put portable interfaces in isn't the system call layer or the
system library layer; it may be better to put it at a low layer in
the application.)
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
guy@sun.com
mike@peregrine.peregrine.com (Mike Wexler) (06/10/87)
In article <540@iscuva.ISCS.COM> boba@iscuva.ISCS.COM (Bob Alexander) writes: > >As far as I know, those capabilities are not made available to >interpreters for their pseudo-code and data, even though they would be >equally as applicable as they are to "real" programs. First, System V release 2 and above have a general purpose shared memory facility. We use this in our fourth generation language interpreter to cache allow centralized access to pseudo-code. Second some version of 4.2BSD(sequent for example), have implement and mmap call that allow you to map a file into your address space. This almost exactly mimics your system call. -- Mike Wexler UUCP: (trwrb|scgvaxd)!felix!peregrine!mike INTERNET: mike@peregrine.com ATT: (714)855-3923
daveb@geac.UUCP (Dave Brown) (06/11/87)
In article <20776@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: >Some systems *do* implement a real live "mmap" that permits you to >map files into your address space. This is probably the way to go. Unix's pa did that, and it worked quite well: the file system was mostly managed by the VMM, and the code was smaller than a seperate file system and VMM (is was larger than the v6 swapper & file system, though). It is also trivial to build a file-system-flavored interface to a VMM-based disk system, which is what both Multics and presumably Apollo did. --dave (mutlicks is alive and living on the riviera) brown
wagner@iaoobelix.UUCP (06/11/87)
I endorse Guy Harris' posting. In addition to his remarks, note the following: In most cases programs written in an interpreted language (LISP, PROLOG, even BASIC) can be modified by the user from within sort of a toplevel. Shared files are ok as long as users do not modify their contents! It is the same with UNIX' shared images: only the text portions are shared among processes, data areas reside in a process-private space. I think, however memory mapping of files is a good idea, anyway, if used with large (read-only) data files accessed by several users (e.g. font descriptor files). Juergen Wagner, (USENET) ...seismo!unido!iaoobel!wagner ("Gandalf") Fraunhofer Institute IAO, Stuttgart
rwhite@nu3b2.UUCP (Robert C. White Jr.) (06/16/87)
In article <8300006@iaoobelix.UUCP>, wagner@iaoobelix.UUCP writes: > I endorse Guy Harris' posting. In addition to his remarks, note the following: > > with UNIX' shared images: only the text portions are shared among processes, > data areas reside in a process-private space. I think, however memory mapping > of files is a good idea, anyway, if used with large (read-only) data files > accessed by several users (e.g. font descriptor files). My questin stil stands... if the following are intended to be true: 1) Files will be read into system-wide shared memory space. 2) previously unused files and/or file segments will be read into virtual memory as such parts are requested. 3) as long as some "open"s are valid all attempts possible are made to keep said files in memory. 4) request/space conflicts will be handeled in an intellegent manner WHAT is the benifit over an approprately set quantity of disk-block buffers? if 2 is fales either the choice of files will be limited or the entire file will be read into memory even if only part is needed. If 3 is false the system will devour it's resources. If 1 is false then their is no point to any of it because the dataspace would be private. If 4 is false the system would halt at the instant the shared space ws full. 1 - 4 are the disk block buffering rules [as near as I learned them] As the call defines a local buffer as teh second parameter as opposed to "one being assigned by the system" [according to the man page] it would seem to my uneducated mind that the individual in question is overlooking the fact that the normal disk read takes place with two levels of buffering already: 1) disk block buffering <as in the tunable parameter> and 2) local process buffering <as in stdio.h> his two layer buffering scheme is already taking place on a system wide level. His solution would simply add a third level of buffering in the middle. His "interpreter" point does not seem to make sense either because most interpreters load a file in it's entirity and "psudo-complie" them to get critical information <line numbers in BASIC and such> This extra layer of buffering would seem an unnecessary drain on services to me. NOTE: It is possible that I have COMPLEETLY mis-understood his intent, but after three or four readings of the preposed command/function description someone is going to laborusly explain, point by point, the difference between what I have expressed and what he intended if I have infact got it wrong. Robert.
guy@gorodish.UUCP (06/16/87)
> WHAT is the benifit over an approprately set quantity of disk-block buffers? 1) You have a less kludgy interface (sorry, "vread"-type calls are kludges). 2) You don't have to worry about the quantity of disk-block buffers; if the file takes more blocks than are in your buffer cache, no problem. 3) You don't tie up parts of your buffer cache for long periods of time while the pages are in your address space after a "vread". (Note also that there may be systems where there *is* no conventional buffer cache *per se*; the system might just do something similar to "mmap" inside the kernel to do "read"s and "write"s. Don't make assumptions about how your OS works!) > As the call defines a local buffer as teh second parameter as opposed > to "one being assigned by the system" [according to the man page] > it would seem to my uneducated mind that the individual in question > is overlooking the fact that the normal disk read takes place with two > levels of buffering already: > > 1) disk block buffering <as in the tunable parameter> > and > 2) local process buffering <as in stdio.h> > > his two layer buffering scheme is already taking place on a system wide > level. His solution would simply add a third level of buffering in > the middle. Huh? If you "mmap" a file, there's only one level of buffering; systems generally do not do paging I/O through the buffer cache. Using "mmap" *reduces* the number of buffering layers, since the local process buffer *is* the disk buffer. > His "interpreter" point does not seem to make sense either > because most interpreters load a file in it's entirity and "psudo-complie" > them to get critical information <line numbers in BASIC and such> Huh? What is "his 'interpreter' point"? The only thing I can find in any of the responses that fits this description is Juergen Wagner's point that interpreters usually let you modify the program being interpreted, and thus that you can't share all of it under all circumstances. This point still stands; with copy-on-write, you can probably arrange to share most of the code, assuming that the size of the program being interpreted, in its internal form (most interpreters don't run directly off the source code), is large relative to the page size (or some similar quantum) of the system. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
chris@mimsy.UUCP (06/17/87)
>>WHAT is the benifit over an approprately set quantity of disk-block buffers? In article <21204@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: >3) You don't tie up parts of your buffer cache for long periods of >time while the pages are in your address space after a "vread". >(Note also that there may be systems where there *is* no conventional >buffer cache *per se* ....) And indeed, the new VM system under development for SunOS (and eventually 4BSD) does away with the buffer cache, as have other experimental Unix kernels. The old PDP11 kernels did not have core maps, and buffer caches were sensible; but more real memory means that you should do something useful with it, which requires something like a core map. There is no good reason to have both a core map (which remembers which pages of memory came from where on what disks) and a buffer cache (which remembers which pages of memory came from where on what disks). for (i = 0; i < count; i += CLBYTES / DEV_BSIZE) if (mfind(dev, bn + i)) munhash(dev, bn + i); -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
boba@iscuva.ISCS.COM (Bob Alexander) (06/17/87)
Robert, I think your reading a lot more complexity into this file mapping business than there needs to be. The key points are: Although the call looks something like a read, no data is actually transferred. This file data is just mapped in to the caller's address space. The mechanism is completely independent from the file system buffer cache. The file's data is accessed just like a memory access. Decisions about what parts of the file are read into real memory when are made by exactly the same mechanisms as regular virtual memory. Parts (i.e. pages) of the file are simply paged in as needed. No special tuning is required. No consideration need be given to whether the file is open or not (in fact, the mapped file can be accessed after it is closed -- just like a "read" buffer can be accessed after the file is closed). If a part of the file is frequently accessed, it will likely stay in real memory; if not it becomes a candidate to be swapped out. The nice thing about mapped files is that, for the most part, they are just a repackaging of existing facilities. -- Bob Alexander ISC Systems Corp. Spokane, WA (509)927-5445 UUCP: ihnp4!tektronix!reed!iscuva!boba
rwhite@nu3b2.UUCP (06/18/87)
In article <544@iscuva.ISCS.COM>, boba@iscuva.ISCS.COM (Bob Alexander) writes: > Robert, I think your reading a lot more complexity into this file > mapping business than there needs to be. The key points are: > > Although the call looks something like a read, no data is actually > transferred. This file data is just mapped in to the caller's address > space. OK, so whare does the file come from?? Is it in a special disk partition, and loaded at system boot time? If no data is transferd, why does the call require an established buffer pointer to an established buffer? > The mechanism is completely independent from the file system buffer > cache. As I gathered, but for an open file the above questions aplied imply that the system buffer catche & read could, and does, preform all the functions of vread. > The file's data is accessed just like a memory access. Once again, what's the buffer for? If it moves the pointer, what about the allocate memory pointed to by the buffer pointer. [required for compatabality] > Decisions about what parts of the file are read into real memory when > are made by exactly the same mechanisms as regular virtual memory. > Parts (i.e. pages) of the file are simply paged in as needed. No > special tuning is required. Can the file still be used normally? how does the system know to page it? does this pass through the system cache buffers? If it does, isin't this an extra buffering level? Lastly, what if someone modifies the file in a normal manner while it is in the page buffer? > No consideration need be given to whether > the file is open or not (in fact, the mapped file can be accessed > after it is closed -- just like a "read" buffer can be accessed after > the file is closed). If a part of the file is frequently accessed, it > will likely stay in real memory; if not it becomes a candidate to be > swapped out. This sounds like the purpose of the system buffers and read to me? > The nice thing about mapped files is that, for the most part, they are > just a repackaging of existing facilities. As I said... What's the point? Well, I feel that I have pounded that into the ground. I sort of get the picture, but it seems like one more thing to needlessly go wrong. With the 600 block buffer cache on my 4mb 3B2 The likely-hood of my second read of the same material will be in cache is high enough, weather my last opperation was a read or wtrite. For the applications mentioned it dosn't sound like much of an asset because it is an unstable addition to a stable functionality. Thankyou for the comments. Robert. (-: Feeling dense or smug, but not shure which, in S.D. :-)
chris@mimsy.UUCP (Chris Torek) (06/23/87)
>In article <544@iscuva.ISCS.COM> boba@iscuva.ISCS.COM (Bob Alexander) writes: >>Robert, I think your reading a lot more complexity into this file >>mapping business than there needs to be. ... Although the call >>looks something like a read, no data is actually transferred. >>This file data is just mapped in to the caller's address space. In article <684@nu3b2.UUCP> rwhite@nu3b2.UUCP (Robert C. White Jr.) writes: >OK, so whare does the file come from?? Is it in a special disk >partition, and loaded at system boot time? The file comes from the file system, of course. With mmap, you write fd = open(filename, mode); if (fd < 0) ... /* error */ /* mmap(addr, len, protection, share, fd, offset); */ mmap(buf, filesize, PROT_READ, MAP_SHARED, fd, (off_t) 0); >If no data is transferd, why does the call require an established >buffer pointer to an established buffer? So that data can be transferred later: char c = buf[n]; causes a page fault, identifying some particular byte that must now be read from the file. >>The mechanism is completely independent from the file system buffer >>cache. >As I gathered, but for an open file the above questions aplied imply >that the system buffer catche & read could, and does, preform all the >functions of vread. Not quite. In particular, the semantics are different. The character buf[n] is automatically associated with the current contents of location offset+n in file fd, where `current' means `at the time the byte is read from memory'. To do the same thing on a traditional Unix system, you must do this: if (lseek(fd, offset+n, 0) == -1) ... /* error */ if (read(fd, &c, 1) != 1) /* error */ In addition to being clumsier to code, this is much less efficient on a paging machine than simply using the paging hardware. In this particular example, the kernel would mark invalid the appropriate pages associated with the user buffer whenever any other programs wrote over that file. Until then, those pages would remain valid and readable, and afterward, once referenced, those pages would be reread automatically and again be valid and readable. >>The file's data is accessed just like a memory access. >Once again, what's the buffer for? It names the addresses the user program wants to have reflect the contents of the file. For this reason, `addr' must be page aligned (a restriction I consider bogus, although for efficiency. . .). >>Decisions about what parts of the file are read into real memory when >>are made by exactly the same mechanisms as regular virtual memory. >>Parts (i.e. pages) of the file are simply paged in as needed. No >>special tuning is required. >Can the file still be used normally? *These*, now, are the sticky questions. Yes. >How does the system know to page it? Programs that do not use mmap() must not see that it is being paged. Programs that do use mmap() must see it being paged. It is up to the kernel to maintain the proper illusions. >does this pass through the system cache buffers? Typically, there are no system cache buffers. Everything is done with mirrors (page mapping, with copies made as necessary). >Lastly, what if someone modifies the file in a normal manner while >it is in the page buffer? As to this I am uncertain. If you have specified MAP_SHARED, it seems that you should see such changes. If you have said MAP_PRIVATE, should the system copy the old pages and update the PTEs to point to the new copies before overwriting the old data? Or do you get the changes anyway, MAP_PRIVATE meaning only that if you write to the pages, they are copied? One question you missed is `what happens if I map a 100K file and then someone truncates it to zero bytes'? (Answer: Who knows? Some say SIGBUS.) >>No consideration need be given to whether the file is open or not >>(in fact, the mapped file can be accessed after it is closed -- >>just like a "read" buffer can be accessed after the file is closed). This sounds somewhat suspicious, but certainly convenient. >>The nice thing about mapped files is that, for the most part, they are >>just a repackaging of existing facilities. >As I said... What's the point? There are several: - The new kernel runs faster (for some selected set of benchmarks); - The new kernel code is simpler; - Mapped file semantics are more convenient for some programs; - Mapped files provide shared memory. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
daveb@geac.UUCP (Dave Brown) (06/24/87)
Bob Alexander asked:
>Can the file still be used normally?
Maybe even better...
On Multics I once saw a demonstration (done by Paul Stachour, if
memory serves), of two persons editing the same place in the same
file at the same time... the mapping mechanism searialized the
accesses so they didn't mash the same character at the same time,
and Emacs laboured mightily to keep its screens updated in the face
of the file changing *under* it.
It was impressive, but not directly useful: it only worked because
the two people were in the same office discussing the changes on a
secondary channel (voice).. But on a TP system, it would be a joy!
Instead of complex locks & commits on disk blocks, you have them
on memory.
Of course, this opens a whole new pandora's box of implementation
problems. (:-))
--dave
--
Computer Science | David (Collier-) Brown
loses its memory | Geac Computers International Inc.
every 6 months | 350 Steelcase Road,Markham, Ontario,
-me. | CANADA, L3R 1B3 (416) 475-0525 x3279