darrell@sdcsvax.UUCP (03/23/87)
Jim Rees's article (which may or may not have been posted yet) gives the outlines of how Apollo uses the single-level store idea to implement its distributed filesystem. Something else to note is that we did run into the problem that the mapped I/O model did present performance problems to the sequential file access level. The kernel knows nothing about sequential I/O and open files and such and originally did not do any read-ahead. Later (a while ago now), we added "touch ahead": Mapped segments (32K regions of virtual address space) can be marked with an integer that is the number of pages the pager should read in when any page in the segment gets faulted on. This give read-ahead of a sort. Sequential output was a bit trickier. There is no parallel to page faulting that occurs when you're "done" writing through a page. What we did do was make it so that segments can be marked "flush-behind"; i.e. the physical memory pages should be treated as good candidates for re-use as soon as the segment mapped over them gets unmapped from the virtual address space. Another change we made to make sequential output better is "grow-ahead": Note that when you're writing a new file, the pages that are mapped in do not correspond to real disk pages until they are touched. Touching one of these pages is called a "growth fault". The system now optionally grows the file by more than one page on growth faults. The Stream I/O library takes care of truncating off any extra pages when the stream is closed. All in all, mapped I/O is a nice and useful idea, but anyone who thinks that it will perform just like a traditional sequential I/O system withOUT any special purpose features like the ones described above is whistling in the dark. Also, some of the features don't always work like you'd think/hope. To deal with cache consistency, the file locking mechanism is tied into the remote file caching mechanism. Although it is possible to bypass the locking mechanism and map a file withOUT locking, this is strongly discouraged (i.e. not a documented feature and not used by the Stream I/O library). When a node locks a file, it contacts the home node of the file and gets back the current date-time-modified (DTM) for the file. It uses this value to determine whether any pages the using node has are still OK (i.e. whether it can avoid re-reading the pages from the home node). When the using node unlocks the file, dirty pages are sent back to the home node before the lock is released. If you use the mechanisms as intended, you NEVER get bad (stale) data. We consider this property a necessity. Another thing to remember about the single-level store: Don't be seduced. I.e. don't use it for things for which it is not intended. I.e. just because you can access some database file all over the network doesn't mean that you should implement a DBMS this way. You probably really want to use RPC. The problem with using a file system in the "wrong" way is that the interface to file systems is just generally not designed to deal with failure -- e.g. to letting you know just how much of your I/O succeeded when the network partitioned. Things can be made even worse in a mapped I/O system since any random memory reference can cause an exception to be raised. (At least when you make "normal" filesystem calls, you generally get an error returned. You can simulate this with mapped I/O, but not always cleanly and faithfully.) -- Nat Mishkin Apollo Computer Inc. -------
darrell@sdcsvax.UUCP (03/27/87)
In article <2906@sdcsvax.UCSD.EDU> mishkin@apollo.UUCP (Nathaniel Mishkin) writes: >Jim Rees's article (which may or may not have been posted yet) gives >the outlines of how Apollo uses the single-level store idea to implement >its distributed filesystem. > : >now), we added "touch ahead": Mapped segments (32K regions of virtual : >we did do was make it so that segments can be marked "flush-behind"; : >better is "grow-ahead": Note that when you're writing a new file, the : >All in all, mapped I/O is a nice and useful idea, but anyone who thinks >that it will perform just like a traditional sequential I/O system withOUT >any special purpose features like the ones described above is whistling >in the dark. Also, some of the features don't always work like you'd >think/hope. : Thank you for writing such an informative summary. These are exactly the features that are needed. The third, new-file "grow-ahead", is something that is missing, but is sorely needed, in VSOS. VSOS actually pages in new untouched pages from disk. Your article gave me hope: not all the lessons of the past have been forgotten (or, for the more emotional: MULTICS and TSS live! :-) Two more questions: 1) Does Aegis have an advise function that allows a process to give advanced information to the system about which pages are needed? VSOS has this, and it turns out to be very useful. Example: A "large" fluid dynamics code which doesn't fit in memory (we only have 64 MBytes :-) ) is structured so that it touches memory semi-sequentially. The programmer calls ADVISE periodically to let the system know which pages will be needed next. The resulting system call sets PAGER going, and the process continues to run. When the process gets to that memory, it is fortuitously already there. This can result in unbelievable performance improvements in some cases. 2) How much of this is available in Apollo's Un*x environment? It would be nice if Aegis provided a "real" BSD Unix environment, but still had the "advanced" facilities available as system services. Hugh LaMaster, m/s 233-9, UUCP {seismo,topaz,lll-crg,ucbvax}! NASA Ames Research Center ames!pioneer!lamaster Moffett Field, CA 94035 ARPA lamaster@ames-pioneer.arpa Phone: (415)694-6117 ARPA lamaster@pioneer.arc.nasa.gov "In order to promise genuine progress, the acronym RISC should stand for REGULAR (not reduced) instruction set computer." - Wirth ("Any opinions expressed herein are solely the responsibility of the author and do not represent the opinions of NASA or the U.S. Government")
darrell@sdcsvax.UUCP (03/31/87)
> Two more questions: > > 1) Does Aegis have an advise function that allows a process to give > advanced information to the system about which pages are needed? There is an advice call, but only one that is related to the touchahead and growahead features I mentioned in my previous article. > 2) How much of this is available in Apollo's Un*x environment? It > would be nice if Aegis provided a "real" BSD Unix environment, > but still had the "advanced" facilities available as system > services. Well, it's ALL available from the Unix environment. One might argue that we could do a better job of making the interfaces to the feature more culturally compatible with Unix (e.g. omitted return parameters that tell you more than than "errno" and made the documentation more opaque :), but that's really just noise. Seriously though, I think we could do a better job of making it clear that these are extensions available to all, not just some wierdo "Aegis users". Of course, using the features might make your code less portable, but sometimes you just have to decide how long coding with the constraints that existed in 1970 minicomputers is appropriate. -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {wanginst,yale,mit-eddie}!apollo!mishkin -------