gingell%opus@Sun.COM (Rob Gingell) (02/13/90)
In article <3556@rti.UUCP> trt@rti.UUCP (Thomas Truscott) writes: >[someone wrote a replacement for "sum", called "fastsum" that uses mmap().] > >You are comparing your efficient "fastsum" that happens to use mmap() >against a sluggardly "sum" that happens to use read(). >(Actually it uses getchar(), which calls _filbuf(), >maybe _filbuf() uses mmap()?!) As it happens, no. This is always a potential change, however we have not done so because to date we have not found that stdio would benefit from such a change -- the principal advantage would be to save buffer copy time and memory loading, however we haven't found a large population of programs where these factors are dominant. Perhaps it is because our stdio is so otherwise inefficient, perhaps it is because the applications themselves are inherently not I/O buffer copy limited, or perhaps simply because those programs that were already so limited long ago converted to direct read()/write() operations. >The following would be a more appropriate test: >Change your fastsum routine so that instead of mmap()ing >a megabyte at a time, it does a read() of a megabyte at a time. >Compare the mmap() and read() versions of this program. >I suspect you will find they take about the same amount of time. I don't think so. At the very least, the read() version will be slower than the mmap() version by the amount of time required to effect the copies from kernel to program buffers. And this assumes an "optimum" situation in which the overhead of buffer management in the kernel does not become significant -- which it will for a large amount of data. And it ignores the system effects of essentially doubling the memory load on the system for both the original file pages and the pages used to buffer the copies in the application. >On a Sparcstation 1, try timing "cp" vs. the following program: > > main() > { > char bfr[8192]; > register int n; > > while ((n = read(0, bfr, sizeof(bfr))) > 0) > write(1, bfr, n); > } > >I did "/bin/time cp /vmunix /tmp/x" >and "/bin/time a.out < /vmunix /tmp/x" several times. >The results were essentially identical. >(I did not experiment with buffer sizes, I suspect 16k would be faster.) I'd be astonished if the results did not always show that access through mmap() is faster (and they are for this program running on my 3/160.) To be a valid experiment, you should be sure that both /vmunix and /tmp/x are completely flushed from memory after each test run -- otherwise the system's buffering of the two files will skew the results. I've never observed a proper experiment in which mmap() was not faster, though the difference is not always large. >There is no inherent reason that read() should be slower >than mmap() for sequential I/O, since read() is doing precisely >what is wanted. Indeed read() should be faster since >it is conceptually simpler. Not true. read() operates by mmaping the file and copying it. And, due to limitations in the address space available inside the kernel, read() must often perform more, smaller "mmap()-like" chunk operations than a single application mmap() could use, using even more CPU time in the process. >Note that read() can be implemented with memory mapping, in some cases: >it could map the address of "bfr" to a copy-on-modify kernel page. This is also not true, though it is a common belief and one that arose repeatedly during development. read() gives you a copy of the file data at the time that the call is executed. That copy is immutable save any action performed by your program. If read() were implemented *as* mmap(), then while it is possible to deal with side effects introduced in *your* machine, it is not, in general, possible to deal with side effects introduced in other machines -- such as file modifications performed by DOS PC's living in your network. It might be possible to make such an assumption save for heterogeneous environments. However, it should be noted that neither MULTICS nor TENEX/TOPS-20 (the latter being the more direct parent of mmap(), with MULTICS as a more remote ancestor) attempted such an optimization either. >As others have pointed out, read() and write() are generally useful >on streams, and mmap() is not. >(The SunOS "cp" command falls back to read/write if mmap() fails. >But since read/write is as fast as mmap(), >why bother with mmap() in the first place?!) > >So what is mmap() good for? Plenty. >But it is NOT a replacement for read/write. Nor is it advertised as such. Though Mr. Truscott has not done so, those deprecating mmap() for not being "device independent" or lacking other attributes of read()/write() miss the point -- which was never that mmap() replace read() or write() or otherwise represent some "grail" in the search for computing enlightenment. Rather it was to provide an abstraction of operations in which the system was already engaged (namely file buffering and physical store multiplexing) in a way that was accessible to applications and which can increase their flexibility. A good test of the sufficiency of such an abstraction is that it is capable of becoming a primitive which you can use to replace older and various implementations with a common framework -- and in this we believe mmap() to have been a success. We also believe it to be an effective abstraction for those requiring its properties. But neither do we believe that everyone does, for mmap() is certainly a "lower-level" abstraction than read()/write(), a primitive out of which the latter can be constructed on memory objects in the same way device drivers provide a primitive for transfer operations. Because mmap() is *more* primitive than read()/write(), it can be (as Dennis Ritchie points out) more cumbersome to use than the equivalent sequence of read() or write() -- but so would access to raw devices. If you're programming around it, it's probably an indication that operating at this level of the system isn't suitable for your needs, you should use the higher abstractions. The fact that the system supplies an abstraction that isn't suitable for your use, does not lessen the fact that it is an effective abstraction for others as well as an effective one for the system to use in the implementation of abstractions that *are* appropriate for your use. It's been my experience that most frustrations in the use of memory mapping techniques in MULTICS, TENEX/TOPS-20, and now with mmap() have come from the expectation that somehow mmap() was a higher-level operation than it really is.
henry@utzoo.uucp (Henry Spencer) (02/13/90)
In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >... At the very least, the read() version will be slower >than the mmap() version by the amount of time required to effect the >copies from kernel to program buffers... Assuming your MMU can do copy-on-write, why copy? >... read() gives you a copy of the file data >at the time that the call is executed. That copy is immutable save any >action performed by your program. If read() were implemented *as* mmap(), >then while it is possible to deal with side effects introduced in *your* >machine, it is not, in general, possible to deal with side effects introduced >in other machines... Why are other machines relevant? Can they reach into your machine and mess with memory? Or are you assuming an implementation where other machines are not told "I'm using this data, and want to be told before anyone else starts to change it"? Clearly, implementing read with copy-on-write mapping requires a proper implementation of copy-on-write, in which *any* attempt to mess with the data triggers a copy operation. If some defective file system, call it NFS (name picked purely at random :-)), is not capable of supporting copy-on-write, then you can't do this optimization when the file is on the other end of NFS. If, on the other hand, you have a real file system, it's not a problem. Concurrent access to files is actually quite rare except for directories and certain system files; a copy-on-write read would almost never need to copy. -- "The N in NFS stands for Not, | Henry Spencer at U of Toronto Zoology or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mrc@Tomobiki-Cho.CAC.Washington.EDU (Mark Crispin) (02/13/90)
In article <1990Feb13.003010.23356@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >>... At the very least, the read() version will be slower >>than the mmap() version by the amount of time required to effect the >>copies from kernel to program buffers... > >Assuming your MMU can do copy-on-write, why copy? I don't see anything that says that read() has to be on a page or even a word boundary. I haven't seen a description of mmap(), but usually file/memory mapping is whole pages on page boundaries, so the swapper can swap in/out the pages directly from/to the file. _____ ____ ---+--- /-\ Mark Crispin Atheist & Proud _|_|_ _|_ || ___|__ / / 6158 Lariat Loop NE R90/6 pilot |_|_|_| /|\-++- |=====| / / Bainbridge Island, WA "Gaijin! Gaijin!" --|-- | |||| |_____| / \ USA 98110-2098 "Gaijin ha doko ka?" /|\ | |/\| _______ / \ +1 (206) 842-2385 "Niichan ha gaijin." / | \ | |__| / \ / \ mrc@CAC.Washington.EDU "Chigau. Gaijin ja nai. kisha no kisha ga kisha de kisha-shita Omae ha gaijin darou." sumomo mo momo, momo mo momo, momo ni mo iroiro aru "Iie, boku ha nihonjin." uraniwa ni wa niwa, niwa ni wa niwa niwatori ga iru "Souka. Yappari gaijin!"
ccplumb@lion.waterloo.edu (Colin Plumb) (02/13/90)
In article <1990Feb13.003010.23356@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >>... At the very least, the read() version will be slower >>than the mmap() version by the amount of time required to effect the >>copies from kernel to program buffers... > >Assuming your MMU can do copy-on-write, why copy? Because the odds that the user's buffers are properly aligned are not very good. If read() allocated a buffer and returned a pointer to that, they'd be pretty much the same call. One thing Mach is showing is that Unix was *not* designed with paging in mind and you can do all sorts of neat tricks if you have it. -- -Colin
henry@utzoo.uucp (Henry Spencer) (02/14/90)
In article <20837@watdragon.waterloo.edu> ccplumb@lion.waterloo.edu (Colin Plumb) writes: >>[read()] Assuming your MMU can do copy-on-write, why copy? > >Because the odds that the user's buffers are properly aligned are not very >good. If read() allocated a buffer and returned a pointer to that, they'd >be pretty much the same call. The odds are very high that if the user is pulling in large chunks of data, either he's malloced the buffer or it's a static variable. Either way, it's no big trick, given a bit of cooperation from library and compiler, to ensure that any large buffer is page-aligned. Sure, once in a while you'll have to copy; not often. More generally, there is *NO REASON* why the worst-case code and the typical-case code have to be one and the same. Usually it is a massive performance win to optimize the bejesus out of the typical case and let the worst case just muddle along as best it can. Once in a long while you run into a real application which triggers the worst case, and has to be fixed (usually in some trivial way) for acceptable performance. For example, read() clearly has to be able to handle reads into buffers at any byte alignment. Back when Unix ran mostly on the pdp11, very few people ever realized what a colossal performance hit pdp11 Unix took if the buffer was at an odd address. I suspect it's fairly significant even on most modern machines. Nobody cares, because it's very rare for a program to ever do that. I found out about it because one of the early Boyer-Moore egreps did odd-aligned reads, and was actually slower than old egrep on the 11. 20 lines of code fixed it. -- "The N in NFS stands for Not, | Henry Spencer at U of Toronto Zoology or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gingell%opus@Sun.COM (Rob Gingell) (02/14/90)
In article <1990Feb13.003010.23356@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >>... At the very least, the read() version will be slower >>than the mmap() version by the amount of time required to effect the >>copies from kernel to program buffers... > >Assuming your MMU can do copy-on-write, why copy? See >> below... >>... read() gives you a copy of the file data >>at the time that the call is executed. That copy is immutable save any >>action performed by your program. If read() were implemented *as* mmap(), >>then while it is possible to deal with side effects introduced in *your* >>machine, it is not, in general, possible to deal with side effects introduced >>in other machines... > >Why are other machines relevant? Can they reach into your machine and >mess with memory? Or are you assuming an implementation where other >machines are not told "I'm using this data, and want to be told before >anyone else starts to change it"? I'm assuming an environment in which it is not possible to tell such things. It is, of course, possible to assume other environments. TOPS-20, for instance, assumed that all machines that shared mapping to a file were TOPS-20, and therefore had all the TOPS-20 semantics about sharing, copy-on-write, etc. TOPS-20 had other convenient properties such as a page-based file system that also helped simplify the problem and provide a reasonably powerful and "pure" environment. However, such assumptions made it impossible for other systems to share access to the files involved -- which, in my experience anyway, was a real and important shortcoming. >Clearly, implementing read with copy-on-write mapping requires a proper >implementation of copy-on-write, in which *any* attempt to mess with the >data triggers a copy operation. Yes. It also assumes those few cases in which the data is so conveniently aligned as to make this optimization possible. One has to question whether the complexity is worth cobbling up something so semantically simple as the read() call. And, it also excludes every system which is not capable of cooperating with the somewhat rigorous semantic requirements, including PC's, most IBM operating systems running on any hardware, in short, the vast majority of hardware in the world. >If some defective file system, call it >NFS (name picked purely at random :-)), is not capable of supporting >copy-on-write, then you can't do this optimization when the file is on >the other end of NFS. Gee, you still got any ax left after all these years of grinding? :-) The argument has nothing to do with NFS. It has to do with heterogeneity. It may be that accomodating heterogeneous hardware and software environments is a complexity that some can live without. Others have not been so fortunate, and must instead deal with a melting pot reality. You can certainly design everything around simplifying assumptions, but at some point you have to wonder whether or not the assumption set has excluded most of the real world. As it happens, the viewpoint I'm describing predates personal familiarity with the NFS, and was derived from experiences in trying to deal with mixing the relative "purity" of the TOPS-20 assumption set with real installations where TOPS-20 (or any system) exclusivity was not possible. And, independent of any particular notion of exclusivity, it's also a recognition that transparent cache coherence is not always possible given heterogeneous hardware and interconnects. In the general environment it may not even be desirable due to the cost of maintaining the coherence. While this is most evident today in systems involving networks such as CI busses, Ethernet, FDDI, etc., it seems increasingly likely that we will have to build architectures around a variety of assumptions involving weakly-ordered operations as we deal with more and more independent caches and buffers. It is these experiences that has driven much of the semantic definition and less so the existence of NFS. That the NFS also dealt with heterogeneity and the mutual assumption sets mesh well is either the result of serendipity or congruence. >If, on the other hand, you have a real file system, >it's not a problem. This of course assumes we have a definition of a "real" filesystem. I guess NFS isn't "real" because it isn't "UNIX", and you're assuming that "real" == "UNIX". And NFS isn't a UNIX file system over networks, it's a network file system of which UNIX can be a client. Yeah, it looks a lot like UNIX, but that's it's heritage -- most people going to design something new generally carry most of their experience into it. But just as UNIX can be an NFS client so can MVS, PC-DOS, VMS, and a variety of other non-UNIX systems which -- even assuming NFS (or whatever) supported the "copy-on-write" semantics required, such systems would still be incapable of responding to them.
tihor@acf4.NYU.EDU (Stephen Tihor) (02/14/90)
Wooa. At this point hetrogenaity of NFS supported OS's counts as a significant design goal. [At least for reasonable metrics.] I though MS-DOS compaticility (at least) was a day one goal but someone closer can probably coprrect that.
dworkin@salgado.Solbourne.COM (Dieter Muller) (02/14/90)
In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >... At the very least, the read() version will be slower >than the mmap() version by the amount of time required to effect the >copies from kernel to program buffers... A few days ago, I posted the results of sum versus fastsum (which used mmap). Someone rightly pointed out that the former was going through lots of grotty stdio code. Well, I just took fastsum and (effectively) replaced the mmap calls with read calls. salgado {47} time fastsum /image/os/4.0C/upgrade/USR.tar 36.1u 11.8s 0:53 89% 0+640k 2+0io 3614pf+0w salgado {48} time ./gerbil /image/os/4.0C/upgrade/USR.tar 35.2u 14.0s 1:18 63% 0+1136k 3601+0io 3645pf+0w User time is a little less, but system time is significantly greater. I suspect most of the difference is in copying the data around in the kernel. The data buffer, btw, was declared global and static, so alignment should be the best you can expect w/o hand-tuning things. The data file and general conditions were the same as in my previous posting. Dworkin -- Martha, the platypus is into the rutabagas again! boulder!stan!dworkin dworkin%stan@boulder.colorado.edu dworkin@solbourne.com Flamer's Hotline: (303) 678-4624 (1000 - 1800 Mountain Time)