rbk@sequent.UUCP (Bob Beck) (06/24/87)
Sequent's version of UNIX (Dynix) implements a version of mmap(), with some of the "blanks" form the original 4.2bsd system description filled in. This supports paged shared memory (nice for parallel programming ;-), and can support SystemV shared-memory via library code (libc). If more than one process maps overlapping portions of a file for PROT_RDWR and MAP_SHARED you get shared memory. MAP_PRIVATE ==> local mods to the data. No address-space alignment restrictions other than page boundaries. Mapped files are "coherent" with normal reads and writes, so "cp mapped-file somewhere" will get the latest data. Mapped address space is dumped into core files, providing a "snapshot" of the data when/if the program dies. In the upcomming release of Dynix (v3.0) "shared text" is finally gone and is done as a sub-case of mapped files. Also mmap() can map over any previously existing address space (or it can populate new address space) -- the last thing mapped is what's there. munmap() makes address space go away (ie, creates a "hole"). mmap() also supports device-drivers providing physical or paged maps for various "custom" reasons. If anyone wants more details, let me know. Bob Beck Sequent Computer Systems
throopw@xyzzy.UUCP (Wayne A. Throop) (06/27/87)
> chris@mimsy.UUCP (Chris Torek) >> rwhite@nu3b2.UUCP (Robert C. White Jr.) >>OK, so whare does the file come from?? Is it in a special disk >>partition, and loaded at system boot time? > The file comes from the file system, of course. Indeed. Quite like the text area of an executable file, it is paged directly from the filesystem. Unlike the text area of an executable file, it may have to have pages written to it from memory that has been modified. An interesting question is whether sync() should force this to happen, or whether some other kernel hint should be used. Chris's summary is of course excellent: >>As I said... What's the point? > There are several: > - The new kernel runs faster (for some selected set of benchmarks); > - The new kernel code is simpler; > - Mapped file semantics are more convenient for some programs; > - Mapped files provide shared memory. ... but I thought I'd expand on the "mapped files provide shared memory" point a little. In fact, they provide it in a much cleaner and more convenient form than the shm calls from SysV. To touch on only one point, the very first problem one runs into when actually using SysV shared memory is how to allocate shmids and communicate them to all the cooperating processes. Usually, some scheme with a central administrator process and some fifos or whatnot has to be kludged together. On the other hand, with the file system providing the "schmids", which are just files (or inodes if you will), there doesn't need to be any complication. Just create a file where it will do the most good, in a place in the file system that has the protections you want, and the overflow checking you want, and all the other nice things the file system already provides. It even gives you a shared memory segement that persists longer than the machine stays up, without having to utter any extraneious incantations in the /etc/rc or equivalent. All in all, mmap-like shared memory and file-to-memory mapping is far superior, far more flexible, and could be far better integrated into the rest of unix than is SysV shmem. -- Adam and Eve had many advantages, but the principal one was, that they escaped teething. --- Pudd'nhead Wilson's Calendar (Mark Twain) -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
Kemp@dockmaster.ncsc.mil (02/07/90)
Doug Gwyn writes: > Larry McVoy writes: >> Anyway, it's tough to do this otherwise. Protection is implemented >> via the MMU. I can't think of any other reasonable (performance) >> way to do it. > > I appreciate that, but you do see the point, I hope. > A user-mode application has no reasonable handle on the notion of > "page alignment"; . . . > > . . . lacking a system-provided function like > void *PageAlign( void *base_pointer, size_t extent ); > . . . there is no way I can see to use mmap() portably even among > systems on which it exists. I fail to understand why mmap(2) can't be used portably. caddr_t mmap( caddr_t addr, int len, int prot, int flags, int fd, off_t off ); " mmap() establishes a mapping between the process's address space at an address paddr for len bytes to the memory object represented by fd at off for len bytes. The value of paddr is an implementation dependent function of the parameter addr and the values of flags, . . . A successful mmap() returns paddr as its result." In other words, a user mode application should have *no* handle at all on the notion of "page alignment". It should just regard the value returned by mmap as a pointer to memory that is valid for the particular hardware on which it is running. In fact, page alignment alone may not be a sufficient condition for validity. What's non-portable about that? Dave Kemp <Kemp@dockmaster.ncsc.mil>
gwyn@smoke.BRL.MIL (Doug Gwyn) (02/07/90)
In article <22368@adm.BRL.MIL> Kemp@dockmaster.ncsc.mil writes: > caddr_t mmap( caddr_t addr, int len, int prot, int flags, > int fd, off_t off ); > " mmap() establishes a mapping between the process's address space > at an address paddr for len bytes to the memory object represented > by fd at off for len bytes. The value of paddr is an implementation > dependent function of the parameter addr and the values of flags, > . . . A successful mmap() returns paddr as its result." >What's non-portable about that? Gee, suppose I need N bytes mapped. I cannot just use N for the `len' argument. What should I use? Presumably N+{page_size}-1 would suffice, assuming of course that my application has arranged for addr to point to that much valid storage. However, {page_size} is not generally available to me. Some *non-POSIX conformant* systems seem to provide a way to find out via sysconf(PAGESIZE). (PAGESIZE should not be defined in <unistd.h>.) SVID3 also seems to say that using an addr of 0 will result in a random location in the middle of my process being used, which would surely be a horrible design botch; one hopes it means that the necessary storage will be allocated from the process's heap, in which case that would be the best way to use this function. Note also that SVID3 says that -1 is returned on error. I have no idea what that means, as -1 is not a caddr_t. You may recall that we discuss this sort of botch (as with sbrk()) every so often, and yet here these idiots go and make the same mistake yet again. The proper error return should have been a null pointer.
lwa@osf.org (Larry Allen) (02/08/90)
If you want to map "len" bytes using mmap, you ask to map "len" bytes. If mmap is successful, it guarantees to return a pointer to an address a such that addresses a through a+len-1 are valid in your address space. Addresses beyond this (say, up to the next page boundary) may also be valid, but you can't depend on that. In any case, I'm not sure what your point about non-Posix conformant systems is. An application using mmap is not Posix conformant. If Posix is ever extended to include mmap, presumably it will also include an interface to get the page size. Now, I think there *are* a couple of things about mmap to which you could have raised valid objections (what happens on a machine with multiple page sizes, such as the ETA-10? what are the semantics for memory sharing on a multiprocessor machine? how do accesses to a file via mapping and accesses to a file via the file system interact?) But I don't see the page size dependency as being a big problem. -Larry Allen Open Software Foundation
gwyn@smoke.BRL.MIL (Doug Gwyn) (02/08/90)
In article <3399@paperboy.OSF.ORG> lwa@osf.org (Larry Allen) writes: >If you want to map "len" bytes using mmap, you ask to >map "len" bytes. If mmap is successful, it guarantees >to return a pointer to an address a such that addresses >a through a+len-1 are valid in your address space. That doesn't mean they're safe to use! They could overlay program variables. That is why I wanted a function to help determine page alignment, so I could safely set up an array to use. >In any case, I'm not sure what your point about non-Posix >conformant systems is. An application using mmap is not >Posix conformant. It is the implementation that is not POSIX conformant, due to sticking PAGESIZE gratuitously in <unistd.h> (according to SVID3).
ndjc@hobbit.UUCP (Nick Crossley) (02/09/90)
In article <12087@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >Gee, suppose I need N bytes mapped. I cannot just use N for the `len' >argument. SVID Edition 3 is very clear that 'the parameter len need not meet a size or alignment constraint'. Any partial page at the end is zero-filled, and is not written back out to the file if modified. >SVID3 also seems to say that using an addr of 0 will result >in a random location in the middle of my process being used, which would >surely be a horrible design botch; one hopes it means that the necessary >storage will be allocated from the process's heap, in which case that >would be the best way to use this function. Again, SVID 3 is clear: 'When the system selects <an address>, it will never place a mapping at address zero, nor replace any extant mapping, nor map into areas considered part of the potential stack or data segments.' I take this to mean that the default mmap virtual addresses must be far distant from both stack and sbrk/malloc addresses. This is not guaranteed, and is frequently not true, for shmat, which makes it difficult to use shared memory and malloc extensively in the same program. I would expect most systems to use a similar choice of mappings for mmap and shmat, so in practice, that problem may also disappear. -- <<< standard disclaimers >>> Nick Crossley, ICL NA, 9801 Muirlands, Irvine, CA 92718-2521, USA 714-458-7282 uunet!ccicpg!ndjc / ndjc@ccicpg.UUCP
guy@auspex.auspex.com (Guy Harris) (02/10/90)
>Gee, suppose I need N bytes mapped. I cannot just use N for the `len' >argument. SVID89 says (and so does SunOS 4.x, whence both man page and implementation were derived): When MAP_FIXED is not set, the system uses addr as a hint in an implementation-defined manner to arrive at paddr. The paddr so chosen will be an area of the address space which the system deems suitable for a mapping of len bytes to the specified object. Said area may be larger than "len", if that's what's necessary to be suitable for such a mapping, even if "N" isn't a multiple of a page size. The documentation should perhaps be clearer on this, but that *is* how it works.... >SVID3 also seems to say that using an addr of 0 will result >in a random location in the middle of my process being used, which would >surely be a horrible design botch; one hopes it means that the necessary >storage will be allocated from the process's heap, in which case that >would be the best way to use this function. It picks a location which is, in most implementations, a large distance below the bottom of the area reserved for the stack. (I don't know what it does on the WE32K, or other machines where the stack grows upward; I would guess that it would place it a large distance above the top of the area reserved for the stack.) The way that area is reserved for the stack is by setting the stack limit to some value. The default stack size seems to be 2MB on Sun-3s and 8MB on Sun-4s; if you need a larger stack, you can set it to a larger value. There are reasons why having the system choose the address is a win; one reason is that machines exist where alignment constraints other than page-alignment constraints affect cacheability of shared segments (Sun-3/2xx and Sun-4/2xx, for example). >Note also that SVID3 says that -1 is returned on error. I have no idea >what that means, as -1 is not a caddr_t. You may recall that we discuss >this sort of botch (as with sbrk()) every so often, and yet here these >idiots go and make the same mistake yet again. The proper error return >should have been a null pointer. I won't defend that decision as being right; it was probably done in SunOS 4.x for binary compatibility with the limited "mmap" in earlier releases, but unless source compatibility was important (and I don't know that source compatibility is provided for "mmap" at all) it could have been given a different trap number and done differently.
coleman@cam.nist.gov (Sean Sheridan Coleman X5672) (02/05/91)
Does anyone have any examples of mmap for sun's. I am especially interested in being able to open one file and copy it to another one. I also would like to see some examples that utilize the EXEC proto. I am also looking for examples that use madvise, mcntl, and mlock. Thanks very much Sean Coleman NIST coleman@bldrdoc.gov
lm@slovax.Eng.Sun.COM (Larry McVoy) (02/05/91)
In article <6991@alpha.cam.nist.gov> coleman@cam.nist.gov (Sean Sheridan Coleman X5672) writes: >Does anyone have any examples of mmap for sun's. I am especially >interested in being able to open one file and copy it to >another one. I also would like to see some examples that >utilize the EXEC proto. Don't have a quickie for EXEC. Check this out, just some gunk I'm playing with. # This is a shell archive. Remove anything before this line, then # unpack it by saving it in a file and typing "sh file". (Files # unpacked will be owned by you and have default permissions.) # # This archive contains: # mmapcp.c mmaplib.c echo x - mmapcp.c cat > "mmapcp.c" << '//E*O*F mmapcp.c//' /* * a version of copy that maps in the data & writes it with mmap. * * Both sets of data are synced of memory. This is designed to move data * fairly quickly but still disturb the system as little as possible. * * @(#)mmapcp.c 1.2 */ #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #define SYNCSIZ (256*1024) main(ac, av) char **av; { int i, bytes; int in, out; char *ibuf, *obuf, *ip, *op; if (ac != 3) { printf("usage: %s src dest\n", av[0]); exit(0); } if ((in = open(av[1], 0)) == -1) { perror(av[1]); exit(1); } if ((out = open(av[2], O_RDWR|O_CREAT, 0644)) == -1) { perror(av[2]); exit(1); } if (mmap_init(in, &ibuf, 0, -1, 0) == -1) { perror("mmap_init in"); exit(1); } if (mmap_init(out, &obuf, 0, size(in), 1) == -1) { perror("mmap_init out"); exit(1); } ip = ibuf; op = obuf; for (i = size(in); i > 0; i -= SYNCSIZ) { bytes = SYNCSIZ < i ? SYNCSIZ : i; bcopy(ip, op, bytes); mmap_flush(ip, bytes, 1); mmap_flush(op, bytes, 0); /* 1 takes longer */ ip += bytes; op += bytes; } close(in); close(out); exit(0); } //E*O*F mmapcp.c// echo x - mmaplib.c cat > "mmaplib.c" << '//E*O*F mmaplib.c//' /* * @(#)mmaplib.c 1.3 */ #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> /* * Set things up to do I/O over the indicated range * * XXX - up to user to be sure that mmap is an OK thing to do (tapes). */ mmap_init(fd, basepp, off, bytes, writeable) caddr_t *basepp; off_t bytes; { caddr_t base; int protbits; protbits = PROT_READ; if (writeable) { protbits |= PROT_WRITE; } if (!regfile(fd)) { return (-1); } if (bytes == (off_t)-1) { bytes = size(fd); } if (writeable && ftruncate(fd, off + bytes) == -1) { return (-1); } base = mmap((caddr_t)0, bytes, protbits, MAP_SHARED, fd, off); if (base == (caddr_t)-1) { return (-1); } madvise(base, bytes, MADV_SEQUENTIAL); *basepp = base; return (0); } /* * flush writes */ mmap_flush(addr, len, clean) caddr_t addr; off_t len; { msync(addr, len, MS_ASYNC); if (clean) madvise(addr, len, MADV_DONTNEED); } static struct stat sb; static lastfd = -1; regfile(fd) { if (lastfd != fd) { if (fstat(fd, &sb) == -1) { return (-1); } lastfd = fd; } return (S_ISREG(sb.st_mode)); } size(fd) { if (lastfd != fd) { if (fstat(fd, &sb) == -1) { return (-1); } lastfd = fd; } return (sb.st_size); } //E*O*F mmaplib.c// echo Possible errors detected by \'wc\' [hopefully none]: temp=/tmp/shar$$ trap "rm -f $temp; exit" 0 1 2 3 15 cat > $temp <<\!!! 57 188 1124 mmapcp.c 78 198 1239 mmaplib.c 135 386 2363 total !!! wc mmapcp.c mmaplib.c | sed 's=[^ ]*/==' | diff -b $temp - exit 0 --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
cgy@cs.brown.edu (Curtis Yarvin) (03/26/91)
There is nothing in mmap(2) on Suns to indicate that I cannot mmap a socket. What will happen if I try? Will new pages be mapped, sequentially, into memory as the socket receives data? Curtis "I tried living in the real world Instead of a shell But I was bored before I even began." - The Smiths