stevo@jane.Jpl.Nasa.Gov (Steve Groom) (08/26/88)
I have a mostly academic question about the degree of transparency provided by the system to a mmap'ed VME memory board. My situation involves a Sun 4/280 running SunOS Sys4-3.2, although this should be of interest to Unix folks in general. This is a long story, but there is a question and/or point for discussion at the end. For those of you unfamiliar with VME-based Sun's, the VME bus is not where the system memory resides. However devices can be added to the VME, and mmap'ed into kernel and/or user space. (BEGINNING OF LONG STORY) My memory board supports installation in both 16- and 32-bit VME data spaces. My original installation was as a 32-bit data device. My driver for the board worked fine, so that I could mmap() the memory on the board into a user process, and access it with no problem. I also provided a read()/write() interface to the memory, to treat it like a big file. All the read/write interface does is copyin/copyout between the user data area and the board memory, which was mmap'ed into the kernel. Again, no problem. Until... One day I noticed that the values written (using write(), not user level mmap/bcopy) to the memory board were garbage. This was a new problem, it had been working fine. (I have another board on the same VME bus which can also access that memory, so I had a means of 'independent verification' of the problem.) I traced through the kernel into the kernel's copyin() routine, and on into the kernel's bcopy(). As expected, this bcopy() has all sorts of special cases to speed things up where possible (unrolled loops, etc). The Sun4 (SPARC) CPU also supports doubleword (8 byte) load and store instructions, and these are used if the alignment is right. What happened to me was that I had added another variable to my user program, which changed the alignment of my data in user space to be on an 8-byte boundary, which caused the doubleword loads and stores to kick in. The symptom was that 0123456789ABCDEF was being written as 4567xxxxCDEFxxxx where xxxx was whatever was there before. Obviously, the CPU or MMU is translating the std (store doubleword) instruction into some really weird VME bus cycles. Normal 8-, 16- and 32-bit writes to the memory work fine, indicating that the memory itself is OK. But the Sun MMU is trashing the std's and turning them into I don't know what (don't have a logic analyzer handy). I don't know whether it is the CPU or MMU that is causing the trouble, because I don't know that Sun4 CPU hardware too well, but I tend to think that since the CPU has only a 32-bit data bus, it must be the one breaking up the accesses. I don't know, its not really relevent to the question I (eventually) ask below. As a workaround, I configured the board as a 16-bit data device. Lo and behold, no more problem, but only because bcopy() took another route instead of using the ldd's and std's. I took a 50% hit on throughput, but at least I didn't have to do funny things with my buffer addresses in user space. So, it seems we have a hardware bug. Sun told me that they knew of some other problems with the CPU, and suggested I get the latest rev. Problem is, we're not on contract, and that would cost a *minimum* of about 4k for 30-day exchange. I had several long talks with various folks at Sun, trying to get them to stand behind their product, at least enough to fix the bug or send me some new PAL's or something. I mean, after all, this isn't a flaw in manufacturing or a burn-out or something, its a design flaw! Well, they didn't buy that. Someday we'll get the thing upgraded, whatever it does or doesn't cost. But for one thing, I'm not even certain (and neither is Sun) that the latest rev *does* fix the problem. So here I am paying a 50% penalty for their bug, but at least it still works without the user process doing funny things. That is, I thought so until today. Now I discover that when I mmap the memory, I can't do 32-bit transfers to that memory. I have to talk to it 8 or 16 bits at a time, or I get segmentation faults. Seems that the MMU imposes the same limitations on the user process as imposed on it by the VME. As far as the question below is concerned, let me say that I'm not whining trying to get Sun to fix my CPU. I really want to know what *should* be happening in the MMU. (END OF LONG STORY) (drum roll please) My question is, is it reasonable to expect that the MMU provide transparent 32-bit access to a 16-bit device? My initial reaction was that of course, the MMU should take care of that. Sun seems to think not, and that if the memory says to the MMU 16 bits is the max, then the MMU says the same to the CPU. Now I'm not so sure. Discussion? -steve /* Steve Groom, Jet Propulsion Laboratory, Pasadena, CA 91109 * Internet: stevo@elroy.jpl.nasa.gov UUCP: {ames,cit-vax}!elroy!stevo * Disclaimer: (thick German accent) "I know noothingg! Noothingg!" */