4363tcb@hou2d.UUCP (T.BALLISTER) (03/21/84)
I've added an mmstratety() section to our mem.c driver to allow use of some portion of addressable memory as a high speed block device. The intent is to provide a fast response for munerous *RTI* ingres commands. (Many of these are 300-500Kbytes in length, and take 20-30 seconds to load in via up.c devices on unibus). The way the scheme works is to hack the magic number 8096 in locore.s down to the number of kbytes you want made available to the operating system, and then use the contents of (physmem) as the start address of the block device. The programs you put in this device should not have their sticky bits set, because this would cause them to be swapped out to real disk, and you wind up gaining little on subsequent loads. So far the driver works to the extent that I can make file systems, copy things into them and around in them, mount the device to /tmp to speed up compilations, etc. I just can't load from it. If I put in a trace printf to get out things in the passed buf structure like b_flags, b_blkno, b_bcount, b_un.b_addr, I notice that the problem comes as the text segment begins to be loaded. While the inode/directory blocks are being read, the only flags up are B_BUSY and B_READ (defined in buf.h), and b_un.b_addr is in system space, i.e. 0x8XXXXXXX. When the first text block is read b_flags is B_PGIN | B_PHYS | B_BUSY | B_READ, and b_un.b_addr is 0. At this point I get a protected segment error and things come to a stop. Poking around I've learned that the trouble is with the (user) b_un.b_addr = 0. If I look at the p0br and p0lr registers at this point, it looks like the size register matches roughly the size of the program I'm trying to load in, but if I examine the pte's pointed to by p0br they're all marked 0x790XXXXX. i.e. they're all marked read only at all priority levels, and the valid bit, 0x80000000, is off. So the question is what is my driver not taking care of? Obviously bio() and swap(), which kick things off, work for other devices, but looking around I haven't yet discovered what routine(s) I should be calling to get these pages validated. Any help out there? Tom Ballister hou2d!wb2!tcb (201)807-7498 (collect) Thanks
jmcg@decvax.UUCP (Jim McGinness) (03/22/84)
The invalid fill-on-demand-from-inode pages are made valid in uba or mba maps by the other drivers before the transfer occurrs, then validated by `pagein'. Your driver should just validate the pages in place and move the bytes. It is probably necessary to raise your priority while you do it, since another process running that image might sneak in and find the valid page and try to execute the old bytes. Better would be to invert the sense of the `copyseg' routine to move from physical address to user virtual address. If you have memory to burn, however, I've found that it makes sense to increase your number of buffers (and on 4.2 to increase bufpages). This has the effect of keeping more of the disk in memory, but is not as direct as targeting a particular file system to actually reside in memory. Jim McGinness decvax!jmcg Digital Equipment Corp. (603)844-5703 MKO2-1/H10 Merrimack, NH, 03054
rpw3@fortune.UUCP (03/25/84)
#R:hou2d:-22200:fortune:11600076:000:1579 fortune!rpw3 Mar 24 20:43:00 1984 Jim McGinnis says "just make the buffers bigger", rather than having Tom Ballister's "RAM disk" device. Well, there's a problem with that. As long as exec'ing a program flushes the cache (indirectly, by loading it), with any kind of reasonable interactive user load the cache never has any useful directory, i-node, or random-access file data left in it. All it has is copies of monsters like "vi" (which I use, don't get me wrong). Ironically, the large programs will flush their OWN directory entries, used earlier in the reading, and so get flushed (partially) in turn when somebody runs the program again (it gets incestuous in there!). In our most recent O/S release, we managed to get a quite significant improvement in multi-user performance by throwing away any blocks that had been involved in read-ahead once they were used. Practically, that means that exec's never consume more than two buffer blocks, and one can exec lots of large programs all day and never disturb the various pieces of useful random-access data. (Much of our software is large, due to a heavy "menu" orientation.) I have no hard data to prove it in the environment being discussed (Ingres), but I suspect that a "RAM disk" plus our "flush read-ahead" strategy would give better performance than just an equivalent number of additional buffers, especially since Ballister mentioned the file segments were several hundred K. Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphin Drive, Redwood City, CA 94065
jmcg@decvax.UUCP (Jim McGinness) (03/25/84)
You're off the mark, Rob. The fill-on-demand-from-inode programs are not read in through the buffer cache. If they were, then Ballister would not have had the problem he reported. My comment about increasing the size of the system buffer cache was intended to head off efforts to implement the in-memory device in situations where it isn't justified. I would prefer that attention be given to how improve the automatic management of the storage hierarchy. Until that golden day, it's quite possible for someone to know better than the default routines what the system's needs are. In some of those cases, it can make sense to lock things into physical memory. Ballister had some specific performance enhancements he was trying to achieve and might be able eventually to statically tune his system for the proper balance between memory devoted to system memory and memory devoted to the in-memory device. One thing in particular that is better handled through the buffer cache than through an in-memory device is /tmp. An anecdote is appropriate here: Several months ago, we brought decvax's system pack up on a 750 with 8Mb. In single-user-mode, we started to rebuild the kernel to pick up some of the new devices this 750 had. Partway through, there was a power glitch. When the system came back up, we discovered an unholy mess in the file system. The "mess" was scads of zero-length .o files. Decvax's kernel had been "tuned" to use 400 system buffers, so we had essentially been compiling to memory. With `update' not running, only the inodes and directory blocks were actually being written out to disk. I guess the moral of this story is that putting /tmp into volatile storage may cause you to lose the value of `expreserve'. Jim McGinness decvax!jmcg Digital Equipment Corp. (603)844-5703 MKO2-1/H10 Merrimack, NH, 03054