[net.unix-wizards] Question on strategy

4363tcb@hou2d.UUCP (T.BALLISTER) (03/21/84)

I've added an mmstratety() section to our mem.c driver to allow use of
some portion of addressable memory as a high speed block device.  The
intent is to provide a fast response for munerous *RTI* ingres commands.
(Many of these are 300-500Kbytes in length, and take 20-30 seconds to
load in via up.c devices on unibus).  The way the scheme works is to hack
the magic number 8096 in locore.s down to the number of kbytes you want
made available to the operating system, and then use the contents of
(physmem) as the start address of the block device.  The programs you put
in this device should not have their sticky bits set, because this would
cause them to be swapped out to real disk, and you wind up gaining little
on subsequent loads.

So far the driver works to the extent that I can make file systems, copy
things into them and around in them, mount the device to /tmp to speed up
compilations, etc.  I just can't load from it.  If I put in a trace printf
to get out things in the passed buf structure like b_flags, b_blkno,
b_bcount, b_un.b_addr, I notice that the problem comes as the text segment
begins to be loaded.  While the inode/directory blocks are being read, the
only flags up are B_BUSY and B_READ (defined in buf.h), and b_un.b_addr is
in system space, i.e. 0x8XXXXXXX.  When the first text block is read b_flags
is B_PGIN | B_PHYS | B_BUSY | B_READ, and b_un.b_addr is 0.  At this point
I get a protected segment error and things come to a stop.  Poking around
I've learned that the trouble is with the (user) b_un.b_addr = 0.  If I look
at the p0br and p0lr registers at this point, it looks like the size
register matches roughly the size of the program I'm trying to load in,
but if I examine the pte's pointed to by p0br they're all marked
0x790XXXXX.  i.e. they're all marked read only at all priority levels,
and the valid bit, 0x80000000, is off.

So the question is what is my driver not taking care of?  Obviously
bio() and swap(), which kick things off, work for other devices, but
looking around I haven't yet discovered what routine(s) I should be
calling to get these pages validated.  Any help out there?

				Tom Ballister
				hou2d!wb2!tcb
				(201)807-7498 (collect)
				   Thanks

jmcg@decvax.UUCP (Jim McGinness) (03/22/84)

The invalid fill-on-demand-from-inode pages are made valid in uba or
mba maps by the other drivers before the transfer occurrs, then
validated by `pagein'.  Your driver should just validate the pages in
place and move the bytes.  It is probably necessary to raise your
priority while you do it, since another process running that image
might sneak in and find the valid page and try to execute the old
bytes.  Better would be to invert the sense of the `copyseg' routine
to move from physical address to user virtual address.

If you have memory to burn, however, I've found that it makes sense to
increase your number of buffers (and on 4.2 to increase bufpages).
This has the effect of keeping more of the disk in memory, but is not
as direct as targeting a particular file system to actually reside in
memory.
						Jim McGinness
	decvax!jmcg				Digital Equipment Corp.
	(603)844-5703				MKO2-1/H10
						Merrimack, NH, 03054

rpw3@fortune.UUCP (03/25/84)

#R:hou2d:-22200:fortune:11600076:000:1579
fortune!rpw3    Mar 24 20:43:00 1984

Jim McGinnis says "just make the buffers bigger", rather than having
Tom Ballister's "RAM disk" device.

Well, there's a problem with that. As long as exec'ing a program flushes
the cache (indirectly, by loading it), with any kind of reasonable interactive
user load the cache never has any useful directory, i-node, or random-access
file data left in it.  All it has is copies of monsters like "vi" (which I
use, don't get me wrong). Ironically, the large programs will flush their OWN
directory entries, used earlier in the reading, and so get flushed (partially)
in turn when somebody runs the program again (it gets incestuous in there!).

In our most recent O/S release, we managed to get a quite significant
improvement in multi-user performance by throwing away any blocks that
had been involved in read-ahead once they were used. Practically, that
means that exec's never consume more than two buffer blocks, and one can
exec lots of large programs all day and never disturb the various pieces of
useful random-access data. (Much of our software is large, due to a heavy
"menu" orientation.)

I have no hard data to prove it in the environment being discussed (Ingres),
but I suspect that a "RAM disk" plus our "flush read-ahead" strategy would
give better performance than just an equivalent number of additional buffers,
especially since Ballister mentioned the file segments were several hundred K.

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphin Drive, Redwood City, CA 94065

jmcg@decvax.UUCP (Jim McGinness) (03/25/84)

You're off the mark, Rob.  The fill-on-demand-from-inode programs are
not read in through the buffer cache.  If they were, then Ballister
would not have had the problem he reported.

My comment about increasing the size of the system buffer cache was intended
to head off efforts to implement the in-memory device in situations where
it isn't justified.  I would prefer that attention be given to how improve
the automatic management of the storage hierarchy.

Until that golden day, it's quite possible for someone to know better
than the default routines what the system's needs are.  In some of
those cases, it can make sense to lock things into physical memory.
Ballister had some specific performance enhancements he was trying to
achieve and might be able eventually to statically tune his system for
the proper balance between memory devoted to system memory and memory
devoted to the in-memory device.

One thing in particular that is better handled through the buffer cache than
through an in-memory device is /tmp.  An anecdote is appropriate here:

	Several months ago, we brought decvax's system pack up on a 750
	with 8Mb.  In single-user-mode, we started to rebuild the
	kernel to pick up some of the new devices this 750 had.
	Partway through, there was a power glitch.  When the system
	came back up, we discovered an unholy mess in the file system.

	The "mess" was scads of zero-length .o files.  Decvax's kernel
	had been "tuned" to use 400 system buffers, so we had
	essentially been compiling to memory.  With `update' not
	running, only the inodes and directory blocks were actually
	being written out to disk.

I guess the moral of this story is that putting /tmp into volatile
storage may cause you to lose the value of `expreserve'.

						Jim McGinness
	decvax!jmcg				Digital Equipment Corp.
	(603)844-5703				MKO2-1/H10
						Merrimack, NH, 03054