donn@utah-cs.UUCP (Donn Seeley) (07/16/84)
I spent some time looking at the paging code after seeing Terry Laskodi's report on 'pagein mfind' panics. Since only one 'pagein mfind' panic is recorded in /usr/adm/messages on any of the four VAXen I have access to (where by my calculations a total of 888 days of machine time is represented), I have concluded that we just don't get them much. Consequently we don't have a core file handy -- which is a pity because it would be interesting to know which of the conditions that is tested before the panic turned out to be true... While I was poking around in the code I noticed a peculiarity about the routine mfind() in vm_mem.c. Here is the code involved: ------------------------------------------------------------------------ /* * Look for block bn of device dev in the free pool. * Currently it should not be possible to find it unless it is * c_free and c_gone, although this may later not be true. * (This is because active texts are locked against file system * writes by the system.) */ struct cmap * mfind(dev, bn) dev_t dev; daddr_t bn; { register struct cmap *c1 = &cmap[cmhash[CMHASH(bn)]]; int si = splimp(); while (c1 != ecmap) { if (c1->c_blkno == bn && c1->c_mdev == getfsx(dev)) return (c1); c1 = &cmap[c1->c_hlink]; } splx(si); return ((struct cmap *)0); } ------------------------------------------------------------------------ When mfind() returns successfully, it doesn't reset the priority! I checked the several places where mfind() is called in the kernel, and while in most cases mfind() is called at 'splimp' priority, there are at least two routines which don't appear to expect mfind() to alter their priority, namely rwip() in sys_inode.c and kluster() in vm_page.c. It appears that kluster() only calls mfind() if it wants to look for text pages in swap space, which won't occur except for old-style 0410 executables, so the bug won't get exercised very often here. In rwip() (the generalized read/write-through-inode-pointer routine) mfind is used to find freed pages that are going to be written on; munhash() is then used to remove them from the pool. In this case I couldn't see any evidence that the code knows it is at high priority when mfind() succeeds, so this is evidence for my contention that it's a bug. It looks like quite a bit of code could be executed at high priority in rwip() after mfind() is called. Is this really a bug, or am I all wet? I don't have my wizard's diploma yet (they told me it was put on hold because of an overdue library book, but I know better) so if anyone who really knows what's going on (and can point out the boners in the exposition) is reading this, please speak up... Donn Seeley University of Utah CS Dept donn@utah-cs.arpa 40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn PS -- Thanks to Spencer Thomas for the lat and long.