brankley@usfvax1.UUCP (Bob Brankley) (04/01/88)
I have been having a pretty wild problem on my VAX 11/750 running 4.3 BSD and I would like to see if anybody else is having the same problem. It seems that 4.3 BSD is not periodically syncing in core inodes out to disk, resulting in crashes. My VAX has an RA60 partitioned a-b-f and an RA81 partitioned a-b-g-h. The RA81 is the disk giving me the trouble. I originally found the problem when my nightly "fsck" of the file system detected multiple UNREFerenced files in the partition containing my user files(/dev/ra0g). Attempts to fix the mounted file system ALWAYS resulted in crashing the system and, hence, I learned not to do that any more. At the same time the system would also sporatically crash due to panic "pagein mfind" during times of heavy usage. The last time I racked up about 20 UNREFerenced files in my user file system I decided to check the bad inodes against those already resident in core. ALL of the UNREFerenced files were pure text images whose inodes were kept in core. To make matters worse, the inodes in core reported 0 link counts. Somehow this does not seem right to me. I have tried fixing the problem by calling sync several dozen times, but this does not always seem to work. In fact, the only sure-fire way to fix the problem seems to be unmounting and remounting the file system. Besides, I thought /etc/update was supposed to flush in core inodes, or does it just flush the superblock? What does "panic:pagein mfind" supposed to indicate anyway? The source code would seem to suggest that the kernel could not find a page of text that it was supposed to find. Is that correct? Any insight on the matter would be of great help. Although it is not a MAJOR inconvenience, I would like to run my system without having to remount /dev/ra0g every few days. Thanks for your help in advance. Bob Brankley University of South Florida, Engineering Computing Services CSNET: usfvax1!brankley@usf.edu UUCP: {ihnp4!codas, gatech}!usfvax2!usfvax1!brankley
chris@mimsy.UUCP (Chris Torek) (04/02/88)
This was posted on 1 April, but on the off chance it was serious, here is an answer: In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes: >I have been having a pretty wild problem on my VAX 11/750 running 4.3 >BSD .... I originally found the problem when my nightly "fsck" of the >file system detected multiple UNREFerenced files in the partition >containing my user files(/dev/ra0g). You cannot run fsck on an active file system. Among other things, it should not be necessary. Stop doing it. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
dce@mips.COM (David Elliott) (04/03/88)
In article <10900@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes: >>I have been having a pretty wild problem on my VAX 11/750 running 4.3 >>BSD .... I originally found the problem when my nightly "fsck" of the >>file system detected multiple UNREFerenced files in the partition >>containing my user files(/dev/ra0g). > >You cannot run fsck on an active file system. Among other things, >it should not be necessary. Stop doing it. Sadly, 4.3BSD comes this way. /usr/adm/daily.sh (an otherwise great way of doing periodic chores, superior to crontab, anyway) runs /etc/fsck with the -n option. Sure, the sync command is executed first, but that doesn't guarantee anything at all. When we first brought up 4.3BSD, we kept this new "feature". After a while, it got really annoying when we happened to be running news expires at the same time. As a side note, our next System V-based release contains a special "periodic execution" interface for administrators, using an interface similar to the rc directory interface in System V. Anyone wanting information can contact me. -- David Elliott dce@mips.com or {ames,prls,pyramid,decwrl}!mips!dce
ron@topaz.rutgers.edu (Ron Natalie) (04/03/88)
Don't run fsck on mounted and busy file systems. You'll destroy things in progress. Generally, one should never run FSCK on a mounted filesystem at all (except for the root which you have no choice). There are several causes of unreferenced files that are still in use, pipes, and certain programs will create them. Blowing them away with an FSCK is a BAD idea. -Ron
deke@socrates.ee.rochester.edu (Deke Kassabian) (04/05/88)
In article <1967@quacky.mips.COM> dce@quacky.UUCP (David Elliott) writes: >In article <10900@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes: >>>I have been having a pretty wild problem on my VAX 11/750 running 4.3 >>>BSD .... I originally found the problem when my nightly "fsck" of the >>>file system detected multiple UNREFerenced files in the partition >>>containing my user files(/dev/ra0g). >> >>You cannot run fsck on an active file system. Among other things, >>it should not be necessary. Stop doing it. > >Sadly, 4.3BSD comes this way. /usr/adm/daily.sh (an otherwise great >way of doing periodic chores, superior to crontab, anyway) runs >/etc/fsck with the -n option. Sure, the sync command is executed >first, but that doesn't guarantee anything at all. Sync may not guarentee anything, but the -n option does. What's the problem here? Using the -n option does not open the file system for writing. How wrong can you go? I find using fsck this way extremely useful, and the worst thats happened so far is a couple of reports of file system problems that were clearly the result of an "active" system. If they "go away" the next time fsck is run (at 4am) then I don't worry. If they hang around for a few days, its probably a legitimate problem and I'll deal with it then. And there have been a few, and I've caught them quickly this way. Overall this is far better than waiting for the next time a system crashes or otherwise reboots to run fsck. Is it really "smarter" to bring a system down to single user every X days to check file system consistancy?? \\\ Deke Kassabian, URochester Department of Electrical Engineering \\\ \\\ deke@ee.rochester.edu "I never metacharacter \\\ \\\ or ...!rochester!ur-valhalla!deke I didn't like......" \\\
chris@mimsy.UUCP (Chris Torek) (04/06/88)
In article <1237@valhalla.ee.rochester.edu> deke@socrates.ee.rochester.edu (Deke Kassabian) writes: >Sync may not guarentee anything, but the -n [fsck] option does. >What's the problem here? None, really, except that any error report is misleading: >worst thats happened so far is a couple of reports of file system problems >that were clearly the result of an "active" system. If they "go away" the >next time fsck is run (at 4am) then I don't worry. If they hang around for >a few days, its probably a legitimate problem and I'll deal with it then. >And there have been a few, and I've caught them quickly this way. We do not run nightly checks, although we do run checks after crashes and before each level 0 single-user dump (biweekly). Only after crashes, which nearly always are the result of power failures, have we had anything that needed fixing. The 4.3BSD file system code is just plain stable. (Of course, we have one kernel development machine which is sometimes more down than up....) In summary, if you are willing to put up with bogus error reports, the nightly `fsck -n's may be worthwhile. We have not found this to be the case here. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris