[comp.unix.wizards] 4.3 BSD VAX 11/750 does not seem to sync its disks.

brankley@usfvax1.UUCP (Bob Brankley) (04/01/88)

I have been having a pretty wild problem on my VAX 11/750 running 4.3
BSD and I would like to see if anybody else is having the same problem.
It seems that 4.3 BSD is not periodically syncing in core inodes out to
disk, resulting in crashes.  My VAX has an RA60 partitioned a-b-f and an
RA81 partitioned a-b-g-h.  The RA81 is the disk giving me the trouble.

I originally found the problem when my nightly "fsck" of the file system
detected multiple UNREFerenced files in the partition containing my user
files(/dev/ra0g).  Attempts to fix the mounted file system ALWAYS resulted in
crashing the system and, hence, I learned not to do that any more.  At
the same time the system would also sporatically crash due to panic "pagein
mfind" during times of heavy usage.

The last time I racked up about 20 UNREFerenced files in my user file
system I decided to check the bad inodes against those already resident
in core.  ALL of the UNREFerenced files were pure text images whose
inodes were kept in core.  To make matters worse, the inodes in core
reported 0 link counts.  Somehow this does not seem right to me.

I have tried fixing the problem by calling sync several dozen times, but
this does not always seem to work.  In fact, the only sure-fire way to
fix the problem seems to be unmounting and remounting the file system. 
Besides, I thought /etc/update was supposed to flush in core inodes, or
does it just flush the superblock?  What does "panic:pagein mfind"
supposed to indicate anyway?  The source code would seem to suggest that
the kernel could not find a page of text that it was supposed to find. 
Is that correct?

Any insight on the matter would be of great help.  Although it is not a
MAJOR inconvenience, I would like to run my system without having to
remount /dev/ra0g every few days.  Thanks for your help in advance.


Bob Brankley
University of South Florida, Engineering Computing Services
CSNET:  usfvax1!brankley@usf.edu
UUCP:   {ihnp4!codas, gatech}!usfvax2!usfvax1!brankley

chris@mimsy.UUCP (Chris Torek) (04/02/88)

This was posted on 1 April, but on the off chance it was serious, here
is an answer:

In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes:
>I have been having a pretty wild problem on my VAX 11/750 running 4.3
>BSD ....  I originally found the problem when my nightly "fsck" of the
>file system detected multiple UNREFerenced files in the partition
>containing my user files(/dev/ra0g).

You cannot run fsck on an active file system.  Among other things,
it should not be necessary.  Stop doing it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

dce@mips.COM (David Elliott) (04/03/88)

In article <10900@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes:
>>I have been having a pretty wild problem on my VAX 11/750 running 4.3
>>BSD ....  I originally found the problem when my nightly "fsck" of the
>>file system detected multiple UNREFerenced files in the partition
>>containing my user files(/dev/ra0g).
>
>You cannot run fsck on an active file system.  Among other things,
>it should not be necessary.  Stop doing it.

Sadly, 4.3BSD comes this way.  /usr/adm/daily.sh (an otherwise great
way of doing periodic chores, superior to crontab, anyway) runs
/etc/fsck with the -n option.  Sure, the sync command is executed
first, but that doesn't guarantee anything at all.

When we first brought up 4.3BSD, we kept this new "feature".  After
a while, it got really annoying when we happened to be running news
expires at the same time.

As a side note, our next System V-based release contains a special
"periodic execution" interface for administrators, using an interface
similar to the rc directory interface in System V.  Anyone wanting
information can contact me.
-- 
David Elliott		dce@mips.com  or  {ames,prls,pyramid,decwrl}!mips!dce

ron@topaz.rutgers.edu (Ron Natalie) (04/03/88)

Don't run fsck on mounted and busy file systems.  You'll destroy
things in progress.  Generally, one should never run FSCK on a
mounted filesystem at all (except for the root which you have
no choice).  There are several causes of unreferenced files that
are still in use, pipes, and certain programs will create them.
Blowing them away with an FSCK is a BAD idea.

-Ron

deke@socrates.ee.rochester.edu (Deke Kassabian) (04/05/88)

In article <1967@quacky.mips.COM> dce@quacky.UUCP (David Elliott) writes:
>In article <10900@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>In article <271@usfvax1.UUCP> brankley@usfvax1.UUCP (Bob Brankley) writes:
>>>I have been having a pretty wild problem on my VAX 11/750 running 4.3
>>>BSD ....  I originally found the problem when my nightly "fsck" of the
>>>file system detected multiple UNREFerenced files in the partition
>>>containing my user files(/dev/ra0g).
>>
>>You cannot run fsck on an active file system.  Among other things,
>>it should not be necessary.  Stop doing it.
>
>Sadly, 4.3BSD comes this way.  /usr/adm/daily.sh (an otherwise great
>way of doing periodic chores, superior to crontab, anyway) runs
>/etc/fsck with the -n option.  Sure, the sync command is executed
>first, but that doesn't guarantee anything at all.

Sync may not guarentee anything, but the -n option does.  What's the problem
here?  Using the -n option does not open the file system for writing. How
wrong can you go?  I find using fsck this way extremely useful, and the
worst thats happened so far is a couple of reports of file system problems
that were clearly the result of an "active" system.  If they "go away" the
next time fsck is run (at 4am) then I don't worry.  If they hang around for
a few days, its probably a legitimate problem and I'll deal with it then.
And there have been a few, and I've caught them quickly this way.

Overall this is far better than waiting for the next time a system crashes
or otherwise reboots to run fsck.  Is it really "smarter" to bring a system
down to single user every X days to check file system consistancy??
 
 \\\  Deke Kassabian, URochester Department of Electrical Engineering  \\\
  \\\ deke@ee.rochester.edu                  "I never metacharacter     \\\
   \\\   or ...!rochester!ur-valhalla!deke     I didn't like......"      \\\

chris@mimsy.UUCP (Chris Torek) (04/06/88)

In article <1237@valhalla.ee.rochester.edu> deke@socrates.ee.rochester.edu
(Deke Kassabian) writes:
>Sync may not guarentee anything, but the -n [fsck] option does.
>What's the problem here?

None, really, except that any error report is misleading:

>worst thats happened so far is a couple of reports of file system problems
>that were clearly the result of an "active" system.  If they "go away" the
>next time fsck is run (at 4am) then I don't worry.  If they hang around for
>a few days, its probably a legitimate problem and I'll deal with it then.
>And there have been a few, and I've caught them quickly this way.

We do not run nightly checks, although we do run checks after crashes
and before each level 0 single-user dump (biweekly).  Only after
crashes, which nearly always are the result of power failures, have we
had anything that needed fixing.  The 4.3BSD file system code is
just plain stable.

(Of course, we have one kernel development machine which is sometimes
more down than up....)

In summary, if you are willing to put up with bogus error reports,
the nightly `fsck -n's may be worthwhile.  We have not found this to
be the case here.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris