[comp.sys.mips] root file system not checked on reboot

peregier@watvlsi.waterloo.edu (Phil Regier) (08/01/89)

On our MIPS M-2000 running RISC/os 4.00, the root file system is not
being checked when the system reboots.  MIPS has been notified and is
looking at it, but I thought others may want to check if the same
thing is happening to them.  The message on the console says that the
root file system is being checked, but it really isn't (we use the
fast file system).  We get the following on the console:

Checking root file system () automatically.
The system is coming up.  Please wait.

If one looks in the /etc/bcheckrc file where this checking is supposed to
start, one finds that for each return code of fsck, there should be
a message echoed after the "Checking root ..." message which never gets
printed (as can be seen from the relevant portion of the bcheckrc file below).
Also, no messages from fsck itself are printed.
This is because the fsstat command has completed with exit code 0 and
the script runs exit before it gets to fsck.  This is not good, since
we had a corrupted root file system; to fix it I had to boot from another
root partition and fsck the original root file system manually.
How a command like fsstat can determine when a root file system needs
checking without actually running fsck is beyond me.  Just removing
the fsstat line does not help.  The root file system does get checked,
but even it is OK, it exits with exit code 4 and reboots the system,
checks the root file system, exits with exit code 4, ad infinitum.

bcheckrc_ffs() {
	fsstat $ROOTFS  >/dev/null 2>&1 && exit 0; 

	fsck.ffs -d -y $ROOTFS 
	case $? in
	0)	echo "fsck.ffs: finished normally"
		;;
	2)	#
		# Berkeley fsck semantics say enter single user now.
		# Wish I could.
		#
		echo "fsck.ffs: received SIGQUIT -- ignored."
		;;
	4)	echo "\nAutomatically Rebooting UNIX."
		uadmin 1 1;
		echo "Auto-reboot failed.  Reboot UNIX manally."
		;;
	8)	echo "fsck.ffs: abnormal exit."
		;;
	12)	echo "fsck.ffs: received SIGINTR -- disk checks terminated."
		;;
	*)	echo "fsck.ffs: unknown exit status."
		;;
	esac


P.S.
The message should read
Checking root file system (/dev/root) automatically.
The variable in the shell script is in the wrong case, but this is
just cosmetic.

rogerk@mips.COM (Roger B.A. Klorese) (08/02/89)

In article <4529@watvlsi.waterloo.edu> peregier@watvlsi.waterloo.edu (Phil Regier) writes:
>How a command like fsstat can determine when a root file system needs
>checking without actually running fsck is beyond me.

It does some of the preliminary work of fsck.  It checks for a superblock, and
tests the BFS_CLEAN and BFS_MOUNT bits.  As BFS_CLEAN is set by fsck, and
reset by mount if it discovers that it was mounted at the time the system 
went down, this is being used to predict the cleanness of the filesystem.

The logic seems to be that the only way to have a damaged filesystem if it
was dismounted cleanly would be if something scribbled on the unmounted
device file.
-- 
ROGER B.A. KLORESE      MIPS Computer Systems, Inc.      phone: +1 408 720-2939
928 E. Arques Ave.  Sunnyvale, CA  94086                        rogerk@mips.COM
{ames,decwrl,pyramid}!mips!rogerk
"I want to live where it's always Saturday."  -- Guadalcanal Diary

lamy@ai.utoronto.ca (Jean-Francois Lamy) (08/02/89)

Our SGI 4d/240 just did similar deeds (SysV admin stuff, like the MIPS).  Root
was fscked, machine rebooted.  fsck of root is skipped, even though it is in
fact still corrupt.  /usr is then found dirty, machine fsck's it.  The reset
button is pressed, the machine reboots. Both root and /usr fscks are skipped,
except both are still corrupt.

The people who came up with this brilliant scheme apparently have never seen a
partition that requires 3 fscks before being coming out clean.  We will most
likely put in a good old forced fsck of root and usr no matter what, and find
a way to prevent the machine from going multi-user when anything but the most
minor damage is repaired.

This likely belongs in comp.bugs.sys5

Jean-Francois Lamy               lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4

jabarby@watvlsi.waterloo.edu (Jim Barby) (08/02/89)

In article <24497@abbott.mips.COM>, rogerk@mips.COM (Roger B.A. Klorese) writes:
> In article <4529@watvlsi.waterloo.edu> peregier@watvlsi.waterloo.edu (Phil Regier) writes:
> >How a command like fsstat can determine when a root file system needs
> >checking without actually running fsck is beyond me.
> 
> It does some of the preliminary work of fsck.  It checks for a superblock, and
> tests the BFS_CLEAN and BFS_MOUNT bits.  As BFS_CLEAN is set by fsck, and
> reset by mount if it discovers that it was mounted at the time the system 
> went down, this is being used to predict the cleanness of the filesystem.
> 
> The logic seems to be that the only way to have a damaged filesystem if it
> was dismounted cleanly would be if something scribbled on the unmounted
> device file.
> -- 
> ROGER B.A. KLORESE      MIPS Computer Systems, Inc.      phone: +1 408 720-2939

Roger, is the root filesystem not an unmounted filesystem (ie always
there)?  If so, the above logic just does not wash.

In the case Phil mentioned, the root filesystem was corrupted at the
MIPS plant.  That is MIPS shipped a hot disk that had not been fsck'd
properly.


-- 
	Jim Barby  (U of Waterloo VLSI Group, Waterloo Ont.)
	jabarby@watvlsi.waterloo.{cdn,edu,bitnet}
	jabarby@watvlsi.uwaterloo.ca

rogerk@mips.COM (Roger B.A. Klorese) (08/03/89)

In article <4533@watvlsi.waterloo.edu> jabarby@watvlsi.waterloo.edu (Jim Barby) writes:
>In article <24497@abbott.mips.COM>, rogerk@mips.COM (Roger B.A. Klorese) writes:
>> The logic seems to be that the only way to have a damaged filesystem if it
>> was dismounted cleanly would be if something scribbled on the unmounted
>> device file.
>Roger, is the root filesystem not an unmounted filesystem (ie always
>there)?  If so, the above logic just does not wash.

I agree.  I was just explaining the logic in the code.  I believe that it
is not sufficiently pessimistic.  In fact, the special check for root is
simply "is it not marked clean"; I'm not sure this can ever not be the case.

We are looking for solutions.  We believe we have a general improvement in 
mind for fsck at startup, but it is early to tell if it will make it into
the next major release.
-- 
ROGER B.A. KLORESE      MIPS Computer Systems, Inc.      phone: +1 408 720-2939
928 E. Arques Ave.  Sunnyvale, CA  94086                        rogerk@mips.COM
{ames,decwrl,pyramid}!mips!rogerk
"I want to live where it's always Saturday."  -- Guadalcanal Diary