[comp.unix.xenix.sco] fsck fails on /dev/root under SCO

pf@artcom0.north.de (Peter Funk) (06/03/91)

In <1991Jun02.184143.15566@virtech.uucp> 
	cpcahil@virtech.uucp (Conor P. Cahill) writes:
[....on problems with fsck...]
cpc> >Opinions accepted, but please document them as such!

cpc> All this stuff is opinions.

Taking up this thread I try to tell you something about our experiences 
with several Systems running SCO XENIX 386 rel. 2.3.2.
Every once a while a customer presses the RESET, or turns power off, or...
Sigh ... you surely known this kind of PeeCee-Users.

We 've experienced the following Problem, when running fsck on the 
root-filesystem (which is automatically invoked during the next boot):

The superblock or the free list header (I not familar enough with the 
internals) is NOT correctly rewritten from fsck, leaving the whole file 
system somehow corrupt under certain circumstances.  

In some rather rare situations the file system will end up in a state, 
where following attempts with 'fsck' will exit during phase 1 without 
any further error message (only the nonzero exit code indicates an error).

Since I've discovered this problem, we instruct our customers to use
a specially prepared "emergency boot"-floppydisk, which automatically runs
'fsck' from floppy on /dev/hd0root.  This workaround solves the problem,
but the manual handling is ... uh.... cumbersome ? In german I would 
call it "nervig" ;-)

Regards, Peter
P.S.: Excuse my awful english grammar and spelling
-=-=-
Peter Funk \\ ArtCom GmbH, Schwachhauser Heerstr. 78, D-2800 Bremen 1
Work at home: Oldenburger Str.86, D-2875 Ganderkesee 1/phone : no chance ;-)

bill@franklin.com (bill) (06/04/91)

When fsck'ing the root file system, you must always reboot your
system immediately after the fsck, if fsck changes anything, and
you must do so without syncing. (There are some exceptions, but
why bother?)

There is probably an option on your fsck that will cause it to do
the necessary reboot automatically. You should use it if it is
there. If fsck won't do an automatic reboot but will return an
indication that it has changed the file system, you should have
your startup script immediately bring your system down when fsck
indicates a change. NB: the shutdown script will do the sync, so
you can't use it. You may have a command like uadmin or something.

In article <3206@artcom0.north.de> pf@artcom0.north.de (Peter Funk) writes:
: The superblock or the free list header (I not familar enough with the
: internals) is NOT correctly rewritten from fsck, leaving the whole file
: system somehow corrupt under certain circumstances.

What happens is this: fsck is doing I/O directly to the disk. When
it updates the superblock, it writes the correct thing to the
disk. However, there is a copy of the superblock sitting in the
kernel somewhere, which is *not* updated (for what may have been
an adequate reason way back when but for what is now no good
reason at all). If anything causes that old copy of the superblock
to be written to the disk, you lose.

: Since I've discovered this problem, we instruct our customers to use
: a specially prepared "emergency boot"-floppydisk, which automatically runs
: 'fsck' from floppy on /dev/hd0root.  This workaround solves the problem,
: but the manual handling is ... uh.... cumbersome ? In german I would
: call it "nervig" ;-)

If you don't have some way with fsck to force a reboot, what you
want to do is this: make sure that your systems come up in single
user mode. Instruct your customers to run fsck on the root
manually, reboot without *any* other commands being entered after
the fsck, and then enter the normal operating mode (via init 3 or
whatever).

: P.S.: Excuse my awful english grammar and spelling

Actually, it is quite good. There are many supposedly literate
natives who don't do as well. The only glaring things were those
extra capital letters. We're nowhere near as consistent as you
Germans on what we capitalize. :-)

jeffl@comix.UUCP (Jeff Liebermann) (06/06/91)

In article <3206@artcom0.north.de> pf@artcom0.north.de (Peter Funk) writes:
>In <1991Jun02.184143.15566@virtech.uucp> 

[ good info on fsck deleted]

>In some rather rare situations the file system will end up in a state, 
>where following attempts with 'fsck' will exit during phase 1 without 
>any further error message (only the nonzero exit code indicates an error).
 
From SCO's sosco support fix xnx124....
-----------------------------------------------------------------------
The enclosed Support Level Supplement contains a new /bin/fsck which
solves problems experienced with fsck quitting silently under SCO XENIX
386 Operating System Releases 2.3.0, 2.3.1, or 2.3.3 for AT architecture
machines. It is engineered specifically for these releases of the Operating
System, and should not be installed on a system running a release other
than these.
-----------------------------------------------------------------------

-- 
# Jeff Liebermann   Box 272   1540 Jackson Ave     Ben Lomond    CA  95005
# 408.336.2558  voice           WB6SSY @ KI6EH.#NOCAL.CA.USA   packet radio
# 408.699.0483  digital pager   73557,2074       cis
# jeffl@comix.santa-cruz.ca.us  uunet!comix!jeffl  jeffl%comix@ucscc.ucsc.edu