[comp.unix.xenix] Bug in SCO 3.2 kernel -- please read

abs@sco.COM (Amy Snader) (10/06/89)

Dear Netfolk:

We have found a bug in SCO UNIX 3.2.0f that can cause
filesystem corruption and/or panics, under 
some rather unusual circumstances.  
Very few sites will ever be bitten
by this bug, but since it has a simple workaround, I urge you all
to install the workaround even if you are not at risk.

Provoking the bug requires that you bring the system down dirty, without
running /etc/haltsys or /etc/shutdown, and then
booting a kernel *different* than the one you were running before.
Switching between kernels that have had their kernel parameters
tuned in different ways should be fine, but
switching between pairs of kernels that contain
different sets of device drivers can sometimes trigger the problem. 

When you reboot the system using the new kernel the filesystem
will be dirty from the previous abnormal shutdown, and fsck will
ask you if you want to clean your filesystem.  If you
answer "yes" to this question, fsck will clean the filesystem
then attempt to remount the root device.  
The bug is in the code that remounts the root -- some data
can be lost across the remount.  Symptoms of this can include 
a panic, a trap, or (rarely) actual filesystem corruption.

If a panic occurs, it will be a "panic: trying to free
already free block".  If a trap occurs, the `eip' register
will point to an address in the kernel routine `getfblk'.  
There's not much I can say about the filesystem corruption,
except to assure you that it is the least likely
of the possible scenarios.

This bug has a very simple workaround.  If the
filesystem has been modified at the time that fsck is run,
the bug will not occur.  
The shell script that causes fsck to be run is called 
`/etc/bcheckrc'.  By adding a line to this script that slightly
modifies the filesystem before running fsck,
the bug can be prevented.

In the file `/etc/bcheckrc', immediately before the lines
that read:
	[ "$dofsck" ] &&
	/bin/su root -c "/etc/fsck -y -s -D -b -a ${rootfs}"

place the line:
	> /lost+found/magic_file ; rm /lost+found/magic_file

Note that the name "magic_file" is not significant.  You
can rename this file as you like, but I do recommend that you place
the file within "/lost+found", because that directory
has slots preallocated for some new files.   
If perchance the directory /lost+found does not exist, 
please create it.

This bug has been fixed in the upcoming Open Desktop release
of 3.2.  No support-level kernel fix is planned, though,
because the bug can so easily be worked around.
The bug does not affect any release of Xenix.

If any of you have any questions about this bug, 
please feel free to mail me.
	--Amy 
	(uunet!sco!abs decvax!microsof!sco!abs)