[unix-pc.general] Endless bummer...

jcm@mtunb.ATT.COM (was-John McMillan) (08/03/89)

In article <802@bagend.UUCP> jan@bagend.UUCP (Jan Isley) writes:
>In article <9832@csli.Stanford.EDU> crimmins@csli.stanford.edu (Mark Crimmins) writes:
>>This has happened to me a couple of times, and I wonder if anyone
>>knows why.  I turn on my 3b1 (3.5M 67HD rev. 3.5 sys and utils) and it
>>goes through the normal boot procedure until the "checking stored
>>files" screen turns to gibberish and then the boot procedure starts
		^^^^^^^^^^^^^^^^^^
>>over (and over and over).  The problem goes away when I "upgrade" all
>>system files from floppy, including utilities.
:
>FSCK has a nasty habbit of saying it fixed a problem it found in the file
			    ^^^^^^^^^^^^^^^
>system when it really did not fix it.  You know it found a problem if the
>system does a reboot after the "checking stored files" routine.  Usually
>the system will come up after the second time through.  But, sometimes
>the problem was not really fixed, and fsck will *never* be able to fix
>it on a mounted file system.
:

Maybe someone has clarified this by now.  (Or maybe I'm missing the
exact scenario: I get lost in the technical jargon of "turns to gibberish".)
(Well, I guess I understand that term regarding a friend's daughter....)

Occsionally, you get folks who miss the point that you can fool the
File System some of the time -- but not ALL of the time:
      +	WRITING A FILE uses/consumes/alters the File System FREE-LIST.
      +	FSCK (often) re-writes the File System FREE-LIST.
    Ergo:
      +	It's a less than brilliant move to have FSCK write a log FILE
		on the File System it's manipulating -- this intrinsically
		attempts to alter the very data fsck's correcting.
    Example:
      + If the Free List is corrupted -- perhaps it was even the CAUSE
		of the crash -- the FSCK log file is building an INODE
		(file) using that corrupted data, and building it in
		RAM while the DISK is being fixed.  Then it gets moved
		to the disk....
      + Or, maybe, that INODE is written to disk, but the Superblock,
		as created by FSCK, still marks some of those now-used
		blocks as FREE....  Time for "Duplicate BLOCKS"
		(or whatever).

I DON'T know of any FSCK errors -- FSCK probably DOES correct the problems.
But SOMEONE wrote an RC script that corrupts the data AS IT IS BEING
CORRECTED.  This has been discussed, here, many times.  It will be discussed
many more.  I have requested this be fixed w/in AT&T sources.

	The only correction is to ELIMINATE any saving of FSCK
	output IN A FILE on the same FS being checked.  Period.

So far, I trust FSCK far more than most C programmers I know !-)

john mcmillan	-- att!mtunb!jcm

jan@bagend.UUCP (Jan Isley) (08/04/89)

In article <1582@mtunb.ATT.COM> jcm@mtunb.UUCP (John McMillan) writes:
>In article <802@bagend.UUCP> jan@bagend.UUCP (Jan Isley) writes:
>>In article <9832@csli.Stanford.EDU> crimmins@csli.stanford.edu (Mark Crimmins) writes:

To summarize:

Mark describes a problem...

I offer a simple suggestion to fix his problem...

John offers a series of comments that are IMHO, quite clearly designed to
make himself look totally enlightened about Mark's problem and anything 
else one could possibly think of, while adding smug commentary about our
use of the wonderfully rich English language.

John then describes what I understand to be a correct assesment of the
behavior of fsck, far better than I did of course, then offers:

>	The only correction is to ELIMINATE any saving of FSCK
>	output IN A FILE on the same FS being checked.  Period.

This is *exactly* what my suggestion does.  I offered a way to do this.
Where is your suggestion?  What is your point?

John, you may be smarter than the average bear, but the bears that I have
met have had better manners.

Jan
---
jan@bagend | gatech!bagend!jan | h (404)434-1335 | w (404)425-5700
	Humankind cannot bear very much reality.   T. S. Eliot

wilkes@mips.COM (John Wilkes) (08/05/89)

In article <822@bagend.UUCP> jan@bagend.UUCP (Jan Isley) writes:
>In article <1582@mtunb.ATT.COM> jcm@mtunb.UUCP (John McMillan) writes:
>>In article <802@bagend.UUCP> jan@bagend.UUCP (Jan Isley) writes:
>>>In article <9832@csli.Stanford.EDU> crimmins@csli.stanford.edu (Mark Crimmins) writes:
>
>To summarize:

[summary, flame, and whining deleted]

I was going to send private e-mail to Jan, but against my better judgement
decided to flame him publicly.

Jan, your simple suggestion to fix Mark's problem is certainly useful,
however, I believe John's description of the problem was concise and also
useful.  You did not explain the nature of the problem at all, merely
offered a way to solve it.  John did not really offer a concrete solution,
but he did describe what is going on when you redirect the output of fsck
to the file system being checked.  He also suggested that he's made some
sort of "official" request within the bowels of ATT to have this
addressesed by those who maintain the sources.

Both commentaries have value.  What's your problem?  I did not infer that
John was saying your solution was incorrect in any way; did you?  Were your
feathers ruffled somehow?  Are you that thin-skinned?  You will not find
satori that way.

Your response sure sounded like a personal attack to me.  If you and John
have some personal history that has now flared up in public, please put it
back in a box or pick nits somewhere else.  This is not appropriate for the
unix-pc groups (not that I stake any claim to being a net.policeman.)

Jan, your manners are every bit as bad as John's (and mine aren't visibly
better, either.)

Take note that followups have been directed to alt.dev.null and that I will
post no more on your or John McMillan's personal problems.  I will respond
to private e-mail, however.

-- 
-wilkes

wilkes@mips.com   -OR-   {ames, decwrl, pyramid}!mips!wilkes

jbm@uncle.UUCP (John B. Milton) (08/07/89)

In article <1582@mtunb.ATT.COM> jcm@mtunb.UUCP (John McMillan) writes:
>In article <802@bagend.UUCP> jan@bagend.UUCP (Jan Isley) writes:
>>In article <9832@csli.Stanford.EDU> crimmins@csli.stanford.edu (Mark Crimmins) writes:
>>>This has happened to me a couple of times, and I wonder if anyone
>>>knows why.  I turn on my 3b1 (3.5M 67HD rev. 3.5 sys and utils) and it
>>>goes through the normal boot procedure until the "checking stored
>>>files" screen turns to gibberish and then the boot procedure starts
>		^^^^^^^^^^^^^^^^^^
>>>over (and over and over).  The problem goes away when I "upgrade" all
>>>system files from floppy, including utilities.
>:
Hmm. Turn to gibberish. I would read that to mean the binary count pattern
the boot ROM puts up when it test video RAM.

I modified my /etc/rc a long time ago. Yeah yeah. Well, lets take a closer look.
Let's look at the relevant code without the bogus comments.

/etc/fsck -pq > /dev/null || (
  if [ -r /etc/.installdate ]; then
    date > /etc/.lastfsck
    /etc/fsck -y >> /etc/.lastfsck
  else
    /bin/sh
  fi
)

The -p for preen switch seems to be unique to the UNIXpc. The man page for fsck
specifically mentions this feature should be used in /etc/rc for un-attended
booting. Booting does not always happen when the stupid comments say it will.
When the "fsck -pq" finds and fixes minor problems it WILL reboot the system.
The vast majority of all file system problems are minor. When minor fixes are
complete and the system reboots, the || part obviously never gets run. If the
"fsck -pw" finds something real nasty, it exits with a bad status and the ||
part does get executed. If you are just now installing the system, the file
.installdate will not exist, and it will just dump you at a shell prompt with
scary error messages. If your system has been installed, the redirection to
/etc/.lastfsck is done with an "fsck -y". I do very much agree that this is
very stupid. The correct way to do this would have been to overwrite a pre-
existing, pre-allocated file, much the same way /lost+found is used, using a
special switch to fsck, say -L. If too much is written, output just stops.
The way it is with the installed /etc/rc through, when things are really
bad, /etc/rc makes it worse. Hmm. maybe bad enough for a, ah, service call?

John
-- 
John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu
(614) h:294-4823, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!