[net.unix-wizards] Unix Error Messages at Crash Time

ben%brandeis%csnet-relay@sri-unix.UUCP (12/20/83)

Is there anyone out there who knows what Unix error messages at crash time
mean? I am talking about the ones not explained in section 8 of volume 1.
Messages like "panic: mba, zero entry", "unit 0: random interrupt", or 
"machine check". 

CRASH(8) says the following about machine checks:

	 "We should describe machine checks, and will someday. For now,
	  ask someone who knows (like your friendly field service people)."

!!!????!!!

jsq@ut-sally.UUCP (John Quarterman) (12/22/83)

DEC has apparently just set up a facility at Colorado Springs for
interpreting Unix error messages.  Our DEC rep tried some soft ecs
errors on them and came up with the correct interpretation.  We
were also having machine checks and they managed those, too.
These were 4.2BSD errors on a VAX-11/780.
-- 
John Quarterman, CS Dept., University of Texas, Austin, Texas
{ihnp4,seismo,ctvax}!ut-sally!jsq, jsq@ut-sally.{ARPA,UUCP}

ggs@ulysses.UUCP (Griff Smith) (12/22/83)

With regard to the following:

    >Is there anyone out there who knows what Unix error messages at crash time
    >mean? I am talking about the ones not explained in section 8 of volume 1.
    >Messages like "panic: mba, zero entry", "unit 0: random interrupt", or 
    >"machine check". 

I suppose a direct reply would have been more appropriate, but with a path
like "...!sri-unix!ben%brandeis@csnet-relay" a mail response wouldn't stand
a snowball's chance in Hell of getting there.

"panic: mba, zero entry" happens under 4.1BSD and 4.2BSD when you read a
mag tape that has a hard read error.  It is caused by some brain damage in
mt.c that makes it assume that mba.c knows how to "read backwards".  When
mt.c gets the "read opposite" status from the tape controller, it passes
a "read backwards" request to mba.c, along with the buffer address and
buffer size.  Since this is "read backwards", mba.c is supposed to map
the pages of the buffer into the mba address space and then set the
initial input address to be the end of the buffer.  Unfortunately, it
leaves the starting address unchanged.  Tape input starts at the beginning
of the buffer, erases any innocent static or stack variables in front of
the buffer until it reaches the beginning of the page, then falls off the
end of the world.  If you are lucky, your process then aborts with a
strange error message resulting from using the text in those variables
as binary numbers.  If you are unlucky, the kernel is deranged and panics
when it tries to use a bent table.  As far as I can tell, you get the
panic if the input buffer is smaller than the input block and you get the
mangled static area if the buffer is larger than the input block.

"unit 0: random interrupt" should be "unit 0: non-data transfer error
interrupt, error status = xxxxxx".  I changed my mt.c to be something
like that, and found that the error status code is usually 32 (base 8).
My DEC tape controller manual says this means "TM fault B", otherwise
known as "I am broken, please fix me".  The error code in the LED display
inside the TM front panel gives further help to the DEC CE that you call
in when this happens.

I intend to fix these problems soon, unless someone posts reasonable
solutions and saves me the trouble.  Whether the fixes can escape the
proprietary black hole of AT&T Bell Laboratories is another matter.
-- 

Griff Smith	AT&T Bell Laboratories, Murray Hill
Phone:		(201) 582-7736
Internet:	ggs@ulysses.uucp
UUCP:		ulysses!ggs