ben%brandeis%csnet-relay@sri-unix.UUCP (12/20/83)
Is there anyone out there who knows what Unix error messages at crash time mean? I am talking about the ones not explained in section 8 of volume 1. Messages like "panic: mba, zero entry", "unit 0: random interrupt", or "machine check". CRASH(8) says the following about machine checks: "We should describe machine checks, and will someday. For now, ask someone who knows (like your friendly field service people)." !!!????!!!
jsq@ut-sally.UUCP (John Quarterman) (12/22/83)
DEC has apparently just set up a facility at Colorado Springs for interpreting Unix error messages. Our DEC rep tried some soft ecs errors on them and came up with the correct interpretation. We were also having machine checks and they managed those, too. These were 4.2BSD errors on a VAX-11/780. -- John Quarterman, CS Dept., University of Texas, Austin, Texas {ihnp4,seismo,ctvax}!ut-sally!jsq, jsq@ut-sally.{ARPA,UUCP}
ggs@ulysses.UUCP (Griff Smith) (12/22/83)
With regard to the following: >Is there anyone out there who knows what Unix error messages at crash time >mean? I am talking about the ones not explained in section 8 of volume 1. >Messages like "panic: mba, zero entry", "unit 0: random interrupt", or >"machine check". I suppose a direct reply would have been more appropriate, but with a path like "...!sri-unix!ben%brandeis@csnet-relay" a mail response wouldn't stand a snowball's chance in Hell of getting there. "panic: mba, zero entry" happens under 4.1BSD and 4.2BSD when you read a mag tape that has a hard read error. It is caused by some brain damage in mt.c that makes it assume that mba.c knows how to "read backwards". When mt.c gets the "read opposite" status from the tape controller, it passes a "read backwards" request to mba.c, along with the buffer address and buffer size. Since this is "read backwards", mba.c is supposed to map the pages of the buffer into the mba address space and then set the initial input address to be the end of the buffer. Unfortunately, it leaves the starting address unchanged. Tape input starts at the beginning of the buffer, erases any innocent static or stack variables in front of the buffer until it reaches the beginning of the page, then falls off the end of the world. If you are lucky, your process then aborts with a strange error message resulting from using the text in those variables as binary numbers. If you are unlucky, the kernel is deranged and panics when it tries to use a bent table. As far as I can tell, you get the panic if the input buffer is smaller than the input block and you get the mangled static area if the buffer is larger than the input block. "unit 0: random interrupt" should be "unit 0: non-data transfer error interrupt, error status = xxxxxx". I changed my mt.c to be something like that, and found that the error status code is usually 32 (base 8). My DEC tape controller manual says this means "TM fault B", otherwise known as "I am broken, please fix me". The error code in the LED display inside the TM front panel gives further help to the DEC CE that you call in when this happens. I intend to fix these problems soon, unless someone posts reasonable solutions and saves me the trouble. Whether the fixes can escape the proprietary black hole of AT&T Bell Laboratories is another matter. -- Griff Smith AT&T Bell Laboratories, Murray Hill Phone: (201) 582-7736 Internet: ggs@ulysses.uucp UUCP: ulysses!ggs