[net.unix-wizards] VAX soft error messages

rivero@kovacs.UUCP (Michael Foster Rivero) (07/09/85)

	Anybody out there know what the soft error messages on the system
	console really mean? So far, nobody an tell us anything more
	than,"It's some kinda soft memory error message from the
	error correction system." Big help! We can't figure out where the
	flaky memory is supposed to be.

	Typical message on the console looks like....

	Msoft  ecc 19af sym (58) corrected

	We're starting to get more of them. Whatever bug is in the system
	is BREEDING!

					Thanks in advance
						Mike Rivero

chris@umcp-cs.UUCP (Chris Torek) (07/13/85)

"soft ecc" messages come from all sorts of places in all sorts of
Unix kernels.  However, the one you mentioned sounds like a 780 or
750 memory controller (mcr) ecc error.  The 4BSD kernels print
something like this:

	mcr%d: soft ecc addr %x syn %x

You can then use your manufacturer's tables to look up the "address"
and "syndrome number"; this will point you to the bad chip.  The
tables vary for different memory boards and systems.  4.2/4.3 BSD
kernels will even (optionally) point you to the chip, IF you have
all Trendata boards....

(Maybe someday I'll put in the National Semi board tables.  Too bad
there's no way to pull a board identifier out of the controller.
Also too bad DEC doesn't seem to give out ECC tables---then again
DEC solders the chips in directly (better for stability, worse for
repairs).)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland