[net.unix-wizards] VAX 750 machine checks

jim (11/01/82)

Ah yes, the old translation buffer parity fault problem.  This is apparently
quite common on 750s running Unix.  There was a flurry of info about it
several months ago, and it seems no one had a good solution other than
calling DEC out repeatedly.  Did anyone ever come up with a better solution?
If so, please post.  Our latest 750 has this problem; two older ones don't.

elman (11/03/82)

We had the same problem: periodic machine checks from translation buffer
parity faults.  This occurred randomly until, by chance, I wrote a program
which was able to reliably induce the check.

After asking around I discovered that the problem is fairly common, and
is associated with an early release of CPU board #3 (rev C or earlier).
Fortunately (for us) two other local sites had had this problem and
already convinced DEC to replace the board, even though the problem does
not show up under DEC diagnostics.  So DEC readily came out and swapped
us the rev C board for rev E, and the problem went away. This was
a month ago.

I'm not sure what the basis for the problem is, but there is some
interaction with memory.  The program that failed on our 750 (which
has 1 MB) ran ok on another that has more memory; furthermore, when
pared down (it had huge arrays of structures) it would even run ok
on our machine.  

I suggest you ask DEC for a new board 3.  The program that produces
the error on demand is somewhat large. The sources are huge.  I will
be happy to mail them out if someone really wants to force the failure,
but would prefer not to. It'd be a big file xfer.

	Jeff Elman
	Phonetics Lab., Dept. of Linguistics; U. C. San Diego
	ucbvax!sdcsvax!sdamos!elman
		or
	elman@nprdc