[net.unix-wizards] 750 soft memory ECC and 4.1BSD

dmr (04/23/83)

A couple of weeks ago there were a couple of articles mentioning problems
in 4.1BSD's handling of soft memory errors on 11/750's.  I posted
a note observing that though some things were clearly wrong, one
suggestion involving the enable/inhibit bit for ECC traps was at variance
with the VAX hardware book.  Through correspondence with the authors
of the original articles, looking at 4.1c, and actual trial, I learned
the following:

1) The definition of bit 28 of memory CSR1 on p. 118 of the 82/83 VAX
Hardware book is flat wrong: the bit ENABLES soft ECC traps and does
not inhibit them.

2) The 4.1c and thus, presumably, future 4.2 tapes distributed by Berkeley
are correct, so mail appealing to Leffler is redundant.

3) My machine has at least 2 bad memory chips.

4) Unless the hardware handbook is wrong yet again, the masks
and shift in M750_SYN and M750_ADDR in 4.1BSD are also incorrect; thanks
to Tom Ferrin for the observation.  4.1c is correct.

For reference, here are the correct definitions of the 750 macros
in h/mem.h:

#if VAX750
#define	M750_ICRD	0x10000000	/* enable[sic] crd interrupts, in [1] */
#define	M750_UNCORR	0xc0000000	/* uncorrectable error, in [0] */
#define	M750_CORERR	0x20000000	/* correctable error, in [0] */

#define	M750_INH(mcr)	((mcr)->mc_reg[1] = 0)
#define	M750_ENA(mcr)	((mcr)->mc_reg[0] = (M750_UNCORR|M750_CORERR), \
			    (mcr)->mc_reg[1] = M750_ICRD)
#define	M750_ERR(mcr)	((mcr)->mc_reg[0] & (M750_UNCORR|M750_CORERR))

#define	M750_SYN(mcr)	((mcr)->mc_reg[0] & 0x7f)
#define	M750_ADDR(mcr)	(((mcr)->mc_reg[0] >> 9) & 0x7fff)
#endif


				Dennis Ritchie