[net.unix-wizards] Vax 750 correctable memory errors

haynes%ucsce.ucscc.UUCP%Berkeley@ucscc.UUCP (02/05/85)

<>

Under 4.2BSD:  Sometimes we get a correctable memory error reported on
the console, and it seems to persist until the system is rebooted.  Presumably
it is in an area of memory that is only read, such as the kernel or locore.
This suggests that (1) The memory controller hardware/firmware doesn't
write the corrected word back into memory when it corrects an error, but
leaves the bad word in there. (2) An appropriate thing for the software to
do in case of a correctable memory error would be to read the contents of
the location and write it back, thus making the correction 'permanent'.

Can anybody confirm that (1) is true and that (2) is appropriate?  Has
anybody patched the kernel code to do this?  Is this problem peculiar
to 750s?

ucbvax!ucscc!haynes

charisse@infoswx.UUCP (02/22/85)

If you are using DEC memory, we have experienced the same thing.
We get soft ecc errors on a particular board and they just go
on an on.  I've never tried just rebooting, our DEC rep said it
was probably a heat problem, so we shut down the machine and
yank each of the boards out and stuff them in again. We have run
for as long as 10 days before bothering to shut the system down
and have never had a hard error.  I'd be interested in any 
information about this problem. Thanx


charisse castagnoli
{ihnp4, allegra!convex, ctvax}!infoswx!charisse