jim (04/04/83)
I have performed a few experiments with memory known to be bad, and have discovered the following unsettling facts about 4.1bsd on a 750. Neither single-bit (correctable) nor multi-bit (uncorrectable) errors are caught by the kernel! Single-bit errors will be corrected, but you won't be notified that anything is wrong until the memory deteriorates to the point of uncorrectable errors. Multi-bit errors will not be caught. If they happen in user space, you will get strange, unpredictable behavior, usually core-dumps from ordinary programs like 'ls' and 'cat'. If the errors happen in kernel space, and you are lucky, you will get a crash, but there won't be any indication of why you got a crash. If you are unlucky, some kernel code or data will become corrupted, resulting in all kinds of strange behavior. I was sent the following fix by Peter Collinson. If you apply this fix, single-bit errors will be caught, but multi-bit errors will still be ignored (at least on my machine). If I find a fix for multi-bit errors, I will let you know. If you have a 750, I strongly urge you to put this fix in now, before your machine goes nuts. >From microsof!decvax!harpo!lime!ukc!pc Thu Mar 31 16:15:29 1983 >Date: Thu Mar 31 18:29:01 1983 To: lime!orion!ariel!vax135!floyd!harpo!decvax!microsof!uw-beave!jim Subject: Re: I have memory problems on my 750 You probably have lots of answers to this by now But there is a bug in the define statements for the memory controller on 4.1BSD for the 750. The appropriate lines in mem.h should read: #if VAX750 /* FIXES FROM JOHN SHEMELD - UKC */ #define M750_ICRD 0 /* Fix: inhibit crd interrupts, in [1] */ #define M750_UNCORR 0xc0000000 /* uncorrectable error, in [0] */ #define M750_CORERR 0x20000000 /* Fix: correctable error, in [0] */ #define M750_INH(mcr) ((mcr)->mc_reg[1] = M750_ICRD) #define M750_ENA(mcr) ((mcr)->mc_reg[0] = (M750_UNCORR|M750_CORERR), \ (mcr)->mc_reg[1] = 0x10000000) /* Fix */ #define M750_ERR(mcr) ((mcr)->mc_reg[0] & (M750_UNCORR|M750_CORERR)) #define M750_SYN(mcr) ((mcr)->mc_reg[0] & 0x3f) #define M750_ADDR(mcr) (((mcr)->mc_reg[0] >> 8) & 0x7fff) #endif Of course, if you were a member of the European UNIX User group, you would have already got these fixes. I am Peter Collinson lime!ukc!pc or phillabs!mcvax!ukc!pc