bossert@Thalatta.COM (John Bossert) (02/21/88)
Rn wouldn't let me cancel my previous article. Sorry. The true error message I was getting on a 11/780 with 8 Mb of interleaved memory was: mcr0: soft ecc addr 110dd syn 26 Again, using the formula addr = 0110dd26 (0xxxxxyy) this translates to an address way above 8 Megs. Does the interleaving have anything to do with this? Where is the bad board? -- In-Real-Life: John Bossert, Thalatta Corporation, (+1 206 643 7187) Domain: bossert@Thalatta.COM Path: uw-beaver!uw-entropy!thebes!bossert
parmelee@wayback.cs.cornell.edu (Larry Parmelee) (02/22/88)
In article <128@thebes.Thalatta.COM> bossert@Thalatta.COM (John Bossert) writes: > The true error message I was getting on a 11/780 with > 8 Mb of interleaved memory was: > > mcr0: soft ecc addr 110dd syn 26 > Credentials: We have an 11/780, with 8Mb of memory interleaved on two memory controllers, using 256kb memory boards, running 4.3BSD unix. Until recently, we had lots of memory errors; Finally with the help of our DEC Technician, we were able to figure out how to interpret that message to the board level. First, "mcr#" is "Memory Controller #". (That was easy, right?) The "addr" part is a little more interesting. We have 256kb memory boards, which works out to 0x40000 bytes per board. Memory error correction/detection works on 4-byte "globs", so a 256kb memory board has 0x40000(bytes) / 4 (bytes-per-glob) = 0x10000 (globs-per-board). Now, the number following "addr" in the message above is the "glob" number where the error occurred, so 0x110dd'th glob / 0x10000 (globs-per-board) = board number 1 (Ignore the remainder for now). For these purposes, the boards are numbered starting with 0 (the memory in each memory controller is considered individually). Physically, board 0 is the leftmost memory board in the given controller. The remainder from above and the syndrome (the number following "syn") can be used to figure out which chip had the problem, but with 256kb memory going for $25 a board (used) nowadays, it wasn't worth figuring out the rest. We just replaced the board(s). I think you said you had 1Mb boards; In that case, it would work out like this: 0x100000 bytes-per-board, or 0x40000 "globs" per board. Then 0x110dd / 0x40000 = board number 0. Have fun! -Larry Parmelee parmelee@wayback.cs.cornell.edu parmelee@cornell.uucp