bruce@godot.UUCP (Bruce Nemnich) (12/15/84)
Nor did I mean to jump on irwin; I realize he was quoting DEC. When I first started fighting this problem, several DEC folks as much as said it was a unix bug: they will usually say something like, "It is a problem with 4.2bsd on a 750" rather than, "It is a problem with the 750." During the 5 months this machine ran VMS prior to conversion to Unix, it crashed once every three or four weeks because of the same problem. Other times it would simply kill the process which was running when the machine check occured. After switching to Unix, the frequency increased by a factor of about 3. When I fixed the incorrect mask for the tbuf error in machdep.c, it cut the crashes in half. After trying many board swaps, I finally got one whose error incidence was much lower. But on to better things.... I had Rev 7 installed on Tuesday. Barry Lustig was kind enough to send me code from Jim McKie (mcvax!jim) to load the patchable control store file off disk as part of the boot sequence. I have had no problems after 3+ days. However, one of the guys doing the installation said that the Rev 7 upgrade actually makes the problem worse by altering the way the machine checks (?!) on tbuf and/or cache errors. I don't know what he meant by this (and I am not sure he did), and I have not tried to verify it. The Rev 7 FCO description says it is fixed: "TB parity error machine checks, due to TB RAM soft errors, are fixed by CMT098 micro-code.... CMT098 will perform single-retry recovery attempt, per macro-instruction, without error-report; but will generate standard Machine Check on subsequent or hard errors." -- --Bruce Nemnich, Thinking Machines Corporation, Cambridge, MA ihnp4!godot!bruce, bjn@mit-mc.arpa ... soon to be bruce@godot.arpa