dan@msdc.UUCP (Dan Forsyth) (02/13/86)
We're having a problem with one of our 785s that manifests itself in the following way: mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 mcr0: soft ecc addr 515e syn 16 ?INT STK INVAL ... <normal, error free reboot> We have nothing but DEC memory on the thing, so I'm making the assumption that the BRL release of 4.2 that we're running should be able to handle these errors correctly. I've looked at it and it seems to do reasonable things (but who am I to judge). This scenario is now occurring at least three times a week; no other memory errors show up at all. Each time we get a single (different) value for "addr" and always a "syn" of 16. And the system crashes. DEC has interpreted these addresses to refer to array 0. We're now on the third board. This weekend DEC replaced array 0 and the lower memory controller. The system stayed up about 9 hours. Does anyone have any experience with such behavior? Is it definitely hardware, or is the kernel doing something it shouldn't? Do we go for a new memory backplane next? Thanks, Dan Forsyth ({agkua,gatech,mcnc}!msdc!dan) Medical Systems Development Corporation, Atlanta, GA