SIT.BUSH@CU20B.COLUMBIA.EDU.UUCP (03/02/87)
Has anyone seen this problem? We have been getting a fair number of SBI faults logged in the error log on a 785. There do not seem to be any effects from this except the error log entries. The entires claim that the error is an unexpected read fault by TR#8 (a massbus adapter). The only massbus device is a single TM78 which is not is use when the errors occur. Digital Field Service has tried replacing the RH780 boards and the memory boards. This has not made the problem go away. Of course, it is an intermittent problem, so the diagnostics never see it. Has this happened to any systems you know of? If so, how was it resolved? - Nick Bush Sterling-Winthrop Research Institute ARPA: SIT.BUSH@CU20B.COLUMBIA.EDU BITNET: SIT.BUSH@CU20B -------
dp@JASPER.PALLADIAN.COM.UUCP (03/03/87)
Date: Mon 2 Mar 87 13:25:23-EST From: Nick Bush <SIT.BUSH@CU20B.COLUMBIA.EDU> Has anyone seen this problem? We have been getting a fair number of SBI faults logged in the error log on a 785. There do not seem to be any effects from this except the error log entries. The entires claim that the error is an unexpected read fault by TR#8 (a massbus adapter). The only massbus device is a single TM78 which is not is use when the errors occur. Digital Field Service has tried replacing the RH780 boards and the memory boards. This has not made the problem go away. Of course, it is an intermittent problem, so the diagnostics never see it. Has this happened to any systems you know of? If so, how was it resolved? - Nick Bush Sterling-Winthrop Research Institute ARPA: SIT.BUSH@CU20B.COLUMBIA.EDU BITNET: SIT.BUSH@CU20B ------- have them check the coaxial ribbon jumpers on the backplane. they fail eventualy (the ususal corrosion problems, and a 12 insertion hard limit on wigiling them). The first machine I had it happen on was showing bizzare memory errors (not so benign, the machine crashed instead) Adding to the short life was the fact the crews in "Touch Up" used to remove them to give the backplane a final dusting before shipping the machines. they seemed to have about a 2.5 year life in my machine room... (after I quit, I ran into my FS tech who told me when the second machine needed new ones.) These cables are (as far as I know, but not having seen the back of a BI machine) are only present on 780/782/785 series machines. <dp>
tencati@JPL-VLSI.ARPA.UUCP (03/03/87)
Nick, Please promise you won't laugh.. I had a problem where my 780 would bugcheck about 3 times a week. DEC Field Service escalated it to the point where I had 2 District guys sitting in the computer room waiting for the system to crash so they could run dumps and look at stuff. It was very interesting because the device that was causing the problem was an RM03 that was spun down and had been for about a month. Here's what the problem was: Our computer room is extra-cold due to another computer room sharing the A/C with us. It is also a water-cooled system as opposed to freon or some other chemical. This caused the humidity to be higher than usual in the computer room. Over an extended period of time, this humidity caused microscopic MOLD to grow on the gold plate of a couple pins on the Massbus adapter for the RM03. This mold would then cause electrical variations in on the board which would cause it to write a bogus value into memory. VMS would come along and try to execute the instruction...BUGCHECK... It took PAINSTAKING diligence on the part of DEC. They showed me the mold and I believed... They replaced the "moldy" board and my system worked fine (until the next problem), but they had fixed the bugcheck problem. So as weird as it may seem, have them clean the pins on your boards. It may solve your problem. It couldn't hurt in any case. Good Luck, Ron Tencati System Mgr, JPL-VLSI.ARPA
art@MITRE.ARPA.UUCP (03/04/87)
>Our computer room is extra-cold due to another computer room sharing the A/C >with us. It is also a water-cooled system as opposed to freon or some other >chemical. This caused the humidity to be higher than usual in the computer >room. Over an extended period of time, this humidity caused microscopic MOLD The use of water-cooled A/C should not in and of itself cause the humidity to be higher. Sharing the A/C with another computer is probably the reason that the room is colder. Being colder the humidity will be higher in your room than the other room. You really should work on correcting your environment. If you have poor environment you are just asking for troubles. It is possible to balance the two rooms but it is not easy. You need to determine the heat loads in the two rooms and control the air flow. I have seen some automatic systems that will use a second thermostat to control air flow to the second room. * *---Art * *Arthur T. McClinton Jr. ARPA: ART@MITRE.ARPA *Mitre Corporation MS-Z305 Phone: 703-883-6356 *1820 Dolley Madison Blvd Internal Mitre: ART@MWVMS or M10319@MWVM *McLean, Va. 22102 DECUS DCS: MCCLINTON *