tarsa@decvax.dec.com (06/26/90)
We have a hardware problem and I need to lean on the collective experience of this group for help. I am primarily a software weenie and what I know of hardware I have learned by trying to keep our machines running on our shoestring budget. This problem is stretching my abilities. The configuration in this story consists of a SUN 3/140 and a SUN 2/120 368M disk expansion pedestal containing Fujitsu 2322's. The cabling between them consists of twisted pair ribbon cables directly connecting the drives to the controller. Last week we had a problem where the disk pedestal was plugged into a socket that had no ground on the grouding pin. The system was useless and examination with a VM showed between 30 and 110V AC flowing between the two chassis via the data and command cables. The 30V flowed when the power switch was 'off', the 110 when it was 'on'. The voltage leak appeared to be coming from the AC line filter (I think it is an AC line filter--its the metal box that the incoming AC runs through prior to entering the power supply). Reconnecting ground solved the AC current flow problem. Is it normal for the AC filter to be conducting current to ground regardless of the power switch setting? Anyway, the system would not boot. Somehow that didn't seem like too much of a surprise given the situation, though I was not aware that simply losing a ground could be so catastrophic. However, I need to determine exactly what is broken so that I can get it fixed. Test #1 consisted of replacing the controller and Multibus-to-VME controller assy, assuming that the controller board got fried by the current would have flowed through it to ground in the CPU box. No change. Test #2, consisted of leaving the new controller assy in place and taking my only spare 2322 and plugging it into the disk box in place of one of the questionable drives. Booting the drive from the console appeared to give me a readable disk, though not usable at the time, since it is loaded with SUN2 images. Booting stand/diag over the network from a hastily-established 'server' showed the drive as readable and properly formatted (diag could read the labels). Unfortunately, I used this as a kind of boolean test and immediately went on to another test rather than 'playing around' with this working disk. On the strength of the Test #2, I assumed that some of the disk electronics were kaput and so Test #3 consisted of replacing the most easily accessible of the boards on one of the bad drives with one from the good drive. This resulted in a disk that diag showed to work for about 20-30 seconds before I began to get 'lost interrupt' errors and then 'no return status' errors--the same errors that the broken drives exhibited at boot time. In an attempt to back out, I put the 'good' drive board back on the good drive and re-ran Test #2. My 'good' disk only works for about 3-5 minutes before exhibiting the same problems as the bad disks. But since I didn't spend a lot of time in Test #2 the first time, I am not sure if I destroyed anything or not. I then checked the power supply outputs for the disks (something I now know I should have done prior to Test #1), but the voltages appear to be fine. Anyone out there recognize anything from this scenario? Not being electronically trained, I have no idea what kind of problems a floating ground could cause and my "Field Service Engineer" experience has led me to a blank wall. Any help would be appreciated. Send mail to me at tarsa@abyss.dec.com for now, since our mail gateway is the afflicted machine. Thanks, Greg Tarsa 33 Seabee Street (no mail to this address today) Bedford, NH 03102 tarsa@elijah.mv.com (603)668-9226 {decuac,decvax}!elijah!tarsa