root@icus.islp.ny.us (ICUS Administrator) (01/12/90)
For those trying to reach me, ICUS (this machine) has been up and down (mostly down) for the past 3 days. The machine died. The reason/cause is still unknown. I _do_ have a good backup, I did a reformat and restore tonight hoping to solve the hard disk problems, but hard disk errors still appear. Something had happaned that caused my external drive, a Miniscribe 6085, to die completely. The PCBA (read/write board) died, and now it only "blinks error codes". I did have a spare 67MB drive, and we (thanx Gil) swapped boards. That saved the "media" on the external drive. Of course my main drive CDC Wren II is still a problem, although I feel good to have a complete backup on two full tapes. Initially, when the system first died, I woke up with just a [working icon] a phone manager and a window manager window on the screen. Just dots, with no smgr and getty (both which access the disk periodically). The keyboard was still responding, and the system was still beating (c/o LEDs). I thought it took some weird power hit so I pressed the RESET. Then I got the loader, booted /unix and got: *****Disk read error. This was consistent. Eventually we removed the drive from the case (and yes, the spindle was spinning and heads sounds like they were moving). and it sometimes passed diags, sometimes it didn't. Eventually we got it to pass diags more often. So we figured either dust or so other problem (like loose power connection) might have caused the problems. Put the machine back together and it booted. It ran for about 1 hour and then died with the same problems. My sysinfo program spat out at me: sysinfo: cannot read superblock for /dev/rfp002 And everything else got: "Killed" I thought there were bad blocks in the swap partition, but unix.log showed nothing (at that time) and now I know the reformat didn't help. Tomorrow I'm going to swap power supplies, and after all that we might swap the TTL chips on the ICUS extention board for the two drives. Something might be flaking, and I might have taken a power surge afterall... Time to buy a UPS, any suggestions? :-} The right now the errors I'm getting are... HDERR ST:51 EF:10 CL:78 CH:0 SN:8 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:C900 Thu Jan 11 23:10:41 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=120. sc=8. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:10:42 1990 drv:0 part:2 blk:3628 rpts:1 Thu Jan 11 23:10:42 1990 HDERR ST:51 EF:10 CL:5C CH:0 SN:E SC:2 SDH:26 DMACNT:FFFF DCRREG:9E MCRREG:8300 Thu Jan 11 23:11:36 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=92. sc=14. hd=6. dr#=0. MCR2:0x0 Thu Jan 11 23:11:37 1990 Drv:0 part:2 blk:1607 rpts:1 Thu Jan 11 23:11:38 1990 HDERR ST:51 EF:10 CL:5C CH:0 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:11:38 1990 HDERR ST:51 EF:10 CL:46 CH:0 SN:4 SC:2 SDH:24 DMACNT:FFFF DCRREG:9C MCRREG:C900 Thu Jan 11 23:11:39 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=70. sc=4. hd=4. dr#=0. MCR2:0x0 Thu Jan 11 23:11:39 1990 HDERR ST:51 EF:10 CL:9E CH:2 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:15:02 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=670. sc=2. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:15:05 1990 HDERR ST:51 EF:10 CL:9E CH:2 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:15:06 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=670. sc=2. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:15:06 1990 HDERR ST:51 EF:10 CL:9F CH:2 SN:0 SC:2 SDH:24 DMACNT:FFFF DCRREG:9C MCRREG:CD00 Thu Jan 11 23:18:25 1990 WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=671. sc=0. hd=4. dr#=0. MCR2:0x0 Thu Jan 11 23:18:27 1990 [...] And when I run elm2.2 I get: Killed and... HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8700 Fri Jan 12 00:03:29 1990 WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:03:34 1990 HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8700 Fri Jan 12 00:03:36 1990 WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:03:39 1990 HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8300 Fri Jan 12 00:05:05 1990 WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:05:10 1990 [Please note the extra diagnostics are from the UNIX 3.51dG1 I'm running, it gives some more verbose output...] Oh well. Suggestions? Of course ICUS will remain down for now. I hope to have it running ASAP. Please forward any *important* mail to another Email address: ...lenny@sbcs.sunysb.edu, sbcs!alps!lenny, ames!limbic!alps!lenny. It looks like we all get struck by this once and a while... :-( -Lenny [A UNIX pc hacker looking to get a UNIX/386 machine these days...] :-)