[icus.general] ICUS is very sick -- it's been down for 3 days now.

root@icus.islp.ny.us (ICUS Administrator) (01/12/90)

For those trying to reach me, ICUS (this machine) has been up and down
(mostly down) for the past 3 days.  The machine died.  The reason/cause
is still unknown.   I _do_ have a good backup, I did a reformat and
restore tonight hoping to solve the hard disk problems, but hard disk
errors still appear.

Something had happaned that caused my external drive, a Miniscribe 6085,
to die completely.  The PCBA (read/write board) died, and now it only
"blinks error codes".  I did have a spare 67MB drive, and we (thanx Gil)
swapped boards.  That saved the "media" on the external drive.  Of course
my main drive CDC Wren II is still a problem, although I feel good 
to have a complete backup on two full tapes.

Initially, when the system first died, I woke up with just a [working icon]
a phone manager and a window manager window on the screen.  Just dots,
with no smgr and getty (both which access the disk periodically).  The
keyboard was still responding, and the system was still beating (c/o LEDs).

I thought it took some weird power hit so I pressed the RESET.  Then I got
the loader, booted /unix and got:

*****Disk read error.

This was consistent.   Eventually we removed the drive from the case
(and yes, the spindle was spinning and heads sounds like they were moving).
and it sometimes passed diags, sometimes it didn't.  Eventually we got it
to pass diags more often.  So we figured either dust or so other problem
(like loose power connection) might have caused the problems.  Put the
machine back together and it booted.  It ran for about 1 hour and then
died with the same problems.  My sysinfo program spat out at me:

	sysinfo: cannot read superblock for /dev/rfp002

And everything else got: "Killed"

I thought there were bad blocks in the swap partition, but unix.log showed
nothing (at that time) and now I know the reformat didn't help.  Tomorrow
I'm going to swap power supplies, and after all that we might swap the
TTL chips on the ICUS extention board for the two drives.  Something might
be flaking, and I might have taken a power surge afterall...   Time to
buy a UPS, any suggestions? :-}

The right now the errors I'm getting are...

HDERR ST:51 EF:10 CL:78 CH:0 SN:8 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:C900 Thu Jan 11 23:10:41 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=120. sc=8. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:10:42 1990
drv:0 part:2 blk:3628 rpts:1 Thu Jan 11 23:10:42 1990
HDERR ST:51 EF:10 CL:5C CH:0 SN:E SC:2 SDH:26 DMACNT:FFFF DCRREG:9E MCRREG:8300 Thu Jan 11 23:11:36 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=92. sc=14. hd=6. dr#=0. MCR2:0x0 Thu Jan 11 23:11:37 1990
Drv:0 part:2 blk:1607 rpts:1 Thu Jan 11 23:11:38 1990
HDERR ST:51 EF:10 CL:5C CH:0 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:11:38 1990
HDERR ST:51 EF:10 CL:46 CH:0 SN:4 SC:2 SDH:24 DMACNT:FFFF DCRREG:9C MCRREG:C900 Thu Jan 11 23:11:39 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=70. sc=4. hd=4. dr#=0. MCR2:0x0 Thu Jan 11 23:11:39 1990
HDERR ST:51 EF:10 CL:9E CH:2 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:15:02 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=670. sc=2. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:15:05 1990
HDERR ST:51 EF:10 CL:9E CH:2 SN:2 SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCRREG:8900 Thu Jan 11 23:15:06 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=670. sc=2. hd=7. dr#=0. MCR2:0x0 Thu Jan 11 23:15:06 1990
HDERR ST:51 EF:10 CL:9F CH:2 SN:0 SC:2 SDH:24 DMACNT:FFFF DCRREG:9C MCRREG:CD00 Thu Jan 11 23:18:25 1990
WD2010 ST=/Sekg/Err/ EF=/Id?/ cy=671. sc=0. hd=4. dr#=0. MCR2:0x0 Thu Jan 11 23:18:27 1990
[...]

And when I run elm2.2 I get:

Killed and...

HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8700 Fri Jan 12 00:03:29 1990
WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:03:34 1990
HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8700 Fri Jan 12 00:03:36 1990
WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:03:39 1990
HDERR ST:51 EF:40 CL:B1 CH:1 SN:1 SC:1 SDH:20 DMACNT:FFFF DCRREG:98 MCRREG:8300 Fri Jan 12 00:05:05 1990
WD2010 ST=/Sekg/Err/ EF=/CRC/ cy=433. sc=1. hd=0. dr#=0. MCR2:0x0 Fri Jan 12 00:05:10 1990

[Please note the extra diagnostics are from the UNIX 3.51dG1 I'm running,
it gives some more verbose output...]

Oh well.  Suggestions?  Of course ICUS will remain down for now.  I hope
to have it running ASAP.  Please forward any *important* mail to another
Email address:  ...lenny@sbcs.sunysb.edu, sbcs!alps!lenny,
ames!limbic!alps!lenny.

It looks like we all get struck by this once and a while... :-(

-Lenny
[A UNIX pc hacker looking to get a UNIX/386 machine these days...] :-)