[comp.sys.sun] Intermittent failure of CDC Wren-IV

henkel@ncsuvx.ncsu.edu (Chuck Henkel) (10/03/89)

Context: Sun 386i 150XP w/ 327 MB CDC Wren IV

Problem: After the Sun has been up for a while (~2 days) I begin to get
messages like the following appearing on the console:

Oct  2 03:24:51 nepjt sd2a:  read failed,  blk 6728  (abs blk 6728)
Oct  2 03:24:51 nepjt        sense key(0x4): hardware error,  error code(0x9):
 servo error
Oct  2 03:24:51 nepjt sd2h:  write recoverable,  blk 16  (abs blk 87784)
Oct  2 03:24:51 nepjt        sense key(0x1): soft error,  error code(0x9):
 servo error

If I power down the system for an hour or so, it will reboot fine and no
problems occur for another 2 days or so.

Here are some other errors that begin to appear:

Oct  2 03:19:25 nepjt sd2:  block 0x1d960 needs mapping
Oct  2 03:19:25 nepjt sd2:  warning, block 0x1d960 has failed 16 times
Oct  2 03:19:25 nepjt sd2h:  read recoverable,  blk 33416  (abs blk 121184)
Oct  2 03:19:25 nepjt        sense key(0x1): soft error,  error code(0x17):
 recoverable error

And:

Oct  2 03:32:22 nepjt sd2h:  write retry,  blk 33416  (abs blk 121184)
Oct  2 03:32:22 nepjt        sense key(0x4): hardware error,  error code(0x44):
 unknown error

I invoked format and did a "refresh" under the "analyze" menu. I hesitate
to do more since it takes *so long* to restore SunOS from floppies, and
it's not even clear that I'd be able to given this erratic behavior.
Anyway, the refresh seemed to help, but it didn't last.

Incidentally, according to format the defect list is empty for this drive.
Is that reasonable?

Also, while these errors are spewing out, the disk is making a funny sound
as if there was a marble periodically being whacked around inside it.

I'd appreciate any suggestions. Best reply to the following address since
the my Sun address (nepjt) may be defunct:

henkel%nevsa@ncsuvx.ncsu.edu

| Chuck Henkel                      | Quote from Flight 232 survivor:   |
| N.C. State University             |   "And God opened up a whole in   |
| Department of Nuclear Engineering |   the floor so we could escape."  |
| henkel%nepjt@ncsuvx.ncsu.edu      | OK, so who crashed the plane?     |

poffen@sj.ate.slb.com (Russ Poffenberger) (10/04/89)

In article <1881@brazos.Rice.edu> nepjt!henkel@ncsuvx.ncsu.edu (Chuck Henkel) writes:
>X-Sun-Spots-Digest: Volume 8, Issue 149, message 6 of 10
>
>Context: Sun 386i 150XP w/ 327 MB CDC Wren IV
>
>Problem: After the Sun has been up for a while (~2 days) I begin to get
>messages like the following appearing on the console:

  ...error messages deleted

>If I power down the system for an hour or so, it will reboot fine and no
>problems occur for another 2 days or so.

Might be a thermal H/W problem.

>Incidentally, according to format the defect list is empty for this drive.
>Is that reasonable?

I don't think I have EVER run across a drive that didn't at least have
some defects in it. Sounds fishy to me. My CDC wren IV has 82 defects.

Russ Poffenberger               DOMAIN: poffen@sj.ate.slb.com
Schlumberger Technologies       UUCP:   {uunet,decwrl,amdahl}!sjsca4!poffen
1601 Technology Drive		CIS:	72401,276
San Jose, Ca. 95110
(408)437-5254