[comp.sys.sgi] kernel and tar errors

frobinso@cirm.northrop.COM (Fletcher Robinson) (06/15/88)

I have come up against two problems on our new 4D 70G. One machine with
very little user activity(..waiting for software..), after sitting idle
for several days will begin to display the following error :
   error in kernel #42 severity=2 , etc.
This locks me out of the system with the only way to recover is to
power down. After power is restored, it functions as before for several
days until it displays the same error again.


Another problem is using tar to backup files. There is a profusion of
the following error message:
   1ps0d0s6: error csr=0x4000 bn=?????? statcode=83 recovered, errcode=0x1
accompanied sparingly by :
   tp7: (18) correctable data error


Anyone have any insight into these errors? Are they aviodable?

perry@PHOENIX.PRINCETON.EDU ("Kevin R. Perry") (06/16/88)

>From info-iris-request@brl-vmb.arpa Thu Jun 16 10:03:04 1988
>Date:     Wed, 15 Jun 88 9:40:43 PDT
>From: Fletcher Robinson <frobinso@cirm.northrop.com>
>To: info-iris@BRL.ARPA
>Subject:  kernel and tar errors
>Message-Id:  <8806151244.aa15986@SMOKE.BRL.ARPA>
>
>I have come up against two problems on our new 4D 70G. One machine with
>very little user activity(..waiting for software..), after sitting idle
>for several days will begin to display the following error :
>   error in kernel #42 severity=2 , etc.

Haven't seen this one, sorry.

>
>Another problem is using tar to backup files. There is a profusion of
>the following error message:
>   1ps0d0s6: error csr=0x4000 bn=?????? statcode=83 recovered, errcode=0x1
>accompanied sparingly by :
>   tp7: (18) correctable data error
	
We have experienced this problem on our 4D's.
I recently asked an SGI field service person about it.
He claims the problem is known to SGI, and they're working on it.
It's in the hardware, and they supposedly understand what is wrong.
Something about impedence-matching on something in the
tape-drive unit.  He says it's perfectly safe to ignore these messages,
and maybe someday they'll have a fix.  Hey, at least it gives you
something to watch while you're doing backups! :-)

K.Perry
perry%phoenix@princeton.edu
Sys Prog
Computing & Info. Technology
Princeton Univ.

rpaul@dasys1.UUCP (Rodian Paul) (06/17/88)

> Another problem is using tar to backup files. There is a profusion of
> the following error message:
>    1ps0d0s6: error csr=0x4000 bn=?????? statcode=83 recovered, errcode=0x1
> accompanied sparingly by :
>    tp7: (18) correctable data error

You are probably running a 380 meg Hitachi drive on the 4D in question?
SGI will be replacing cables for these drives pretty soon, seems the controller
can't keep up with the drive sometimes (so I've been told), thus the drive
error messages. Apparently nothing to worry about so long as 'recovered'
comes up.

The tp7: (18) has been a real bitch for me. I get the message about 7 out of 10 
'tar cv's. The message is specific to the tape drive.

Several weeks ago I had some corrupted data on a couple of tapes. 
'tar' read them off fine, but the data was trashed. 

After chatting with SGI someone said that they'd had the same problem at SGI 
(later on I was told by many people there, that this could never happen), 
anyway a field engineer came out and we ran a whole lot of tests. 
We still got lot's of 'correctable data error' messages, but no trashed data. 

Since then things seem fine, I don't know what trashed my data before, or if 
it's related to the flakey controller/disk cables, only time will tell...