[comp.unix.questions] Tu77 ate tape, then hung device.

lou@hoxna.UUCP (03/14/87)

	I have a backup that runs every night by itself.  Last
night the drive ate the tape, stalling the reel half-way through.
When I showed up in the morning, I cleared the mess out of the drive,
&& restarted the backup, but I couldn't create /dev/rmt0 .

	I looked around, and the original cpio was still running, so
I did kill -9 20279. No problem, right ?  Funny thing is, the process
was still alive.  I could 'kill -9' it all I wanted, and it wouldn't
die.  I killed the parent, && eventually it was adopted by init, 
(ppid was 1 ), but I *still* couldn't touch it.  And of course, *that* 
process was still using /dev/rmt0, so I couldn't access the device.....  
Anyone have any idea what happened ? Am I even right in assuming that 
this 'immortal' cpio was stopping me from writing to the device ?

It's a 780 running 5.2, TU77 drive.  I finally re-booted, && the problem
went away.  Anything less drastic I could have done ?

                                                     lou @ hoxna

ps - No, the backup procedure doesn't catch any signals.

ken@rochester.UUCP (03/14/87)

What you have is a controller hang. You can't kill the process because
it is waiting for a controller event and the software signal can't
touch the process until that event happens.

If you know how to reset the controller, you can twiddle bits by
adb'ing /dev/kmem. Sometimes a power cycle of the tape controller will
work, but I won't guarantee that won't crash your system.  Otherwise
the only other recourse is to power cycle and reboot.

	Ken

gdw@ssl-macc.co.uk (Grenville Whelan) (03/17/87)

In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes:
>
>	I looked around, and the original cpio was still running, so
>I did kill -9 20279. No problem, right ?  Funny thing is, the process
>was still alive.  I could 'kill -9' it all I wanted, and it wouldn't
>die.  I killed the parent, && eventually it was adopted by init, 
>(ppid was 1 ), but I *still* couldn't touch it.  And of course, *that* 
>process was still using /dev/rmt0, so I couldn't access the device.....  
>Anyone have any idea what happened ? Am I even right in assuming that 
>this 'immortal' cpio was stopping me from writing to the device ?

I've encountered similar problems on a VAX, (BSD4.2). I used "mt" to skip
n number of fields on a tape, but the tape only contained (n-1) files, so
the process ended up reading to the end of tape. As i couldn't CTRL-C from
the "mt" command, i attempted to kill the process from another terminal,
and got the same problem you did; the process wouldn't die.

>It's a 780 running 5.2, TU77 drive.  I finally re-booted, && the problem
>went away.  Anything less drastic I could have done ?

I powered off, (and on), the tape drive, the "mt" process then exited and
everything carried on as normal.

Anyone have any ideas as to why "mt" can't be interrupted or killed while
the device is being accessed?
-- 
    /  Grenville Whelan                |   Software Sciences Ltd,     \
   /   TEL   -  +44 625 29241          |   London & Manchester House,  \
   \   EMAIL -  gdw@ssl-macc.co.uk     |   Park Street,                /
    \  UUCP  -  !mcvax!ukc!sslvax!gdw  |   Macclesfield, UK.          /

rbl@nitrex.UUCP ( Dr. Robin Lake ) (03/17/87)

In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes:
>
>	...
>Anyone have any idea what happened ? Am I even right in assuming that 
>this 'immortal' cpio was stopping me from writing to the device ?
>

We've seen similar behavior on other machines.  It acts like cpio is waiting
for an "interrupt" from the tape drive.  Sometimes we can fix it by turning
the power to the tape drive off and on, forcing an error.  If your driver and/or
controller are not set up for that  ---  or if the drive is mechanically fouled
up  ---  that event might not get back to the cpio task.

Another rarely used trick is to take a piece of foil and trick the drive into
thinking it has reached physical end-of-tape.

Disclaimer:  This advice comes from the Dark Ages before joining my current
employer, when I designed device controllers.

Rob Lake
decvax!cwruecmp!nitrex!rbl

rbj@icst-cmr.arpa (03/24/87)

   In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes:
   >	I looked around, and the original cpio was still running, so
   >I did kill -9 20279. No problem, right ?  Funny thing is, the process
   >was still alive.  I could 'kill -9' it all I wanted, and it wouldn't
   >die.  I killed the parent, && eventually it was adopted by init, 
   >(ppid was 1 ), but I *still* couldn't touch it.  And of course, *that* 
   >process was still using /dev/rmt0, so I couldn't access the device.....  
   >Anyone have any idea what happened ? Am I even right in assuming that 
   >this 'immortal' cpio was stopping me from writing to the device ?

   I've encountered similar problems on a VAX, (BSD4.2). I used "mt" to skip
   n number of fields on a tape, but the tape only contained (n-1) files, so
   the process ended up reading to the end of tape. As i couldn't CTRL-C from
   the "mt" command, i attempted to kill the process from another terminal,
   and got the same problem you did; the process wouldn't die.

   >It's a 780 running 5.2, TU77 drive.  I finally re-booted, && the problem
   >went away.  Anything less drastic I could have done ?

   I powered off, (and on), the tape drive, the "mt" process then exited and
   everything carried on as normal.

This is a better solution. An even better one is to punch the offline
button. The tape controller will give an error, terminate the I/O, allow
the system to deliver the signal, which kills it. Then you rewind the tape.

   Anyone have any ideas as to why "mt" can't be interrupted or killed while
   the device is being accessed?

My guess is that the process is sleeping at a priority greater (less than?)
the magic constant PZERO. What that means is that I/O to `fast' devices
like disks (and by extension tapes) are not interrupted by signals. I don't
know if you can ^Z it either, so you just have to learn to put all your
mt commands in background with `&'. Rewind seems to be an exception.

       /  Grenville Whelan                |   Software Sciences Ltd,     \
      /   TEL   -  +44 625 29241          |   London & Manchester House,  \
      \   EMAIL -  gdw@ssl-macc.co.uk     |   Park Street,                /
       \  UUCP  -  !mcvax!ukc!sslvax!gdw  |   Macclesfield, UK.          /

(Root Boy) Jim "Just Say Yes" Cottrell	<rbj@icst-cmr.arpa>
I want to kill everyone here with a cute colorful Hydrogen Bomb!!

P.S. It won't recognize ^C either.

ken@rochester.UUCP (03/24/87)

|My guess is that the process is sleeping at a priority greater (less than?)
|the magic constant PZERO. What that means is that I/O to `fast' devices
|like disks (and by extension tapes) are not interrupted by signals. I don't
|know if you can ^Z it either, so you just have to learn to put all your
|mt commands in background with `&'. Rewind seems to be an exception.

Yes that will prevent it from eating your login session, but it won't
prevent the tape drive from becoming unavailable.

	Ken