lou@hoxna.UUCP (03/14/87)
I have a backup that runs every night by itself. Last night the drive ate the tape, stalling the reel half-way through. When I showed up in the morning, I cleared the mess out of the drive, && restarted the backup, but I couldn't create /dev/rmt0 . I looked around, and the original cpio was still running, so I did kill -9 20279. No problem, right ? Funny thing is, the process was still alive. I could 'kill -9' it all I wanted, and it wouldn't die. I killed the parent, && eventually it was adopted by init, (ppid was 1 ), but I *still* couldn't touch it. And of course, *that* process was still using /dev/rmt0, so I couldn't access the device..... Anyone have any idea what happened ? Am I even right in assuming that this 'immortal' cpio was stopping me from writing to the device ? It's a 780 running 5.2, TU77 drive. I finally re-booted, && the problem went away. Anything less drastic I could have done ? lou @ hoxna ps - No, the backup procedure doesn't catch any signals.
ken@rochester.UUCP (03/14/87)
What you have is a controller hang. You can't kill the process because it is waiting for a controller event and the software signal can't touch the process until that event happens. If you know how to reset the controller, you can twiddle bits by adb'ing /dev/kmem. Sometimes a power cycle of the tape controller will work, but I won't guarantee that won't crash your system. Otherwise the only other recourse is to power cycle and reboot. Ken
gdw@ssl-macc.co.uk (Grenville Whelan) (03/17/87)
In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes: > > I looked around, and the original cpio was still running, so >I did kill -9 20279. No problem, right ? Funny thing is, the process >was still alive. I could 'kill -9' it all I wanted, and it wouldn't >die. I killed the parent, && eventually it was adopted by init, >(ppid was 1 ), but I *still* couldn't touch it. And of course, *that* >process was still using /dev/rmt0, so I couldn't access the device..... >Anyone have any idea what happened ? Am I even right in assuming that >this 'immortal' cpio was stopping me from writing to the device ? I've encountered similar problems on a VAX, (BSD4.2). I used "mt" to skip n number of fields on a tape, but the tape only contained (n-1) files, so the process ended up reading to the end of tape. As i couldn't CTRL-C from the "mt" command, i attempted to kill the process from another terminal, and got the same problem you did; the process wouldn't die. >It's a 780 running 5.2, TU77 drive. I finally re-booted, && the problem >went away. Anything less drastic I could have done ? I powered off, (and on), the tape drive, the "mt" process then exited and everything carried on as normal. Anyone have any ideas as to why "mt" can't be interrupted or killed while the device is being accessed? -- / Grenville Whelan | Software Sciences Ltd, \ / TEL - +44 625 29241 | London & Manchester House, \ \ EMAIL - gdw@ssl-macc.co.uk | Park Street, / \ UUCP - !mcvax!ukc!sslvax!gdw | Macclesfield, UK. /
rbl@nitrex.UUCP ( Dr. Robin Lake ) (03/17/87)
In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes: > > ... >Anyone have any idea what happened ? Am I even right in assuming that >this 'immortal' cpio was stopping me from writing to the device ? > We've seen similar behavior on other machines. It acts like cpio is waiting for an "interrupt" from the tape drive. Sometimes we can fix it by turning the power to the tape drive off and on, forcing an error. If your driver and/or controller are not set up for that --- or if the drive is mechanically fouled up --- that event might not get back to the cpio task. Another rarely used trick is to take a piece of foil and trick the drive into thinking it has reached physical end-of-tape. Disclaimer: This advice comes from the Dark Ages before joining my current employer, when I designed device controllers. Rob Lake decvax!cwruecmp!nitrex!rbl
rbj@icst-cmr.arpa (03/24/87)
In article <1105@hoxna.UUCP> lou@hoxna.UUCP ( L. Marco ) writes: > I looked around, and the original cpio was still running, so >I did kill -9 20279. No problem, right ? Funny thing is, the process >was still alive. I could 'kill -9' it all I wanted, and it wouldn't >die. I killed the parent, && eventually it was adopted by init, >(ppid was 1 ), but I *still* couldn't touch it. And of course, *that* >process was still using /dev/rmt0, so I couldn't access the device..... >Anyone have any idea what happened ? Am I even right in assuming that >this 'immortal' cpio was stopping me from writing to the device ? I've encountered similar problems on a VAX, (BSD4.2). I used "mt" to skip n number of fields on a tape, but the tape only contained (n-1) files, so the process ended up reading to the end of tape. As i couldn't CTRL-C from the "mt" command, i attempted to kill the process from another terminal, and got the same problem you did; the process wouldn't die. >It's a 780 running 5.2, TU77 drive. I finally re-booted, && the problem >went away. Anything less drastic I could have done ? I powered off, (and on), the tape drive, the "mt" process then exited and everything carried on as normal. This is a better solution. An even better one is to punch the offline button. The tape controller will give an error, terminate the I/O, allow the system to deliver the signal, which kills it. Then you rewind the tape. Anyone have any ideas as to why "mt" can't be interrupted or killed while the device is being accessed? My guess is that the process is sleeping at a priority greater (less than?) the magic constant PZERO. What that means is that I/O to `fast' devices like disks (and by extension tapes) are not interrupted by signals. I don't know if you can ^Z it either, so you just have to learn to put all your mt commands in background with `&'. Rewind seems to be an exception. / Grenville Whelan | Software Sciences Ltd, \ / TEL - +44 625 29241 | London & Manchester House, \ \ EMAIL - gdw@ssl-macc.co.uk | Park Street, / \ UUCP - !mcvax!ukc!sslvax!gdw | Macclesfield, UK. / (Root Boy) Jim "Just Say Yes" Cottrell <rbj@icst-cmr.arpa> I want to kill everyone here with a cute colorful Hydrogen Bomb!! P.S. It won't recognize ^C either.
ken@rochester.UUCP (03/24/87)
|My guess is that the process is sleeping at a priority greater (less than?) |the magic constant PZERO. What that means is that I/O to `fast' devices |like disks (and by extension tapes) are not interrupted by signals. I don't |know if you can ^Z it either, so you just have to learn to put all your |mt commands in background with `&'. Rewind seems to be an exception. Yes that will prevent it from eating your login session, but it won't prevent the tape drive from becoming unavailable. Ken