braun@drivax.UUCP (Kral) (09/21/87)
I posted this once before about 3 or 4 months ago, and never really got an answer to it, so I'm going to try one more time. Surely this isn't unique to my system, and surely someone has found a way around it. The problem is in killing the line printer daemon when one of the printers has gone off line. The particular situation arises when the system automatically comes down for weekly backups (no discussions on the merits of single vs multi, stand alone vs live backups, please). Last weekend, while testing a new version of the scripts, one of the printers was offline and still had a couple of jobs scheduled for it. My script attempted to fix this situation by performing '/etc/lpc abort all' before bringing the system down. The command reports that it is killing the daemon (both when run from the take-down script, and when run manually). But when shutdown occurrs, I get the "something won't die" message, and /usr won't unmount. After rebooting and recreating the situation, I tried to kill the daemon manually with kill -9. It just won't die. So what's an admin to do? (I just can NOT believe an operating system with this much development experience behind it can't kill a process, even one waiting on I/O). System: Vax 11/780, Berkeley Unix 4.2, RP07, RM03, No Network. -- kral 408/647-6112 ...{ism780|amdahl}!drivax!braun "Dream lightyears... Challenge miles... Walk in steps" DISCLAIMER: If DRI knew I was saying this stuff, they would shut me d~-~oxx
chris@mimsy.UUCP (Chris Torek) (09/23/87)
In article <2419@drivax.UUCP> braun@drivax.UUCP (Kral) writes: >... But when shutdown occurrs, I get the "something won't die" message, >and /usr won't unmount. >After rebooting and recreating the situation, I tried to kill the daemon >manually with kill -9. It just won't die. So what's an admin to do? Fix the kernel bug. The process is hung in some driver close() routine. These have a tendency not to be thoroughly tested, and to break under unusual conditions. If you ever get a process stuck such that not even shutdown clears it, this is a kernel bug---not necessarily a serious one, but a bug nonetheless. > System: > Vax 11/780, Berkeley Unix 4.2, RP07, RM03, No Network. Well, at least you (should) have source. You might try switching to 4.3BSD, which has numerous bugs fixed over 4.2BSD. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
amos@taux01.UUCP (Amos Shapir) (09/25/87)
You did not say what type of tty hardware you use, but this problem usually happens on tty line drivers that wait for carrier signal on input - since the driver expects to be sleeping for a short time, it sleeps at a negative priority, which is virtually un-killable. Some solutions: Off Line: * BSD kernels have a compilation-time flag for a 'soft carrier' flag that insures that the driver will never wait for hardware carrier bit; * Fix the tty driver so its logic fits the hardware. On Line: * Connect a terminal at the printer's end and watch lpr empty out on the screen; * Hack /dev/kmem to make the driver think the required bit is on (wizards only); * Reboot. I hope that helps. -- Amos Shapir (My other cpu is a NS32532) National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522261 amos%taux01@nsc.com (used to be amos%nsta@nsc.com) 34 48 E / 32 10 N