bobvan (11/17/82)
Consider this scenario: I run halt or shutdown -h from the terminal at my desk. Shutdown time arrives, the final shutdown messages go around and my terminal stops echoing characters. I head for the machine room to work on the machine, finding the usual "use console to halt system" message on the console. I get back to my desk hours later and find this message on my terminal screen: CAUTION: some process(es) wouldn't die What has happened was that some process(es) were hung (usually in an I/O driver) and halt couldn't get them to die in the 30 seconds allowed for this. The message was written to my terminal 30 seconds after I left for the machine room. At the very least, the message should have gone to the console rather than my terminal. Preferably, halt should have forked a shell for the console so that I could do a ps axl and see what it was that refused to die. The preferred behavior I describe above is exactly what init does if it runs into the same snag when trying to shut the system down. Much of the process killing code in halt and reboot has analogous code in init. I think it is wrong to duplicate this code (and associated bugs) all over the system. The halt and reboot commands should just signal init and let it do all the work. I don't see any good reason why it wasn't done this way in the first place. As a quick fix, I've routed the message to the console. Diff -c is below. This diff also show that fix to the first bug I found in halt. Since installing that fix, errors from fsck have dropped from an average of 2.4 errors per boot to about 0.07 errors per boot! Have any other Berkeley sites noticed a similar drop? Bob Van Valzah (...!decvax!ittvax!tpdcvax!bobvan) *** halt.orig.c Fri Oct 29 18:01:22 1982 --- halt.c Wed Nov 17 09:08:24 1982 *************** *** 18,24 int howto; char *ttyn = (char *)ttyname(2); register i; ! register qflag; howto = RB_HALT; argc--, argv++; --- 18,24 ----- int howto; char *ttyn = (char *)ttyname(2); register i; ! register qflag = 0; /* RAV bug fix - have to init qflag to false */ howto = RB_HALT; argc--, argv++; *************** *** 57,63 exit(1); } if (i > 5) { ! fprintf(stderr, "CAUTION: some process(es) wouldn't die\n"); break; } setalarm(2 * i); --- 57,68 ----- exit(1); } if (i > 5) { ! FILE *con; /* RAV important messages go to console */ ! ! if ((con=fopen("/dev/console", "w")) == NULL) ! con = stderr; ! fprintf(con, ! "CAUTION: some process(es) wouldn't die\r\n"); break; } setalarm(2 * i);
ecn-pa:bruner (11/18/82)
Our PDP-11 "init" was locally modified to perform shutdowns. It kills everyone off; first with a signal 15 and then (after 15 seconds) with a sequence of signal 9's. It then prints a message to the console, turns off accounting, does a sync() and exits. [It has to turn off accounting to prevent the final accounting record from being written after the last sync().] A different signal causes "init" to inititate a reboot after everything has been killed. (The reboot is done by a user-mode program via "phys"; it's dirty but it works very well.) When any of our systems is rebooted, the "/etc/rc" file looks for the file "/down". If "/down" exists it is removed and the system comes up right away. If not, an automatic filesystem recovery is performed. The shutdown program (on the VAX) and "init" (on our PDP-11's) create "/down" just before the final sync() if and only if it could kill everything off (and hence the shutdown was "clean"). This system has worked very well for us. --John Bruner Purdue/EE