bobvan (10/30/82)
Yow! I've just found a rather serious bug in our Berkeley 4.1bsd halt.c. I'm posting to net.unix-wizards as well as to net.bugs.4bsd, for the benefit of many ARPA folks who don't get net.bugs. Last week, I found a bug in init that was preventing logout records from getting written when the system was taken down to single user with "kill -TERM 1". This week I noticed that logout records were still not getting written if we went down with "shutdown -h". I've traced the cause to the halt command always doing a "quick halt" instead of a "clean halt". Halt normally broadcasts a SIGKILL to all processes and waits for them to die before pulling the reboot system call that really halts the machine. However, it has a "-q" option for when you are in a hurry and want to skip the killing and just halt. An uninitialized register variable was causing it to skip the killing in all cases. The reboot system call *does* do a sync() before halting, but running user processes can fill the disk queues as fast as sync() can empty them. Hence using the halt command as distributed isn't much safer than halting the system with the console or just kicking the plug out of the wall. Perhaps this fix will cut down on the number of unreferenced inode messages from fsck. We've been averaging about one or two unreferenced inodes every time we reboot. I will report in a few weeks if our console logs show that there is a correlation. A condensed form of halt.c that shows the missing initialization follows the signature. Here is another great case for using lint more often. I guess that there aren't many sites that go down as often as we do. Waiting to find the next shutdown bug, Bob Van Valzah (...!decvax!ittvax!tpdcvax!bobvan) main(argc, argv) int argc; char **argv; { register qflag; /* should be "register qflag = 0;" */ argc--, argv++; while (argc > 0) { if (!strcmp(*argv, "-q")) qflag++; } if (!qflag) { /* kill procs and wait for them to die */ } syscall(55, howto); /* this could be "reboot(howto);" */ }