trb (12/13/82)
In the middle of November someone suggested that fsck needn't be run at boot time if UNIX was halted gracefully. This was around the time that Bob Van Valzah submitted some halt.c fixes for 4.1bsd, but I seem to recall reference to implementing this on 2.81bsd. I hacked my 4.1bsd halt.c to create /etc/downclean if all user processes died properly, and I hacked /etc/rc to test for and remove /etc/downclean before (conditionally) checking the filesystems. When I bring the system down cleanly, it only takes a few seconds to autoreboot (you should SEE the smile on my face) and when UNIX dies a horrible death it does the proper fsck. This scheme has worked well for me so far. Does anyone out there know if I'm neglecting some horrible possible condition? If not, then I think that everyone should install this code in their halt.c and /etc/rc. Fsck's take SO long. I'll post the simple diffs if no one comes up with a reason that my suggestion is dangerous. Andy Tannenbaum Bell Labs Whippany, NJ (201) 386-6491
bobvan (12/15/82)
We (only?) have about 200 Mbytes of files split across 2 RM05's taking about 39,000 inodes. Fsck gets thru these in less than 5 minutes, meaning it checks about 130 files/second (11/780 w/fpa). Make sure you've got the pass number field in /etc/fstab set correctly if you're not getting similar thruput. I'd like to suggest that we consider speeding up fsck rather than eliminating it on clean shutdowns. The current scheme where it forks one copy per arm per pass number is a great improvement over just checking serially, but I think more could be done. Fsck is much faster on nearly empty filesystems that on nearly full ones. (It's much faster to check the free list than inodes and directories.) If you have two partitions to check on the same pass, where one is 90% full and the other is 10% full, fsck waits for BOTH checks to finish before going on to the next pass. The arm containing the 10% full filesystem sits idle while fsck slugs away on the full filesystem. I'd like to see fsck fork one copy of itself per ARM rather than one copy per arm per pass number. This should give you the maximum CPU I/O overlap. Even though we average three boots per day, I'm not considering skipping the fsck. I've been keeping watch of errors fixed by fsck. Errors fell from about 3 per boot to about 1 in 20 boots when I fixed halt.c, but they didn't fall to zero. My current judgement is that the potential harm of running on a damaged filesystem is greater than the 5 minutes "lost" while fsck runs. This judgement would no doubt change if we had more drives or even if our existing disks filled up. P.s. Our Symbolics laser printer just arrived! No manuals and no software. Both are "on the way", but the novelty of watching the self test has warn off.
mahler (12/16/82)
At Purdue we have had the shutdown program making a "/down" file for a long time. Even under v6 UN*X the fsck running was dependent on the presence or absence of "/down". We have not had any problems with this approach (we changed to our own copies of a shutdown and modified halt under 4.1). When a system has 1.2 GB, 100 terminals, and 64 pseudos ports (on our network) you can't take a 20 minute recovery lighty. I do suggest that we pick a common name for the file, installation of the file in the root directory requires a minimal search for the file. (Do you remember icheck, dcheck, ncheck, clri, and hand patching vanilla v6 disk ... thanks to all envolved with the creation of fsck) Steve Mahler, Network Service Mgr., ECN, Purdue Univ.