[net.unix-wizards] quicker system startup

trb (12/13/82)

In the middle of November someone suggested that fsck needn't be run at
boot time if UNIX was halted gracefully.  This was around the time that
Bob Van Valzah submitted some halt.c fixes for 4.1bsd, but I seem to
recall reference to implementing this on 2.81bsd.

I hacked my 4.1bsd halt.c to create /etc/downclean if all user
processes died properly, and I hacked /etc/rc to test for and remove
/etc/downclean before (conditionally) checking the filesystems.  When I
bring the system down cleanly, it only takes a few seconds to autoreboot
(you should SEE the smile on my face) and when UNIX dies a horrible
death it does the proper fsck.

This scheme has worked well for me so far.  Does anyone out there know
if I'm neglecting some horrible possible condition?  If not, then I
think that everyone should install this code in their halt.c and
/etc/rc.  Fsck's take SO long.

I'll post the simple diffs if no one comes up with a reason that my
suggestion is dangerous.

	Andy Tannenbaum   Bell Labs  Whippany, NJ   (201) 386-6491

bobvan (12/15/82)

We (only?) have about 200 Mbytes of files split across 2 RM05's taking
about 39,000 inodes.  Fsck gets thru these in less than 5 minutes,
meaning it checks about 130 files/second (11/780 w/fpa).  Make sure
you've got the pass number field in /etc/fstab set correctly if you're
not getting similar thruput.

I'd like to suggest that we consider speeding up fsck rather than
eliminating it on clean shutdowns.  The current scheme where it forks one copy
per arm per pass number is a great improvement over just checking
serially, but I think more could be done.  Fsck is much faster on nearly
empty filesystems that on nearly full ones.  (It's much faster to check
the free list than inodes and directories.) If you have two partitions
to check on the same pass, where one is 90% full and the other is 10%
full, fsck waits for BOTH checks to finish before going on to the next
pass.  The arm containing the 10% full filesystem sits idle while fsck
slugs away on the full filesystem.  I'd like to see fsck fork one copy
of itself per ARM rather than one copy per arm per pass number.  This
should give you the maximum CPU I/O overlap.

Even though we average three boots per day, I'm not considering
skipping the fsck.  I've been keeping watch of errors fixed by fsck.
Errors fell from about 3 per boot to about 1 in 20 boots when I fixed
halt.c, but they didn't fall to zero.  My current
judgement is that the potential harm of running on a damaged filesystem
is greater than the 5 minutes "lost" while fsck runs.  This judgement
would no doubt change if we had more drives or even if our existing
disks filled up.


P.s. Our Symbolics laser printer just arrived!  No manuals and no
software.  Both are "on the way", but the novelty of watching the
self test has warn off.

mahler (12/16/82)

At Purdue we have had the shutdown program making a "/down" file for
a long time.  Even under v6 UN*X the fsck running was dependent on
the presence or absence of "/down".  We have not had any problems
with this approach (we changed to our own copies of a shutdown and
modified halt under 4.1).  When a system has 1.2 GB, 100 terminals,
and 64 pseudos ports (on our network) you can't take a 20 minute
recovery lighty.   I do suggest that we pick a common name for the
file, installation of the file in the root directory requires 
a minimal search for the file.   (Do you remember icheck, dcheck,
ncheck, clri, and hand patching vanilla v6 disk ... thanks
to all envolved with the creation of fsck)

Steve Mahler, Network Service Mgr., ECN, Purdue Univ.