[net.unix-wizards] 4.2 fsck and /etc/rc question

edmund@turtlevax.UUCP (Ed Trujillo) (02/07/85)

Why does the following scenario happen under 4.2 running on a vax 750?

At boot time,  fsck finds out that the root file system was modified so 
it exits with a condition code of 4 that is passed on to /etc/rc which 
immediately does an /etc/reboot -n.  According to the documentation for
reboot the -n option avoids the sync.  Why then does fsck do a sync() 
before the call to exit(4) ???  Is there a logical reason for this?

BTW, this apparent bug didn't appear in the 4.2 buglist.
-- 

Ed(mund) Trujillo @ CADLINC, Menlo Park, CA
{amd,decwrl,nsc,seismo,spar}!turtlevax!edmund

tim@callan.UUCP (Tim Smith) (02/09/85)

In article <649@turtlevax.UUCP> edmund@turtlevax.UUCP (Ed Trujillo) writes:
>reboot the -n option avoids the sync.  Why then does fsck do a sync() 
>before the call to exit(4) ???  Is there a logical reason for this?

The above is for 4.2bsd, so what I say may be wrong.  On Sys V it goes
like this....

When fsck is run on the root, the cooked file system is used.  This is
because of the way fsck determines when it is doing root.  So fsck must
do a sync() at the end to make sure the disk gets changed.

This can cause problems.  If there is a problem with the inode for the
console, fsck will fix it on the disk, but since it is using the console,
the modify time on the in-core copy of the inode gets changed, and so
the final sync() writes it out, putting you right back where you started!

Here at Callan, we changed fsck to NOT do the final sync when doing a
raw device.  Then problems like the above can be fixed by using fsck
on the raw root.
-- 
Duty Now for the Future
					Tim Smith
			ihnp4!wlbr!callan!tim or ihnp4!cithep!tim

steveg@hammer.UUCP (Steve Glaser) (02/10/85)

In article <649@turtlevax.UUCP> Ed(mund) Trujillo @ CADLINC, Menlo Park writes:

>Why does the following scenario happen under 4.2 running on a vax 750?
>
>At boot time,  fsck finds out that the root file system was modified so 
>it exits with a condition code of 4 that is passed on to /etc/rc which 
>immediately does an /etc/reboot -n.  According to the documentation for
>reboot the -n option avoids the sync.  Why then does fsck do a sync() 
>before the call to exit(4) ???  Is there a logical reason for this?

4.2 fsck uses the cooked device on the fsck of the root filesystem.
When fsck is done, if it had to modify the superblock of the filesystem
there will be TWO copies of the superblock around in kernel memory.
One is the copy kept by the kernel cause the file system is mounted and
the other is a normal block in the block buffer cache due to the write
by fsck when it fixed the superblock.  The sync(2) system call will out
BOTH of these copies (and always has).  The trick on 4.2 (4.1 too?) is
that they the kernel makes sure that the block buffer version gets
written out *after* the other one.  Thus the sync inside fsck is
correct and gets the updated stiff onto the disk.

You must then avoid all syncs until the reboot cause the copy in the
block buffer cache is no longer marked dirty (cause the sync in fsck
wrote it out and nobody has changed it since then).  Thus another sync
would write out the wrong copy of the superblock onto disk, undoing
some of the work that fsck just did.

Summary: this is a case where one sync (inside fsck) is correct and
more than one will undo some of the work fsck just did for you.

Disclaimer: I'm not saying that I *like* this scheme.  It works, but
seems kinda fragile.  At bare minimum, it should be documented and the
"new expanded" semantics of sync(2) should be guaranteed by all future
systems.

	Steve Glaser
	tektronix!steveg

Ron Natalie <ron@BRL-TGR> (02/10/85)

It's not a bug.  The sync doesn't kill you.  When working on "hotroot"
fsck uses the cooked device.  The sync is desirable in this instance
as it forces the changed superblock, etc. back to the disk.

-Ron