[comp.unix.wizards] O_SYNC and filesystem updating

hope@gatech.UUCP (02/06/87)

I realize that this subject may have already been beaten to death, but:

When a (regular) file is opened with the O_SYNC flag, the write
syscalls "will not return until both the file data and file status
have been physically updated." (System V 'write' man page)

By "file status" are we to assume that the write syscall also waits for
the superblock to be updated due to inode information changing?  Also, does
it wait for the update of the directory where the file resides (e.g. in
the event of file creation)?

I don't really have the time to look through the source code, and would
appreciate any insight or knowledge on the subject.

Thanks.
-- 
Theodore Hope
School of Information & Computer Science, Georgia Tech, Atlanta GA 30332
CSNet:	Hope @ gatech		ARPA:	Hope@ics.GATECH.EDU
uucp:	...!{akgua,decvax,hplabs,ihnp4,linus,seismo,ulysses}!gatech!hope

guy@gorodish.UUCP (02/09/87)

>By "file status" are we to assume that the write syscall also waits for
>the superblock to be updated due to inode information changing?

No, why?  The only thing in the superblock that would change would be
the free lists of various flavors, and whenever you do an "fsck" they
get rebuilt, so there's no point in guaranteeing their correctness
(they're not state information, they're just hints to the file system
code).

>Also, does it wait for the update of the directory where the file resides
>(e.g. in the event of file creation)?

No.  The only thing it does is guarantee 1) that all writes to the
data blocks of the file are done synchronously, 2) all writes of
indirect blocks of the file are done synchronously, and 3) the inode
is updated synchronously after every write.

rcodi@yabbie.UUCP (02/12/87)

In article <12946@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> No.  The only thing it does is guarantee 1) that all writes to the
> data blocks of the file are done synchronously, 2) all writes of
> indirect blocks of the file are done synchronously, and 3) the inode
> is updated synchronously after every write.

This O_SYNC feature sounds like a humungous overkill, and not
very well thought out.  I'll bet its there because it could be added
to most kernels in less than half an hour (by basically changing
calls to bdwrite() into calls to bwrite() if O_SYNC is set).

It is not sensible to ensure that *every* write be syncronously updated
to disk.  To do this would incurr enormous disk overheads.  Picture
doing a write of 2 bytes here and then 3 bytes there -- both in the same
disk block -- it requires 2 disk writes to do it.  If it wasn't important
that the 2 bytes be syncronously written before the 3 bytes then why
go to the effort to do it?  If it was important, then fine, we must live
with it.  The thing is, that most times it *isn't* necessary.

For an application that requires that at "certain times" the disk
image must be correct, 4BSD's fsync() call is much more sensible, and
only incurs overhead when you call it -- you can do an unlimited
number of writes to the buffer cache between fsync() calls, and
your application will fly during that time.  The only time it will
be physically written to disk is when update does a sync() or the
system needs the buffers for something else or fsync() is called
by the user.

I suppose that you could open the file twice with SVR[23], one
with O_SYNC and one without to acheive a similar effect, but then you 
have all sorts of problems if you use a buffered package such as stdio 
to do I/O on them, unless you use the O_SYNC fd for just flushing
blocks (even that won't work properly unless you rewrite all the
blocks you wrote with the non O_SYNC fd).

Ian D.

guy@gorodish.UUCP (02/15/87)

>It is not sensible to ensure that *every* write be syncronously updated
>to disk.

The O_SYNC flag is controllable by "fcntl".  You can turn it on when
opening the file (or not - I hope that writes done when *creating* a
file are done so that there won't be any serious file system
inconsistencies *regardless* of whether O_SYNC is set), turn it off
for most writes, and turn it back on when you are about to do a write
that must be flushed to disk at that time.  There's no need to open
two file descriptors.

meissner@dg_rtp.UUCP (02/24/87)

In article <413@yabbie.rmit.oz> rcodi@yabbie.rmit.oz (Ian Donaldson) writes:
> 
> It is not sensible to ensure that *every* write be syncronously updated
> to disk.  To do this would incurr enormous disk overheads.  Picture
> doing a write of 2 bytes here and then 3 bytes there -- both in the same
> disk block -- it requires 2 disk writes to do it.  If it wasn't important
> that the 2 bytes be syncronously written before the 3 bytes then why
> go to the effort to do it?  If it was important, then fine, we must live
> with it.  The thing is, that most times it *isn't* necessary.

Then don't set O_SYNC for that file.  For your typical database with
transactions, writes to the log file must be guaranteed.

> I suppose that you could open the file twice with SVR[23], one
> with O_SYNC and one without to acheive a similar effect, but then you 
> have all sorts of problems if you use a buffered package such as stdio 
> to do I/O on them, unless you use the O_SYNC fd for just flushing
> blocks (even that won't work properly unless you rewrite all the
> blocks you wrote with the non O_SYNC fd).

I believe you can also set O_SYNC on/off with fcntl at any time.  This
avoids the problems you mentioned above.
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?