daveb@llama.rtech.UUCP ("It takes a clear mind to make it") (05/06/88)
One problem with the access (and modify time) fields is that when you are running in SV's O_SYNC mode, you end up doing two disk writes all the time: once for the data, and once for the inode. This can be frightfully expensive. It would be nice if there were some way for O_SYNCed files to avoid this: maybe just defer the access/mod time write until the file is closed. In another issue, a number of us here have wanted another chunk of info in the inode for a long time. We call it the "ratfink" field, which would contain the real user id of the last process that modified the file. This would make it a lot easier to track some things down... -dB {amdahl, cpsc6a, mtxinu, ptsfa, sun, hoptoad}!rtech!daveb daveb@rtech.uucp
karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (05/06/88)
daveb@llama.rtech.UUCP says... It would be nice if there were some way for O_SYNCed files to avoid [2 writes/call]: maybe just defer the access/mod time write until the file is closed. I would disagree. If data blocks are updated but the inode is left in-core-only, what value have I gained by using O_SYNC? If the system crashes right now, with half a megabyte of data written, but no record of the changes to the inode yet out to disc, O_SYNC has become useless. I tend to think that O_SYNC is specifically intended for those folks who are explicitly willing to put up with the performance hit of 2 physical writes/call. --Karl
daveb@llama.rtech.UUCP (It takes a clear mind to make it) (05/08/88)
In article <12606@tut.cis.ohio-state.edu> karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) writes: >daveb@llama.rtech.UUCP says... > It would be nice if there were some way for > O_SYNCed files to avoid [2 writes/call]: maybe just defer the > access/mod time write until the file is closed. > >I would disagree. If data blocks are updated but the inode is left >in-core-only, what value have I gained by using O_SYNC? If the system >crashes right now, with half a megabyte of data written, but no record >of the changes to the inode yet out to disc, O_SYNC has become >useless. > >I tend to think that O_SYNC is specifically intended for those folks >who are explicitly willing to put up with the performance hit of 2 >physical writes/call. OK, then you write the inode when the file size changes, and update the mod/access time at close. Maybe you have O_SYNC work this way if an additional (or alternative) option, say O_FASTSYNC was provided. The specific example we have in mind is this. There is a 1M file lying around, within which we are doing seeks and read and writes to perform updates (as you might imagine this is DBMS activity). We don't give a hoot about the access/mod time on the file, but we care a lot that the data actually makes it to the disk. That extra disk i/o can cut your throughput in half, which is not acceptable. It would also be perfectly acceptable to have an additional system call that explicitly said, "I am extending this file to this length, get the pages and write the inode to disk." To a certain extent, this would be a generalization of th existing BSD ftruncate call. -dB {amdahl, cpsc6a, mtxinu, ptsfa, sun, hoptoad}!rtech!daveb daveb@rtech.uucp