ok@quintus.uucp (Richard A. O'Keefe) (07/21/88)
In article <302@infmx.UUCP> aland@infmx.UUCP (Alan S. Denney @ Informix) writes: >The funniest thing is that they choose the Suns to run their benchmarks. >Sun 3.X machines DO NOT SUPPORT synchronous writes anyway (no O_SYNC flag >here, folks), so any claim that their benchmarks are hurt by "integrity >issues" on these machines is BOGUS. The only way to force i/o is to use >raw devices; Oracle decries raw devices as being "complex" in their current >ad. (If my understanding of Sun 3.X synchronicity is wrong, I will post >a followup and apology. I confirmed this with a friend at Sun about a month >ago). The current ads do *not* indicate that this integrity claim >applies only to certain (e.g SVR2+) ports, that I recall. I'm using SunOS 3.2. A quick "grep" through /usr/include/*/*.h confirmed the absence of an O_SYNC flag. But "man -k sync" turned up fsync (2) - synchronize a file's in-core state with that on disk and "man 2 fsync" says that fsync(fd) fsync moves all modified data and attributes of fd to a per- manent storage device: all in-core modified copies of buffers for the associated file have been written to a disk when the call returns. Note that this is different than sync(2) which schedules disk I/O for all files (as though an fsync had been done on all files) but returns before the I/O completes. fsync should be used by programs which require a file to be in a known state; for example, a program which contains a simple transaction facility might use it to ensure that all modifications to a file or files caused by a transaction were recorded on disk. [Sun Release 3.2 Last Change: 16 July 1986] ^^^^^^^^^^^^ This appears to claim that the changed information has actually been written on the disc, not merely scheduled for writing. What more do you want, exactly?
eric@pyrps5 (Eric Bergan) (07/21/88)
In article <179@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >In article <302@infmx.UUCP> aland@infmx.UUCP (Alan S. Denney @ Informix) >writes: >>Sun 3.X machines DO NOT SUPPORT synchronous writes anyway (no O_SYNC flag >>here, folks), >I'm using SunOS 3.2. A quick "grep" through /usr/include/*/*.h confirmed >the absence of an O_SYNC flag. But "man -k sync" turned up > fsync (2) - synchronize a file's in-core state with that on disk >and "man 2 fsync" says that fsync(fd) >This appears to claim that the changed information has actually been >written on the disc, not merely scheduled for writing. What more do >you want, exactly? The problem is that fsync is very inefficient, with respect to O_SYNC, which itself is slower than writing directly to a raw disk. The reason for the performance difference is that fsync is supposed to flush all dirty blocks for the file. To do that, it has to look for all the dirty blocks for the file descriptor. O_SYNC guarentees each individual write, and avoids the overhead of either scanning, or maintaining a per file descriptor list of dirty blocks. Raw disk just skips the file system code entirely, which provides you guarenteed contiguous disk space, and also no worries about indirect blocks. But the DBMS code has to do the disk management (what blocks are used, which are free, etc) itself. Since the need for assurance of a write is for the log, and since transactions can't finish their commit and release their locks until the write of the log entry to disk has happened, this becomes a big bottleneck. I believe INGRES used to rely on fsync, but switched to O_SYNC on systems that support it.
john@anasaz.UUCP (John Moore) (07/23/88)
In article <32133@pyramid.pyramid.com% eric@pyrps5.UUCP (Eric Bergan) writes: %In article <179@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: %>In article <302@infmx.UUCP> aland@infmx.UUCP (Alan S. Denney @ Informix) %>writes: %maintaining a per file descriptor list of dirty blocks. Raw disk just %skips the file system code entirely, which provides you guarenteed %contiguous disk space, and also no worries about indirect blocks. But %the DBMS code has to do the disk management (what blocks are used, which On a couple of machines, we have observed that raw device I/O does not seem to have seek optimization and has extreme unfairness between processes (one process gets lots of I/O's, another gets none, when both are same priority and doing the same thing). Apparently the raw devicer driver sends only one request at a time to the strategy routine, and waits for it to finish before sending another, thus defeating the seek optimization. Does anyone know how common this is and which systems do provide optimization on raw devices? -- John Moore (NJ7E) {decvax, ncar, ihnp4}!noao!mcdsun!nud!anasaz!john (602) 861-7607 (day or eve) {gatech, ames, rutgers}!ncar!... The opinions expressed here are obviously not mine, so they must be someone else's.