[comp.unix.questions] [fl]seek mechanism

cs00chs@unccvax.UUCP (charles spell) (09/01/89)

Does the kernal optimize seeks within an open file?

Eg.
if you have a file descriptor that is currently at offset 500,000 of a
1,000,000 byte file, which would be faster (to get to byte 500,001)?:

lseek(fd, 1L, 1);             -OR-               lseek(fd, 500001L, 0);

with file descriptors:
fseek(fp, 1L, 1);             -OR-               fseek(fp, 500001L, 0);

_____________________________________________________________________________
Clemson IPTAY: It's Probation Time Again Y'all....

cpcahil@virtech.UUCP (Conor P. Cahill) (09/02/89)

In article <1631@unccvax.UUCP>, cs00chs@unccvax.UUCP (charles spell) writes:
> Does the kernal optimize seeks within an open file?

There is not much to optimize because the seek operation is one of the
simplest (both in overhead & implementation) system calls in the kernel.
It simply sets the file offset to the new value.  The difference between
adding one byte to the current value and storing the new value would
be unmeasurable (especially when compared to the overhead of the 
context switch into kernel mode to perform the system call).

I have worked on systems that had special system calls to perform
an lseek and read/write in a single system call.  The addition
of these system calls had a significant (positive) impact on the
performance of database software which routinely perform an lseek
with just about every read/write.

> if you have a file descriptor that is currently at offset 500,000 of a
> 1,000,000 byte file, which would be faster (to get to byte 500,001)?:
> 
> lseek(fd, 1L, 1);             -OR-               lseek(fd, 500001L, 0);

See above.

> 
> with file descriptors:
> fseek(fp, 1L, 1);             -OR-               fseek(fp, 500001L, 0);

For the file POINTERS (not descriptors) I'm not too sure if there is 
any local (stdio) operations associated with discarding the current buffer
and getting a new one.  My *guess* would be that there is no measurable
difference, but that is only a non-educated guess.

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

chris@mimsy.UUCP (Chris Torek) (09/02/89)

In article <1631@unccvax.UUCP> cs00chs@unccvax.UUCP (charles spell) writes:
>Does the kernal optimize seeks within an open file?

This question is basically meaningless, because the kernel (note spelling)
code for lseek---minus error checks, and with names expanded---is:

	fp = this_process.open_files[file_descriptor];
	switch (whence) {
	case 0: fp->f_offset = offset; break;
	case 1: fp->f_offset += offset; break;
	case 2: fp->f_offset = fp->f_inode->i_file_size - offset; break;
	}
	return;

Offsets from the end of the file are a tiny bit slower than other offsets
due to the extra indirection required to get the file size.  If a system
call requires 100 machine instructions (this estimate is probably a bit
low), case 2 might be 1% slower.

>[to go from byte 500000 to byte 500001] with file descriptors:
>fseek(fp, 1L, 1);             -OR-               fseek(fp, 500001L, 0);

Presumably you mean `with stdio'.  In general, existing stdio
implementations are better with offsets from 0 than with offsets from
`current point' or `end of file', so the latter would be faster.  But
`(void) getc(fp)' would be faster still.  Stdio has to make two
lseek calls per fseek, in the most general case, since it needs to
first discover where it is (consider, e.g., `prog >> output', which
might be at byte 5131 when it begins).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/03/89)

In article <1631@unccvax.UUCP> cs00chs@unccvax.UUCP (charles spell) writes:
>lseek(fd, 1L, 1);             -OR-               lseek(fd, 500001L, 0);

The kernel does essentially the same operation in both cases, the only
difference being a miniscule amount af extra arithmetic in the relative-
seek case.  Why are you worrying about such things, anyway?

jjb@sequent.UUCP (Jeff Berkowitz) (09/03/89)

In reference to the question about lseek (..., L_SET), Chris Torek writes:
>
>This question is basically meaningless, because the kernel
>code for lseek---minus error checks, and with names expanded---is:
>
>	switch (whence) {
>	case 0: fp->f_offset = offset; break;
>	case 1: fp->f_offset += offset; break;
>	case 2: fp->f_offset = fp->f_inode->i_file_size - offset; break;
>	}
>	return;

This is true for 4.3BSD, but slightly misleading for systems that include
NFS.  The difference is only in the L_XTND code ("case 2:" in the example).
The reference to the file size - "fp->f_inode->i_file_size" - requires a
VOP_GETATTR() call into the underlying virtual file system code on systems
which include NFS.

If the underlying file type is ufs (local disk), the VOP_GETATTR call
will be reasonably inexpensive (although it will cost a bit more than
the two pointer references in the example).

If the underlying file is being served from another machine, though, the
VOP_GETATTR() call may require an RPC to the file server.  This will cost
much more than L_SET or L_INCR.  (Caching by the NFS implementation may
eliminate some of the RPC calls, but can't eliminate all of them).
-- 
Jeff Berkowitz N6QOM			uunet!sequent!jjb
Sequent Computer Systems		Custom Systems Group