[comp.unix.wizards] NFS 3.2 bug: holes in files after large fwrite

hugo@acorn.co.uk (Hugo "Bignose" Tyson) (01/15/90)
NFS kernel builders - A PLEA FOR HELP:

We are trying to discover the cause of what appears to be a very
rarely occuring bug in the NFS system in our machines, which are
running development versions of our BSD_4.3 + Tahoe + NFS_3.2 kernel.

It manifests as follows:

When you do a single fwrite() call of more than about 250 kilobytes,
as a single 'item' i.e. fwrite( buffer, 260000, 1, file ), into a file
you just fopen()ed with "w" i.e. a new file, which is on an NFS
mounted file system, remote from this machine over ethernet,
occasionally a hole appears in the file.  This hole is 8k bytes long,
8k aligned in the file, and inspection of the inode on the server
reveals that the missing 8k is not actually allocated on the disc,
i.e. this block has never been written.  The file system blocksize is
8k.  NFS slices up large writes into 8k chunks, one per RPC.
/usr/etc/nfsstat reports no badcalls at either end, but some retrans,
badxids and timeouts at the client (about 2,000 retrans in 4,000,000
RPCs).  The code that implements the RPC retries and reply handling
appears to work.

We have heard rumours of this type of feature before, but never
repeatable (1 in ~1000 operations, say), with both our equipment and
other people's (S*N for example).  Does anybody else out there know
anything about this?  Do other manufacturers' computers exhibit this
behaviour?  Is there a fix?

Any relevant information will be appreciated.

	- Huge

Hugo Tyson, Acorn Computers Ltd., Cambridge, England.