hugo@acorn.co.uk (Hugo "Bignose" Tyson) (01/15/90)
NFS kernel builders - A PLEA FOR HELP: We are trying to discover the cause of what appears to be a very rarely occuring bug in the NFS system in our machines, which are running development versions of our BSD_4.3 + Tahoe + NFS_3.2 kernel. It manifests as follows: When you do a single fwrite() call of more than about 250 kilobytes, as a single 'item' i.e. fwrite( buffer, 260000, 1, file ), into a file you just fopen()ed with "w" i.e. a new file, which is on an NFS mounted file system, remote from this machine over ethernet, occasionally a hole appears in the file. This hole is 8k bytes long, 8k aligned in the file, and inspection of the inode on the server reveals that the missing 8k is not actually allocated on the disc, i.e. this block has never been written. The file system blocksize is 8k. NFS slices up large writes into 8k chunks, one per RPC. /usr/etc/nfsstat reports no badcalls at either end, but some retrans, badxids and timeouts at the client (about 2,000 retrans in 4,000,000 RPCs). The code that implements the RPC retries and reply handling appears to work. We have heard rumours of this type of feature before, but never repeatable (1 in ~1000 operations, say), with both our equipment and other people's (S*N for example). Does anybody else out there know anything about this? Do other manufacturers' computers exhibit this behaviour? Is there a fix? Any relevant information will be appreciated. - Huge Hugo Tyson, Acorn Computers Ltd., Cambridge, England.