zumbachl@norand.UUCP (lyle zumbach) (01/26/91)
I am frustrated with Apollo Tech. Support! I am trying to use NFS from my PC to an Apollo DN4500 running Domain/OS 10.3 or 10.2 and NFS version 2.1. Using either Sun's PC/NFS or FTP's Interdrive as the client I encounter the following problem: When I try to do to an erase *.* in a directory on the mounted drive (with about 70 files in it), 6 files remain after the erase *.*. Remember this occurs using both Sun's and FTP's NFS clients. Here's the kicker: Apollo's tech. support rep., Bob Spear, tells me that their NFS server product is only supported with other BSD Unix clients. So, if you have an NFS client on a Macintosh, VMS, DOS, or other non-BSD system, you are out of luck if you have a problem that points to their server. Can anyone else help this problem! Thanks, Lyle Zumbach Norand Data Systems uunet!norand!zumbachl
beame@maccs.dcss.mcmaster.ca (Carl Beame) (01/26/91)
In article <43@norand.UUCP> zumbachl@norand.UUCP (lyle zumbach) writes: > > >I am frustrated with Apollo Tech. Support! I am trying to use NFS >from my PC to an Apollo DN4500 running Domain/OS 10.3 or 10.2 and NFS version >2.1. Using either Sun's PC/NFS or FTP's Interdrive as the client I >encounter the following problem: > > When I try to do to an erase *.* in a directory on the mounted > drive (with about 70 files in it), 6 files remain after the erase > *.*. > >Remember this occurs using both Sun's and FTP's NFS clients. >Here's the kicker: Apollo's tech. support rep., Bob Spear, tells >me that their NFS server product is only supported with other >BSD Unix clients. So, if you have an NFS client on a Macintosh, >VMS, DOS, or other non-BSD system, you are out of luck if you have a >problem that points to their server. > >Can anyone else help this problem! > >Thanks, >Lyle Zumbach >Norand Data Systems >uunet!norand!zumbachl If the 6 remaining files can be then deleted by an erase *.*, this sounds like the "cookie" problem. All PC based NFSs have this feature/problem including mine (BWNFS). The problem is that the length of time the directory "cookie" is valid is not in the NFS spec. The directory "cookie" is used by the READ DIRECTORY NFS RPC call to determine where to start reading the directory. This "cookie" is server specific and on many servers is either the byte offset or entry number in the directory file of the next file in the directory. Each returned directory entry contains the "cookie" of the next directory entry. Each READ DIRECTORY NFS RPC call specifies how much space is available for returned entries (on PC based NFSs this is around 1024 bytes). When the PC exhausts the current returned data, a new request is sent with the last "cookie" specified in the return data. On an Apollo system, my guess would be that the "cookie" returned is the directory entry number of the file. Now the Bug/feature: Under UNIX (csh) a command of "rm *" is parsed by the shell to the command "rm file1 file2 file3 ..." after expanding the "*" before the remove command is executed. On the PC, an "erase *.*" generates the sequence: "directory first *.*" Erase result. "directory next" <---------+ Erase result. | loop until no more files ---+ On the PC a initial "READ DIRECTORY NFS RPC" call is made, and 1024 bytes of directory entries are returned. The PC extracts each entry and then performs an NFS DELETE FILE RPC call on each file as it is extracted. When the 1024 bytes are exhausted, a new "READ DIRECTORY NFS RPC" call is made with the "cookie" of the last file deleted. (On an Apollo, this would be the directory entry number of the last file ?) Consider the possibility that the Apollo compacts its directories when files are removed. If this is true, a file in the directory would suddenly have a new cookie. The second "READ DIRECTORY NFS RPC" call made by the PC might ask for a directory starting at file 10 (cookie of 10), but since the directory has been compacted, what the PC thought was file 10 is now file 0 and file 11 is file 1 ... the request by the PC would pickup file 20 and orignial files 10 through 19 would not have been deleted. My guess is that if you limited the transfer size on a Unix system to 1024 and wrote a program which deleted files as described for the PC, you would see the same behaviour. One solution (kludge) would be that the PC would re-execute the deleted loop, until no files are deleted. Is anyone willing to modify the NFS spec. to specify the length of time a cookie is valid ? - Carl Beame Beame@McMaster.CA
geoff@tandoori.East.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) (01/28/91)
[problem with "del *.*" on a PC leaving files undeleted on an Apollo server] The most likely explanation is that the Apollo is resorting the directory when it receives the unlink request. This invalidates the "cookie" in the directory entries so that when the PC requests the "next" entry it gets something completely different. On a BSD Unix system, when you type "rm *.*" the expansion of the "*.*" into a list of names takes place first, in the shell, and "rm" then processes the list of names and issues an unlink for each one. Under DOS, the "unlink" system call can be given a filename which includes wildcard characters, and the pattern matching is done within the system call. How do we (Sun, FTP and others) implement this? We can't do what BSD does and expand the pattern first, because we can't guarantee that we could buffer the resulting list, so instead we read successive blocks of directory entries, scan each block, and for each name that matches the pattern we issue an unlink over the wire. When we get to the end of the block, we request the next block using the "cookie" token which was appended to the previous block. If the directory has been reordered, we lose. There are really two problems here. The first is that the Apollo implementation shouldn't reorder a directory if there's a reasonable chance that a cookie which it issued is likely to be "played back" in a subsequent request. This means waiting until after a "read directory entries" request returns the last directory block, or after some suitable interval. However there is also a protocol problem. It's clearly impossible (and unwise) to forbid a server from ever reordering a directory between successive reads (which may be separated by an arbitrary amount of time), and with the present protocol there's no good way to cope with this. At present a server is required to make sure that it never returns the same file twice in a search, so there's always a risk that in doing so it will miss some entries. There are a couple of fixes which have been contemplated. The "correct" protocol-level fix would require that each "read directory" request include a version number or timestamp for the directory (obtained from a "stat" of the directory); if the directory changed, the server could reject a request with a suitable error code. That would take a full-blown protocol rev. A slghtly simpler change which would require only the definition of a new error code within the existing protocol would be for a server to encode a version number within the cookie (easy; it's opaque) and return an error if an out-of-date cookie was received. The client could then choose whether to restart the search with a cookie of 0 (and risk getting duplicate entries) or give up. The behaviour on both sides could be configurable. A cookie is simplay an unsigned 32-bit integer; 0 has a special meaning, so one could divide the 32 bits into an 8 bit version and a 24 bit index. This would still allow 2^24 entries in a directory, while correctly handlng most stale cookies. (Hopefully an implementor can resist the temptation to reorder a directory 256 times between requests!) -- Geoff Arnold, PC-NFS architect, Sun Microsystems. (geoff@East.Sun.COM) -- ------------------------------------------------------------------------------ -- No cute comments. War isn't cute. -- ------------------------------------------------------------------------------