mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) (11/27/89)
In article <MIKE.89Nov27094420@cfdl.larc.nasa.gov> mike@cfdl.larc.nasa.gov (Mike Walker) writes: >I am having a strange error occur on a NFS mounted partition on our PI. >First a little info about the machines involved: [ ... details deleted ... ] >Symptoms: > - ls works everywhere > - cat, grep, etc. (normal file access) works everywhere > - echo * fails only on the Gould file-system (error: ``no match'') > - find fails only on the Gould file-system > (error: ``getwd: read error in ..'') > - None of these problems show up on the Sun or the Gould using local, > Sun NFS, or Irix NFS file-systems. >Thanks for any help, >Mike >-- >Mike Walker AS&M Inc/NASA LaRC (804) 864-2305 I have had very similar problems trying to get NFS to work between our PI and our NeXT (which we bought as a cheap file server). NFS works pretty well between our 3030's and our NeXT, though often when the NeXT tries to write a file on the IRIS's disk, it ends up with user-id=-1 and group-id=-1. I would understand this if a setuid program were trying to write the file, but it happens when almost any program writes.... Examples are `cp' and `emacs', both owned by root, but neither with setuid or setgroupid bits set.... -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu mccalpin@scri1.scri.fsu.edu mccalpin@delocn.udel.edu
mike@cfdl.larc.nasa.gov (Mike Walker) (11/27/89)
I am having a strange error occur on a NFS mounted partition on our PI. First a little info about the machines involved: 1) Personal Iris 4D-20 w/ Irix 3.2 2) Gould NP1 w/ UTX/32 3.1 (BSD w/ SVR3 extensions) 3) Sun 3/280 w/ Sun Unix 3.4 I have one file system from machines 2 and 3 above mounted on the PI. Everything seems to work fine with the Sun based fs, but certain operations fail on the fs mounted off of the Gould. Symptoms: - ls works everywhere - cat, grep, etc. (normal file access) works everywhere - echo * fails only on the Gould file-system (error: ``no match'') - find fails only on the Gould file-system (error: ``getwd: read error in ..'') - None of these problems show up on the Sun or the Gould using local, Sun NFS, or Irix NFS file-systems. I noticed the problem when I tried running the X11 lndir.sh script on the Iris to set up links to the X distribution mounted from my Gould. Does anyone have any clues as to what could be wrong here? As I said, many things work find (after running lndir on the Gould, I was able to compile the X library on the Iris without any [NFS related] problems). Thanks for any help, Mike -- Mike Walker AS&M Inc/NASA LaRC (804) 864-2305
brendan@illyria.wpd.sgi.com (Brendan Eich) (12/25/89)
In article <MIKE.89Nov27094420@cfdl.larc.nasa.gov>, mike@cfdl.larc.nasa.gov (Mike Walker) writes: > I am having a strange error occur on a NFS mounted partition on our PI. > First a little info about the machines involved: > > 1) Personal Iris 4D-20 w/ Irix 3.2 > 2) Gould NP1 w/ UTX/32 3.1 (BSD w/ SVR3 extensions) > 3) Sun 3/280 w/ Sun Unix 3.4 > > I have one file system from machines 2 and 3 above mounted on the PI. > Everything seems to work fine with the Sun based fs, but certain > operations fail on the fs mounted off of the Gould. Symptoms: > > - ls works everywhere > - cat, grep, etc. (normal file access) works everywhere > - echo * fails only on the Gould file-system (error: ``no match'') > - find fails only on the Gould file-system > (error: ``getwd: read error in ..'') > - None of these problems show up on the Sun or the Gould using local, > Sun NFS, or Irix NFS file-systems. Mike informed me via private communication that only the C-shell's echo failed to match * against visible filenames; 'echo *' in the Bourne shell worked as expected. This clue, plus Ethernet packet traces captured by Mike (thanks!), exposed a server bug seen at previous Connectathons (a Connectathon is an annual NFS interoperation conference thrown by Sun, attended by most NFS vendors). Clients may call the NFS readdir remote procedure with an arbitrary byte count indicating the number of bytes allocated for filesystem-independent directory entries. The reference NFS server code uses this byte count to allocate space for server-dependent directory entries, and calls the local filesystem to read the directory. Older reference NFS ports contained BSD Fast File System (FFS) readdir code that failed with EINVAL if the requested byte count was less than, or not congruent with, DIRBLKSIZ. DIRBLKSIZ is typically 512. SGI's C-shell, and several other BSD-derived programs that SGI ships, use a byte count of 512 when they call the BSD version of readdir(3B). If the directory is remote, and if its NFS server is based on an older NFS reference port and has a DIRBLKSIZ of, say, 1024, the server will reject the client's readdir call with a status code equal to EINVAL (22). This is exactly what Mike's Gould server does, so it is likely that Gould has defined their DIRBLKSIZ to be 1024 (perhaps because their disks use 1024-byte sectors). Our C-shell, a straight port of 4.3BSD csh, doesn't check for readdir errors, so the EINVAL causes 'echo *' to silently complete, apparently successfully, but with "No match". The bourne shell uses the AT&T-based readdir(3C) routine, which asks for 4096 bytes worth of directory entries, thus avoiding the bug. Note that the NFS protocol doesn't define EINVAL as a well-known status code -- however, the protocol's status codes are defined by enumerating certain 4.2BSD/SunOS intro(2) error numbers, and all NFS implementations that I've seen from Sun fail to check for error numbers not in the status enumeration, in order to avoid sending them. Almost any server error code could leak through the protocol. Our NFS maps unspecified error numbers such as EINVAL onto the NFSERR_IO status code. Gould's NFS does not. NFS implementors have always relied on the Sun reference ports of NFS to 4.3BSD for standardization, lacking a complete spec (the NFS version 2 protocol has an RFC, but it doesn't place any restrictions on readdir's byte count argument; it doesn't even distinguish between client and server uses of this number). The latest reference port (NFSSRC4.0.x) that Sun has shipped to licensed NFS vendors has fixed BSD FFS readdir to accept any byte count. Perhaps Gould has, or will soon have, a version of NFS based on this release. Brendan Eich Silicon Graphics, Inc. brendan@sgi.com