whm@sunquest.UUCP (Bill Mitchell) (02/23/90)
I found that on a particular DECstation 3100, "setld -i" when redirected to an NFS-mounted filesystem on a Sun-3/280 would produce a file with some null characters in it. The file would always have the right number of bytes, but some bytes would be null. The 3100 is a loaner and our DEC rep inquired about this problem on some sort of DEC internal network. He forwarded me a couple of responses and they boiled down to "looks like a Sun bug". I tried to reproduce the problem using another 3100 as a NFS server, but the problem didn't appear. I investigated further and found that "setld -i" has non-deterministic output: the lines aren't always in the same order. setld is actually a script and for some reason it does a lot of echo's in the background. I created a shell script that does a bunch of echo's in the background (it follows at the end of this message). When run on a 3100 and redirected to an NFS filesystem on another 3100, the observed failure rate is 100%. When run on a Sun and redirected to an NFS filesystem on a Sun or a 3100, the observed failure rate is 0%. So, it looks like some sort of client-side Ultrix NFS bug. If you'd like to try to reproduce this on your system, here's the script: ------------------------------------------------------------------------- echo '.xx xxx x' & echo '.xx xxxx' & echo 'xxx \- xxxxxxxxxxx xxx xxxxx xxxx' & echo '.xx xxxxxx' & echo '.x xxx' & echo '[\xx\-x\xx] [\xx\-x\xx] [\xx\-x\xx] [\xx\-x\xx] ' & echo '[\xx\-x\xx\|] [\xx\-x\xx] [\xx\-x\xx] \xxxxxx...\xx' & echo '.xx' & echo '.xx xxxxxxxxxxx' & echo '.xxx "xxx xxxxxxx"' & echo '.xxx "xxxx" "xxxxxxxxxx"' & echo 'xxx' & echo '.xx xxx' & echo 'xxxxxxx xxxxx xxxx' & echo '.x xxxx' & echo 'xx xxxxxxxx xxx xxxxxxxx xx xx xxx xxxxxxxx xxxxxx. xxxxxxxxx,' & echo 'xx xxxxxxx xxx xxxx xx xxx xxxxxxxx xxxxxx xxx xxxx:' & echo '.xx' & echo 'xxx xxxx' & echo '.xx' & echo 'xx xxxxxxxxxxx xxx xxxxx xxx xxxxx xxx xxxxxx' & echo 'xx xxx xxxxx xxx xxxx:' & echo '.xx' & echo 'xxx xxxxx xxxxn >xxxxn' & echo '.xx' & echo 'xx xx xxxxx xxxx xx xxxxx, xx xx xx xx xxxx (\-) xx xxxxxxxxxxx xx' & echo 'xx xxxxxxxx,' & echo '.xx xxx' & -------------------------------------------------------------------- Bill Mitchell whm@sunquest.com Sunquest Information Systems sunquest!whm@arizona.edu 930 N. Finance Center Dr. {arizona,uunet}!sunquest!whm Tucson, AZ, 85710 602-885-7700
iglesias@orion.oac.uci.edu (Mike Iglesias) (02/24/90)
We just found out that there is a known Sun NFS bug (ref # 1014577) for SunOS 4.0. Here's the info we have: Synopsis: NFS mounted files occasionally get garbage/nulls written to them. Description: Occasionally when writing to NFS mounted files, parts of a file are replaced exactly (no insertions or deletions) with garbage, usually nulls. This can span several appens to the file by distinct processes running minutes apart. Does that sound like the problem you're having? We've seen the results of this bug, but have no idea (until now) how to cause it. Mike Iglesias University of California, Irvine
iglesias@orion.oac.uci.edu (Mike Iglesias) (02/24/90)
Well, I guess it isn't the Sun NFS bug. I tried it between a DECstation 3100 (Ultrix 3.1) and a Sun Sparc 1 (SunOS 4.0.3) and it does fail. Looking at the packets with etherfind, I see nulls in the packets at the exact places they ended up in the file. Looks like Ultrix is messing up the file. Mike Iglesias University of California, Irvine
meissner@osf.org (Michael Meissner) (02/24/90)
In article <25E5B8AD.23477@orion.oac.uci.edu> iglesias@orion.oac.uci.edu (Mike Iglesias) writes: | We just found out that there is a known Sun NFS bug (ref # 1014577) | for SunOS 4.0. Here's the info we have: | | | Synopsis: NFS mounted files occasionally get garbage/nulls written to them. | | Description: | Occasionally when writing to NFS mounted files, parts of a file are replaced | exactly (no insertions or deletions) with garbage, usually nulls. This can | span several appens to the file by distinct processes running minutes apart. | | | Does that sound like the problem you're having? We've seen the results | of this bug, but have no idea (until now) how to cause it. When I was at Data General, we had the same problem with SunOS 3.5 that we were using to bootstrap the AViiON software. Our network people discovered sun was not turning on checksumming on the NFS UDP packets. We kludged around it, by taking the NFS source for the module which opens the socket, and turning on checksumming, and rebuilding the kernel with this module. I would hope that Ultrix turns on checksumming, but you never know.... -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so