[comp.unix.ultrix] RISC Ultrix 3.1 NFS bug

whm@sunquest.UUCP (Bill Mitchell) (02/23/90)

I found that on a particular DECstation 3100, "setld -i" when redirected to
an NFS-mounted filesystem on a Sun-3/280 would produce a file with some null
characters in it.  The file would always have the right number of bytes, but
some bytes would be null.  The 3100 is a loaner and our DEC rep inquired about
this problem on some sort of DEC internal network.  He forwarded me a couple
of responses and they boiled down to "looks like a Sun bug".

I tried to reproduce the problem using another 3100 as a NFS server, but the
problem didn't appear.

I investigated further and found that "setld -i" has non-deterministic output:
the lines aren't always in the same order.  setld is actually a script and for
some reason it does a lot of echo's in the background.  I created a shell
script that does a bunch of echo's in the background (it follows at the
end of this message).  When run on a 3100 and redirected to an NFS filesystem
on another 3100, the observed failure rate is 100%.  When run on a Sun and
redirected to an NFS filesystem on a Sun or a 3100, the observed failure rate
is 0%.  So, it looks like some sort of client-side Ultrix NFS bug.

If you'd like to try to reproduce this on your system, here's the script:
-------------------------------------------------------------------------
echo '.xx xxx x'  &
echo '.xx xxxx'  &
echo 'xxx \- xxxxxxxxxxx xxx xxxxx xxxx'  &
echo '.xx xxxxxx'  &
echo '.x xxx'  &
echo '[\xx\-x\xx] [\xx\-x\xx] [\xx\-x\xx] [\xx\-x\xx] '  &
echo '[\xx\-x\xx\|] [\xx\-x\xx] [\xx\-x\xx] \xxxxxx...\xx'  &
echo '.xx'  &
echo '.xx xxxxxxxxxxx'  &
echo '.xxx "xxx xxxxxxx"'  &
echo '.xxx "xxxx" "xxxxxxxxxx"'  &
echo 'xxx'  &
echo '.xx xxx'  &
echo 'xxxxxxx xxxxx xxxx'  &
echo '.x xxxx'  &
echo 'xx xxxxxxxx xxx xxxxxxxx xx xx xxx xxxxxxxx xxxxxx.  xxxxxxxxx,'  &
echo 'xx xxxxxxx xxx xxxx xx xxx xxxxxxxx xxxxxx xxx xxxx:'  &
echo '.xx'  &
echo 'xxx xxxx'  &
echo '.xx'  &
echo 'xx xxxxxxxxxxx xxx xxxxx xxx xxxxx xxx xxxxxx'  &
echo 'xx xxx xxxxx xxx xxxx:'  &
echo '.xx'  &
echo 'xxx xxxxx xxxxn >xxxxn'  &
echo '.xx'  &
echo 'xx xx xxxxx xxxx xx xxxxx, xx xx xx xx xxxx (\-) xx xxxxxxxxxxx xx'  &
echo 'xx xxxxxxxx,'  &
echo '.xx xxx'  &
--------------------------------------------------------------------
Bill Mitchell				whm@sunquest.com
Sunquest Information Systems		sunquest!whm@arizona.edu
930 N. Finance Center Dr.               {arizona,uunet}!sunquest!whm
Tucson, AZ, 85710                       602-885-7700

iglesias@orion.oac.uci.edu (Mike Iglesias) (02/24/90)

We just found out that there is a known Sun NFS bug (ref # 1014577)
for SunOS 4.0.  Here's the info we have:


Synopsis: NFS mounted files occasionally get garbage/nulls written to them.

Description:
Occasionally when writing to NFS mounted files, parts of a file are replaced
exactly (no insertions or deletions) with garbage, usually nulls.  This can 
span several appens to the file by distinct processes running minutes apart.


Does that sound like the problem you're having?  We've seen the results
of this bug, but have no idea (until now) how to cause it.



Mike Iglesias
University of California, Irvine

iglesias@orion.oac.uci.edu (Mike Iglesias) (02/24/90)

Well, I guess it isn't the Sun NFS bug.  I tried it between a DECstation
3100 (Ultrix 3.1) and a Sun Sparc 1 (SunOS 4.0.3) and it does fail.
Looking at the packets with etherfind, I see nulls in the packets
at the exact places they ended up in the file.  Looks like Ultrix
is messing up the file.


Mike Iglesias
University of California, Irvine

meissner@osf.org (Michael Meissner) (02/24/90)

In article <25E5B8AD.23477@orion.oac.uci.edu>
iglesias@orion.oac.uci.edu (Mike Iglesias) writes:

| We just found out that there is a known Sun NFS bug (ref # 1014577)
| for SunOS 4.0.  Here's the info we have:
| 
| 
| Synopsis: NFS mounted files occasionally get garbage/nulls written to them.
| 
| Description:
| Occasionally when writing to NFS mounted files, parts of a file are replaced
| exactly (no insertions or deletions) with garbage, usually nulls.  This can 
| span several appens to the file by distinct processes running minutes apart.
| 
| 
| Does that sound like the problem you're having?  We've seen the results
| of this bug, but have no idea (until now) how to cause it.

When I was at Data General, we had the same problem with SunOS 3.5
that we were using to bootstrap the AViiON software.  Our network
people discovered sun was not turning on checksumming on the NFS UDP
packets.  We kludged around it, by taking the NFS source for the
module which opens the socket, and turning on checksumming, and
rebuilding the kernel with this module.  I would hope that Ultrix
turns on checksumming, but you never know....
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so