[comp.unix.wizards] Another reason I hate NFS: Silent data loss!

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/15/91)

I just ran about twenty processes simultaneously, each feeding into its
own output file in the same NFS-mounted directory. About half the data
was lost: truncated files, blocks full of zeros, etc. The NFS client and
NFS server both had load averages under 2, though I was pounding rather
heavily on the network (ten TCP connections or so a second from one
machine). The data loss was completely silent.

I know the official answer: people on Suns aren't supposed to send
twenty I/O requests in a fraction of a second. But such things will
happen occasionally on any multiuser machine. What do Sun's protocol
designers have against TCP? What do they think is so important that they
have to sacrifice TCP streams, TCP reliability, TCP efficiency?

---Dan

jik@cats.ucsc.edu (Jonathan I. Kamens) (06/17/91)

At Project Athena, we encountered both of the bugs mentioned by Dan in his
last posting (truncation of files and blocks full of nulls when writing to NFS
filesystems).  We fixed both of them, and sent patches back to Sun.  However,
I believe that our fixes required minor changes to the NFS protocol (I seem to
recall something about timestamping truncation requests so that retransmitted
requests would not cause files to be truncated after they had already started
to get data written into them), and I don't think Sun ever did anything with
them.

My point is that the problem is fixable, and it isn't even difficult to fix. 
Whether Sun (not to mention other vendors) has or will ever fix it is another
question entirely....

AFS is your friend. :-)

-- 
Jonathan Kamens					jik@CATS.UCSC.EDU

boyd@prl.dec.com (Boyd Roberts) (06/17/91)

In article <17105@darkstar.ucsc.edu>, jik@cats.ucsc.edu (Jonathan I. Kamens) writes:
> My point is that the problem is fixable, and it isn't even difficult to fix. 
> Whether Sun (not to mention other vendors) has or will ever fix it is another
> question entirely....
> 

Sure, the problem is fixable.  The protocol is the problem!
It would appear that's the last thing Sun are likely to change.

All this nonsense about statelessness is just a smoke screen.
As soon as anyone proposes a change the immediate response is
`but then it's not _stateless_'.  We'll as far as I'm concerned:

    s/stateless/bug-full/

The whole thing is a charade.  You see that real disk there?  What's
contained on it.  Is it files?  Is it data?  Is it state?  Yes is it!

I could never understand this nonsense.  What makes them so sure that
when a crashed server comes up your data will still be intact?  If
a server crashes your system calls should error, no re-trying;  error --
plain and simple.  How will NFS ensure that the kernel or fsck or
the buffer cache won't have trashed my file as a result of the
crash?  Don't say `inode generation number',  it's just not a defense.

That UDP `protocol' really sucks the mop.  Soft/hard mounts.  What a joke.
What's needed is a connection based stream protocol.  Then you know the
difference between remote slow and remote dead.  It's all a question
of flow control.  NFS has none.  Not even sequence numbers.  

We run a lot of NFS here, and it's as flakey as C shell.  Two of the machines
here just go to sleep every once in a while, when the traffic gets a little
strong.  God knows why.  It's going to take a lot of pondering to track it
down.  Even then, it's probably a fundamental design problem that can't,
or won't, be fixed.

Boyd Roberts			boyd@prl.dec.com

``When the going gets wierd, the weird turn pro...''

auvsaff@auvc8.tamu.edu (David Safford) (06/17/91)

In article <1991Jun17.084533.15905@prl.dec.com>, boyd@prl.dec.com (Boyd
Roberts) writes:
|>All this nonsense about statelessness is just a smoke screen.
|>As soon as anyone proposes a change the immediate response is
|>`but then it's not _stateless_'.  We'll as far as I'm concerned:
|>
|>    s/stateless/bug-full/
|>
|>I could never understand this nonsense.  What makes them so sure that
|>when a crashed server comes up your data will still be intact?  

It is nearly impossible for any system, stateless or statefull, to 
guarantee data integrity.  There are many approaches, and all have
certain advantages and disadvantages.  Stateless designs tend to be
simpler, faster, and less reliable.  For my research lab, Suns with
NFS have proven to be fast, convenient, and sufficiently reliable.
In fact, in four years we have not lost a single byte of data due to NFS.
Yes, there have been bugs, such as actimeo, and the nfs-confused-client
problem, but they were rather rapidly patched.  Conversely, because
NFS is stateless, we have NEVER been bothered by workstation crashes,
which, with our demanding distributed research applications, have
occured all too frequently :-).  

|>That UDP `protocol' really sucks the mop.  Soft/hard mounts.  What a joke.
|>What's needed is a connection based stream protocol.  Then you know the
|>difference between remote slow and remote dead.  It's all a question
|>of flow control.  NFS has none.  Not even sequence numbers.  

Hmm. Simply switching to TCP or other stream protocol will not differentiate
between "remote slow and remote dead".  If anything, TCP tends to hide
remote failures.  If your connection is explicitly dropped by the remote
host, you will know immediately, but other link or kernel failures can
be hidden for a long time by TCP retries and adaptive algorithms.

|>We run a lot of NFS here, and it's as flakey as C shell.  Two of the machines
|>here just go to sleep every once in a while, when the traffic gets a little
|>strong.  God knows why.  It's going to take a lot of pondering to track it
|>down.  Even then, it's probably a fundamental design problem that can't,
|>or won't, be fixed.

We run a lot of NFS here, too, and are very happy with it.  We certainly don't
go blaming all of our application failures on it without some evidence.
If your needs dictate tighter, statefull, service, then by all means feel
free to use AFS, but realize that many other people are happy with NFS.

dave safford
Texas A&M University
auvsaff@auvsun1.tamu.edu

mike@BRL.MIL ( Mike Muuss) (06/18/91)

NFS is designed as a reliable protocol.  I have pounded more than 250
NFS requests/sec against a fileserver, and no data loss.  Things you
should check are the number of retransmit's you authorized in /etc/fstab,
the error logs on both machines (run NFSSTAT), "netstat -s", etc, on both
machines, and see what the problem is.

You didn't mention what kind of machine you were using (exactly), nor
did you indicate what OS you were running (yes, UNIX, but which one?).
That information might prove helpful.

	Best,
	 -Mike

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/18/91)

In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

> I just ran about twenty processes simultaneously, each feeding into
> its own output file in the same NFS-mounted directory.  About half
> the data was lost: truncated files, blocks full of zeros, etc.

Was it a hard mount?  Then report a bug to your vendor.  Otherwise, you
asked for it, you got it.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

gwyn@smoke.brl.mil (Doug Gwyn) (06/18/91)

In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes:
>NFS is designed as a reliable protocol.

But its caching is all messed up, at least on some of our systems it is.
Moss & I discovered a way for an unprivileged user to exploit this to
browse through data in files that were not supposed to be accessible.

lance@motcsd.csd.mot.com (lance.norskog) (06/19/91)

AT&T's RFS runs solid as a rock, once you figure out how to configure it.
(The latter is a muthuh of a learning curve.)  Use it if you can.

It would be nice if AT&T told us how it worked, though...

Lance Norskog

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/19/91)

In article <1991Jun18.064615.21165@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
> In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > I just ran about twenty processes simultaneously, each feeding into
> > its own output file in the same NFS-mounted directory.  About half
> > the data was lost: truncated files, blocks full of zeros, etc.
> Was it a hard mount?  Then report a bug to your vendor.  Otherwise, you
> asked for it, you got it.

Uh, nothing in the NFS documentation says ``soft mounts are buggy, do
not use them.'' Hard mounts and soft mounts show similar failures.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/19/91)

In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes:
> NFS is designed as a reliable protocol.  I have pounded more than 250
> NFS requests/sec against a fileserver, and no data loss.

In this case the 20 requests came in under 1/50 of a second (somewhat
smaller, I think, but I don't have good measuring tools). I can't
sustain this load from one Sun, but a single burst was enough to lose
data.

> Things you
> should check are the number of retransmit's you authorized in /etc/fstab,

If the number of retransmits runs out, the writing process ``should''
get an error. Otherwise the implementation is (obviously) buggy.

> the error logs on both machines (run NFSSTAT), "netstat -s",

Nope. As far as I can tell, the loss was completely silent. I'm working
on a test program to exercise the problem thoroughly; I'll post it when
it's done.

---Dan

mcneill@eplrx7.uucp (Keith McNeill) (06/19/91)

From article <16553.Jun1903.00.5691@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <1991Jun18.064615.21165@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
>> In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> > I just ran about twenty processes simultaneously, each feeding into
>> > its own output file in the same NFS-mounted directory.  About half
>> > the data was lost: truncated files, blocks full of zeros, etc.
>> Was it a hard mount?  Then report a bug to your vendor.  Otherwise, you
>> asked for it, you got it.
> 
> Uh, nothing in the NFS documentation says ``soft mounts are buggy, do
> not use them.'' Hard mounts and soft mounts show similar failures.
> 
> ---Dan

But it does say...

From the SunOS Systems Admin Manual:

"Use the hard option with any file hierarchies you mount read-write."

If you have problems with hard mounts destroying data then you have
a buggy NFS version.

Keith

    Keith McNeill                 |    Du Pont Company
    eplrx7!mcneill@uunet.uu.net   |    Engineering Physics Laboratory
    (302) 695-9353/7395           |    P.O. Box 80357
                                  |    Wilmington, Delaware 19880-0357
-- 
    Keith McNeill                 |    Du Pont Company
    eplrx7!mcneill@uunet.uu.net   |    Engineering Physics Laboratory
    (302) 695-9353/7395           |    P.O. Box 80357
                                  |    Wilmington, Delaware 19880-0357
--
The UUCP Mailer

truesdel@nas.nasa.gov (David A. Truesdell) (06/20/91)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes:
>> NFS is designed as a reliable protocol.  I have pounded more than 250
>> NFS requests/sec against a fileserver, and no data loss.

>In this case the 20 requests came in under 1/50 of a second (somewhat
>smaller, I think, but I don't have good measuring tools). I can't
>sustain this load from one Sun, but a single burst was enough to lose
>data.

>> Things you
>> should check are the number of retransmit's you authorized in /etc/fstab,

>If the number of retransmits runs out, the writing process ``should''
>get an error. Otherwise the implementation is (obviously) buggy.

Why ``should'' it?  Your writes probably put their data into the buffer cache
just fine, it's the subsequent flushing of the buffer cache that failed.  And
guess what?  The write had probably already returned by then.  Or, do you
always use O_SYNC when opening files for writing?
--
T.T.F.N.,
dave truesdell (truesdel@nas.nasa.gov)
"Carpe Noctem"

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/22/91)

In article <16553.Jun1903.00.5691@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> In article <1991Jun18.064615.21165@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
>> In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>> I just ran about twenty processes simultaneously, each feeding into
>>> its own output file in the same NFS-mounted directory.  [...]
>> Was it a hard mount?  Then report a bug to your vendor.  Otherwise,
>> you asked for it, you got it.
> Uh, nothing in the NFS documentation says ``soft mounts are buggy, do
> not use them.''

Well, it does say don't use them for read/write filesystems.  Since you
were writing to the filesystem....

> Hard mounts and soft mounts show similar failures.

Then bug your vendor.  We used to get blocks of nulls when our
cross-mounts were soft; once we made them hard, the only data loss I've
ever seen was due to two different client machines writing to the same
file at once.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/22/91)

In article <truesdel.677362688@sun418>, truesdel@nas.nasa.gov (David A. Truesdell) writes:
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> In article <27226@adm.brl.mil> mike@BRL.MIL ( Mike Muuss) writes:
>>> NFS is designed as a reliable protocol.  I have pounded more than
>>> 250 NFS requests/sec against a fileserver, and no data loss.
>>> Things you should check are the number of retransmit's you
>>> authorized in /etc/fstab, [...]
>> If the number of retransmits runs out, the writing process
>> ``should'' get an error.  Otherwise the implementation is
>> (obviously) buggy.
> Why ``should'' it?  Your writes probably put their data into the
> buffer cache just fine, it's the subsequent flushing of the buffer
> cache that failed.  And guess what?  The write had probably already
> returned by then.

Consider a real disk.  What happens if a real disk doesn't respond when
the kernel writes a buffer from the buffer cache to it?

Right.  The kernel panics.

So a case could be made that if the number of retransmits runs out
(where a hard mount could be considered as specifying infinite
retransmission), the kernel should panic.

Unfortunately, fileservers die much more often than disks do.  The
current behavior is a compromise between preserving disk semantics and
practicality.

(No, I don't particularly like NFS either.  For us, unfortunately, it
is pretty much the only game in town.)

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

darryl@lemuria.MV.COM (Darryl P. Wagoner) (06/22/91)

In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>I just ran about twenty processes simultaneously, each feeding into its
>own output file in the same NFS-mounted directory. About half the data
>was lost: truncated files, blocks full of zeros, etc. The NFS client and
>NFS server both had load averages under 2, though I was pounding rather
>heavily on the network (ten TCP connections or so a second from one
>machine). The data loss was completely silent.

The only time that I have seen this happen is when there was a bug in
the NFS port or the server file system code.  Is this on Suns?  The
only other thing I could think of is that the server has too many open
files.  But this is just a SWAG!

>happen occasionally on any multiuser machine. What do Sun's protocol
>designers have against TCP? What do they think is so important that they
>have to sacrifice TCP streams, TCP reliability, TCP efficiency?

Sun's protocol has nothing against TCP or any other protocol.  It is
above the RPC layer which is above the UDP layer.  RPC can be ported
to use TCP or anyother protocol with reason.  TCP might work will with
a few clients but if you have a few hundred then hang it up.

-- 
Darryl Wagoner		darryl@lemuria.MV.COM or uunet!virgin!lemuria!darryl
12 Oak Hill Road
Brookline, NH 03033
Office: 603.672.0736   		Home: 603.673.0578

mike@unix.cis.pitt.edu (Mike Elliot) (06/23/91)

In article <1991Jun22.152801.9774@lemuria.MV.COM> darryl@lemuria.UUCP (Darryl P. Wagoner) writes:
>In article <4339.Jun1501.31.5191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
[description of circumstance deleted]
>>machine). The data loss was completely silent.
>
>The only time that I have seen this happen is when there was a bug in
>the NFS port or the server file system code.  Is this on Suns?  The
>only other thing I could think of is that the server has too many open
>files.  But this is just a SWAG!

Unfortunately, I have seen this all too often. We run a hetergenous net-
work of Apollo's, DEC's, HP's, IBM's, Sun's, etc. all running NFS. We
mount all of our file systems hard so that our software will only hang
when reading and writing across the network when things are slow instead
of just dieing. We have run this way for years without any problems.

Then we got in the IBM RS6000. Under AIX 3.1 (3001) NFS failed silently
at least 5% of the time. In fact it got so bad that we stopped running
on the IBM unless we were using the local disk. Then we upgraded to
AIX 3.1 (3005) and now NFS seems to fail 25% of the time, but at least
now it doesn't do it silently.

-mje

akira@atson.asahi-np.co.jp (Akira Takiguchi) (06/23/91)

     There's another reason for NFS to fail:  hardware bugs.  I heard that
at least one hardware did have a bug in the ethernet controller, and what
at first seemed to be an NFS bug was caused by somewhat mangled data on
ethernet collision [so the bug appeared only on heavy load].
-- 
       |    Akira Takiguchi  at ATSON, Inc. (a subsidiary of the Asahi Shimbun)
       |                WAKO GINZA bldg.  8-10-4 Ginza Chuo-ku Tokyo 104  Japan
       | Phone +81 3 3289 7051  Fax +81 3 3289 7066  SORRY, EMAIL NOT AVAILABLE

les@chinet.chi.il.us (Leslie Mikesell) (06/24/91)

In article <4282@motcsd.csd.mot.com> lance@motcsd.csd.mot.com (lance.norskog) writes:

>AT&T's RFS runs solid as a rock, once you figure out how to configure it.
>(The latter is a muthuh of a learning curve.)  Use it if you can.

The sematics of killing any process that happens to have a file open over
RFS or is in an RFS-mounted directory when the RFS mount is broken is
a little annoying, though.  It's especially annoying in combination with
the AT&T DOS server software which handles several users per server
process.  If one user has a file open across an RFS mount when the
RFS like is broken, suddenly 6 DOS users lose their server (and their
work).  If the DOS server in question happens to be the parent process,
everyone gets disconnected.

Les Mikesell
  les@chinet.chi.il.us