[comp.sys.pyramid] Help with "NFS server write failed"

ras@sgfb.ssd.ray.com (Ralph A. Shaw) (11/09/90)

For the past few days, we've been getting messages like:

>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>NFS server write failed: (err=69, dev=0xf0722bf4, ino=0x2).
>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).

every few seconds on our Pyramid 9820's console.  Nothing obvious is wrong,
there are no local or remote file systems that are above 90% full,
but I am at a loss as to how to trace the dev= information to find
which system has the problem. 

We have a Pyramid 9820 running OSx 5.0-0824, mounting file systems (and
being mounted by) various PC's, Sun2's, Sun3's, Sun4's and a Mt Xinu
VAX11/785.

Does anybody know how to trace this information?
-- 
Ralph Shaw		ras@sgfb.ssd.ray.com
Raytheon Company, Submarine Signal Division, Portsmouth, RI

booga@polyslo.CalPoly.EDU (Steve Jankowski [nectary]) (11/09/90)

In article <292@sgfb.ssd.ray.com> ras@sgfb.ssd.ray.com (Ralph A. Shaw) writes:
>For the past few days, we've been getting messages like:
>
>>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>>NFS server write failed: (err=69, dev=0xf0722bf4, ino=0x2).
>>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>
>every few seconds on our Pyramid 9820's console.  Nothing obvious is wrong,
>there are no local or remote file systems that are above 90% full,
>but I am at a loss as to how to trace the dev= information to find
>which system has the problem. 

We've been getting similar messages since upgrading to 5.0, though
not very frequently.  They remind me of the leftover debugging
statement that was left in 4.4.  That message complained of an error
69 whenever a user wrote a block that put them over quota.  Given
that we service 600 undergrads with about the same number of megs,
our console was often unusable...

I digress.  I suspect the message is caused by an NFS inspired write
that is putting the user over their quota.  If you don't run quotas,
this prognosis is completely useless.

booga

-- 
Steve Jankowski --------------------------------------------------------------
booga@polyslo.CalPoly.EDU        |V|   |)              Are we scientists? 
                                 | |r  |)ooga      Are we even engineers?
joobyjoobyjoobycornholerschnapperpooperpooperschnapper

geoff@bodleian.East.Sun.COM (Geoff Arnold @ Sun BOS - R.H. coast near the top) (11/09/90)

Quoth ras@sgfb.ssd.ray.com (Ralph A. Shaw) (in <292@sgfb.ssd.ray.com>):
#For the past few days, we've been getting messages like:
#
#>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).

Well, according to my /usr/include/sys/errno.h, error 69 is "disc [sic]
quota exceeded", so I'd try looking there. Try "/usr/etc/quotaoff -av"
(or whatever your system's equivalent is) on the pertinent systems.


-- Geoff Arnold, PC-NFS architect, Sun Microsystems. (geoff@East.Sun.COM)   --
   *** "Now is no time to speculate or hypothecate, but rather a time ***
   *** for action, or at least not a time to rule it out, though not  ***
   *** necessarily a time to rule it in, either." - George Bush       ***

wlyle@sjuphil.uucp (Wayne Lyle) (11/12/90)

In article <273a2efb.3d02@petunia.CalPoly.EDU> booga@polyslo.CalPoly.EDU (Steve Jankowski [nectary]) writes:
>
>
>In article <292@sgfb.ssd.ray.com> ras@sgfb.ssd.ray.com (Ralph A. Shaw) writes:
>>For the past few days, we've been getting messages like:
>>
>>>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>>>NFS server write failed: (err=69, dev=0xf0722bf4, ino=0x2).
>>>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>>
>>every few seconds on our Pyramid 9820's console.  Nothing obvious is wrong,
>>there are no local or remote file systems that are above 90% full,
>>but I am at a loss as to how to trace the dev= information to find
>>which system has the problem. 
>
>We've been getting similar messages since upgrading to 5.0, though
>not very frequently.  They remind me of the leftover debugging
>statement that was left in 4.4.  That message complained of an error
>69 whenever a user wrote a block that put them over quota.  Given
>that we service 600 undergrads with about the same number of megs,
>our console was often unusable...
>
>I digress.  I suspect the message is caused by an NFS inspired write
>that is putting the user over their quota.  If you don't run quotas,
>this prognosis is completely useless.

	This happens to us a lot (9825's 4.4c).  Mostly when the PC running
PC-NFS makes a big write.  Time to time it will happen when our mac users
(through a gatorbox) makes a write.  We don't use quota's so it is that for
us.  It doesn't seem to corrupt anything, my guess is that it tells the client
just to send it again.  It is annoying and shocking to an unsuspecting
operator.  We are hoping to go to 5.0 soon.  I hope that the reason you don't
see the message any more is that it has been fixed, instead of disregarded.


Wayne
-- 

Wayne J. Lyle
Dilworth, Paxson, Kalish & Kauffman
Philadelphia, PA 19109
(215) 875-8583

sandel@SW.MCC.COM (Charles Sandel) (11/13/90)

Well, offhand, I'd say that err=69 refers to a quota exceeded and
the ino=0x2 means that someone is trying to write in the root of a
filesystem....

Charles

ras@rayssd.ssd.ray.com (Ralph A. Shaw) (11/14/90)

Just a little followup on my earlier message about getting hundreds
of messages like:

>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).
>NFS server write failed: (err=69, dev=0xf0722bf4, ino=0x2).
>NFS server write failed: (err=69, dev=0xf0648650, ino=0x2).

1) Yes, error 69 is probably "over quota" errors from one of many
client systems we have (and we do use quotas) on one of many file systems.

[As an aside, did anyone notice that the Pyramid 5.x man page is
truncated in the middle, after errno 64?  This is both in the online
and hardcopy version of the manual page.]

2) Yes, I know the significance of inode 2.

3) Someone made a comment about it disappearing when they upgrade
to release 5.0 - wrong.  We are running OSx version 5.0c-891204,
and we still have the problem.

What I really want to know is how to trace the dev= information
contained in the error message.  There isn't any obvious correlation
between major/minor device numbers, maybe they are vnode-numbers?

Any suggestions (besides restating #1-3) would be appreciated.
-- 
Ralph Shaw		ras@sgfb.ssd.ray.com
Raytheon Company, Submarine Signal Division, Portsmouth, RI