[comp.unix.ultrix] Gnode table full

dietrich@cernvax.UUCP (dietrich wiegandt) (09/14/90)

Hardware: VAX8530
Software: ULTRIX 3.1

Hello,

recently our system got into an almost unusable state complaining
that the gnode table was full.

Running pstat -i revealed that in our table of 1392 entries 875 entries
belonged to the same user, and the rest did not suffice to run a decent
service for the remaining logged on 60 or so users.

The user with all these busy entries in the gnode table was NOT logged on, and
there was no process on the system belonging to him. Of course he claimed
to be totally innocent.

Any idea by what manipulation we might have got in such a state?  We don't
know for how long we have been running with a gnode table threatening to
overflow, so the problem may have arisen days before we noticed it.

We have a rather special environment with many machines from different
manufacturers and lots of NFS mounts (including possibly this user's files)
to remote machines going on.

Any hints would be very much appreciated.

Dietrich Wiegandt
CERN CN Division

grr@cbmvax.commodore.com (George Robbins) (09/17/90)

In article <2731@cernvax.UUCP> dietrich@cernvax.UUCP (dietrich wiegandt) writes:
> Hardware: VAX8530
> Software: ULTRIX 3.1
> 
> recently our system got into an almost unusable state complaining

I've seen gnode table full messages when a disk drive goes offline or otherwise
wacky.  Once this happened when a mangment type was pushing buttons, several
times when the HSC decided to run ILDISK (in-line error analysis) due to drive
errors.

The HSC support works nicely, but doesn't seem all that rugged.

> We have a rather special environment with many machines from different
> manufacturers and lots of NFS mounts (including possibly this user's files)
> to remote machines going on.

It seems possible that you might see the same kind of problem if you have
lots of NFS mounts and run into network delays or problems, but this is
only guessing....

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

saus@bijou.media.mit.edu (Mark Sausville) (09/19/90)

In article <2731@cernvax.UUCP> dietrich@cernvax.UUCP (dietrich wiegandt) writes:

   From: dietrich@cernvax.UUCP (dietrich wiegandt)
   Newsgroups: comp.unix.ultrix,comp.unix.internals
   Keywords: gnode table overflow
   Date: 14 Sep 90 10:03:15 GMT
   Followup-To: comp.unix.ultrix

   Hardware: VAX8530
   Software: ULTRIX 3.1

   Any idea by what manipulation we might have got in such a state?  We don't
   know for how long we have been running with a gnode table threatening to
   overflow, so the problem may have arisen days before we noticed it.

   We have a rather special environment with many machines from different
   manufacturers and lots of NFS mounts (including possibly this user's files)
   to remote machines going on.

   Any hints would be very much appreciated.

   Dietrich Wiegandt
   CERN CN Division

A patch to version 3.1 exists (available from the support center)
which purports to fix a panic related to gnodes, NFS mounts and quotas.

We had a problem similar to yours caused by an application which
seemed to corrupt gnodes.  

After some investigation (the gnode code is way hairy), we decided
to host the application elsewhere.  It is clear to me that it's possible
to wedge gnodes over NFS, but since I couldn't find a simple way
to recreate the problem, I didn't pursue it with DEC.

I would suggest that you try to correlate with some application.  In
our case, it was an ethertalk file service serving files which were
NFS mounted on the ethertalk server.

					Mark.

Mark Sausville                           MIT Media Laboratory
617-253-0325                             Room E15-354
Fax: 617-258-6264                        20 Ames Street
saus@media-lab.media.mit.edu             Cambridge, MA 02139