[net.unix-wizards] inode table full

bzs%bostonu.csnet@csnet-relay.arpa (Barry Shein) (03/20/86)

There's definitely an inode bug in the original 4.2 tape distribution.
Not sure what the fix is tho it's been discussed a few times on this
list. The way to find out if that is your problem is to use pstat
to determine if any inodes have a ref count of -1 (will, I believe,
appear as ff or 255 on the output of pstat.) If so, you got it and
I believe it can lead to inode table full messages. Temporary fix?
Re-boot and pray for the best, increasing the number of inodes in
the system (as Chris Torek suggested, by upping MAXUSERS) will probably
help (seems to have here at BU) but won't completely cure it tho if
you have the memory it's a fine suggestion (in general it seems like
a good idea to set MAXUSERS in 4.2 to about as high as you can afford
anyhow, increases buffers etc but of course this is a complicated tuning
issue, well, not so complicated but I don't want to go into it here,
suffice it to say if you have 16 users on average and MAXUSERS is 16
and you have at least 4MB of memory you should increase MAXUSERS to
at least 32, I have 8MB on a 750, typically 20 users [I know, ugh!]
and am happy with 48 MAXUSERS, given the memory about double the actual
users is not wild.) This is all kinda irresponsible advice, but maybe
it will point you towards some common sense findings.

	-Barry Shein, Boston University

Me, I'm hoping it will all be fixed in 4.3...hah!

rick@nyit.UUCP (Rick Ace) (03/24/86)

Barry Shein writes,

> There's definitely an inode bug in the original 4.2 tape distribution.
> Not sure what the fix is tho it's been discussed a few times on this
> list. The way to find out if that is your problem is to use pstat
> to determine if any inodes have a ref count of -1 (will, I believe,
> appear as ff or 255 on the output of pstat.) If so, you got it and
> I believe it can lead to inode table full messages. Temporary fix?
> Re-boot and pray for the best...

Yes, there are several bugs, all of which revolve around the subject
of file descriptor management and its interaction with devices such
as terminals, whose open and close routines can sleep at a priority
greater than PZERO.

Consider the case where a 4.2bsd user program issues a close() syscall.
The kernel can (not necessarily in this order):

	1.  Free the file descriptor (clear u.u_ofile[fd] and
	    u.u_pofile[fd]).
	2.  Decrement the f_count value for the corresponding "file"
	    table entry.  If the count goes to zero, release the entry.
	3.  Decrement the i_count value for the "inode" structure.
	4.  In the case of a character-special device like a tty,
	    call the driver's d_close routine.

Problem:  there are cases where the kernel gets halfway through doing
an open() or close(), sleeps > PZERO, and gets interrupted by a signal
before the rest of the operation is complete, leaving file/inode/user
tables in an inconsistent state.

One scenario that can cause i_count to go below zero goes like this:
A user program calls close() to close a tty file descriptor.  UNIX
decrements f_count and i_count and then calls ttyclose().  If the tty's
output character queue is not empty, the kernel sleep()s at a priority
greater than PZERO, waiting for the queue to drain.  Normally, once the
queue has drained, the kernel awakens and proceeds to clear the u_ofile
and u_pofile entries for the file descriptor.  Assume, though, that while
the process is sleep()ing on t_outq, it receives a signal.  The kernel
aborts the sleep AND NEVER CLEARS U_OFILE AND U_POFILE.  When the process
subsequently issues another close() call to that file descriptor (either
explicitly, or implicitly via the "exit" syscall), f_count and i_count
are decremented AGAIN, SPURIOUSLY.  i_count can fall below zero, behaving
like a very large count that will never reach zero.  Result:  jammed
inode till next reboot.

The kernel performs two main tasks during close():
	1.  Adjust all share counts on "inode" and "file" table entries,
	    freeing these entries when appropriate.
	2.  Call device-specific logic to close the device.

When the kernel calls the device's d_close routine, it assumes the risk
that the routine will sleep and be interrupted by a signal.  It is
therefore imperative that the kernel do either:  all of #1 followed by
all of #2, or all of #2 followed by all of #1.  4.2bsd begins some of
the work in #1, then does #2, and finally finishes #1, giving rise to
the bugs.  There are places where a process can reference "file" table
entries it does not own anymore.

The essence of our fix was to rearrange the kernel's close() logic to
do task #1 completely first, and then do task #2.  It is possible in
this case for close() to return an EINTR error code while closing a
tty file descriptor, even though u_ofile and u_pofile have been cleared.
This seems preferable to the other alternative (#2, followed by #1)
because most programs don't examine the value returned by close().

-----
Rick Ace
Computer Graphics Laboratory
New York Institute of Technology
Old Westbury, NY  11568
(516) 686-7644

{decvax,seismo}!philabs!nyit!rick

kjs%tufts.csnet@.arpa (Kevin Sullivan) (04/09/86)

After running our Vax 780 under 4.2BSD for several days (6 or 7 maybe) we
begin getting 'inode: inode table full' messages on the console,  even though
it doesn't seem that the table should really be used up.  Has anyone else
noticed this kind of behavior?  Is the kernel corrupting itself after running
for a while?  Is there a fix if this is actually the problem?  Thanks.

Kevin Sullivan
Tufts University
kjs%tufts@csnet-relay