[comp.unix.questions] Ultrix panic: iput i_count < 1

david@varian.UUCP (02/11/87)

Here's a question that I would be posting to an Ultrix group (if there
was one...), though it may be of interest to other 4.xBSD users.

We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80,
2 DMZ32, and DELUA).  In the last couple of weeks we started crashing
about once a day or two with the message

	panic: iput i_count < 1 !

The only thing that I can find in common between the crashes are logouts
from ptys (telnet from IBM PC's running MIT/CMU PCIP) about a minute
or 2 before the crash. However, I can't re-create a crash by just logging out.

Does anyone have any suggestions?  Any help would be appreciated.


-- 
David Brown	 (415) 945-2199
Varian Instruments 2700 Mitchell Dr.  Walnut Creek, Ca. 94598
{ptsfa,lll-crg,zehntel,dual,amd,fortune,ista,rtech,csi,normac}!varian!david

avolio@decuac.UUCP (02/12/87)

In article <636@varian.UUCP>, david@varian.UUCP (David Brown) writes:

> We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80,
> 2 DMZ32, and DELUA).  In the last couple of weeks we started crashing
> about once a day or two with the message
> 
> 	panic: iput i_count < 1 !

Hmmm.  The error comes from the iput routine in ufs_inode.o.  This
routine is to "Decrement reference count of an inode structure.  On
the last reference, write the inode out and if necessary, truncate and
deallocate the file."  Basically it is in a section of code that
should not happen.  (But I guess it may ... I mean it isn't marked
"YOU CAN NEVER GET HERE" :-))  The routine just found out that the
reference count -- the link count I guess -- is less than 1.  I guess
my first guess would be hardware problems (since I am in software).
Are "fsck's" being done properly?  Any other error messages on the
console or in /usr/adm/messages?  I am led to believe by Digital
Review that there are some perceived problems with a disk you mention
in your list.  You might have that checked out.  (Our RA81 has never
given us a problem.  I am just reporting what I read...)

-Fred-

terryl@tekcrl.UUCP (02/13/87)

In article <1172@decuac.DEC.COM> avolio@decuac.DEC.COM (Frederick M. Avolio) writes:
>In article <636@varian.UUCP>, david@varian.UUCP (David Brown) writes:
>
>> We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80,
>> 2 DMZ32, and DELUA).  In the last couple of weeks we started crashing
>> about once a day or two with the message
>> 
>> 	panic: iput i_count < 1 !
>
>Hmmm.  The error comes from the iput routine in ufs_inode.o.  This
>routine is to "Decrement reference count of an inode structure.  On
>the last reference, write the inode out and if necessary, truncate and
>deallocate the file."  Basically it is in a section of code that
>should not happen.  (But I guess it may ... I mean it isn't marked
>"YOU CAN NEVER GET HERE" :-))  The routine just found out that the
>reference count -- the link count I guess -- is less than 1.  I guess
>my first guess would be hardware problems (since I am in software).
>Are "fsck's" being done properly?  Any other error messages on the
>console or in /usr/adm/messages?  I am led to believe by Digital
>Review that there are some perceived problems with a disk you mention
>in your list.  You might have that checked out.  (Our RA81 has never
>given us a problem.  I am just reporting what I read...)

     I don't think so. From the symptoms reported, it sounds like the
bug is in close(). Mr. Brown states that he is running Ultrix 1.2; Ultrix
is based on 4.2 BSD. There was a bug in EARLY releases of 4.2 in the
routine close(found in /sys/sys/kern_descrip.c on 4.2 systems; your path
names may vary.). Below is a(somewhat brief) excerpt of the original 4.2
code in close():

	closef(fp);
	u.u_ofile[uap->i] = NULL;
	*pf = 0;

     The problem only shows up on the last close of a file descriptor.
What happens is that close() calls closef(); if this is the last close
of a file descriptor, closef() calls ino_close() for NORMAL files (i.e.
not sockets). ino_close() calls iput() to clean up the inode as Mr. Avolio
describes. ino_close() then calls a device-specific close if this file
descriptor refers to either a block-special device, or a character-special
device, and this is where the problem can occur. If the device-specific
close routine blocks for ANY reason (i.e. a tty driver has to wait for
character queues to empty), then the device-specific close routine can be
interrupted. If the device-specific close routine is interrupted, then the
code after the call to closef() in close() is never executed, and that is
the real bug. The panic will happen when the process exits; what happens at
exit time is that closef() is called again to close the file descriptor that
was previously close()'ed by the user(because the information in the per-
process data area u.u_ofile still thinks the file descriptor hasn't been
close()'ed yet). Anyway, the whole upshot of this is to change the routine
close() in /sys/sys/kern_descrip.c, re-arranging the three lines quoted
above to look like this:

	u.u_ofile[uap->] = NULL;
	*pf = 0;
	closef(fp);

     BTW, whoever did the original 4.2 code had a (somewhat)slight clue
that there was a problem in close(), because there's a comment after the
call to closef() that says:

	/* WHAT IF u.u_error? */



				Terry Laskodi
				     of
				Tektronix