david@varian.UUCP (02/11/87)
Here's a question that I would be posting to an Ultrix group (if there was one...), though it may be of interest to other 4.xBSD users. We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80, 2 DMZ32, and DELUA). In the last couple of weeks we started crashing about once a day or two with the message panic: iput i_count < 1 ! The only thing that I can find in common between the crashes are logouts from ptys (telnet from IBM PC's running MIT/CMU PCIP) about a minute or 2 before the crash. However, I can't re-create a crash by just logging out. Does anyone have any suggestions? Any help would be appreciated. -- David Brown (415) 945-2199 Varian Instruments 2700 Mitchell Dr. Walnut Creek, Ca. 94598 {ptsfa,lll-crg,zehntel,dual,amd,fortune,ista,rtech,csi,normac}!varian!david
avolio@decuac.UUCP (02/12/87)
In article <636@varian.UUCP>, david@varian.UUCP (David Brown) writes: > We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80, > 2 DMZ32, and DELUA). In the last couple of weeks we started crashing > about once a day or two with the message > > panic: iput i_count < 1 ! Hmmm. The error comes from the iput routine in ufs_inode.o. This routine is to "Decrement reference count of an inode structure. On the last reference, write the inode out and if necessary, truncate and deallocate the file." Basically it is in a section of code that should not happen. (But I guess it may ... I mean it isn't marked "YOU CAN NEVER GET HERE" :-)) The routine just found out that the reference count -- the link count I guess -- is less than 1. I guess my first guess would be hardware problems (since I am in software). Are "fsck's" being done properly? Any other error messages on the console or in /usr/adm/messages? I am led to believe by Digital Review that there are some perceived problems with a disk you mention in your list. You might have that checked out. (Our RA81 has never given us a problem. I am just reporting what I read...) -Fred-
terryl@tekcrl.UUCP (02/13/87)
In article <1172@decuac.DEC.COM> avolio@decuac.DEC.COM (Frederick M. Avolio) writes: >In article <636@varian.UUCP>, david@varian.UUCP (David Brown) writes: > >> We're running Ultrix 1.2 (with DECnet/Ultrix) on a VAX 750 (RA81, TU80, >> 2 DMZ32, and DELUA). In the last couple of weeks we started crashing >> about once a day or two with the message >> >> panic: iput i_count < 1 ! > >Hmmm. The error comes from the iput routine in ufs_inode.o. This >routine is to "Decrement reference count of an inode structure. On >the last reference, write the inode out and if necessary, truncate and >deallocate the file." Basically it is in a section of code that >should not happen. (But I guess it may ... I mean it isn't marked >"YOU CAN NEVER GET HERE" :-)) The routine just found out that the >reference count -- the link count I guess -- is less than 1. I guess >my first guess would be hardware problems (since I am in software). >Are "fsck's" being done properly? Any other error messages on the >console or in /usr/adm/messages? I am led to believe by Digital >Review that there are some perceived problems with a disk you mention >in your list. You might have that checked out. (Our RA81 has never >given us a problem. I am just reporting what I read...) I don't think so. From the symptoms reported, it sounds like the bug is in close(). Mr. Brown states that he is running Ultrix 1.2; Ultrix is based on 4.2 BSD. There was a bug in EARLY releases of 4.2 in the routine close(found in /sys/sys/kern_descrip.c on 4.2 systems; your path names may vary.). Below is a(somewhat brief) excerpt of the original 4.2 code in close(): closef(fp); u.u_ofile[uap->i] = NULL; *pf = 0; The problem only shows up on the last close of a file descriptor. What happens is that close() calls closef(); if this is the last close of a file descriptor, closef() calls ino_close() for NORMAL files (i.e. not sockets). ino_close() calls iput() to clean up the inode as Mr. Avolio describes. ino_close() then calls a device-specific close if this file descriptor refers to either a block-special device, or a character-special device, and this is where the problem can occur. If the device-specific close routine blocks for ANY reason (i.e. a tty driver has to wait for character queues to empty), then the device-specific close routine can be interrupted. If the device-specific close routine is interrupted, then the code after the call to closef() in close() is never executed, and that is the real bug. The panic will happen when the process exits; what happens at exit time is that closef() is called again to close the file descriptor that was previously close()'ed by the user(because the information in the per- process data area u.u_ofile still thinks the file descriptor hasn't been close()'ed yet). Anyway, the whole upshot of this is to change the routine close() in /sys/sys/kern_descrip.c, re-arranging the three lines quoted above to look like this: u.u_ofile[uap->] = NULL; *pf = 0; closef(fp); BTW, whoever did the original 4.2 code had a (somewhat)slight clue that there was a problem in close(), because there's a comment after the call to closef() that says: /* WHAT IF u.u_error? */ Terry Laskodi of Tektronix