v.wales%ucla-locus@sri-unix.UUCP (10/19/83)
From: Rich Wales <v.wales@ucla-locus> First, a general technique for localizing "panic"s. "panic" messages come from calls to a kernel routine named "panic". For example, the message "panic: iput" would come from the following call: panic ("iput"); By convention, "panic" is usually called with an argument identifying the routine in the kernel where the problem occurred. For example, this particular fatal error occurred in a kernel routine named "iput". This is helpful if you are familiar with the kernel -- and since you're going to have to dive into the kernel sources to track down the problem anyway . . . . The most straightforward way to find a random "panic" is to search through the kernel sources for the quoted character string argument to "panic", via a command sequence similar to the following example: cd /usr/src/sys egrep '"iput"' */*.[cs] After determining where the "panic" call in question is, you must then look through that part of the kernel and figure out the circumstances under which that particular "panic" happens. If you get a "panic" message in UNIX, by the way, it will generally do you no good at all to ask your DEC CE what it means (since "panic"s are specific to the UNIX kernel and he/she most likely knows only VMS). Now, since you specifically asked about "panic: iput" -- The "panic: iput" message comes from the "iput" routine in sys/iget.c in the kernel. "iput" is called when someone is through with the in- core copy of an inode. In particular, when the reference count of an in-core inode is about to be decremented to zero, any changes made to the inode are written back to the disk, and the in-core inode structure is deallocated so that it can be reused later. To speed up the process of locating the in-core copy of an inode (or determining that no such copy exists right now), a hashing scheme is used. Inodes which "hash" to the same value are connected in a singly linked list. One of the things "iput" has to do when an in-core inode is deallocated, therefore, is remove the inode from the linked list corresponding to its hash code. "panic: iput" happens to mean that an in-core inode could not be found on the linked list corresponding to its hash code. Needless to say, this should never happen in a bug-free system. The only scenario I can imagine at the moment which might explain this is that a call to "iput" might be getting interrupted by another call to "iput" occurring at a raised IPL (interrupt priority level), where both "iput" calls reference either the same inode or else two different inodes with the same hash code. If this is the case, the second "iput" call could be tracked down by analyzing a kernel dump, or by halting the system after the "panic" and examining the registers and interrupt stack (yecch!). One thing you might do to help you track the problem down is add a "printf" to tell you the device and inode number of the offending in- core inode. You can then use "ncheck -i" to map this information into a file name; this may be helpful. Be sure to put your "printf" BEFORE the "panic", not after, since calls to "panic" never return! Perhaps the part of "iput" which removes the inode from the linked list corresponding to its hash code should itself be done at a raised IPL, by enclosing it between "s = spl6();" and "splx(s);" statements. What do the other wizards out there in UNIX-Wizard-Land think about this? -- Rich <wales@UCLA-LOCUS>