[net.unix-wizards] Possible file table kernel problem

pc (03/02/83)

We run 4.1BSD and have installed several device drivers to
support our Cambridge ring. Typically, a /dev/entry is
mapped in a dynamic way onto a printer (say) on our network.

Most of the time, the device driver functions perfectly, but there
seems to be occasions (about one every two or three days - when the
ring interface is in continual use) where the close routine in the
driver isn't called. 

The program which drives the printer opens it as standard output and
the forks to pr to actually do the printing. So the file table entry
for the device has a reference count of two - but there is only one
inode entry in memory. 

Now, sometime during the closedown of this pair of processes a user
sends an interrupt signal to the parent. The parent traps this (using sigset)
and passes on an interrupt to the child. I suspect (but can't prove) that
this signal either causes some form of screw-up in the file table mechanism
such that the code to call the close routine doesn't happen - or what...?
There are two places (sys1.c, sys3.c) where the call to closef is not of the
form
	fp = u.u_ofile[i];
	u.u_ofile[i] = NULL;
	closef(fp);
Is this the problem?????

At the end of the day, the file table entry and the inode referenced by it
have gone - but the device is locked because it's internal tables say that
it is busy.  The evidence is that the close routine has never been called.

Has anyone out there in uucp-land experienced this sort of problem? Please
don't just dismiss it and say OH it must be the device driver because I have
looked and looked and looked and looked. If anyone has had this problem and
can give me a fix (or just to say they might have had it) - please reply to
	lime!ukc!pc	(best)
	or
	philabs!mcvax!ukc!pc
	or
	decvax!mcvax!ukc!pc

	Peter Collinson, University of Kent UK

obrien@Rand-Unix (03/30/83)

This "close not called" bug has been in UNIX since at least research
Version 6.  I've had several stabs myself at finding it and have never
managed the trick.  I'm hoping the new signal stuff in 4.2 will make it
go away.  Berkeley's aware of the problem but they haven't found it either.

greep@Su-Dsn (03/31/83)

I've also had problems with close routines apparently not being called,
especially when signals were being used. (This was on a driver for the
Arpanet.) I think this is a known (but not too well known) bug in 4.1bsd.
I don't know any fix for it.

Also I once saw a tty left with the exclusive-open bit on, even though the
only process talking to it had been closed.  This may have also been caused
by the close routine not being called.  This was with the standard driver
(DZ), so no non-Berkeley code was involved.