[net.bugs.4bsd] uuxqt bug with a badly logjammed spool directory

mark@cbosgd.UUCP (Mark Horton) (09/30/84)

Index:	usr.bin/uucp/uuxqt.c 4.2

Description:
	When there are a large number of X. files with no corresponding
	D. files, uuxqt exits without doing any work.
Repeat-By:
	Not easily reproduced.  It happens when the first 20 X. files
	in /usr/spool/uucp/X. all have F lines requiring files that
	are not present.
Fix:
	I don't have a fix, but can offer some insight into the bug.
	There is a variable "rechecked" in gtxfile in uuxqt.c.  It is set the
	first time gtwrkf returns failure, so that the second time,
	it will assume there is nothing to do and exit.  Each gtwrkf
	returns the first 20 files in the directory; normally each
	time through this loop uuxqt will run 20 more programs and
	remove them from the directory, and the next time 20 more
	files will be found.  However, X files that need a D file,
	if the D file isn't present, will just be left there, cutting
	into the 20 files per batch.  If the first 20 files in the
	directory are all X. files that need D. files that aren't there,
	rechecked will be set and uuxqt will exit.

	Just having it not set rechecked makes things worse - uuxqt
	goes into a loop looking at the SAME 20 FILES over and over.

	Note that the symptoms of this bug suggest the old "I forgot to
	close a file descriptor" bug, but this bug has nothing to do
	with unclosed file descriptors.

	What I did as a temporary fix was to insert
		unlink(subfile(file));
	at the bottom of the routine gtxfile (right before the "goto retry"
	at the bottom) to remove the dead X. files.  I do not recommend
	this fix for normal situations, since this would remove files that
	are in transit or right after a phone line burp.  An alternative
	might be to stat the file, and if it's older than some cutoff
	(a few hours, perhaps) unlink it.

	The right fix is much harder - uuxqt has to remember all the
	names of files it's already rejected and ignore them in
	subsequent calls to gtwrkf.

	I can think of some other possibilities as well.  Running uuclean
	before uuxqt would help.  (This bug only appears when the spool
	directory is badly jammed up, in this situation the filesystem had
	been badly damaged.)  gtwrkf might return a considerably larger
	buffer than 20 files.  Or it might just keep the directory open
	and keep scanning, then restart when it runs off the end (with
	some provision for stopping if a complete pass of the directory
	is made with no work done.)

joe@fluke.UUCP (Joe Kelsey) (10/02/84)

This bug was discussed and solutions posted last January.  The original
article was from damonp@tektronix, date 13-Jan-84, subject: bug in 4.2
UUXQT.  There were about 4 or 5 followups posted, and I posted two
slightly different fixes, the second of which we have been using ever
since then.  Please search back through your archives of net.bugs.uucp
and I'm sure that you can easily find the articles.  If you do not have
these articles archived, send me mail and I will send you the fix ASAP.

/Joe

joe@fluke.UUCP (Joe Kelsey) (11/06/84)

er logs a message or unlinks the
	files, depending on a compile switch.  There are also some changes
	to anlwrk.c and uucp.h.  First, uuxqt.c


uuxqt.c log:
----------------------------
version 1.6        
date: 84/10/08 14:43:21;  author: joe;  state: Exp;  lines added/del: 2/1
Fix editing (really formatting) mistake which crept into v1.5
----------------------------
version 1.5        
date: 84/10/04 12:00:27;  author: joe;  state: Exp;  lines added/del: 3/2
Add log message for old X. files if UNLINK_OLDX is not defined.
This way, you can check the LOGFILE for old files and unlink them
by hand.
----------------------------
version 1.4        
date: 84/04/10 15:02:34;  author: joe;  state: Exp;  lines added/del: 2/0
Made unlinking of work files conditionally compiled on switch
UNLINK_OLDX.  Default is not to include the code.
----------------------------
version 1.3        
date: 84/02/29 14:04:53;  author: joe;  state: Exp;  lines added/del: 3/2
Fixed problem with previous fix whereby uuxqt would not process any
work files if the first one it encounters did not 'gotfiles()'.  I
moved the second iswrk call in gtxfile() to after the while loop which
looks for old X. files and conditionally return 0 if no work, otherwise
I go back and loop some more.

----------------------------
version 1.2        
date: 84/02/21 13:33:51;  author: joe;  state: Exp;  lines added/del: 23/0
Modified gtxfiles() so that if an X. file exists but does not have all
of its associated D. files and it is older than 1 day, we try to unlink
it to keep the X. queue from overflowing and messing up things.  Change
suggested by Tom Truscott via rlgvax.
----------------------------
version 1.1        
date: 84/01/24 11:29:15;  author: lcp;  state: Exp;  
Initial version
=============================================================================

< uuxqt.c.r1.1
> uuxqt.c.r1.6
27a28,30
> /* Nfiles is set in anlwrk.c. fluke!joe */
> extern int Nfiles;
> 
306a310
>  * Mod to check for old X. files, Feb. 1984, fluke!joe.
313a318,320
> 	time_t ystrdy;		/* yesterday */
> 	extern time_t time();
> 	struct stat stbuf;	/* for X file age */
335a343,363
> 	/* check for old X. file with no work files and remove them. */
> 	/* suggested by Tom Truscott. fluke!joe */
> 	if (Nfiles > LLEN/2) {
> 	    time(&ystrdy);
> 	    ystrdy -= (24 * 3600);		/* yesterday */
> 	    DEBUG(4, "gtxfile: Nfiles > LLEN/2\n", "");
> 	    while (gtwrkf(Spool, file) && !gotfiles(file)) {
> 		if (stat(subfile(file), &stbuf) == 0)
> 		    if (stbuf.st_mtime <= ystrdy) {
> 			DEBUG(4, "gtxfile: unlink %s \n", file);
> #ifdef UNLINK_OLDX
> 			unlink(subfile(file));
> #else
> 			logent(file, "OLD X. FILE");
> #endif UNLINK_OLDX
> 		    }
> 	    }
> 	    DEBUG(4, "iswrk\n", "");
> 	    if (!iswrk(file, "get", Spool, pre))
> 		return 0;
> 	}

anlwrk.c log:
----------------------------
version 1.2        
date: 84/02/21 13:32:26;  author: joe;  state: Exp;  lines added/del: 8/6
Moved LLEN and MAXRQST to uucp.h.  Made Nfiles global for use
by uuxqt.c/gtxfiles().
----------------------------
version 1.1        
date: 84/01/24 11:14:11;  author: lcp;  state: Exp;  
Initial version
=============================================================================

< anlwrk.c.r1.1
> anlwrk.c.r1.2
52,54c52,54
< 
< #define LLEN 20
< #define MAXRQST 250
---
> /*
>  * fluke!joe moved LLEN and MAXRQST to uucp.h.
>  */
58a59,60
>  * Not now they aren't!  uuxqt.c/gtxfile() uses Nfiles and LLEN to avoid
>  * problems with too many X files with no D files.  fluke!joe Feb. 1984
60,61c62,63
< static	int Nfiles = 0;
< static	char Filent[LLEN][NAMESIZE];
---
> int Nfiles = 0;
> char Filent[LLEN][NAMESIZE];
248c250
< /* LOCAL only */
---
> /* EXTERNALLY CALLED fluke!joe */

Obviously, you also have to #define LLEN and MAXRQST in uucp.h.  I leave
that as an exercise for the reader.

/Joe Kelsey	John Fluke Mfg. Co., Inc.	PO Box C9090
(206)356 5933	Everett, WA  98206

lmcl@ukc.UUCP (L.M.McLoughlin) (11/10/84)

When faced with this problem I came up with a different solution.
If the first LLEN X. files did not yet have there D. files then
uuxqt gave up trying to do a sorted scan of the directory and tried
a linear scan instead.  That way it could skip the entries awaiting
the D. files.   Eventully uuclean would stomp on them.