[news.software.b] C news problem: spacefor script

phil@eecs.nwu.edu (William LeFebvre) (03/06/91)

This one problem is preventing us from being a fully functional C news site.
Any immediate help will be much appreciated.

We are running C news on an Encore Multimax running UMAX 4.3 (release 4.0).

Some times, but not all the time, when uuxqt runs rnews on an incoming batch,
the script "spacefor" as invoked by rnews does not write anything to
standard output.  As a result, the check for free space fails and the
batch is (eventually) dropped.  Once spacefor starts doing this, it continues
to happen until I modify rnews or otherwise interfere with the goings on.

I have added some amount of debugging to rnews (thus the modifications)
but I then ususally have to wait an hour or so before the problems recurs.
I have not been saving standard error, so I don't know if there are any
error messages associated with the invocation.

I have not been able to get spacefor to fail by invoking it myself.  I
have not been able to get things to fail by invoking uuxqt by hand.

Has anyone seen this?  Can anyone lend some advice?

		William LeFebvre
		Computing Facilities Manager and Analyst
		Department of Electrical Engineering and Computer Science
		Northwestern University
		<phil@eecs.nwu.edu>

henry@zoo.toronto.edu (Henry Spencer) (03/07/91)

In article <1991Mar6.153527.10070@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes:
>Some times, but not all the time, when uuxqt runs rnews on an incoming batch,
>the script "spacefor" as invoked by rnews does not write anything to
>standard output.  As a result, the check for free space fails and the
>batch is (eventually) dropped.  Once spacefor starts doing this, it continues
>to happen until I modify rnews or otherwise interfere with the goings on.
> ...
>I have not been able to get spacefor to fail by invoking it myself.  I
>have not been able to get things to fail by invoking uuxqt by hand.

I've seen things like this before, although not in the C News context.
Some resource is being exhausted, and spacefor is failing to run.
A strong possibility is that you're exceeding the limit on the number
of processes a given userid can have running.  Exhaustion of swap space
is another idea.  Also of note is that some very old uuxqts were sloppy
about closing file descriptors and eventually ran out.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

phil@eecs.nwu.edu (William LeFebvre) (03/07/91)

In article <1991Mar6.213302.17283@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
|> In article <1991Mar6.153527.10070@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes:
|> >Some times, but not all the time, when uuxqt runs rnews on an incoming batch,
|> >the script "spacefor" as invoked by rnews does not write anything to
|> >standard output....
|> 
|> I've seen things like this before, although not in the C News context.
|> Some resource is being exhausted, and spacefor is failing to run.

This is a possibility on this machine, since it is typically supporting
more users and virtual memory than it should.  However, it was doing
this at a time when the machine was not being stressed.  I have since
added debugging to (attempt to) capture stderr on certain command lines.
This has turned up nothing.  Is there any way to capture stderr for the
entire shell script?

|> A strong possibility is that you're exceeding the limit on the number
|> of processes a given userid can have running.

Not likely.

|> Exhaustion of swap space is another idea.

Much more likely.  I didn't explicitly check for that, but there were
times when I saw this that the machine was not overloaded.

|> Also of note is that some very old uuxqts were sloppy
|> about closing file descriptors and eventually ran out.

A HA!  This is a very plausible explanation.  I will look at that, too.

Someone has suggested via email that I look at the C News extension
package.  This includes (among other things) C replcaements for rnews
and spacefor.  I will try at least the spacefor replacement and see if
that fixes it.  In the meantime I have merely disabled the check.

		William LeFebvre
		Computing Facilities Manager and Analyst
		Department of Electrical Engineering and Computer Science
		Northwestern University
		<phil@eecs.nwu.edu>

marcelo@deadzone.uucp (Marcelo Gallardo) (03/08/91)

In article <1991Mar6.213302.17283@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>
>is another idea.  Also of note is that some very old uuxqts were sloppy
>about closing file descriptors and eventually ran out.

	I would opt for the file descriptors running out. 

	I know this situation all too well. A/UX 2.0 has this problem,
	but thanks to the works of people on the Net (Alexis Rosen comes
	to mind), I have managed to stay one step ahead of this problem
	(most of the time).

	Now for my problem. I'm running c-news and rn on A/UX. Things
	seem to be going smoothly (now), except for a small problem
	which bothers me. Once news has been passed on to rnews and
	uuxqt has done it's job, a mail message is being sent back to my
	feed (for each batch) containing the following...

	uuxqt cmd (rnews ) status (exit 0, signal 0)

	Anyone have any ideas why, and possibly a way to stop it?


-- 
Marcelo Gallardo				...!princeton!deadzone!marcelo
Test and Evaluation Specialist			marcelo@sparcwood.princeton.edu
Princeton University				marcelo@phoenix.princeton.edu
Advanced Technologies and Applications		(609) 258-5661

molenda@s1.msi.umn.edu (Jason Molenda) (03/08/91)

marcelo@deadzone.uucp (Marcelo Gallardo) writes:

>	uuxqt has done it's job, a mail message is being sent back to my
>	feed (for each batch) containing the following...

>	uuxqt cmd (rnews ) status (exit 0, signal 0)

>	Anyone have any ideas why, and possibly a way to stop it?

See newsbatch(8).  Try specifying vizuuxz instead of the normal
'vixuux' in batchparams.

(tried to mail a reply but mail to user@deadzone.uucp didn't get
past our local site that 'Does' UUCP paths).
-- 
More information that you couldn't have existed another day without, from:
Jason Molenda, Tech Support, Iris & News Admin, Minnesota Supercomputer Inst
molenda@s1.msi.umn.edu || molenda%msi.umn.edu@umnacvx.bitnet
"And remember: Evil will always prevail, because Good is dumb." -- Spaceballs

alexis@panix.uucp (Alexis Rosen) (03/11/91)

phil@eecs.nwu.edu (William LeFebvre) writes:
>We are running C news on an Encore Multimax running UMAX 4.3 (release 4.0).
>
>Some times, but not all the time, when uuxqt runs rnews on an incoming batch,
>the script "spacefor" as invoked by rnews does not write anything to
>standard output.  As a result, the check for free space fails and the
>batch is (eventually) dropped.  Once spacefor starts doing this, it continues
>to happen until I modify rnews or otherwise interfere with the goings on.
>
>I have not been able to get spacefor to fail by invoking it myself.  I
>have not been able to get things to fail by invoking uuxqt by hand.
>
>Has anyone seen this?  Can anyone lend some advice?

Henry Spencer suggests, in a followup message, that some old uuxqt's were
sloppy about closing files. This is the case with A/UX, and (after months
of headscratching) when I finally came up with a workaround, Richard Todd
mentioned in a followup posting that MultiMaxes had this problem too.

If your uucp load is low, you can get away with running uuxqt every few minutes
while a uucico is running. This will prevent the buildup of incoming jobs
that leads to uuxqt failing. The better solution is to use the script I'm
including at the end of this message.

A few notes:
1) This script or something like it probably belongs in Cnews' contrib
directory. There are a lot of ancient doddering UUCPs out there!
2) Kudos to Henry and Geoff. We had this problem for months before we even
realized it. Mail and news was vanishing without a trace, and we never knew.
Then we installed Cnews, and started getting error messages. Their paranoid
programming helped us tremendously.
3) You're also losing mail, just like we did.
4) This problem is fixed in A/UX 2.0.1. (so you can throw out the multimax
and get a Mac IIfx :-))

To use the uuxqt.wrap script, rename your uuxqt to uuxqt.real. Then install
this uuxqt.wrap in /usr/lib/uucp (or the corresponding directory on your
system) with the same protections as uuxqt.real. You may need to revise the
script a bit if you have an old or broken Bourne shell. In particular you may
need to replace the square brackets in the ifs with "test"s. You should check
to make sure that all file and directory names are appropriate for your system.

The busier your system, the fewer jobs each invocation of uuxqt should see.
I run a fairly busy site with as many as three uucicos going on at once, so
I allow seven X files at a time. If you never have more than one connection
at a time ten is safe. Basically, the problem is that ten is always safe, but
if you have several simultaneous connections, you can accumulate a bunch of
new X files while a uuxqt is running, bringing it over the ten job limit. Once
you start hitting the Cnews errors, you're hosed. The uuxqt won't go away
until it chews up fifteen more jobs or so, at which point it dies entirely.

Anyway, use and enjoy.
---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
{cmcl2,apple}!panix!alexis
For brain dead mailers which can't send to uucp sites: rosen@nyu.edu

---------------------------->% cut here %<--------------------------------
#!/bin/sh
# uuxqt.wrap - V1.0 written by Alexis Rosen 9-30-90
# This bourne shell script is a wrapper for uuxqt which will prevent it from
# crapping out in the middle of a long run, almost certainly losing a file
# in the process. Rename the original uuxqt to uuxqt.real and change this
# file's name to uuxqt. This should be ownned/group by uucp, mode 770.
# Bugs or comments to panix!alexis or alexis@panix.uucp or rosen@nyu.edu

# modified 12/11/90 by Alexis to do only seven files per XQT

cd /usr/spool/uucp
if [ ! -f X.* ] ; then exit 0 ; fi		# nothing to do

HIDEDIR=/usr/spool/uucp/hidden-x-files		# stick excess X files here

if [ ! -d $HIDEDIR ] ; then mkdir $HIDEDIR ; chmod 770 $HIDEDIR ; fi

# check for a LCK.WXQT file. If it exists, see if it's stale or not.
# There is a very small window of time in which this locking system could
# fail. So wait ten seconds (probably way conservative) and inspect the lock.
if [ -f LCK.WXQT ] ; then
	kill -0 `cat LCK.WXQT` 2>/dev/null
	if [ $? != 0 ] ; then		# stale lock
		rm -f LCK.WXQT
	else
		exit 0
	fi
fi
trap 'rm -f LCK.WXQT /tmp/xw$$' 0 1 2 15
echo "$$" >LCK.WXQT			# make the lock

# Check the lock to make sure we kept it. If not let the other guy do the work.
sleep 10
if [ $$ != `cat LCK.WXQT` ] ; then trap 0 1 2 15 ; exit 0 ; fi

# Now move all the X. files into the hidden directory and then move 10 back
# out. When there aren't many X. files this won't matter but when there are
# hundreds, it's much more efficient than moving all but 10. Put the mv inside
# the loop to pick up any new X. files that might have just arrived.
while : ; do
	mv -f X.* $HIDEDIR 2>/dev/null	# all X. files into the hole
	ls $HIDEDIR >/tmp/xw$$		# make a list
	for i in `head -7 /tmp/xw$$` ; do	# pull out first bunch
		mv $HIDEDIR/$i $i
	done
	if [ ! -f X.* ] ; then exit 0 ; fi		# normal exit here

	/usr/lib/uucp/uuxqt.real $*	# fire up the real uuxqt
	XEXIT=$?
	if [ $XEXIT != 0 ] ; then exit $XEXIT ; fi
done

rees@pisa.citi.umich.edu (Jim Rees) (03/11/91)

In article <1991Mar10.204709.10460@panix.uucp>, alexis@panix.uucp (Alexis Rosen) writes:

  Henry Spencer suggests, in a followup message, that some old uuxqt's were
  sloppy about closing files. This is the case with A/UX, and (after months
  of headscratching) when I finally came up with a workaround, Richard Todd
  mentioned in a followup posting that MultiMaxes had this problem too.

This is an extreme example of vendors not fixing known bugs.  I dug into my
extensive uucp archives (Prof. Honeyman's office is next to mine) and found
that this was fixed nearly ten years ago, at least in the Berkeley lineage.
Here's the earliest version of this fix that I could find (who was aef?)

diff -c -r old/uuxqt.c /usr/src/cmd/uucp/uuxqt.c
*** old/uuxqt.c	Wed Mar  9 07:54:26 1983
--- /usr/src/cmd/uucp/uuxqt.c	Wed Feb 23 11:00:02 1983
***************
*** 202,207
  			sscanf(&buf[1], "%s", file);
  			unlink(file);
  		}
  		unlink(xfile);
  	}
  

--- 217,224 -----
  			sscanf(&buf[1], "%s", file);
  			unlink(file);
  		}
+ 		/* fix the hanging open fd! (dpk.bmd70@BRL 10-26-81) --aef */
+ 		fclose(xfp);
  		unlink(xfile);
  	}
  

henry@zoo.toronto.edu (Henry Spencer) (03/12/91)

In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes:
>Henry Spencer suggests, in a followup message, that some old uuxqt's were
>sloppy about closing files. This is the case with A/UX, and (after months
>of headscratching) when I finally came up with a workaround, Richard Todd
>mentioned in a followup posting that MultiMaxes had this problem too.

My stars.  I'd thought that bug had been eradicated long ago; I mentioned
it just on the off chance.

>1) This script or something like it probably belongs in Cnews' contrib
>directory. There are a lot of ancient doddering UUCPs out there!

I'll take a look at it and consider this.  At the very least we need to
document the problem.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

phil@eecs.nwu.edu (William LeFebvre) (03/12/91)

In article <1991Mar11.180913.29468@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
|> In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes:
|> >Henry Spencer suggests, in a followup message, that some old uuxqt's were
|> >sloppy about closing files. This is the case with A/UX, and (after months
|> >of headscratching) when I finally came up with a workaround, Richard Todd
|> >mentioned in a followup posting that MultiMaxes had this problem too.
|> 
|> My stars.  I'd thought that bug had been eradicated long ago; I mentioned
|> it just on the off chance.

Well, just to make matters more confusing, my predecessor installed
and used a different uucp than the one Encore distributes.  I am not
positive, but I think that it came from the 4.3 BSD source tapes.  I
never had the courage to switch back to Encore's version, because I
didn't (and still don't) know what subtle changes he made to make it
all work, and I didn't want to risk breaking everything.  I'll look
at what I believe are the sources for what we are running and see if
I can find the problem there.  Thanks for the tips.

		William LeFebvre
		Computing Facilities Manager and Analyst
		Department of Electrical Engineering and Computer Science
		Northwestern University
		<phil@eecs.nwu.edu>

rmtodd@chinet.chi.il.us (Richard Todd) (03/12/91)

In article <1991Mar11.190920.3655@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes:
>In article <1991Mar11.180913.29468@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
>|> In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes:
>|> >Henry Spencer suggests, in a followup message, that some old uuxqt's were
>|> >sloppy about closing files. This is the case with A/UX, and (after months
>|> >of headscratching) when I finally came up with a workaround, Richard Todd
>|> >mentioned in a followup posting that MultiMaxes had this problem too.

I did?  I don't recall saying any such thing.  I did mention that I had heard
of other systems that had the infamous uuxqt bug, but I don't really know if 
Encore is one of them, for reasons explained below:

>|> My stars.  I'd thought that bug had been eradicated long ago; I mentioned
>|> it just on the off chance.
Alas, at least on one moderately recent system (A/UX 2.0, SVR2), it's still
alive and thriving.  Thanks to Alexis and I yelling about it, I think it 
won't be in 2.0.1, though...

>Well, just to make matters more confusing, my predecessor installed
>and used a different uucp than the one Encore distributes.  I am not
>positive, but I think that it came from the 4.3 BSD source tapes.  I
>never had the courage to switch back to Encore's version, because I
>didn't (and still don't) know what subtle changes he made to make it
>all work, and I didn't want to risk breaking everything.  I'll look
>at what I believe are the sources for what we are running and see if
>I can find the problem there.  Thanks for the tips.

The plot thickens.  You see, the Multimax I know the most about, uokmax.ecn.
uoknor.edu, doesn't run Encore UUCP either.  It runs a UUCP off of the 
4.3 BSD tapes.  And, interestingly enough, I don't think it has the "uuxqt bug";
I'm pretty sure I've seen its uuxqt go thru 20-30 jobs at a time without 
problem, and I don't recall that Encore has an unusually large # of 
file descriptors per process.  Interesting, no?  

Now, I'm not certain that it was indeed a 4.3BSD tape that it came off of;
it might be a Mt Xinu 4.3/NFS tape instead, and it's entirely possible that
someone years ago hacked on this code at OU ECN and stomped on this bug.  And,
alas, I can't get anyone to check on it right now because uokmax is down for
disk repartitioning over Spring Break....

BTW, if I remember correctly, you *don't* really want to go back to the 
Encore UUCP.  It had all sorts of reliability problems last time they tried it
at ECN, notably uucico hanging in midtransfer for no obvious reason.  Also,
they found it startling that a UUCP shipped on a system purporting to be 
BSD 4.3-tahoe didn't seem to know about the D., X., C., etc subdirectories of
/usr/spool/uucp....
--
Richard Todd   rmtodd@chinet.chi.il.us   rmtodd@servalan.UUCP