phil@eecs.nwu.edu (William LeFebvre) (03/06/91)
This one problem is preventing us from being a fully functional C news site. Any immediate help will be much appreciated. We are running C news on an Encore Multimax running UMAX 4.3 (release 4.0). Some times, but not all the time, when uuxqt runs rnews on an incoming batch, the script "spacefor" as invoked by rnews does not write anything to standard output. As a result, the check for free space fails and the batch is (eventually) dropped. Once spacefor starts doing this, it continues to happen until I modify rnews or otherwise interfere with the goings on. I have added some amount of debugging to rnews (thus the modifications) but I then ususally have to wait an hour or so before the problems recurs. I have not been saving standard error, so I don't know if there are any error messages associated with the invocation. I have not been able to get spacefor to fail by invoking it myself. I have not been able to get things to fail by invoking uuxqt by hand. Has anyone seen this? Can anyone lend some advice? William LeFebvre Computing Facilities Manager and Analyst Department of Electrical Engineering and Computer Science Northwestern University <phil@eecs.nwu.edu>
henry@zoo.toronto.edu (Henry Spencer) (03/07/91)
In article <1991Mar6.153527.10070@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes: >Some times, but not all the time, when uuxqt runs rnews on an incoming batch, >the script "spacefor" as invoked by rnews does not write anything to >standard output. As a result, the check for free space fails and the >batch is (eventually) dropped. Once spacefor starts doing this, it continues >to happen until I modify rnews or otherwise interfere with the goings on. > ... >I have not been able to get spacefor to fail by invoking it myself. I >have not been able to get things to fail by invoking uuxqt by hand. I've seen things like this before, although not in the C News context. Some resource is being exhausted, and spacefor is failing to run. A strong possibility is that you're exceeding the limit on the number of processes a given userid can have running. Exhaustion of swap space is another idea. Also of note is that some very old uuxqts were sloppy about closing file descriptors and eventually ran out. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
phil@eecs.nwu.edu (William LeFebvre) (03/07/91)
In article <1991Mar6.213302.17283@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes: |> In article <1991Mar6.153527.10070@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes: |> >Some times, but not all the time, when uuxqt runs rnews on an incoming batch, |> >the script "spacefor" as invoked by rnews does not write anything to |> >standard output.... |> |> I've seen things like this before, although not in the C News context. |> Some resource is being exhausted, and spacefor is failing to run. This is a possibility on this machine, since it is typically supporting more users and virtual memory than it should. However, it was doing this at a time when the machine was not being stressed. I have since added debugging to (attempt to) capture stderr on certain command lines. This has turned up nothing. Is there any way to capture stderr for the entire shell script? |> A strong possibility is that you're exceeding the limit on the number |> of processes a given userid can have running. Not likely. |> Exhaustion of swap space is another idea. Much more likely. I didn't explicitly check for that, but there were times when I saw this that the machine was not overloaded. |> Also of note is that some very old uuxqts were sloppy |> about closing file descriptors and eventually ran out. A HA! This is a very plausible explanation. I will look at that, too. Someone has suggested via email that I look at the C News extension package. This includes (among other things) C replcaements for rnews and spacefor. I will try at least the spacefor replacement and see if that fixes it. In the meantime I have merely disabled the check. William LeFebvre Computing Facilities Manager and Analyst Department of Electrical Engineering and Computer Science Northwestern University <phil@eecs.nwu.edu>
marcelo@deadzone.uucp (Marcelo Gallardo) (03/08/91)
In article <1991Mar6.213302.17283@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > >is another idea. Also of note is that some very old uuxqts were sloppy >about closing file descriptors and eventually ran out. I would opt for the file descriptors running out. I know this situation all too well. A/UX 2.0 has this problem, but thanks to the works of people on the Net (Alexis Rosen comes to mind), I have managed to stay one step ahead of this problem (most of the time). Now for my problem. I'm running c-news and rn on A/UX. Things seem to be going smoothly (now), except for a small problem which bothers me. Once news has been passed on to rnews and uuxqt has done it's job, a mail message is being sent back to my feed (for each batch) containing the following... uuxqt cmd (rnews ) status (exit 0, signal 0) Anyone have any ideas why, and possibly a way to stop it? -- Marcelo Gallardo ...!princeton!deadzone!marcelo Test and Evaluation Specialist marcelo@sparcwood.princeton.edu Princeton University marcelo@phoenix.princeton.edu Advanced Technologies and Applications (609) 258-5661
molenda@s1.msi.umn.edu (Jason Molenda) (03/08/91)
marcelo@deadzone.uucp (Marcelo Gallardo) writes: > uuxqt has done it's job, a mail message is being sent back to my > feed (for each batch) containing the following... > uuxqt cmd (rnews ) status (exit 0, signal 0) > Anyone have any ideas why, and possibly a way to stop it? See newsbatch(8). Try specifying vizuuxz instead of the normal 'vixuux' in batchparams. (tried to mail a reply but mail to user@deadzone.uucp didn't get past our local site that 'Does' UUCP paths). -- More information that you couldn't have existed another day without, from: Jason Molenda, Tech Support, Iris & News Admin, Minnesota Supercomputer Inst molenda@s1.msi.umn.edu || molenda%msi.umn.edu@umnacvx.bitnet "And remember: Evil will always prevail, because Good is dumb." -- Spaceballs
alexis@panix.uucp (Alexis Rosen) (03/11/91)
phil@eecs.nwu.edu (William LeFebvre) writes: >We are running C news on an Encore Multimax running UMAX 4.3 (release 4.0). > >Some times, but not all the time, when uuxqt runs rnews on an incoming batch, >the script "spacefor" as invoked by rnews does not write anything to >standard output. As a result, the check for free space fails and the >batch is (eventually) dropped. Once spacefor starts doing this, it continues >to happen until I modify rnews or otherwise interfere with the goings on. > >I have not been able to get spacefor to fail by invoking it myself. I >have not been able to get things to fail by invoking uuxqt by hand. > >Has anyone seen this? Can anyone lend some advice? Henry Spencer suggests, in a followup message, that some old uuxqt's were sloppy about closing files. This is the case with A/UX, and (after months of headscratching) when I finally came up with a workaround, Richard Todd mentioned in a followup posting that MultiMaxes had this problem too. If your uucp load is low, you can get away with running uuxqt every few minutes while a uucico is running. This will prevent the buildup of incoming jobs that leads to uuxqt failing. The better solution is to use the script I'm including at the end of this message. A few notes: 1) This script or something like it probably belongs in Cnews' contrib directory. There are a lot of ancient doddering UUCPs out there! 2) Kudos to Henry and Geoff. We had this problem for months before we even realized it. Mail and news was vanishing without a trace, and we never knew. Then we installed Cnews, and started getting error messages. Their paranoid programming helped us tremendously. 3) You're also losing mail, just like we did. 4) This problem is fixed in A/UX 2.0.1. (so you can throw out the multimax and get a Mac IIfx :-)) To use the uuxqt.wrap script, rename your uuxqt to uuxqt.real. Then install this uuxqt.wrap in /usr/lib/uucp (or the corresponding directory on your system) with the same protections as uuxqt.real. You may need to revise the script a bit if you have an old or broken Bourne shell. In particular you may need to replace the square brackets in the ifs with "test"s. You should check to make sure that all file and directory names are appropriate for your system. The busier your system, the fewer jobs each invocation of uuxqt should see. I run a fairly busy site with as many as three uucicos going on at once, so I allow seven X files at a time. If you never have more than one connection at a time ten is safe. Basically, the problem is that ten is always safe, but if you have several simultaneous connections, you can accumulate a bunch of new X files while a uuxqt is running, bringing it over the ten job limit. Once you start hitting the Cnews errors, you're hosed. The uuxqt won't go away until it chews up fifteen more jobs or so, at which point it dies entirely. Anyway, use and enjoy. --- Alexis Rosen Owner/Sysadmin, PANIX Public Access Unix, NY {cmcl2,apple}!panix!alexis For brain dead mailers which can't send to uucp sites: rosen@nyu.edu ---------------------------->% cut here %<-------------------------------- #!/bin/sh # uuxqt.wrap - V1.0 written by Alexis Rosen 9-30-90 # This bourne shell script is a wrapper for uuxqt which will prevent it from # crapping out in the middle of a long run, almost certainly losing a file # in the process. Rename the original uuxqt to uuxqt.real and change this # file's name to uuxqt. This should be ownned/group by uucp, mode 770. # Bugs or comments to panix!alexis or alexis@panix.uucp or rosen@nyu.edu # modified 12/11/90 by Alexis to do only seven files per XQT cd /usr/spool/uucp if [ ! -f X.* ] ; then exit 0 ; fi # nothing to do HIDEDIR=/usr/spool/uucp/hidden-x-files # stick excess X files here if [ ! -d $HIDEDIR ] ; then mkdir $HIDEDIR ; chmod 770 $HIDEDIR ; fi # check for a LCK.WXQT file. If it exists, see if it's stale or not. # There is a very small window of time in which this locking system could # fail. So wait ten seconds (probably way conservative) and inspect the lock. if [ -f LCK.WXQT ] ; then kill -0 `cat LCK.WXQT` 2>/dev/null if [ $? != 0 ] ; then # stale lock rm -f LCK.WXQT else exit 0 fi fi trap 'rm -f LCK.WXQT /tmp/xw$$' 0 1 2 15 echo "$$" >LCK.WXQT # make the lock # Check the lock to make sure we kept it. If not let the other guy do the work. sleep 10 if [ $$ != `cat LCK.WXQT` ] ; then trap 0 1 2 15 ; exit 0 ; fi # Now move all the X. files into the hidden directory and then move 10 back # out. When there aren't many X. files this won't matter but when there are # hundreds, it's much more efficient than moving all but 10. Put the mv inside # the loop to pick up any new X. files that might have just arrived. while : ; do mv -f X.* $HIDEDIR 2>/dev/null # all X. files into the hole ls $HIDEDIR >/tmp/xw$$ # make a list for i in `head -7 /tmp/xw$$` ; do # pull out first bunch mv $HIDEDIR/$i $i done if [ ! -f X.* ] ; then exit 0 ; fi # normal exit here /usr/lib/uucp/uuxqt.real $* # fire up the real uuxqt XEXIT=$? if [ $XEXIT != 0 ] ; then exit $XEXIT ; fi done
rees@pisa.citi.umich.edu (Jim Rees) (03/11/91)
In article <1991Mar10.204709.10460@panix.uucp>, alexis@panix.uucp (Alexis Rosen) writes: Henry Spencer suggests, in a followup message, that some old uuxqt's were sloppy about closing files. This is the case with A/UX, and (after months of headscratching) when I finally came up with a workaround, Richard Todd mentioned in a followup posting that MultiMaxes had this problem too. This is an extreme example of vendors not fixing known bugs. I dug into my extensive uucp archives (Prof. Honeyman's office is next to mine) and found that this was fixed nearly ten years ago, at least in the Berkeley lineage. Here's the earliest version of this fix that I could find (who was aef?) diff -c -r old/uuxqt.c /usr/src/cmd/uucp/uuxqt.c *** old/uuxqt.c Wed Mar 9 07:54:26 1983 --- /usr/src/cmd/uucp/uuxqt.c Wed Feb 23 11:00:02 1983 *************** *** 202,207 sscanf(&buf[1], "%s", file); unlink(file); } unlink(xfile); } --- 217,224 ----- sscanf(&buf[1], "%s", file); unlink(file); } + /* fix the hanging open fd! (dpk.bmd70@BRL 10-26-81) --aef */ + fclose(xfp); unlink(xfile); }
henry@zoo.toronto.edu (Henry Spencer) (03/12/91)
In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes: >Henry Spencer suggests, in a followup message, that some old uuxqt's were >sloppy about closing files. This is the case with A/UX, and (after months >of headscratching) when I finally came up with a workaround, Richard Todd >mentioned in a followup posting that MultiMaxes had this problem too. My stars. I'd thought that bug had been eradicated long ago; I mentioned it just on the off chance. >1) This script or something like it probably belongs in Cnews' contrib >directory. There are a lot of ancient doddering UUCPs out there! I'll take a look at it and consider this. At the very least we need to document the problem. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
phil@eecs.nwu.edu (William LeFebvre) (03/12/91)
In article <1991Mar11.180913.29468@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes: |> In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes: |> >Henry Spencer suggests, in a followup message, that some old uuxqt's were |> >sloppy about closing files. This is the case with A/UX, and (after months |> >of headscratching) when I finally came up with a workaround, Richard Todd |> >mentioned in a followup posting that MultiMaxes had this problem too. |> |> My stars. I'd thought that bug had been eradicated long ago; I mentioned |> it just on the off chance. Well, just to make matters more confusing, my predecessor installed and used a different uucp than the one Encore distributes. I am not positive, but I think that it came from the 4.3 BSD source tapes. I never had the courage to switch back to Encore's version, because I didn't (and still don't) know what subtle changes he made to make it all work, and I didn't want to risk breaking everything. I'll look at what I believe are the sources for what we are running and see if I can find the problem there. Thanks for the tips. William LeFebvre Computing Facilities Manager and Analyst Department of Electrical Engineering and Computer Science Northwestern University <phil@eecs.nwu.edu>
rmtodd@chinet.chi.il.us (Richard Todd) (03/12/91)
In article <1991Mar11.190920.3655@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes: >In article <1991Mar11.180913.29468@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes: >|> In article <1991Mar10.204709.10460@panix.uucp> alexis@panix.uucp (Alexis Rosen) writes: >|> >Henry Spencer suggests, in a followup message, that some old uuxqt's were >|> >sloppy about closing files. This is the case with A/UX, and (after months >|> >of headscratching) when I finally came up with a workaround, Richard Todd >|> >mentioned in a followup posting that MultiMaxes had this problem too. I did? I don't recall saying any such thing. I did mention that I had heard of other systems that had the infamous uuxqt bug, but I don't really know if Encore is one of them, for reasons explained below: >|> My stars. I'd thought that bug had been eradicated long ago; I mentioned >|> it just on the off chance. Alas, at least on one moderately recent system (A/UX 2.0, SVR2), it's still alive and thriving. Thanks to Alexis and I yelling about it, I think it won't be in 2.0.1, though... >Well, just to make matters more confusing, my predecessor installed >and used a different uucp than the one Encore distributes. I am not >positive, but I think that it came from the 4.3 BSD source tapes. I >never had the courage to switch back to Encore's version, because I >didn't (and still don't) know what subtle changes he made to make it >all work, and I didn't want to risk breaking everything. I'll look >at what I believe are the sources for what we are running and see if >I can find the problem there. Thanks for the tips. The plot thickens. You see, the Multimax I know the most about, uokmax.ecn. uoknor.edu, doesn't run Encore UUCP either. It runs a UUCP off of the 4.3 BSD tapes. And, interestingly enough, I don't think it has the "uuxqt bug"; I'm pretty sure I've seen its uuxqt go thru 20-30 jobs at a time without problem, and I don't recall that Encore has an unusually large # of file descriptors per process. Interesting, no? Now, I'm not certain that it was indeed a 4.3BSD tape that it came off of; it might be a Mt Xinu 4.3/NFS tape instead, and it's entirely possible that someone years ago hacked on this code at OU ECN and stomped on this bug. And, alas, I can't get anyone to check on it right now because uokmax is down for disk repartitioning over Spring Break.... BTW, if I remember correctly, you *don't* really want to go back to the Encore UUCP. It had all sorts of reliability problems last time they tried it at ECN, notably uucico hanging in midtransfer for no obvious reason. Also, they found it startling that a UUCP shipped on a system purporting to be BSD 4.3-tahoe didn't seem to know about the D., X., C., etc subdirectories of /usr/spool/uucp.... -- Richard Todd rmtodd@chinet.chi.il.us rmtodd@servalan.UUCP