reid@glacier.ARPA (Brian Reid) (03/07/86)
I upgraded to 2.10.3 about a month ago, and except for the "expire" bug that has already been reported here, I am quite happy with it. However, about a week ago I made a stupid mistake that caused all incoming news from all of our feeds to vanish into a pit. It was 3 days before I was able to figure out what I had done (forgotten to turn on the setuid bit on a piece of home-grown distribution software), so we lost 3 days' worth of news. Once I fixed the bug, I got one of our feeds (decwrl) to retransmit the 3 days' worth of news. This was 155 batches of compressed news, and thus 155 separate calls to rnews via uuxqt. This was 3 days ago. What happened then was both horrible and fascinating. Our machine has had load factors of 10 to 12 around the clock for the last 3 days, and as I watch the queues and analyze things, it looks as though it is not likely to quiesce soon. Even as I type this message, it is about midnight, I am the only user on the machine, and the load factor is 9.95. The increased rate of news-flow, a pulse of 3 days' worth of news shipped in one 8-hour burst, gave our system an impulse that is not damping out. Glacier is now loaded down badly enough that it is unable to process news as fast as the ordinary news is coming in, and the situation is actually getting worse rather than better. In 2 days the number of unprocessed uucp "X." files has grown from 155 to 220. This is an elementary queueing problem, of course. The rate at which 2.10.3 rnews (in conjunction with compress 4.0, uuxqt, /bin/sh, and all of the other accessories) seems to be able to process incoming messages on our VAX-750 is just barely fast enough to keep up with the rate at which ordinary messages are arriving. Giving it a sudden burst of 1000 articles has saturated things, and increased some cost (perhaps the history file search?) enough that the uucico|uuxqt|sh|compress|rnews pipeline is not able to process messages as fast as they are arriving. What I am going to do to fix this problem is to run news at nice -16 all weekend; that should give it enough of an increased speed that I can drain the queues. -- Brian Reid decwrl!glacier!reid Stanford reid@SU-Glacier.ARPA
reid@glacier.ARPA (Brian Reid) (03/08/86)
Well, I've managed to get my netnews under control, after several days of wrestling with it. I'm worried, though, that this is just the first of many hiccups like this, and that perhaps a Vax 750 is not a big enough computer to be a backbone site running this news software. The problem is that we have 2 primary feeds; each potentially feeds us about 500 articles a day. In the steady state, when we make regular contact with both feeds, we end up with about 150 from oliveb and about 350 from decwrl. The compress/inews pipeline is able to handle about 3 articles a minute, so our normal load is about 3 hours per day (out of 24) devoted to incoming news. When we were out of action for a couple of days, the steady-state rhythym was broken, and both oliveb and decwrl queued all of the messages for us. They arrived and sat in UUXQT queues, and while they sat there there was no record that we had the article, and we didn't send it out to the other feed, and so on: a vicious cycle. The net result was that we were getting a torrent of 1000 articles per day, with many duplicates, from our two feeds. Since we had a backlog of 2500 articles waiting to be processed, we never caught up; glacier was not able to process 1000 articles per day, but only by catching up, by getting all 3500 articles processed, would the input flow be reduced back down to 500 per day. What I finally did to fix it was to hotwire inews/rnews to run at nice -18, and then to replace /bin/csh with a program that said "sorry, no logins permitted at this time". This kept users off the machine while letting uucico through. With no other users on the machine, but with uucp running full bore, I was able to get my backlog of 3500 messages processed in about 20 hours. Then I reset everything back to normal; we've been running this way for 3 hours now and the UUXQT queues are normal. I don't yet have a solution to this problem, but I predict that we are not the last site that it will happen to. If I had not been able to kick all of my users off the machine for a day, then I really don't know how we could have recovered from this, other than by asking our feeds to remove net.religion and net.politics and that sort of thing, in order to decrease the arrival traffic. -- Brian Reid decwrl!glacier!reid Stanford reid@SU-Glacier.ARPA
jon@msunix.UUCP (Otto) (03/08/86)
In article <5044@glacier.ARPA>, reid@glacier.ARPA (Brian Reid) writes: > The increased rate of news-flow, a pulse of 3 days' worth of news shipped in > one 8-hour burst, gave our system an impulse that is not damping out. > Glacier is now loaded down badly enough that it is unable to process news as > fast as the ordinary news is coming in, and the situation is actually > getting worse rather than better. In 2 days the number of unprocessed uucp > "X." files has grown from 155 to 220. > Ouch! I only keep two weeks worth of news on our machine, which is a Sun-2/120. It can do a uuxqt -> sh -c compress -d | rnews on a 50K batch in about three and a half minutes (unloaded). Guess I don't have anything to complain about. Once I saw someone type "la" (used to be a link to w, like uptime) three times on the console of a VAX-11/750 that was having major problems. It looked something like: root% la load average: 27.48, 20.68, 15.34 root% la load average: 54.23, 41.94, 33.74 root% la load average: 81.23, 74.23, 64.23 panic: (some suicide note from the OS, in hex) (crash, die, incinerate, more hex) All I wanted was a Pepsi! Jonathan Hue And she wouldn't get it for me! LMSC-Mechanisms & Servos (c) Suicidal Tendencies, 1984 {amdcad!cae780,sun!sunncal}!leadsv!msunix!jon
phil@amdcad.UUCP (Phil Ngai) (03/08/86)
Brian, your analysis sounds quite intriguing at first but I can't seem to make the numbers really work out. To start with, you say a normal day's load of news takes 3 hours to process. While you are down or trying to catch up, your feeds send you everything and since you have two feeds, I would expect your load of news to double from 500 to 1000 articles per day. This is only 6 hours of news in the worse case, so you should be able to make considerable progress in the other 18 hours catching up. If you were down for a week, you'd have to spend 7 * 6 or 42 hours to catch up. During that time, you'd get shipped another 12 hours of news and while processing that 12 hours you get 3 hours, etc. Although the spike takes a while to dampen out, it should indeed do so. There is another factor. Looking through my logs, I see that my 780 processes 5 articles/min but rejects 19 articles/min. While you are catching up you should be rejecting 50% of your articles and if a 750 behaves similarly, a day's worth of news ought to take 3 (normal processing) + 3/4 (reject four times as fast) hours. I haven't considered modem time here, it too is subject to a latch-up effect under the right conditions. We have four dialers, so we probably don't have to worry. -- "We must welcome the future, remembering that soon it will become the present, and respect the past, knowing that once it was all that was humanly possible." Phil Ngai +1 408 749 5720 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.dec.com
reid@glacier.ARPA (Brian Reid) (03/09/86)
Grumble. Don't tell me that this didn't happen. It did. It is absolutely true that the time to process one news article increased substantially while the torrent was arriving. Or, more precisely, the number of articles per minute that the machine could process was decreasing. In particular, the time to process one message became longer than the expected interval between arrivals of messages, with the result that the queues were increasing in length. The particular queue that was increasing in length was the /usr/spool/uucp/X. queue. Although I spent my energy on trying to fix this problem rather than trying to diagnose it, I did make a few measurements and form a few theories. I believe that the extra time to process articles was caused by some combination of these two factors: (1) The huge /usr/spool/uucp/X. directory (200+ entries at the beginning; 900+ entries before I took drastic action. I believe that this slowed uuxqt down considerably. (2) The huge history file. At the moment my /usr/lib/news/history file is 1.2 megabytes, and has 17000 messages in it. This is about double what it usually runs. I believe that inews is running a bit slower when the history file gets this big (it still has to search to resolve hash conflicts). Unfortunately, I forgot to turn on system accounting before all of this, so I don't have any hard data about how well it was working. -- Brian Reid decwrl!glacier!reid Stanford reid@SU-Glacier.ARPA
mc68020@gilbbs.UUCP (Tom Keller) (03/10/86)
In article <5128@glacier.ARPA>, reid@glacier.ARPA (Brian Reid) writes: > > Grumble. Don't tell me that this didn't happen. It did. > > It is absolutely true that the time to process one news article increased > substantially while the torrent was arriving. Or, more precisely, the number > of articles per minute that the machine could process was decreasing. In > particular, the time to process one message became longer than the expected > interval between arrivals of messages, with the result that the queues were > increasing in length. The particular queue that was increasing in length was > the /usr/spool/uucp/X. queue. > > Although I spent my energy on trying to fix this problem rather than trying > to diagnose it, I did make a few measurements and form a few theories. > > I believe that the extra time to process articles was caused by some > combination of these two factors: > (1) The huge /usr/spool/uucp/X. directory (200+ entries at the beginning; > 900+ entries before I took drastic action. I believe that this > slowed uuxqt down considerably. > (2) The huge history file. At the moment my /usr/lib/news/history > file is 1.2 megabytes, and has 17000 messages in it. This is about > double what it usually runs. I believe that inews is running a bit > slower when the history file gets this big (it still has to search > to resolve hash conflicts). > > Unfortunately, I forgot to turn on system accounting before all of this, so > I don't have any hard data about how well it was working. > -- > Brian Reid decwrl!glacier!reid > Stanford reid@SU-Glacier.ARPA I also have problems with the UUXQT program bogging down. Now, I have to tell you that I am running a machine which is not in the same UNIVERSE with VAXen. My machine is a Tandy 6000. Part of the problem *I* am having is that when I have 12-20 processes all clamoring for disk I/O, a couple of modems clamoring for I/O, plus me bitching at my console about mollasses in January, my poor 'lil braindamaged Z-80A IOP goes virtually catatonic (though not in the canonical sense). It *DOES* bog down badly (as well you might imagine) I don't receive tons of traffic, and I only receive traffic during the off hours at night. Which means that my system is working *VERY* hard to keep up with what *DOES* come down the pike (I have over 110 newsgroups that me feed site does *NOT* feed me). I understand there is some way to ensure that only one istantiation of UUXQT is executed at any one time. Can someone elaborate on this for me? I am very new to all this (I've never had to consider such things in my 3 years of using XENIX. I have no gurus to point out the True Path to me). Any other suggestions for relieving my system of terminal indigestion? (please, do not crack wise about getting a more powerful system. I would dearly love to do so...but first I need a job to pay rent, etc., *THEN* maybe I can consider getting a new system...maybe) -- ==================================== Disclaimer: I hereby disclaim any and all responsibility for disclaimers. tom keller {ihnp4, dual}!ptsfa!gilbbs!mc68020 (* we may not be big, but we're small! *)
geoff@desint.UUCP (Geoff Kuenning) (03/12/86)
Before I get to my real point, I can't resist defending Phil Ngai: In article <5128@glacier.ARPA>, reid@glacier.ARPA (Brian Reid) writes: > > Grumble. Don't tell me that this didn't happen. It did. > > It is absolutely true that the time to process one news article increased > substantially while the torrent was arriving. Brian, I don't think Phil ever said that your load spike didn't happen. He simply said that the facts you posted in your original article didn't support blaming queing for all of your problems. This second article (5128@glacier) gave more data, explaining that the arrival rate in your queues had increased due to the increased service time; this makes your explanation much more plausible. However, I still think that if you have more than one UUXQT running you have a problem because (a) you're quite possible thrashing, and (b) with the possible exception of Honey Danber (Peter? any comment?) no uucp system should allow more than one UUXQT at a time. Which brings me to my main point: In article <63@gilbbs.UUCP> mc68020@gilbbs.UUCP (Tom Keller) writes: > I understand there is some way to ensure that only one istantiation of > UUXQT is executed at any one time. Can someone elaborate on this for me? > I am very new to all this (I've never had to consider such things in my > 3 years of using XENIX. I have no gurus to point out the True Path to me). Actually, Tom, the rumor reached you in a bit of a confused state. All systems should limit uuxqt's to one-at-a-time. Unfortunately, there's a bug in older systems: the original uucp authors simply never foresaw the day of hours-long uuxqt's, and they wrote uuxqt to ignore the lock file if it was more than 50 minutes old. Later systems either fixed the bug, or switched to a more reliable locking mechanism that doesn't depend on timeouts to break hung locks. Fortunately, the workaround is extremely easy: you simply have to ensure that the uuxqt lock file is always less than an hour old. This can be done with the following crontab entry: 15,45 * * * * * if [ -r /usr/spool/uucp/LCK.XQT ] ; then touch /usr/spool/uucp/LCK.XQT ; fi (note that this has to fit all on one line, unlike the way I typed it). One minor problem: make sure your /etc/rc file has the line rm -f /usr/spool/uucp/LCK.XQT or, once you have had a uuxqt crash, you will never run uuxqt's again! -- Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff
grr@unirot.UUCP (George Robbins) (03/12/86)
In article <5044@glacier.ARPA> reid@glacier.ARPA (Brian Reid) writes: >I upgraded to 2.10.3 about a month ago, and except for the "expire" bug that >has already been reported here, I am quite happy with it. > >What happened then was both horrible and fascinating. Our machine has had >load factors of 10 to 12 around the clock for the last 3 days, and as I >watch the queues and analyze things, it looks as though it is not likely to >quiesce soon. Even as I type this message, it is about midnight, I am the >only user on the machine, and the load factor is 9.95. > >The increased rate of news-flow, a pulse of 3 days' worth of news shipped in >one 8-hour burst, gave our system an impulse that is not damping out. >Glacier is now loaded down badly enough that it is unable to process news as >fast as the ordinary news is coming in, and the situation is actually >getting worse rather than better. In 2 days the number of unprocessed uucp >"X." files has grown from 155 to 220. > >What I am going to do to fix this problem is to run news at nice -16 all >weekend; that should give it enough of an increased speed that I can drain >the queues. >-- > Brian Reid decwrl!glacier!reid > Stanford reid@SU-Glacier.ARPA You don't indicate which program is hogging your system the most, but here are several suggestions. 1) Compress, as distributed with news, is a real core and cpu hog. This is fine if you have the resources, but if not will cause thrashing and general sleepiness. There are options in the program for smaller systems that are easy to set up. 2) History file processing, unless using the BSD style indexed file access, is really ineffieient. Consider kludging the BSD option to use any indexed file stuff you have handy. 3) All the pipeing that goes on with compressed news can't be helping. One of those unix features that's nice to have, but screws you in high-volume situations. 4) If you are running mod.sources vnews, afbuild isn't the fastest thing in town. Check the hashing algorithm - you mean you don't like to do divide instructions for every character? -- George Robbins uucp: ...!ihnp4!tapa!grr P.O. Box 177 ...!caip!unirot!grr Lincoln U, PA 19352 [Any ideas herein are not responsible for themselves!]
msc@saber.UUCP (Mark Callow) (03/13/86)
> I understand there is some way to ensure that only one istantiation of > UUXQT is executed at any one time. Can someone elaborate on this for me? There had better be only one instance of uuxqt or you'll have real problems. uuzqt uses the lock file /usr/spool/uucp/LCK.XQT to ensure only one uuxqt is running. UNNNNFortunately uuxqt ignores the lock file if it is more than a certain age. The default age (in 4.2 BSD's uucp at least) is 60 minutes. This is far too short if you are receiving batched news, especially if you have unbatchnews niced. -- From the TARDIS of Mark Callow msc@saber.uucp, sun!saber!msc@decwrl.dec.com ...{ihnp4,sun}!saber!msc "Boards are long and hard and made of wood"
wedgingt@udenva.UUCP (Will Edgington/Ejeo) (03/13/86)
In article <175@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes: >One minor problem: make sure your /etc/rc file has the line > > rm -f /usr/spool/uucp/LCK.XQT > >or, once you have had a uuxqt crash, you will never run uuxqt's again! While you're at it, you might as well: rm -f /usr/spool/uucp/LCK.* There sure wasn't *ANY* uucp activity while the system was down, so you might as well remove them all while you're there. Terminal lock files (LCK..tty??) are especially critical; removing them at reboot often means uucp knows the ports are available for use that much sooner. Sites with heavy uucp use will find things just a little bit easier, especially if they also go down 6 nights a week for backup (as mine do). -- Will Edgington | Phone: (303) 871-2081 (work), 722-5738 (home) Computing Services Staff | USnail: BA 469, 2020 S. Race, Denver CO 80210 University of Denver | Home: 2035 S. Josephine #102, Denver CO 80210 Electronic Address (UUCP only): {hplabs, seismo}!hao!udenva!wedgingt or {boulder, cires, ucbvax!nbires, cisden}!udenva!wedgingt
dsp@ptsfa.UUCP (David St. Pierre) (03/14/86)
In article <1951@saber.UUCP> msc@saber.UUCP (Mark Callow) writes: > >There had better be only one instance of uuxqt or you'll have real >problems. uuzqt uses the lock file /usr/spool/uucp/LCK.XQT to ensure >only one uuxqt is running. > >UNNNNFortunately uuxqt ignores the lock file if it is more than a >certain age. The default age (in 4.2 BSD's uucp at least) is 60 >minutes. This is far too short if you are receiving batched news, >especially if you have unbatchnews niced. For both of these reasons and others, I have taken an extreme approach. I just checked my (BNU) logfiles and found 1038 executions today. If I were to let cunbatch grab hours of that time during prime time, I'd probably be lynched. I can run multiple UUXQTs but don't think it really helps this problem much. Here's my "cunbatch": ===================== cat > news.$$ I just let files drop into my .XQTDIR/.Xqtdir; UUXQT goes off to find more interesting things to do. I can keep up with incoming traffic without having queueing problems in UUCP or ageing problems with LCK files. AND I can try to do some sensible things with incoming news. Here's something I call "cunbatch.sh" - I'm System V only so it may not work on your system. Then again it may: ================================== trap 'rm -f LCK..news; trap 0; exit' 1 2 3 15 cd /usr/spool/uucp/.Xqtdir if [ -f LCK..news ] then ps -p `cat LCK..news` > /dev/null if [ $? -eq 0 ] then exit 0 fi fi echo $$ > LCK..news while : do for FILE in `ls -tr news.* 2>/dev/null` do /usr/lbin/compress -d < $FILE | /usr/lbin/rnews rm $FILE done FILES=`ls -tr news.* 2>/dev/null` if [ -z "$FILES" ] then rm LCK..news exit 0 fi done I haven't been too thrilled with system performance or rnews article insertion speed when there are multiple batch files being processed concurrently. We run a lot of background daemons on our system which tend to get higher priority than rnews during the daytime. So I've built a shell LCK-file around the unbatching process. A second instance of cunbatch.sh finding the first still running will just go away. If the LCK file is not present or the process is no longer active, it will start up. A single instance of cunbatch.sh will process all files present when it started or new ones which arrived after it started. I drive cunbatch.sh out of crontab hourly during the evening - it was just too SLLOOOWWWW during the day. 'Sides, people are supposed to be working part of the time around here, right? I can process a day's inputs in 1-2 hours so I don't slow down the sites I feed too much. Others who don't have those constraints could probably run cunbatch.sh nohup right out of cunbatch. You'd still gain the effects of not tying up UUCP and being able to single-thread batch file processing. One additional benefit I derived from the LCK-file processing is that I also "taught" expire to honor it: ============================ trap 'rm -f $LOCKFILE; trap 0; exit' 1 2 3 15 LOCKFILE=/usr/spool/uucp/.Xqtdir/LCK..news if [ -f $LOCKFILE ] then ps -p `cat $LOCKFILE` > /dev/null if [ $? -eq 0 ] then echo expire.sh | at now + 30 minutes >>/dev/null exit 0 fi fi echo $$ > $LOCKFILE PATH=/usr/lib/news:$PATH ...shortened for brevity expire -e 10 ... > /usr/lib/news/ohistory rm -f $LOCKFILE I don't have to worry about one process trying to add to the history file while another is trying to destroy it. -- David St. Pierre {ihnp4,dual,qantel,lll-crg}!ptsfa!dsp
boyce@daemen.uucp (DWB) (03/16/86)
In article <1951@saber.UUCP>, msc@saber.UUCP (Mark Callow) writes: > > I understand there is some way to ensure that only one istantiation of > > UUXQT is executed at any one time. Can someone elaborate on this for me? > > UNNNNFortunately uuxqt ignores the lock file if it is more than a > certain age. The default age (in 4.2 BSD's uucp at least) is 60 > minutes. This is far too short if you are receiving batched news, > especially if you have unbatchnews niced. If you have the source I'd suggest you fix it that way like we did. In file "uucp.h" BSD of course. ---------------- change X_LOCKTIME from 3600 to maybe 28800 (8 hours) or there abouts. This would certainly remedy the problem for the next you're bombarded with news. -- Doug Boyce Daemen College, Buffalo NY UUCP : decvax!sunybcs!daemen!boyce or ihnp4!kitty!daemen!boyce ARPA : boyce%buffalo@csnet-relay.arpa "The mind is a terrible thing....and most be stopped in our lifetime."