david@ms.uky.edu (David Herron -- One of the vertebrae) (08/09/88)
As Matt pointed out before I may lose big time by installing a patch like the one I'd made. He was concerned that I'd have a lot of articles arriving from many different directions at the same time and that since they weren't being inserted immediately upon arrival that I'd have 2 or 3 copies arrive and have 1 or 2 junked almost immediately. So I decided to give a look-see and the early results are discouraging. I used to have my background scripts happen every 15 minutes. I later moved it up to 10 minutes and later to 5. Each time because I was seeing some duplicate articles in SPOOLDIR/.rnews. (I have one script for unbatching news and another for batching, with locking to ensure that only one is running at a time, and the two scripts are offset from each other by a couple of minutes). To examine the logs I ran the following script: awk <log '$5 == "Duplicate" {print $7}' | sort -u which gives me a list of message-id's which are duplicated. A different version has just "print" and "sort +6 -7" instead, which gives me all the lines about duplicate articles from the log file sorted by message id. I haven't taken the output of the above command to it's logical conclusion yet: while read id; do grep $id log done to give me all the lines talking about the message id's which had been duplicated. To make the rest of this story short ... I saw what seems like a lot of articles arrive here >2 times (A total of 515 message id's were duplicated, with 653 total duplicates, on the traffic from about 4 am til around midnight; I don't know off hand how many total articles arrived in that time). Of the 653 number 188 were from psuvm.bitnet, a neighbor known to give us lots of duplicates and which I'll have to deal with sometime Real Soon Now. I don't know how many of the duplicates *only* came through psuvm. A quick reading of the log file showed that we were often getting two copies of the same article within 2-3 minutes of each other and less often getting 3 and even less often getting 4. That is, when we had duplicates that is what was happening. Since I don't know what the total traffic for that time period was, I don't know how significant those numbers are. -- A look at some slightly old reports (June) shows ~25000 per week, or ~3500 articles per day. If that's accurate then we're having 10-20 a percent rejection rate. Which maybe isn't so bad after all. How does this stack up against other peoples experiences? -- <---- David Herron -- The E-Mail guy <david@ms.uky.edu> <---- ska: David le casse\*' {rutgers,uunet}!ukma!david, david@UKMA.BITNET <---- <---- Looking forward to a particularly blatant, talkative and period bikini ...