ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/22/91)
I have found that having specific subset feeds is an interesting situation. The sites that I feed tend to want specific groups and not others in non-simple patterns. I figured that it would not be a problem to have multiple entries in the sys file for a system that dealt with the appropriate subsets of the namespace so that it would not hit the "line length" limits that occasionally bite on very long sys entries. This generates duplicated entries in the togo file for certain crosspostings (e.g. alt.* x sci.* will duplicate if the alt groups are in a different sys entry from the sci groups.) This, I thought, would not be a terribly situation, the second copy will be dropped on the floor by the receiver. This is, indeed, what happens, no site reports duplicate items in the spool. It turns out though, that between some groups that one of the sites is getting that crossposting can account for almost 25% of the articles in the togo file! That makes it very inefficient to use with uucp batching. I did find a solution. In the $NEWSCTL/batch/batchsplit script, adding a "sort -u" before running the togo file through the splitting awk script removes the duplicates. (This is because the togo file name is the first group in the newsgroups header in all cases for crossposted articles.) I did it with these lines in batchsplit: --------- # main processing + # wolves special! + # sort the input to remove duplicates + tmp=/tmp/bsort$$ + cat $input | sort -u >$tmp + rm $input ; mv $tmp $input + # + # now, back to the regularly scheduled program + # rm -f togo.overflow togo.count awk 'BEGIN { total = 0 ; ninbatch = 0 ; bno = 1 ; limit = '$1' --------- The cost of sorting should be weighed carefully against the number of crossposts duplicated in your togo files by the sys files that you are using. If your sys files are small and simple, and don't use multiple entries for any system, then you won't need to sort at all. -- Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC UUCP: ...dukcds!wolves!ggw ...mcnc!wolves!ggw [use the maps!] Domain: ggw@cds.duke.edu ggw%wolves@mcnc.mcnc.org [The line eater is a boojum snark! ] <standard disclaimers apply>
henry@zoo.toronto.edu (Henry Spencer) (02/23/91)
In article <1991Feb22.034319.9805@wolves.uucp> ggw%wolves@cs.duke.edu writes: >I figured that it would not be a problem to have multiple entries in the >sys file for a system that dealt with the appropriate subsets of the >namespace so that it would not hit the "line length" limits that >occasionally bite on very long sys entries. Uh, *what* "line length" limits? There are, by intent, none of any kind in C News. We've got no shortage of customers using truly ridiculous line lengths for very selective feeds. -- "Read the OSI protocol specifications? | Henry Spencer @ U of Toronto Zoology I can't even *lift* them!" | henry@zoo.toronto.edu utzoo!henry
ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/23/91)
Sorry, I should have pointed out that I am running on a 80386 Unix platform and using the vendor supplied stdio rather than the Cnews stdio versions. It is not a bug in Cnews, its a bug in the choices that I made in setting up this system! (Is an inherent limitation a "bug"?) -- Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC UUCP: ...dukcds!wolves!ggw ...mcnc!wolves!ggw [use the maps!] Domain: ggw@cds.duke.edu ggw%wolves@mcnc.mcnc.org [The line eater is a boojum snark! ] <standard disclaimers apply>
brad@looking.on.ca (Brad Templeton) (02/23/91)
If your sys file is that complex, you should really be feeding with dynafeed. It's in uunet:~/ClariNet/dynafeed.tar.Z This gives newsgroup by newsgroup control of the feeding, remote control, and best of all, automatic remote control based on arbitron-like output from the recipient site. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
scs@lokkur.dexter.mi.us (Steve Simmons) (02/24/91)
In article <1991Feb22.034319.9805@wolves.uucp> ggw%wolves@cs.duke.edu writes: >I figured that it would not be a problem to have multiple entries in the >sys file for a system that dealt with the appropriate subsets of the >namespace so that it would not hit the "line length" limits that >occasionally bite on very long sys entries. henry@zoo.toronto.edu (Henry Spencer) writes: >Uh, *what* "line length" limits? There are, by intent, none of any kind >in C News. We've got no shortage of customers using truly ridiculous line >lengths for very selective feeds. How about the human reader ones? My sysfiles were horrible (yes, I know about dynafeed, Brad!) until I split the entries across multiple lines (ie, multiple sysfile entries for a single site). 1K-long lines just don't make it. -- "Perl is the BASIC of UNIX." -- Tom Christiansen
henry@zoo.toronto.edu (Henry Spencer) (02/24/91)
In article <1991Feb23.221103.13320@lokkur.dexter.mi.us> scs@lokkur.dexter.mi.us (Steve Simmons) writes: >>Uh, *what* "line length" limits? ... > >How about the human reader ones? My sysfiles were horrible (yes, I >know about dynafeed, Brad!) until I split the entries across multiple >lines (ie, multiple sysfile entries for a single site). 1K-long lines >just don't make it. I don't see any advantage to using multiple sysfile entries over just using backslash-newline to split a single entry. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/24/91)
<1991Feb23.064618.6167@looking.on.ca> brad@looking.on.ca (Brad Templeton): >If your sys file is that complex, you should really be feeding with >dynafeed. It's in uunet:~/ClariNet/dynafeed.tar.Z > >This gives newsgroup by newsgroup control of the feeding, remote control, >and best of all, automatic remote control based on arbitron-like output >from the recipient site. Brad, I appreciate your eagerness to promote your software, but not all the world is on UNIX systems or on systems where they can simply plug in some package and hope it will "go", feeding gateways to other networks and systems other than fairly vanilla B- or C- news derived systems is not helped by dynafeed. If you can convince Waffle and FSUUCP and all the DOS-based newsreaders and news consumers to support the dynafeed messages, then I will look at it again. You're a nice guy (at least you seem that way, even in the flesh) but just a bit to flip on the delivery of your solution "pronouncements". You might qualify as a real net.god, but don't forget the net.feet.of.clay. -- Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC UUCP: ...dukcds!wolves!ggw ...mcnc!wolves!ggw [use the maps!] Domain: ggw@cds.duke.edu ggw%wolves@mcnc.mcnc.org [The line eater is a boojum snark! ] <standard disclaimers apply>
brad@looking.on.ca (Brad Templeton) (02/25/91)
In article <1991Feb24.023410.5490@wolves.uucp> ggw%wolves@cs.duke.edu (Gregory G. Woodbury) writes: >Brad, > I appreciate your eagerness to promote your software, but not >all the world is on UNIX systems or on systems where they can simply >plug in some package and hope it will "go", feeding gateways to other >networks and systems other than fairly vanilla B- or C- news derived >systems is not helped by dynafeed. > >If you can convince Waffle and FSUUCP and all the DOS-based newsreaders >and news consumers to support the dynafeed messages, then I will look at >it again. Dynafeed uses a software tools approach, so it can indeed help those sites. It consists of three parts: a) A feeding program that takes a .newsrc and spits out news batches of "unread" articles, either as true batches, or file lists AKA the "togo" file of C news. This part is dependent on the Bnews/Cnews (unix) news directory structure, one article per file, etc. But to be honest, that could be changed pretty easily. This is run on the *feed* site, so it doesn't have to be ported to deal with DOS based *leaf* sites. b) A "newsrc merging" program which takes a subscription request (I want group X, or groups that match regulary expression R, etc.) and updates the .newsrc file. This one is not unix dependent at all, although it depends on the fairly simple B/C "active" file and its format. It is also only run on the feed site. It is optional, in that you can add and delete newsgroups from the .newsrc file by hand with a text editor. c) An arbitron like program that runs on the recipient site to generate subscription requests to be emailed or uux'd to the merging program. This is not unix dependent either. It knows the .newsrc format, but could be adapted really easily to understand other formats. For subscription purposes, all it really wants is to know what groups people subscribe to. You could write a batch file to do this with just about any subscription file in the world in 5 minutes -- gather up group names, and remove duplicates. (2nd step optional.) The only things that are unix dependent are the feed program's use of the standard news database and active file, and its use of pipes to output news batches directly, as well as the merging program and arbitron program's use of the active file. There are various shell scripts included to put it all together, but they are just one example of ways to use the tools. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
duncan@comp.vuw.ac.nz (Duncan McEwan) (02/27/91)
henry@zoo.toronto.edu (Henry Spencer) writes: >Uh, *what* "line length" limits? There are, by intent, none of any kind >in C News. We've got no shortage of customers using truly ridiculous line >lengths for very selective feeds. I understood long sys file lines to be a problem for some of the awk scripts on systems with awks that limit line lengths. I realise that this is not a *bug* in Cnews, but it is still a *problem* with Cnews that splitting into multiple sys file entries works around, and that use of continuation lines does not. If I have misunderstood the situation I am open to correction ... Having said that, I'm not sure that I like the original posters suggestion of the "sort -u". News arrives out of order often enough already without making it worse... Perhaps a program that reads the togo file stating files and remembering what inodes it had already seen would do the job. Duncan
henry@zoo.toronto.edu (Henry Spencer) (02/28/91)
In article <1991Feb27.064651.1600@comp.vuw.ac.nz> duncan@comp.vuw.ac.nz (Duncan McEwan) writes: >>Uh, *what* "line length" limits? ... > >I understood long sys file lines to be a problem for some of the >awk scripts on systems with awks that limit line lengths... This is a nuisance but, last I looked, doesn't actually cause any dire problems. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry
garyb@abekrd.co.uk (Gary Bartlett) (03/01/91)
In <1991Feb24.000606.3890@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >I don't see any advantage to using multiple sysfile entries over just >using backslash-newline to split a single entry. What about when you need to send different 'classes' of news to the same site, eg i) all files in the 'comp' newsgroup tree and ii) all local files no matter which group so: site:comp:f: site-local/site:all,!comp:Lf:site/togo would be relevant entries in the sys file. ('/site' used so that news from 'site' would not be sent back to 'site' via the 'site-local' entry, and '!comp' so that locally generated 'comp' articles were only sent once via 'site'). Can _this_ be done without using multiple sys file entries? Gary --------------------------------------------------------------------------- Gary C. Bartlett NET: garyb@abekrd.co.uk Abekas Video Systems Ltd. UUCP: ...!uunet!mcsun!ukc!pyrltd!abekrd!garyb 12 Portman Rd, Reading, PHONE: +44 734 585421 Berkshire. RG3 1EA. FAX: +44 734 567904 United Kingdom. TELEX: 847579
henry@zoo.toronto.edu (Henry Spencer) (03/03/91)
In article <1991Feb28.194541.9400@abekrd.co.uk> garyb@abekrd.co.uk (Gary Bartlett) writes: >>I don't see any advantage to using multiple sysfile entries over just >>using backslash-newline to split a single entry. > >What about when you need to send different 'classes' of news to the same >site... Can >_this_ be done without using multiple sys file entries? No, but that's not what the original question was about. -- "But this *is* the simplified version | Henry Spencer @ U of Toronto Zoology for the general public." -S. Harris | henry@zoo.toronto.edu utzoo!henry