[news.software.b] Cnews - A small assist for managing complex sys files

ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/22/91)

I have found that having specific subset feeds is an interesting
situation.  The sites that I feed tend to want specific groups and not
others in non-simple patterns.

I figured that it would not be a problem to have multiple entries in the
sys file for a system that dealt with the appropriate subsets of the
namespace so that it would not hit the "line length" limits that
occasionally bite on very long sys entries.

This generates duplicated entries in the togo file for certain
crosspostings (e.g. alt.* x sci.* will duplicate if the alt groups are
in a different sys entry from the sci groups.)  This, I thought, would
not be a terribly situation, the second copy will be dropped on the
floor by the receiver.  This is, indeed, what happens, no site reports
duplicate items in the spool.

It turns out though, that between some groups that one of the sites is
getting that crossposting can account for almost 25% of the articles in
the togo file!  That makes it very inefficient to use with uucp
batching.

I did find a solution.

In the $NEWSCTL/batch/batchsplit script, adding a "sort -u" before
running the togo file through the splitting awk script removes the
duplicates.  (This is because the togo file name is the first group in
the newsgroups header in all cases for crossposted articles.)

I did it with these lines in batchsplit:
---------
	# main processing
+	# wolves special!
+	# sort the input to remove duplicates
+	tmp=/tmp/bsort$$
+	cat $input | sort -u >$tmp
+	rm $input ; mv $tmp $input
+	#
+	# now, back to the regularly scheduled program
+	#
	rm -f togo.overflow togo.count
	awk 'BEGIN { total = 0 ; ninbatch = 0 ; bno = 1 ; limit = '$1'
---------

The cost of sorting should be weighed carefully against the number of
crossposts duplicated in your togo files by the sys files that you are
using.  If your sys files are small and simple, and don't use multiple
entries for any system, then you won't need to sort at all.
-- 
Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC
UUCP: ...dukcds!wolves!ggw   ...mcnc!wolves!ggw           [use the maps!]
Domain: ggw@cds.duke.edu     ggw%wolves@mcnc.mcnc.org
[The line eater is a boojum snark! ]           <standard disclaimers apply>

henry@zoo.toronto.edu (Henry Spencer) (02/23/91)

In article <1991Feb22.034319.9805@wolves.uucp> ggw%wolves@cs.duke.edu writes:
>I figured that it would not be a problem to have multiple entries in the
>sys file for a system that dealt with the appropriate subsets of the
>namespace so that it would not hit the "line length" limits that
>occasionally bite on very long sys entries.

Uh, *what* "line length" limits?  There are, by intent, none of any kind
in C News.  We've got no shortage of customers using truly ridiculous line
lengths for very selective feeds.
-- 
"Read the OSI protocol specifications?  | Henry Spencer @ U of Toronto Zoology
I can't even *lift* them!"              |  henry@zoo.toronto.edu  utzoo!henry

ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/23/91)

Sorry, I should have pointed out that I am running on a 80386 Unix
platform and using the vendor supplied stdio rather than the Cnews stdio
versions.

It is not a bug in Cnews, its a bug in the choices that I made in
setting up this system! (Is an inherent limitation a "bug"?)
-- 
Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC
UUCP: ...dukcds!wolves!ggw   ...mcnc!wolves!ggw           [use the maps!]
Domain: ggw@cds.duke.edu     ggw%wolves@mcnc.mcnc.org
[The line eater is a boojum snark! ]           <standard disclaimers apply>

brad@looking.on.ca (Brad Templeton) (02/23/91)

If your sys file is that complex, you should really be feeding with
dynafeed.  It's in uunet:~/ClariNet/dynafeed.tar.Z

This gives newsgroup by newsgroup control of the feeding, remote control,
and best of all, automatic remote control based on arbitron-like output
from the recipient site.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

scs@lokkur.dexter.mi.us (Steve Simmons) (02/24/91)

In article <1991Feb22.034319.9805@wolves.uucp> ggw%wolves@cs.duke.edu writes:
>I figured that it would not be a problem to have multiple entries in the
>sys file for a system that dealt with the appropriate subsets of the
>namespace so that it would not hit the "line length" limits that
>occasionally bite on very long sys entries.

henry@zoo.toronto.edu (Henry Spencer) writes:

>Uh, *what* "line length" limits?  There are, by intent, none of any kind
>in C News.  We've got no shortage of customers using truly ridiculous line
>lengths for very selective feeds.

How about the human reader ones?  My sysfiles were horrible (yes, I
know about dynafeed, Brad!) until I split the entries across multiple
lines (ie, multiple sysfile entries for a single site).  1K-long lines
just don't make it.
-- 
	"Perl is the BASIC of UNIX."  -- Tom Christiansen

henry@zoo.toronto.edu (Henry Spencer) (02/24/91)

In article <1991Feb23.221103.13320@lokkur.dexter.mi.us> scs@lokkur.dexter.mi.us (Steve Simmons) writes:
>>Uh, *what* "line length" limits? ...
>
>How about the human reader ones?  My sysfiles were horrible (yes, I
>know about dynafeed, Brad!) until I split the entries across multiple
>lines (ie, multiple sysfile entries for a single site).  1K-long lines
>just don't make it.

I don't see any advantage to using multiple sysfile entries over just
using backslash-newline to split a single entry.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

ggw%wolves@cs.duke.edu (Gregory G. Woodbury) (02/24/91)

<1991Feb23.064618.6167@looking.on.ca> brad@looking.on.ca (Brad Templeton):
>If your sys file is that complex, you should really be feeding with
>dynafeed.  It's in uunet:~/ClariNet/dynafeed.tar.Z
>
>This gives newsgroup by newsgroup control of the feeding, remote control,
>and best of all, automatic remote control based on arbitron-like output
>from the recipient site.

Brad,
	I appreciate your eagerness to promote your software, but not
all the world is on UNIX systems or on systems where they can simply
plug in some package and hope it will "go", feeding gateways to other
networks and systems other than fairly vanilla B- or C- news derived
systems is not helped by dynafeed.

If you can convince Waffle and FSUUCP and all the DOS-based newsreaders
and news consumers to support the dynafeed messages, then I will look at
it again.

You're a nice guy (at least you seem that way, even in the flesh) but
just a bit to flip on the delivery of your solution "pronouncements".
You might qualify as a real net.god, but don't forget the
net.feet.of.clay.
-- 
Gregory G. Woodbury @ The Wolves Den UNIX, Durham NC
UUCP: ...dukcds!wolves!ggw   ...mcnc!wolves!ggw           [use the maps!]
Domain: ggw@cds.duke.edu     ggw%wolves@mcnc.mcnc.org
[The line eater is a boojum snark! ]           <standard disclaimers apply>

brad@looking.on.ca (Brad Templeton) (02/25/91)

In article <1991Feb24.023410.5490@wolves.uucp> ggw%wolves@cs.duke.edu (Gregory G. Woodbury) writes:
>Brad,
>	I appreciate your eagerness to promote your software, but not
>all the world is on UNIX systems or on systems where they can simply
>plug in some package and hope it will "go", feeding gateways to other
>networks and systems other than fairly vanilla B- or C- news derived
>systems is not helped by dynafeed.
>
>If you can convince Waffle and FSUUCP and all the DOS-based newsreaders
>and news consumers to support the dynafeed messages, then I will look at
>it again.

Dynafeed uses a software tools approach, so it can indeed help those sites.
It consists of three parts:

a) A feeding program that takes a .newsrc and spits out news batches of
"unread" articles, either as true batches, or file lists AKA the "togo"
file of C news.   This part is dependent on the Bnews/Cnews (unix) news
directory structure, one article per file, etc.   But to be honest, that
could be changed pretty easily.   This is run on the *feed* site, so it
doesn't have to be ported to deal with DOS based *leaf* sites.

b) A "newsrc merging" program which takes a subscription request (I want
group X, or groups that match regulary expression R, etc.) and updates the
.newsrc file.  This one is not unix dependent at all, although it depends
on the fairly simple B/C "active" file and its format.   It is also only
run on the feed site.  It is optional, in that you can add and delete
newsgroups from the .newsrc file by hand with a text editor.

c) An arbitron like program that runs on the recipient site to generate
subscription requests to be emailed or uux'd to the merging program.
This is not unix dependent either.  It knows the .newsrc format, but could
be adapted really easily to understand other formats.   For subscription
purposes, all it really wants is to know what groups people subscribe to.
You could write a batch file to do this with just about any subscription
file in the world in 5 minutes -- gather up group names, and remove duplicates.
(2nd step optional.)

The only things that are unix dependent are the feed program's use of the
standard news database and active file, and its use of pipes to output
news batches directly, as well as the merging program and arbitron program's
use of the active file.   There are various shell scripts included to put
it all together, but they are just one example of ways to use the tools.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

duncan@comp.vuw.ac.nz (Duncan McEwan) (02/27/91)

henry@zoo.toronto.edu (Henry Spencer) writes:

>Uh, *what* "line length" limits?  There are, by intent, none of any kind
>in C News.  We've got no shortage of customers using truly ridiculous line
>lengths for very selective feeds.

I understood long sys file lines to be a problem for some of the
awk scripts on systems with awks that limit line lengths.  I realise
that this is not a *bug* in Cnews, but it is still a *problem* with
Cnews that splitting into multiple sys file entries works around,
and that use of continuation lines does not.  If I have misunderstood
the situation I am open to correction ...

Having said that, I'm not sure that I like the original posters
suggestion of the "sort -u".  News arrives out of order often enough
already without making it worse...  Perhaps a program that reads
the togo file stating files and remembering what inodes it had already
seen would do the job.

Duncan

henry@zoo.toronto.edu (Henry Spencer) (02/28/91)

In article <1991Feb27.064651.1600@comp.vuw.ac.nz> duncan@comp.vuw.ac.nz (Duncan McEwan) writes:
>>Uh, *what* "line length" limits? ...
>
>I understood long sys file lines to be a problem for some of the
>awk scripts on systems with awks that limit line lengths...

This is a nuisance but, last I looked, doesn't actually cause any dire
problems.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

garyb@abekrd.co.uk (Gary Bartlett) (03/01/91)

In <1991Feb24.000606.3890@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>I don't see any advantage to using multiple sysfile entries over just
>using backslash-newline to split a single entry.

What about when you need to send different 'classes' of news to the same site,
eg
	i)  all files in the 'comp' newsgroup tree and
	ii) all local files no matter which group

so:

	site:comp:f:
	site-local/site:all,!comp:Lf:site/togo

would be relevant entries in the sys file. ('/site' used so that news from
'site' would not be sent back to 'site' via the 'site-local' entry, and '!comp'
so that locally generated 'comp' articles were only sent once via 'site'). Can
_this_ be done without using multiple sys file entries?

Gary

---------------------------------------------------------------------------
Gary C. Bartlett               NET: garyb@abekrd.co.uk
Abekas Video Systems Ltd.     UUCP: ...!uunet!mcsun!ukc!pyrltd!abekrd!garyb
12 Portman Rd,   Reading,    PHONE: +44 734 585421
Berkshire.       RG3 1EA.      FAX: +44 734 567904
United Kingdom.              TELEX: 847579

henry@zoo.toronto.edu (Henry Spencer) (03/03/91)

In article <1991Feb28.194541.9400@abekrd.co.uk> garyb@abekrd.co.uk (Gary Bartlett) writes:
>>I don't see any advantage to using multiple sysfile entries over just
>>using backslash-newline to split a single entry.
>
>What about when you need to send different 'classes' of news to the same
>site...  Can
>_this_ be done without using multiple sys file entries?

No, but that's not what the original question was about.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry