[news.software.b] What to do with unknown newsgroups.

rick@seismo.UUCP (03/13/87)

Currently, 2.11 news goes to great pains to keep unknown newsgroups
on the Newsgroups line, but not localize them. This has caused many
people to change their mind and claim that this is really wrong.

A favorite example is somthing posted to misc.jobs,ut.jobs from
ut-sally ending up in the "local" ut jobs group at U of Toronto.

Does anyone still think that the unknown groups should be preserved or
should a future patch remove the unknown newsgroups?

---rick

stephen@comp.lancs.ac.uk (Stephen J. Muir) (03/15/87)

In article <43152@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>Does anyone still think that the unknown groups should be preserved or
>should a future patch remove the unknown newsgroups?

It should remove them.  A more important reason is that, for example, although
Europe doesn't get talk.* newsgroups, many sites think we do because of the
cross-posted articles from other newsgroups we *do* get!  Then these users post
to a talk.* newsgroup and those articles get junked with no warning to the
poster.

Also, my /usr/lib/news/errlog keeps filling up with "unknown newsgroup
talk.whatever not localized".  This file should not grow quickly or news
administrators will be discouraged from looking at it.
-- 
EMAIL:	stephen@comp.lancs.ac.uk	| Post: University of Lancaster,
UUCP:	...!mcvax!ukc!dcl-cs!stephen	|	Department of Computing,
Phone:	+44 524 65201 Ext. 4120		|	Bailrigg, Lancaster, UK.
Project:Alvey ECLIPSE Distribution	|	LA1 4YR

jbuck@epimass.UUCP (Joe Buck) (03/16/87)

Sorry this is so long, but I think a detailed discussion is needed.

In article <43152@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>Currently, 2.11 news goes to great pains to keep unknown newsgroups
>on the Newsgroups line, but not localize them. This has caused many
>people to change their mind and claim that this is really wrong.

I think it's right.  The standards document said this was how news
was supposed to behave; 2.11 is the first one to get this right.
It's now less critical for every non-leaf site to have a correct
active file; either the article will be rejected (and possibly flow
around the bad site) or the Newsgroups line will be left alone.

For example, let's say site "hao" (for example) decides not to
accept talk groups.  Under 2.10.x, to avoid trashing the Newsgroups
line on cross-posted articles, they'd have to maintain a line in
active, and a directory, for all the talk groups.  Local readers
(including the manager who ordered "talk" to be cut off) could still
read the talk groups, though only cross-posted articles would be
present.  The alternative (Unknown group talk.philosophy.misc removed)
would be to ruin the Newsgroups line, since the article may
eventually get to a part of the network where talk exists.

Under the new regime, hao can simply remove the talk groups.  No
hassle.  The sys admin doesn't have to keep track of "talk" group
creations and deletions, and other sites don't get mad at them.

>A favorite example is somthing posted to misc.jobs,ut.jobs from
>ut-sally ending up in the "local" ut jobs group at U of Toronto.

Hmm.  Well, nothing's perfect.  The problem is that there's no way to
say that the two ut.jobs groups are different.  What if the two
groups were "sci.math,talk.philosophy.misc" (this has been a fairly
common cross-posting at times)?  Then this is exactly the behavior
you want.  Go back and read Mark Horton's document (in the doc
directory with the 2.11 distribution).  He explains this point.  

One solution that's sure to make everyone scream: make all
distributions unique.  But it's too late for that.

>Does anyone still think that the unknown groups should be preserved or
>should a future patch remove the unknown newsgroups?

I think it's more critical than ever (with overloaded backbone sites
dropping whole categories of groups) that the current 2.11 behavior
be preserved.  I think the new approach causes fewer problems than
the old approach.

Another point.  Say you're a leaf site (or otherwise totally
dependent on a single upstream site, though you may feed others) and
the upstream administrator has a missing group.  Under 2.10.x the
upstream site strips names; you have no idea anything's wrong.  Under
2.11 the name of the new group is there, in cross-postings.  You get
messages in the log file about it (unknown group not localized) so
you can ask the upstream administrator what is going on.



-- 
- Joe Buck 	{hplabs,ihnp4,sun,ames}!oliveb!epimass!jbuck
		seismo!epiwrl!epimass!jbuck  {pesnta,tymix,apple}!epimass!jbuck
  Entropic Processing, Inc., Cupertino, California

rick@seismo.CSS.GOV (Rick Adams) (03/17/87)

> you want.  Go back and read Mark Horton's document (in the doc
> directory with the 2.11 distribution).  He explains this point.  

Unfortunately Horton has reversed his position and favors removal. Thats
what prompted the general question.

(The current opinon is about 60% for removal and 40% for keeping. I was
hoping for something more definitive)

---rick

grr@cbmvax.UUCP (03/17/87)

In article <43159@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>
>Unfortunately Horton has reversed his position and favors removal. Thats
>what prompted the general question.
>
>(The current opinon is about 60% for removal and 40% for keeping. I was
>hoping for something more definitive)

I definitly favor keeping the groups -

1) it bridges gaps in distributions, allowing the articles to fall into
   their intended groups no matter how they arrive.  this allows the reader
   to distinguish a cross posted article from an apparently mis-posted one.

2) it allows news to flow during name changes and similar confusions by
   simply putting both groups in the header.  the current moribund state
   of mod.amiga.{sources,binaries} is a case in point.


Perhaps some thought should be given to naming conventions for regional and
local groups and stripping groups that being exported from a distribution
class i.e. local, regional, net...
-- 
George Robbins - now working for,	uucp: {ihnp4|seismo|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

dhb@rayssd.RAY.COM (David H. Brierley) (03/17/87)

In article <297@dcl-csvax.comp.lancs.ac.uk> stephen@comp.lancs.ac.uk (Stephen J. Muir) writes:
>In article <43152@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>>Does anyone still think that the unknown groups should be preserved or
>>should a future patch remove the unknown newsgroups?
>
>It should remove them.  A more important reason is that, for example, although
>Europe doesn't get talk.* newsgroups, many sites think we do because of the
>cross-posted articles from other newsgroups we *do* get!  Then these users post
>to a talk.* newsgroup and those articles get junked with no warning to the
>poster.

To get the most functionality, I think the software should do both.  The unknown
group should be preserved in the copy of the message that is sent on to other
sites but should be removed in the copy that is stored locally.  This way, the
other sites can still see the cross posting if they want to but local users are
not confused by the strange newsgroups name.  Unfortunately, I realize that this
would probably require some extensive changes to rnews/inews since the copy passed
along to other machines would now be different than the one stored locally.

>Also, my /usr/lib/news/errlog keeps filling up with "unknown newsgroup
>talk.whatever not localized".  This file should not grow quickly or news
>administrators will be discouraged from looking at it.

This doesn't seem like a real big problem to me.  I have an awk program that runs
every night that analyzes the log file and sends me a summary via email.  If my
site did not support the talk. groups I would just add a line to the awk script
that recognized "unknown newsgroup talk." and threw it away.  It doesn't take
much effort to write an awk script to analyze the log file, you can probably even
use the ones Erik Fair wrote (they have been posted to the net a couple of times).
-- 
David H. Brierley
Raytheon Submarine Signal Division; Portsmouth RI; (401)-847-8000 x4073
smart mailer or arpanet: dhb@rayssd.ray.com
old dumb mailer or uucp: {cbosgd,gatech,ihnp4,linus!raybed2} !rayssd!dhb

eppstein@tom.columbia.edu (David Eppstein) (03/17/87)

Obviously some groups in some contexts (talk in na) should be kept in
the message regardless of whether the link wants that group.

Obviously other groups and contexts (the two different uts outside their
respective universities) would be better stripped.

So, the obvious solution is to make both possible on a group-by-group basis.
Why has no one else already proposed such a solution?
-- 
David Eppstein, eppstein@cs.columbia.edu, Columbia U. Computer Science Dept.

piet@mcvax.cwi.nl (Piet Beertema) (03/17/87)

	>Does anyone still think that the unknown groups should be preserved or
	>should a future patch remove the unknown newsgroups?
Remove 'em. I see no reason to preserve newsgroups that aren't
received locally (or that aren't even received in e.g. Europe).
On the contrary: it's utterly confusing if such newsgroups are
preserved. That's also exactly the reason why we worldwide
checkgroups control messages are unwanted: large parts of the
world do *not* receive all newsgroups.


-- 
	Piet Beertema, CWI, Amsterdam
	(piet@cwi.nl  or  mcvax!piet)

heiby@mcdchg.UUCP (Ron Heiby) (03/18/87)

I also favor the current 2.11 behaviour of maintaining the newsgroups
in the header as posted (or, of course, modified by aliases) for the
reasons given by several others in this forum.

In article <297@dcl-csvax.comp.lancs.ac.uk> stephen@comp.lancs.ac.uk (Stephen J. Muir) writes:
>Also, my /usr/lib/news/errlog keeps filling up with "unknown newsgroup
>talk.whatever not localized".  This file should not grow quickly or news
>administrators will be discouraged from looking at it.

The way I deal with this, is the following code in the .profile for the netnews
administrator login.  (Note that my LIBDIR is not in the default location.)
Maybe I'm trimming too much out of errlog.  This seems to get rid of the trash.
-----
if [ -s $HOME/lib/errlog ]
then
	cat $HOME/lib/errlog |
	egrep -v 'Unknown newsgroup|Newsgroups in active|Duplicate|Aliased|Orphaned|unopenable' > $HOME/lib/errlog.new
	mv $HOME/lib/errlog.new $HOME/lib/errlog
fi
if [ -s $HOME/lib/errlog ]
then
	echo "Stuff in $HOME/lib/errlog!"	# if there's still something
fi
-----
-- 
Ron Heiby, mcdchg!heiby		Moderator: mod.newprod & mod.os.unix
Motorola Microcomputer Division (MCD), Schaumburg, IL
"Save your energy.  Save yourselves.  Avoid the planet 'cuae2' at all costs!"

henry@utzoo.UUCP (Henry Spencer) (03/18/87)

The C news implementors (Geoff and I) favor retaining unknown groups, due
to a fundamental belief that the right thing to do with headers is to leave
them alone.  (The need to compromise this for Path and Xrefs is annoying.)
The problem of name clashes between regional groups is indeed troublesome,
but just stripping out unknown groups is not a good solution.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

dave@rsch.wisc.edu (Dave Cohrs) (03/19/87)

It seems that what we need is a way to tell inews what to localize
and what not to localize, without changing the headers, of course.

A quick (meaning I didn't spend hours thinking about this) solution
would be to add another control file which lists distributions and
the sites that can post to them.  Something like:

comp    all
soc     all
uw      *.wisc.edu
etc,etc

[ let's assume such a file lives on uwvax, aka rsch.wisc.edu ]

This way, if uwvax receives a message for "misc.forsale,uw.general", it
would look to see which site originated the message.  If it isn't in the
list of "localizable" sites, it doesn't localize the distribution.

Comments?

Dave Cohrs                                             Proud member of NOTHING
+1 608 262-2196                             UW-Madison Computer Sciences Dept.
dave@rsch.wisc.edu               ...!{harvard,ihnp4,seismo,rutgers}!uwvax!dave

jgd@uwmcsd1.UUCP (03/22/87)

Numerous previous articles have made a case for zapping "unknown" newsgroups
from the Newsgroups: header line.

[Lighting a match... ]

Much as I hate to do this [ :-) ], I will argue against this change by citing
the *documented STANDARD*.  (It's a dirty trick, but someone has to do it!)


==> RFC 850                                         June 1983
==>        Standard for Interchange of USENET Messages
==>                       Mark R. Horton
==> 
==> 
==> [ This memo is distributed as an RFC  only  to  make  this
==> information  easily  accessible to researchers in the ARPA
==> community.  It does not specify  an  Internet  standard. ] 
==> 
==> 1.  Introduction
==> 
==> This document defines the standard format for  interchange
==> of Network News articles among USENET sites.  It describes
==> the format for  articles  themselves,  and  gives  partial
==> standards for transmission of news.  ...
==> 
==> 	[Non-germane text deleted]
==> 
==> 2.1.5  Newsgroups  The  Newsgroups  line  specifies  which
==> newsgroup  or newsgroups the article belongs in.  ...
==> 
==> If an article is received with a Newsgroups  line  listing
==> some  valid newsgroups and some invalid newsgroups, a site
==> should  not  remove  invalid  newsgroups  from  the  list.
==> Instead,  the  invalid  newsgroups should be ignored.  For
==> example,  suppose  site  A  subscribes  to   the   classes
==> "btl.all"   and   "net.all",   and exchanges news articles
==> with site B,  which  subscribes  to   "net.all"   but  not
==> "btl.all".      Suppose   A   receives   an  article  with
==> "Newsgroups: net.micro,btl.general".     This  article  is
==> passed  on  to  B because B receives net.micro, but B does
==> not receive btl.general.  A must leave the Newsgroup  line
==> unchanged.   If  it  were  to  remove  "btl.general",  the
==> edited header could  eventually  reenter  the    "btl.all" 
==> class,  resulting in an article that is not shown to users
==> subscribing  to   "btl.general".    Also,  followups  from
==> outside  "btl.all"  would not be shown to such users.

Now, although the above cited RFC does not purport to be an Internet
standard, it *does* claim to reflect USENET standards.  I submit that
News B.2.11 conforms to this standard (at least insofar as section
2.1.5 is concerned.)

Recent proposals are to "break" News 2.11.  (By removing "unknown" newsgroups
from the "Newsgroups: header.)

If people want to change the way News works, they should change the
standards first.  (Or at least *propose* changing the standards, *THEN*
change the software.)  [This is a *religious* position -- don't argue! :-)]

Please keep in mind that as more and more sites start dropping branches of
the news directory tree, the situation described in the example of 2.1.5
will become more common.  If we start dropping "unknown" newsgroups from
the headers, (more?) little "black holes" will start appearing in USENET.
Enough articles get dropped on the floor as it is.  Besides, "junk" will
start getting more activity, and who *really* wants to read "junk"?

[Extinguishing match... just spotted Smokey the Bear.]
-- 
John G Dobnick
Computing Services Division @ University of Wisconsin - Milwaukee
UUCP: {ihnp4|uwvax|uwmacc}!uwmcsd1!jgd
INTERNET: jgd@csd1.milw.wisc.edu

"Knowing how things work is the basis for appreciation,
and is thus a source of civilized delight."  -- William Safire