[news.config] Solution to news dup site names

emv@mailrus.cc.umich.edu (Edward Vielmetti) (08/20/88)

In article <1474@datapg.MN.ORG> sewilco@datapg.MN.ORG (Scot E Wilcoxon) writes:
>In article <20246@tut.cis.ohio-state.edu> karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) writes:
>>sewilco@datapg.MN.ORG writes:
>>   News sites with the same name are invisible to each other.
>>
>>Why not just have fully-qualified domain names in the Path: header?
>
>To keep the Path: header from becoming inconveniently long.  This
>might be defined as either "too large for a buffer" or "longer than
>the text of the message".

True enough.  Long path names are a problem.  As matt@oddjob proved
some time this summer (matt?) there's a point at which the long
path line will cause the existing news software to break.  I don't
think it was # of hops, more like total character length (on
the order of 3 lines or 240 characters as I recall.)  Even
if the existing news software works, the longer the paths
in terms of # of hops and also # of characters the harder it
is for people with brain-dead news software and mailers (i.e.
without INTERNET defined) to reply to news postings easily.
That describes most of the AT&T news network, for instance.

I doubt that the addition of 7 characters to your name will
make a big difference, considering the leaf-ness of your site.
But people like Karl who shoot a lot of news around with
22 character site names might make a dent in the limits
some time, if the net grows too big.

The real problem is that "the net is getting too big", or at
the very least it's not unlikely that path lengths will grow,
not shrink, in the near future.  That is, unless you take
action to keep them in check.

What you can do to help this situation out in the long run is
to be sure that your own articles get propogated as widely and
cheaply as possible.  One good metric of this is "how many hops
does it take my posting to get to uunet?"  (I'm guilty in this
respect - we stopped feeding uunet when umix, our aging vax,
started to have mail back up because of news feeds, thus violating
the Prime Directive.)  If you know that you're one or two or
even three hops away from uunet, you could have a system name
like starbarlounge.upie.cc.umich.edu and no one would bat an eye.

If you're on the internet, and you have multiple feeds with NNTP,
and you've noticed that some of them "aren't worth it" because you
never end up sending any articles across that link - that's 
a perfect opportunity to cut down the size of the network.
Reduce the link to an L4 or L5 style connection instead of a
full feed, and you'll have only relatively local traffic pass
over that connection.  That'll reduce load on both machines,
not queueing up articles that don't have a good chance of 
being accepted on the far end.

I'm cross posting this to news.config because it's not just
a software issue, it's a topology issue.  

--Ed
usenet news admin, U of Michigan.

(mailrus - 7 characters, not bad....)

hokey@plus5.UUCP (Hokey) (08/21/88)

I haven't been following this discusison all that carefully, but has anybody
else considered that the problem would disappear if articles were only posted
from or relayed between registered sites?

This is a simple matter of administration.

Unregistered leaf sites can cause articles to be posted from their registered
hub using mail.
-- 
Hokey

peter@ficc.UUCP (Peter da Silva) (08/22/88)

How about a smart application of domainised paths?

If you have a domain name, or are registered, check the path. Only put your
full name in if nobody has already.

So, if you're tut.cis.ohio-state.edu, and the path looks like:

	Path: zardoz.omaha.nebraska.us!luser

Then you just tack your uucp name on, but if it looks like:

	Path: weerd!flamer!myibmpc!luser

Then you tack on the whole thing...

That way, paths would have at least one valid, reachable, name.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"     sugar.uu.net!ficc!peter.

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (08/23/88)

emv@mailrus.cc.umich.edu writes:
   True enough.  Long path names are a problem.

Not in terms of the software.  I just grep'd the Path: header out of
971 articles in comp.unix.wizards and stuffed them through `wc.'  The
result is an average Path: length of a fraction over 80 chars/line.
That's including the "Path: " text, so make it 74 chars/line of actual
hostname-and-!'s.  Allowing for deeply `leafed' nodes might knock that
average length up to, say, 120.

In the news software, src/header.h defines the max length as char
path[PATHLEN], where src/defs.h contains
	#define PATHLEN 512     /* length of longest source string */
and thus we could quadruple or quintuple the length of a Path: header
before breaking software.

If the metric is intead "longer than the text of the article," then
the much larger problem is that every news article carries around 8 or
10 lines worth of stuff to keep track of a 2-line comment.  Longer
Path: lines are not significant in the face of all those other
headers.

--Karl

bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) (08/25/88)

In article <1290@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>How about a smart application of domainised paths?
>
>If you have a domain name, or are registered, check the path. Only
>put your full name in if nobody has already.
>...
>That way, paths would have at least one valid, reachable, name.

But Path: lines aren't to be used for mail replies, only as an audit
trail ("Don't put that in your mouth, you don't where it's been!" :-)
to describe the path of the article.  If the Path: already contains a
fully-qualified domain name and I therefore don't add to it, then the
connectivity information is lost.  There might as well be no Path:
line.

If you need a valid, reachable address for mail, you're supposed to
use the From: line.  Similarly, you're not supposed to put anything
else in a From: line.

We must be taking a lonely stance on this one - we're starting to be
used as a counterexample :-).
-=-
									--Bob
He probably just wants to take over my CELLS and then EXPLODE
 inside me like a BARREL of runny CHOPPED LIVER!  Or maybe he'd
 like to PSYCHOLIGICALLY TERRORISE ME until I have no objection
 to a RIGHT-WING MILITARY TAKEOVER of my apartment!!  I guess
 I should call AL PACINO!

peter@ficc.uu.net (Peter da Silva) (08/25/88)

In article <20673@tut.cis.ohio-state.edu>, bob@allosaur.cis.ohio-state.edu
	(Bob Sutterfield) writes:

> But Path: lines aren't to be used for mail replies, only as an audit
> trail ("Don't put that in your mouth, you don't where it's been!" :-)

Well, if the mailpaths file worked right, there would be no problem.
Unfortunately, mailpaths doesn't work. In fact, if you use GENERICPATH
and INTERNET together, it tries to send local mail to your internet
location (because it doesn't figure out that site.domain and site
match). This also causes problems for cancel().

Where does one get 3.0 or C news?

Anyway, cheap or segmentised sites have to depend on the path.

> We must be taking a lonely stance on this one - we're starting to be
> used as a counterexample :-).

You were a handy example, suitable for cutting and pasting.
--
	Kent Paul Dolan/Zippy the Pinhead in '92.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"     sugar.uu.net!ficc!peter.

ane@hal.UUCP (Aydin "Bif" Edguer) (08/25/88)

In article <20673@tut.cis.ohio-state.edu> bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) writes:
 > In article <1290@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
 > > If you have a domain name, or are registered, check the path. Only
 > > put your full name in if nobody has already.
 > > That way, paths would have at least one valid, reachable, name.
 > If the Path: already contains a fully-qualified domain name and 
 > I therefore don't add to it, then the connectivity information is lost.
 > There might as well be no Path: line.
Bob, you didn't quite read Peter correctly I think.  He said only
put your full name if no one has.  Otherwise just use your short uucp
name.  You still appear in the Path: line, no connectivity information
has been lost BUT only a short name (up to 7 chars rather than say
22 chars as in tut.cis.ohio-state.edu) has been added to the path.
I do not think he was saying add nothing to the path.

 > But Path: lines aren't to be used for mail replies, only as an audit
 > trail ("Don't put that in your mouth, you don't where it's been!" :-)
 > to describe the path of the article.
That is the only reason why this would be okay.  However, I don't completely
agree that Path: lines are not used for mail replies.  If you have a
smart mailer (smail w/pathalias) or a convenient smarthost (uunet) who
has agreed to be your smarthost then Path: is an audit.  In the default
configuration though, Path: is used for mail replies, and thus MUST BE
VALID.  If the host ONLY responds to its fully qualified domain name
then it SHOULD use it.  With that additional qualification, I agree
with Peter.

 > We must be taking a lonely stance on this one - we're starting to be
 > used as a counterexample :-).
Your great connectivity helps to get you noticed.  Along with fame
comes notoriety :-)

Aydin Edguer					hal!ane or ane@hal.cwru.edu

peter@ficc.uu.net (Peter da Silva) (08/25/88)

I just rewrote the replyname function in funcs2.c to do a better job of
hacking the mailpaths file. This is only useful for people who have
INTERNET defined, and of course it's missing all the special code that
the Sun people put in to handle .OZ...

It also makes the assumption that the first element of the path is
the site name, but that seems safe.

----/*SNIP SNIP */----
char *
replyname(hptr)
struct hbuf *hptr;
{
	register char *name;
	register char *ptr;
	static char user[PATHLEN], path[PATHLEN];
	static char buf[PATHLEN], fmt[PATHLEN];
	FILE *mfd;
	int found;

	/* Figure out where to send mail */
	if (hptr->replyto[0])
		name = hptr->replyto;
	else if (hptr->from[0])
		name = hptr->from;
	else { /* Should never happen */
		name = strchr(hptr->path, '!');
		if(!name)
			name = hptr->path;
	}

	/* Remove (User name @ Organisation) */
	strcpy(buf, name);
	name = buf;
	if(ptr = index(name, ' '))
		*ptr = 0;

	/* break it up into path/site and name */
	if(ptr = index(name, '@')) {
		strncpy(user, name, ptr-name);
		user[ptr-name] = 0;
		strcpy(path, ptr+1);
	} else if(ptr = rindex(name, '!')) {
		strcpy(user, ptr+1);
		strncpy(path, name, ptr-name);
		path[ptr-name] = 0;
	} else	/* A local */
		return name;
	
	sprintf(buf, "%s/mailpaths", LIB);
	mfd = xfopen(buf, "r"); /* Should probably do an fopen,
				   and just return path if not found.
				   This would make INTERNET unnecessary */

	/* If all else fails, fall back on the path */
	name = strchr(hptr->path, '!');
	if(!name)
		name = hptr->path;

	/* Look for my path */
	while(fgets(buf, sizeof buf, mfd)) {
		if(PREFIX(buf, path)) {	
			sscanf(buf, "%*s %s", fmt);
			sprintf(buf, fmt, user);
			name = buf;
			break;
		} else if(PREFIX(buf, "internet")) { /* Should be last entry */
			sscanf(buf, "%*s %s", fmt);
			strcat(path, "!");
			strcat(path, user);
			sprintf(buf, fmt, path);
			name = buf;
			break;
		}
	}
	return name;
}
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?"            peter@ficc.uu.net

bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) (08/26/88)

In article <281@hal.UUCP> ane@hal.cwru.edu (Aydin "Bif" Edguer) writes:
>Bob, you didn't quite read Peter correctly I think.  

That's entirely possible.

>He said only put your full name if no one has.  Otherwise just use
>your short uucp name.

There are several problems with that: feasibility (for UUCP-only
sites) and cost (for name resolver-capable sites) to validate that
upstream name, and the fact that only one of our machines is
registered in the UUCP maps, though that with a short name.

> > But Path: lines aren't to be used for mail replies, only as an
> > audit trail ("Don't put that in your mouth, you don't where it's
> > been!" :-) to describe the path of the article.
>
>That is the only reason why this would be okay.  However, I don't
>completely agree that Path: lines are not used for mail replies.  If
>you have a smart mailer (smail w/pathalias) or a convenient smarthost
>(uunet) who has agreed to be your smarthost then Path: is an audit.
>In the default configuration though, Path: is used for mail replies,
>and thus MUST BE VALID.

I can only stand on the comments in section 2.1.6 of RFC 1036 (Horton
and Adams, December 1987) on the Path line.  To excerpt:

    This line shows the path the message took to reach the current
    system.
    ...
    The "Path" line is not used for replies, and should not be taken
    as a mailing address.  It is intended to show the route the
    message traveled to reach the local host.
    ... [though unfortunately] ...
    Special upward compatibility note: Since the "From", "Sender", and
    "Reply-To" lines are in Internet format, and since many USENET
    hosts do not yet have mailers capable of understanding Internet
    format, it would break the reply capability to completely sever
    the connection between the "Path" header and the reply function.
    [but still]
    It is recognized that the path is not always a valid reply string
    in older implementations, and no requirement to fix this problem
    is placed on implementations.

So, use of Path: instead of Reply: is described as a recognized and
tolerated thing that people out there tend to do, especially those
with older versions of the software.  Even those older versions may
not do it right to help each other.  Path: is used for replies in the
default configuration of 2.11 if you leave INTERNET undefined, but
there's no reason not to define INTERNET if you have LIBDIR/mailpaths
set up correctly.

Maintenance of valid mail links in the Path: line is encouraged, but
not required.  It's even common these days for the link not to exist
in a form usable for UUCP mail.  For example, your note, as it arrived
on our system, has a Path line like
tut.cis.ohio-state.edu!cwjcc!hal!ane, but there's no UUCP connection
between OSU and Case.  It came across NNTP.

Oh well, this is going on far too long and not really getting
anywhere.  It's sounding too much like a religious argument when
someone has to resort to excerpting quotations of scriptures :-)
-=-
									--Bob
YOW!!  What should the entire human race DO??  Consume a fifth
 of CHIVAS REGAL, ski NUDE down MT. EVEREST, and have a wild
 SEX WEEKEND!

emv@mailrus.cc.umich.edu (Edward Vielmetti) (08/26/88)

In article <20729@tut.cis.ohio-state.edu> bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) writes:
>
>So, use of Path: instead of Reply: is described as a recognized and
>tolerated thing that people out there tend to do, especially those
>with older versions of the software.  Even those older versions may
>not do it right to help each other.  Path: is used for replies in the
>default configuration of 2.11 if you leave INTERNET undefined, but
>there's no reason not to define INTERNET if you have LIBDIR/mailpaths
>set up correctly.
>
>Maintenance of valid mail links in the Path: line is encouraged, but
>not required.  It's even common these days for the link not to exist
>in a form usable for UUCP mail.  For example, your note, as it arrived
>on our system, has a Path line like
>tut.cis.ohio-state.edu!cwjcc!hal!ane, but there's no UUCP connection
>between OSU and Case.  It came across NNTP.

Is "older versions of the software" also a reference to the uucp
that you're running on tut?  If you have t protocol support, it's
real easy for the Path: line to be a replyable one.

One alternative that no one has mentioned from the RFC is
this one, which would break I don't know how much software:

	tut.cis.ohio-state.edu, cwjcc!hal!ane

A quote from Scripture:

"Letters, digits, periods and hyphens are considered part of host
names; other puncutation, including blanks, are considered
separators."

As I say, I hate to think how much software relys on a "!"
character, but any other separator is legal.

--Ed
Edward Vielmetti, usenet news admin, U of Michigan.

Obligatory anti-Ohio slur: 
	Columbus: South until you smell it, east until you step in it.

mrm@sceard.UUCP (M.R.Murphy) (08/26/88)

In article <2530@plus5.UUCP> hokey@plus5.com (Hokey) writes:
>I haven't been following this discusison all that carefully, but has anybody
>else considered that the problem would disappear if articles were only posted
>from or relayed between registered sites?
>
>This is a simple matter of administration.
>
>Unregistered leaf sites can cause articles to be posted from their registered
>hub using mail.
>-- 
>Hokey

Speaking of att.com's recent decision to cease support for third-party mail,
must they burden the maps with their (DEDICATED) internal machines?-)
As far as saving the names for att, who cares (much:-) what internal names
they use. dweebish.att.com is a nice fully qualified domain name; it shouldn't
conflict with dweebish.drelb.com even a teensy. Let 'em unregister the names
so we can all grab the good ones. (All the good ones are taken ...)

Reading the maps is wholesome fun. Did you know there is a site named
f**kup in Europe (where is left as an excercise for the reader).

Regards,
Mike


-- 
Mike Murphy  Sceard Systems, Inc.  544 South Pacific St.  San Marcos, CA  92069
ARPA: sceard!mrm@nosc.MIL   BITNET: MURPHY@UCLACH
UUCP: ucsd!sceard!mrm     INTERNET: mrm%sceard.UUCP@ucsd.ucsd.edu

henry@utzoo.uucp (Henry Spencer) (08/26/88)

In article <20673@tut.cis.ohio-state.edu> bob@allosaur.cis.ohio-state.edu (Bob Sutterfield) writes:
>But Path: lines aren't to be used for mail replies, only as an audit
>trail ...
>If you need a valid, reachable address for mail, you're supposed to
>use the From: line...

That's the theory.  The practice is somewhat different.

(Don't forget the use of Path: for loop prevention, too.)
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

david@ms.uky.edu (David Herron -- One of the vertebrae) (08/27/88)

Another way to get short path names is to get yourself lots of news
feeds ... it does wonders!
-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<---- Problem: how to get people to call ...; Solution: Completely reconfigure 
<---- your mail system then leave for a weeks vacation when 90% done.