[news.software.b] What should go in the References: line?

jef@well.UUCP (Jef Poskanzer) (11/19/89)

Email from Brad and the posting from Karl reveal that the only reason
they think there is a problem with  References: %i  is that their
software is badly written.  A competent programmer will have no
difficulty handling a message tree with only immediate parent
references.  A competent programmer will realize that this is all he
or she can reasonably depend on having.  The net should not feel
obliged to bear the burden of "% interp buffer overflow" and > 256
character lines just so that Brad and Karl can get by with simple
but fragile code.  The net is not a simple place, guys.  I'm sorry,
but you'll just have to adapt.

I have been using  References: %i  for three years, everyone I have
set up to use netnews uses it too, and we will continue to use it.
If you don't want *your* message trees fragmented, then fix *your*
software.
---
Jef

      Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
"...but no sooner does he take a pen in his hand than it becomes a torpedo to
            him, and benumbs all his faculties." -- Samuel Johnson

wcf@psuhcx.psu.edu (Bill Fenner) (11/19/89)

In article <14619@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
|If you don't want *your* message trees fragmented, then fix *your*
|software.

And what do I do about my feed's feed's feed, who drops every third article?
-- 
Bill Fenner                   wcf@hcx.psu.edu             ..!psuvax1!psuhcx!wcf
sysop@hogbbs.fidonet.org (1:129/87 - 814/238-9633)     ..!lll-winken!/

jef@well.UUCP (Jef Poskanzer) (11/19/89)

In the referenced message, wcf@psuhcx.psu.edu (Bill Fenner) wrote:
}In article <14619@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
}|If you don't want *your* message trees fragmented, then fix *your*
}|software.
}
}And what do I do about my feed's feed's feed, who drops every third article?

You are losing a third of the news and you want to complain about a
few missing message-ids?  You have a strange sense of priorities.

Get a new feed.  Better yet, get two.  Then install a tree-structured
news reader.  *Then* you can complain.
---
Jef

      Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
                         Close cover before striking.

wcf@psuhcx.psu.edu (Bill Fenner) (11/19/89)

In article <14624@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
|In the referenced message, wcf@psuhcx.psu.edu (Bill Fenner) wrote:
|}In article <14619@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
|}|If you don't want *your* message trees fragmented, then fix *your*
|}|software.
|}
|}And what do I do about my feed's feed's feed, who drops every third article?
|
|You are losing a third of the news and you want to complain about a
|few missing message-ids?  You have a strange sense of priorities.

Sorry, I suppose it wasn't clear enough that that was an exaggerated
hypothetical situation.
-- 
Bill Fenner                   wcf@hcx.psu.edu             ..!psuvax1!psuhcx!wcf
sysop@hogbbs.fidonet.org (1:129/87 - 814/238-9633)     ..!lll-winken!/

brad@looking.on.ca (Brad Templeton) (11/19/89)

It is not a matter of being a competent programmer -- let's not get
insulting.

It's a matter of being an efficient programmer.  Yes, you can chain
back through references, *if* you have gotten the whole chain (which
you quite often haven't, due to dropped articles, expired articles and
articles that come out of order.)

But chaining back one at a time is very inefficient, even in the (more
rare than you think) case where the whole chain is on your machine.

Let's say I've killed tree with root <x>.  You're suggesting that I
either:
	a) For every article that comes through, chain backwards up to
	   its root, one article at a time, to see if it's <x>, or
	b) Every time an article comes in that references <x> or any
	   child of <x>, save that in the kill directory.

The first is too slow to be usable, the second uses immense amounts of
disk space, particularly since it is stored per user, not per site.

In fact, I want to display a series of articles as a tree with an
X client.  Now I have to chain through dozens of already-read articles
one article at a time just to see the tree?

If enough people are going to be annoying and break the references line,
then the only choice may well be to have rnews programs repair them
as they come in.  But repairs will never be perfect, particularly if
*any* part of the chain gets lost.

Or, in the end, the readers may have to just throw away articles that
don't have a usable header.   That's what humans do on the net
these days.  Put a subject of '<none>' or "forwarded from bitnet" or
"orphaned response" on your message, and I just toss it.  I haven't the
time to piece through to find out what the article's about.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

brad@looking.on.ca (Brad Templeton) (11/19/89)

It occurs to me that if we can't stop people from breaking the references
line, can we at least define a standard for indicating that info has
been removed?

If we *know* that the line has been truncated or removed, we can usually
do some repair.

And the nice thing is, only a few sites have to go about repairing, as
everybody downstream gets the repaired article.

For example, if there is NO references line and a "re:" subject, we know
we need to repair.  We can usually search for the subject in other messages
in the group to find the root.

Or if that fails (or in addition) we can search for things like:
	"In <xxx@yyy>, foo writes"
to try to get the parent.

Or failing that, look at included text and try to find the parent(s) with
it.

In other words, repair is possible -- if we know we need to do it.  This is
too slow a process to do on every incoming article.

Unfortunately, even these repairs may not work if the parent of a
References: %i
article never arrives, or comes in late.  Sigh.


Perhaps people should use:

References: <DELETED@DELETED> %i

instead?   or <?> or somesuch.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

trp@b-tech.ann-arbor.mi.us (Thomas Parker) (11/20/89)

>software is badly written.  A competent programmer will have no
>difficulty handling a message tree with only immediate parent
>references.  A competent programmer will realize that this is all he

Even a competent programmer cannot write software to follow a linked 
list where links are missing.  This will be the case for news.

karl@ddsw1.MCS.COM (Karl Denninger) (11/21/89)

In article <14619@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
>Email from Brad and the posting from Karl reveal that the only reason
>they think there is a problem with  References: %i  is that their
>software is badly written.  A competent programmer will have no
>difficulty handling a message tree with only immediate parent
>references.  A competent programmer will realize that this is all he
>or she can reasonably depend on having.  The net should not feel
>obliged to bear the burden of "% interp buffer overflow" and > 256
>character lines just so that Brad and Karl can get by with simple
>but fragile code.  The net is not a simple place, guys.  I'm sorry,
^^^^^^^^^^^^^^^^^^
>but you'll just have to adapt.
>
>I have been using  References: %i  for three years, everyone I have
>set up to use netnews uses it too, and we will continue to use it.
>If you don't want *your* message trees fragmented, then fix *your*
>software.

I'll be more than happy to modify our software (we do it all the time) when
you explain how a computer can manage to mind-read when parts of a tree are
lost.... and all the software has is a reference to another article WHICH 
HAS NOT AND NEVER DOES ARRIVE HERE.

Now how does my system, or any other, figure out which thread that article
belongs to?  "Guessing" based on the subject line is a bad answer, as it's
often user-modified, and may be repeated even when the actual subject is 
DIFFERENT.

Hell, I can't mind-read.  How can a computer manage it?

My solution for now?  DUMP THE OFFENDING ARTICLE.  Sure, I could file it as
another "base"..... but have chosen not to do that, as I'd rather have fewer
articles in the message base than 50 threads all on the same subject.

I've no problem with handling immediate references >if< propagation is
perfect.  It isn't, and that is something you are IGNORING.

Bad coding?  No.  Operating under the (correct) assumption that articles are
delayed, lost, and mangled?  Yep.

You aren't going to get >256 character lines if you trim the references line
to ONE parent article and the BASE ITEM.  You also won't get them if your 
software does the recommended thing with oversize lines and uses the 
continuation capabilities (hint: try "<tab> xxxxxx" after the first line;
that's in the RFC too).  

Our software deals with both situations in a rational manner, and I bet 
Brad's does too.

Blaming Brad's and my software as an example of "poor coding" is a bunch of
nonsense when you can easily output something reasonable in the reference
line.

--
Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl)
Public Access Data Line: [+1 708 566-8911], Voice: [+1 708 566-8910]
Macro Computer Solutions, Inc.		"Quality Solutions at a Fair Price"

jef@well.UUCP (Jef Poskanzer) (11/21/89)

Brad, normally I charge $75/hour for advice like this, but since you're
being such a dick I'll make an exception.

Yes, Usenet is unreliable.  Yes, articles arrive out of order or don't
arrive at all.  That's why you *can't* do it the way you want to.  The
basic problem with your approach is, what do you do if the "root"
arrives late or not at all?  And by induction the same problem applies
to the rest of the tree.  You basic conceptual model of the problem is
useless.

You have to view it as a database problem.  Look at the questions you
want to ask, and figure out what kind of database you need to answer
them.  For instance, one question is: given a current article X and
a killed message id Y, is Y an ancestor of X?  The database you have
designed cannot answer this question in general.  It can only answer
the question if Y happens to be what you think of as a "root".  This is
a very bad design.

One database that can answer this question correctly is, given a
message id, return a list of ancestor message ids.  (I apologize to the
audience for belaboring the obvious, but Brad seems to need a little
hand-holding.)

Another question that one might want answered is, given a current
article X, what is the immediate parent article?  Your database can
only answer the less interesting question, what is the closest
preceeding article that happens to be somewhere in the same "tree"?

The same database mentioned two paragraphs ago can handle this query,
assuming the message id lists are returned sorted.  And when the
question cannot be answered, it can tell you that, instead of lying.
In case this slipped by you, Brad, I'll emphasize it.  It is better
to report a failure when a question cannot be answered than to give
an incorrect answer.  Then the user can decide whether to ask a more
general question or not.

Brad, I suggest you take a look at "nn" for a sample implementation
of a database derived from netnews.
---
Jef
                                   
  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
                         "How's it goin', eh?"

jef@well.UUCP (Jef Poskanzer) (11/22/89)

In the referenced message, karl@ddsw1.MCS.COM (Karl Denninger) wrote:
}My solution for now?  DUMP THE OFFENDING ARTICLE.

Posting 1:

    From: joe
    Subject: Cows
    Message-Id: <23@kudu.uucp>

Posting 2:

    From: bob
    Subject: Re: Cows
    Message-Id: <666@elk.uucp>
    References: <23@kudu.uucp>

Tell us, Karl, what your oh so robust software will do if posting 2 happens
to arrive before posting 1?

"Duhhhhhhhhhhhhh........"
---
Jef
                                   
  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
 "We are all here on earth to help others.  What I can't figure out is
             what the others are here for." -- W. H. Auden

amanda@intercon.com (Amanda Walker) (11/22/89)

In article <19Nov89.535AAE456@b-tech.mi.org>, trp@b-tech.ann-arbor.mi.us
(Thomas Parker) writes:
> Even a competent programmer cannot write software to follow a linked 
> list where links are missing.  This will be the case for news.

Even aside from this pragmatic issue, there's another issue here that
has been peeking out from underneath Brad's & Karl's comments.  If a
perfectly valid news article causes RN to have problems, then a good
thing to do is to fix RN, *whatever workarounds may exist at the moment*.
Trimming the references line is just that: a workaround to a BUG in RN
that makes RN sensitive to line lengths.

It seems to be the case that other software has the same problem (BITNET
gateways, etc.), and this may mean that fixing RN is not enough.  Perhaps
we should change rnews & inews so that they wrap (not truncate) long header
lines to 80 (or 256, or whatever) columns.

Jeff's version of the References line makes the problem invisible for most
people, but it doesn't solve it.  Eliminating References altogether would
also make the problem invisible for most people, too...

Amanda Walker
InterCon Systems Corporation
--