[news.software.b] Some possible fixes for the references line

brad@looking.UUCP (Brad Templeton) (05/13/89)

Here are some additional, easier, ways that I have thought of to deal
with the references line question.

a) Define a special message-id <?> which means, "this message refers to
	some message, but I'm not sure what."  It would go at the front
	of a References chain.

b) If software absolutely can't find the parent chain for an article it
   puts this <?> in.  (I still don't like this)

c) A patch to inews is made that detects either "re: " messages with
	no references line, or lines that have <?> in them.  This inews
	checks the newsgroups the article is posted in for articles that
	match the subject line.  If it finds one, it inserts the
	References line for that article into the chain, replacing the <?>

This modified inews need only run at major feed sites.  They will repair
broken chains from smaller sites that don't bother upgrading.

This should fix most of the problem.  Of course, these 'orphaned reponses' 8-)
may well get linked up with the wrong chain, but at least the root parent
will be there.
--------------------

Proposal #2
	
	Right now inews programs are allowed to reduce the size of the
	References: line.   The next release of the usenet article format
	will specify that the root message-id always be kept, that it is
	advised that the immediate parent be kept, and that deletions should
	come in the middle of the chain.

	I propose we add something more to the message-id syntax:
	Currently the syntax is <unique@domain>.  If this flag is set,
	then the message would be considered a sub-root message.

	Inews programs would be advised not to delete sub-root message-ids
	from a chain, and if they must be deleted, all other ids should go
	first, and then the most recent sub-roots.

	A message gets the sub-root flag if the poster explicitly requests
	it, or perhaps if the posting program detects that the subject
	line has been changed.

	(Actually, the concept sub-root isn't strictly necessary.  The
	root message would also have this flag.  It gets distinguished
	by being the first message in the chain.)

	I propose something as simple as:
		<SRunique@domain>
	unless somebody is already using that as a prefix.

What would be the penalties for having stuff outside the angle brackets.
(We know nobody uses that.)  For example <?> could just be ??, and the
flag could go outside the brackets as in <unique@domain>+.  Would this
break existing stuff?  If not it might be the way to go.



-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

doug@xdos.UUCP (Doug Merritt) (05/13/89)

In article <3243@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes:
>	References: line.   The next release of the usenet article format
>	will specify that the root message-id always be kept, that it is
>	advised that the immediate parent be kept, and that deletions should
>	come in the middle of the chain.

I hope this is just a suggestion, and not what's really going to happen.
If anything is kept at all, the immediate parent is what is critical.
It's the only thing that (disregarding pathological cases) allows
reconstruction of the chain.

For instance, for some reason lately I've been receiving news in many
groups in reverse order (1 2 3 4 5 is received and presented by rn
as 5 4 3 2 1). A smart news reader can figure out the right sequence
from the immediate parents on the References line, whereas if all you
require to be kept is the root message-id, then you've destroyed all
sequence info. Bogus!

Don't be so single-minded about thread-deletion, Brad. There are *other*
features that are important, too.

(Speaking of which, time for me to figure out how I used to get rn to
fix the sequencing; I've forgotten how that worked...)
	Doug
-- 
Doug Merritt		{pyramid,apple}!xdos!doug	doug@xdos.com
Member, Crusaders for a Better Tomorrow		Professional Wildeyed Visionary

"Of course, I'm no rocket scientist" -- Randell Jesup, Capt. Boinger Corps

brad@looking.UUCP (Brad Templeton) (05/14/89)

Yes, I agree that both the root and immediate parent should be required if
you shorten the References line.

Another idea that I think might be valuable is to put in a placeholder
message-id when you delete some.  So if you have
	<1@a> <2@b> <3@c> ....  <26@z>
you might turn it into
	<1@a> <D> <26@z>

or somesuch, at the minimum.  Again, you should try to preserve sub-roots
wherever possible.  This lets back-following software know that items have
been deleted, and it should try other methods (like the history database)
to go backwards.


If we are going to have 'special' message-ids, any comments on whether it
is better to use things without, or outside the angle brackets, or to put
special codes inside the angle brackets.  Or both?

(Both because the <?> and <D> are truly special and should not be interpreted
by current software as a message id, while sub-root message-ids do need to
be interpreted by current software.)
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

lear@NET.BIO.NET (Eliot Lear) (05/15/89)

About the ``root'' of a thread - it changes as articles get expired
off the system.  Thus, using the concept of a root is probably shakey.

That said, there seems to be no reason why one couldn't limit the
number of message ids listed in a reference line.  You need more than
one for redundancy's sake, but there is a point where you could stop.
Simply climb up the tree to find the root.

Speaking of trees, here is another monkey wrench.  What should happen
if someone wants to reply to two messages?  Why not have the ability
to list all the relavent messages in the references line?  So much for
trees.

By the way, if you are looking to save space, I suggest starting with
.signature files.  But that's an argument for another day (far far
away from now).
-- 
Eliot Lear
[lear@net.bio.net]

brad@looking.UUCP (Brad Templeton) (05/15/89)

No, the message tree is something that exists independent of your individual
machine.  The root is still the root no matter if it has expired off
every machine on the net.

To chain up through parents using information on your own machine, only the
immediate parent is required -- unless you want to chain past expired articles
to their parents that somehow didn't expire.

(You could have expire, when deleting a message and not deleting the parent,
leave the parent around in the database)

The full tree is needed to kill it, ask for it or work with parts of it.
At the least, 'important' nodes are needed.   I would define important
nodes as the root, and any nodes at which the subject truly changed.
Sadly we can never perfectly spot the latter.
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

paul@devon.LNS.PA.US (Paul Sutcliffe Jr.) (05/15/89)

In article <3248@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes:
+---------
| Another idea that I think might be valuable is to put in a placeholder
| message-id when you delete some.  So if you have
| 	<1@a> <2@b> <3@c> ....  <26@z>
| you might turn it into
| 	<1@a> <D> <26@z>
| 
| or somesuch, at the minimum.  Again, you should try to preserve sub-roots
| wherever possible.  This lets back-following software know that items have
| been deleted, and it should try other methods (like the history database)
| to go backwards.
+---------

Of course, all of Brad's ideas assume that child articles always refer
back to the original parent (e.g. <1@a> above).  Many times I've seen
(and started) followups that change the subject thread.  In my own
case, I edit the References: header to only show article id's that I
know my response is referencing (isn't that what References: is for?).

Which is my point:  I also edit the Subject: header when I do this,
though most people don't.  If you can't get people to properly edit the
Subject: header (when the discussion thread changes), how are you going
to get them to correctly handle the References: one?

The Notes software package has "base-notes" from which the responses
(followups) are "attached."  How is that done, and could the same (or a
similar) method be incorporated into B-news 2.11, 3.0 or C-news?

- paul
-- 
INTERNET:  paul@devon.LNS.PA.US		|   How many whales do you have to
UUCP:	   ...!rutgers!devon!paul	|	save to get a toaster?

davecb@yunexus.UUCP (David Collier-Brown) (05/16/89)

>In article <3243@looking.UUCP> brad@looking.UUCP (Brad Templeton) writes:
| References: line.   The next release of the usenet article format
| will specify that the root message-id always be kept, that it is
| advised that the immediate parent be kept, and that deletions should
| come in the middle of the chain.

In article <283@xdos.UUCP> doug@xdos.UUCP (Doug Merritt) writes:
| For instance, for some reason lately I've been receiving news in many
| groups in reverse order (1 2 3 4 5 is received and presented by rn
| as 5 4 3 2 1). A smart news reader can figure out the right sequence
| from the immediate parents on the References line, whereas if all you
| require to be kept is the root message-id, then you've destroyed all
| sequence info.

  Please select a "fix" which retains as much of the reference
information as possible (there is an extension technique in the RFC,
which most mailers use).  Since people actually see articles **in
different orders**, reconstructing the sequences is difficult without
the order.
  I plan on using this very information for building citation trees, so
I'd appreciate if it isn't lost!

--dave (only four more days to monomania!) c-b

brad@looking.UUCP (Brad Templeton) (05/17/89)

Of course, if you are TRULY changing the subject, you should not post your
message with a Followup command.  If you must use a followup command, then you
should, as you say, delete the References line and type in a new subject.

If you are only partially changing the subject, to a sub-topic, then you
should use the followup command, keep the references line, change the
Subject: line and mark your message as a 'sub-root' message.

If you are just adding a new element to the dicussion, you should keep the
subject, put in a proper references line and put in a distinct summary line
for your message.

If you are not adding anything to the discussion, you should not post..
(Few people follow that rule.)
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

brad@looking.UUCP (Brad Templeton) (05/17/89)

Yes, clearly you want to keep as much of the References line as possible,
but it is noted that in a long back-and-forth discussion, this can easily
get huge.

It was thus put into the standard that sites could edit the line as
required.

I suggested that this not be so broad, and that it say that at the
very least the root and immediate parent be kept.

I have been making the further suggestion that we find some way of
spotting sub-root articles (where the topic changes a bit) and asking
that they be kept, if possible.

Unless we require the whole line, which I doubt we can do, you won't
be able to sort articles based on it if the chain gets too long.
The posting date can still be used.   If the <D> convention gets used
to mark deleted items,  (perhaps <Dnn> where nn is the number deleted)
then you will be able to sort on the References line, _most_ of the time.
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

matt@oddjob.uchicago.edu (Matt Crawford) (05/23/89)

) Speaking of trees, here is another monkey wrench.  What should happen
) if someone wants to reply to two messages?  Why not have the ability
) to list all the relavent messages in the references line?  So much for
) trees.

This is not hypothetical.  People have merged references lines together
"by hand".
________________________________________________________
Matt Crawford	     		matt@oddjob.uchicago.edu

pokey@well.UUCP (Jef Poskanzer) (05/23/89)

In the referenced message, lear@NET.BIO.NET (Eliot Lear) wrote:
}Speaking of trees, here is another monkey wrench.  What should happen
}if someone wants to reply to two messages?

Yes, I do this all the time.  It's the secondary reason that I list only
the message(s) I am directly replying to in my References line.

The primary reason is, of course, rn's idiotic fixed-length interp buffer.
---
Jef

            Jef Poskanzer   jef@helios.ee.lbl.gov   ...well!pokey
          "Who's going to believe you? You're just a talking head."

ulmo@ssyx.ucsc.edu (Brad Allen) (05/25/89)

> No, the message tree is something that exists independent of your individual
> machine.  The root is still the root no matter if it has expired off
> every machine on the net.

But the root usually has very little relevence, at least on the tree
systems I'm used to:  there is one root message, the very topmost node
of the tree, and all 50,000 messages or so are underneath that one message.

But that's a very controlled tree at that.  On something as large as 
the world, I can't expect a single root, and I continue to shun the idea
of multiple roots since that only entices stupid people to think that
any particular discussion ought to stay right on some particular track
(which they shouldn't, or not usually).

ulmo@ssyx.ucsc.edu (Brad Allen) (05/25/89)

> Of course, if you are TRULY changing the subject, [...]

This is rarely the case.

ulmo@ssyx.ucsc.edu (Brad Allen) (05/25/89)

> I have been making the further suggestion that we find some way of
> spotting sub-root articles (where the topic changes a bit) and asking
> that they be kept, if possible.

This sounds silly and confusing, though you are fairly close to the point.

More sophisticated links, links which have authorship, authentification,
programmability, date, version#, scope, locality, etc.,
or something more on that level, would probably be a better direction now.
We've been using a single References line for a long time now,
it's time something better were desgined.
developed.