[net.news.b] new algorithm for expire

honey@down.FUN (Peter Honeyman) (08/03/85)

recently, someone suggested a new way to expire.  the gist of it was to
expire "conversations" not articles.  (it is mandatory to reference larry
peterson's work whenever mentioning conversations in the context of
electronic communication, but i forget when he presented his dragonmail
usenix paper, sometime in the last year.  sorry, larry, i tried.)

in a nutshell, expire only those articles that haven't been reference*-ed
for N days.  (reference* is the transitive and symmetric closure of the
relation induced by the References: header.)

this fairly shouts "union-find!  union-find!" (which can be distracting
late at night, and might even wake the baby).  since applications of
union-find don't crop up every day, it's important not to let this one
get away.

i plan to assign this to the next junior that walks up to me and says
"don't hurt me professor honey; do you have any ideas for independent
study projects?"  "no problem, grasshopper, write me a program."  just
think, path compression in netnews code.

can anyone tell me why references should contain anything other than the
last article referenced?  larry would be disappointed with any practice
other than shipping the transitive reduction of the references relation.
since netnews conversations are tree-like, this is just the article being
up-followed.

i don't recall ever having seen an article that merited a non-standard
Expires: header.  anyone else?

	peter

jaw@ames.UUCP (James A. Woods) (08/05/85)

		
#  Fools ignore complexity.  Pragmatists suffer it.
   Some can avoid it.  Geniuses remove it.
	-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. l982
.....
		Professor Honeyman's expire nightmare (in one-act)

simpleton:
	why not just do find /usr/spool/news -mtime -14 -exec rm {} \;
	the old-fashioned unix way?
honey: 
	but what about conversations, you dolt?  we need reflexive transitive
	closure for this job -- hmm ... four russians algorithm, no, valiant's
	sub-cubic method, no wait, union-find!  that's it, union-find!

puppy dog undergrad:
	ah professor, you know, even though bell labs tried some of these
	sixth-generation data structures (like hash tables), it's tough
	to grind out sophisticated production code that won't break.  we
	might need more than two credits of independent study for this...
honey:
	nonsense, i might even have something leftover from pathalias.
	[(scratching head) uh well, maybe pathalias does take a few hours to
	regenerate; remind me to work on that so we can play what-if
	games with nsc better].  listen, don't expect to graduate unless you
	understand this algorithm stuff like the back of your hand.  look,
	not only all that crap you learned about huffman codes over in ee
	has been replaced by lempel-ziv, but berkeley does simulated annealing
	in the kernel to get around some tough np-complete register allocation
	probs.  remember, we gotta get cray v8 out of beta test next week.
	just go away.

grad student replacement:
	hey, look at this neato splay tree stuff sleator & tarjan published.
	in pseudo-pascal, even.  my little 9-year-old sister just typed it
	in on her amiga, and it runs rings around union-find. 

honey: 	say what?

9-year old:
	yeah, nice average-case linear behavior, but i had to junk it.
	you're all thinkin' too hard.  i just put horspool's dynamic perfect
	hasher into crontab, and now we expire conversations in one probe!
	don't even need boyer-moore grep to catch false drops.
honey:
	say, where did that simple simon go?

--ames!jaw

avolio@decuac.UUCP (Frederick M. Avolio) (08/08/85)

I agree with what Peter says in here.  Since we haven't agreed on much in
the recent past, I just throught I'd mention it.... :-)

Fred.

fair@ucbvax.ARPA (Erik E. Fair) (08/18/85)

I think that this is an excellent idea. Among the other things, we
could eliminate the horrendous business of including the previous
article in a followup article, because that article would still be
online everywhere else, if the discussion was current. (if it wasn't
why are you kicking a dead dog, anyway?)

Do you think you can find the necessary, uh `volunteer', graduate or
undergraduate labor this fall, Peter?

While that is being done, it might also be nice to have rnews construct
a database from the `References:' header line.  You end up with a tree
that way:
                        base
                        /|\
                       / | \
                     r1  r2 r3
                    / \     /|\
                 r1.1 r1.2 / | \
                          /  |  \
                      r3.1  r3.2 r3.3

and so on. We can use rnews and expire to maintain this database.

The three prevalent user-interfaces (readnews, vnews, and rn) should
use it for presentation ordering, such that on each responses level,
the articles are presented in the order they were originally posted
(chronologically). When you have exhausted one level, you go on to the
next level (responses to responses), and so on.

Some nice things fall out of having this functionality:

ordering
	You actually get the conversation in the order in which it
	occurred, rather than in the order it arrived on your machine.

change of Subject:
	You can change the subject line to reflect the current content
	of the article you're posting, without worrying that no one will
	find it (things will relate by `References:', not this primitive
	`search for the same subject' junk).

response restriction
	``You haven't read everything currently online related to this
	  subject/topic, do you really want to followup now, no doubt to
	  give the same response that 100 other people did?''

no more inclusion of previous articles
	Why include when your readers can all just pop back up the tree
	and re-read the thing you responded to?

killing a discussion
	Given that people do change the subject lines, you can kill an
	entire discussion easily, including responses that have
	completely different subject lines. Alternately, you can ask
	your user interface to kill all the articles with the identical
	subject line, but stop when the subject changes (indicating a
	shift in the conversation, presumably).

Larry Wall (rn), Kenneth Almquist (vnews), are you listening?

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

P.S.	Peter, I re-read the DRAGONMAIL paper. When I first heard it at
	SLC USENIX, it sounded, well, like a version of netnews for mail.
	Now I see the wisdom of the thing, and wish I had a copy to
	deal with my 50-100 letters per day... Larry Peterson is at U of
	Arizona, busily working on version 2, according to his response
	to my status query.

david@ukma.UUCP (David Herron, NPR Lover) (08/19/85)

It's a nice idea Eric, but you need to avoid the "Orphaned Response" problem.
-- 
--- David Herron
--- ARPA-> ukma!david@ANL-MCS.ARPA
--- UUCP-> {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!david
---        {ihnp4,decvax,ucbvax}!cbosgd!ukma!david

Hackin's in me blood.  My mother was known as Miss Hacker before she married!

fair@ucbvax.ARPA (Erik E. Fair) (08/21/85)

We can't have any such problem, unless the authors of software do
something foolish (e.g. throw away the subject line in a response).

I am suggesting a database which contains all the discussion trees.
I am not suggesting that we monkey in any way with the format of the
header of a netnews article. Thus, the subject line is always there,
even in a response.

Does anyone have further comments?

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

jaw@ames.UUCP (James A. Woods) (08/23/85)

'tis actually a Capital idea, especially now that mr. fair has delineated its
myriad uses so nicely.

so, bring on the bitmaps, the sixth-generation data structures,
the menu mnemonics, and the animiconic dancing girls.

however, whatever method prevails upon the slave coder, we hereby
request that it become known as "honeyman normal form".

sincerely,
   ames!jaw