[net.news] Information Overload and What We Can Do About It

fair@ucbvax.ARPA (Erik E. Fair) (09/14/85)

Have you ever wondered why the notesfiles people are so smug about
the superiority of their system over netnews?

Or why has `rn' been such a big hit with the USENET user community?
(of course, if you're using it, you probably know, but bear with me
for the moment anyway).

The USENET user community as a whole is suffering from information
overload; that is, there are more items coursing the paths of the
network than any single individual can read in a reasonable period
of time.

As the volume of messages in the newsgroups that I choose to read
increases, there are two steps I can take to be more efficient:

1) I can arrange to read netnews at a higher baud rate
	(instead of 1200 baud, how about 9600 or 19200?).
	This will allow me to make my article selections faster,
	and hopefully be able to handle more articles per unit time
	than I did at 1200 baud.

2) I can prioritize the list of newsgroups that I read and
	remove some newsgroups from the bottom of the list,
	until the volume is manageable again.

However, these traditional mechanisms for limiting time spent reading
netnews are no longer sufficient, because they're not specific enough.
What I need now is a set of automatic structuring and filtering
mechanisms for articles.

Remember my original questions about notesfiles & rn? The reason that
these two user interfaces are popular is that in addition to providing
the usual amenities (screen oriented interface &c), they also structure
the information presented to the user, and `rn' provides the first of
many possible filtering mechanisms for removing from view articles that
the user is not interested in.

If you were to grep for the Subject line in any high volume newsgroup,
my observation is that you would find 80% or more of the articles are
responses, rather than original articles. To the notesfiles user, the
`base note' (the first article) and all the responses appear as one
item in the presentation menu.

It is considerably more daunting to hit `=' in rn, in a newsgroup you
haven't read in many weeks and see the list of hundreds of individual
articles that have accumulated. Fortunately, `rn' provides you with the
facility to `kill' (remove from the list of unread articles) all of the
articles with a specific subject (including the `Re:' subjects). This
brings us to:

	I N F O R M A T I O N   S T R U C T U R E

Right now (with the exception of rn & notes) netnews articles are
presented to the user in the order they arrived on the system. This is
not optimal. To create structure in the way that netnews articles are
presented, we can start (as rn does) with the Subject line, and follow
that along, presenting articles whose subjects match. This gives us the
thread of a discussion.

However since responses can and frequently do arrive on a system out of
order, we should sort by date of submission (i.e. the contents of the
`Date:' field). This will give us the discussion in the chrological
order in which it occurred.

There is even more information in the header that we can use to order
the articles into a discussion more accurately than with `Subject:'
and `Date:'.  I mean the `References:' line.

Presently, the only use that any of the user interfaces make of this
field is for finding the `parent' article of the current article (that
is, the article to which the current article is a response).

We can use this information for following discussions by building the
tree that discussions form:

			  a
			 /|\
			b c d
			   / \
			  e   f

If this information is put into a database that is easily used by the
various user-interfaces, the following things are possible:

1) accurate ordering and presentation of the discussions that take
	place on the network

2) differentiation between the various sub-branches of the tree of
	discussion (one branch goes off discussing foo from foobar,
	the other discussing bar from foobar)

3) change of subjects to reflect actual message content to facilitate #2,
	without affecting #1 (i.e. no more `Re: foo (really bar)')

4) delay posting of responses until the user has read the entire
	tree (or at least as much of it as is online at his site).
	We have a problem with users asking a trivial question, to
	which everyone knows the answer (and everyone immediately
	responds!). If the user-interface holds the followup until the
	user has read all the articles in the tree, and asks again
	whether the submitted response is still appropriate, the
	incidence of this problem should drop significantly. This
	should also cause a drop in network traffic.

5) lessen the necessity of including the text of the article to which
	one is responding. (the `parent' command of vnews, and ^P in
	rn also provide some of this functionality).

It is this particular structure that makes the netnews data storage
structure superior to notesfiles.

However, we still have the problem of too much information to read and
understand, which leads into:

		F I L T E R I N G   M E C H A N I S M S 

As I mentioned, rn provides for removing articles with subjects you are
disinterested in, from your view. However, given the proclivity of users
to change the subject line, for a less than titanic change of subject
(in which you probably still have no interest), rn's current mechanism
for killing discussions misses the mark. Given the database described
above, rn would never miss.

A subject, however, is not the only criterion that you might wish to
filter with. Consider the following information that might be useful
to filter by:

author		(also known as the `bozo' filter)
site		(they're all bozos on that bus)
date		(kill articles that are four days old)
time		(kill articles composed between 0000 and 0600?)
transit-time	(kill articles that took more than x days to get here)
length		(anything too small or too big)
newsgroups	(in a multiple group posting,
		  skip if `net.flame' is one of the other groups)
keywords	(suppose that postnews mungs up a set of keywords
		  from the body of the article when it was first posted...)

Consider also that any of these criteria can be used for article
selection (i.e. to *find* articles) as well as in article de-selection.

Finally, one more mechanism: we use moderators as a filtering
mechanism, in that they select appropriate articles to broadcast to the
network.  In our electronic publishing medium, they are the editors.

With the appropriate statistical information gathered by the
user-interfaces on the system, other users on your system can act as
editors for you. Ideally, I should be able to tell the user-interface,
`show me all the articles that John Smith <jsmith> thought were
interesting'. In this way, John Smith becomes my editor. Alternately,
`show me everything that John Smith and Jane A. Nonymous did not look
at' should also be a valid filter.

		W H A T   D O   W E   D O   N O W ?

The structuring of netnews articles should be easy to implement; all of
the necessary hooks are there, we're just not using the information
contained in the header as yet. Clearly this is a database function
that should go into rnews and expire for update & maintainance, rather
than in the user-interfaces.

The more mundane filtering mechanisms that I suggested should also be
relatively easy to implement, given `rn' as a base. The `other local
users as editors' idea will take some work.

With the volume of network traffic increasing, there is no doubt in my
mind that we will have a test of fire (site death by network byte?).
However, I think that the mechanisms I have outlined, coupled with
sensible naming of groups (and management of that namespace as a whole)
will `save' the network that we know as USENET. The key is getting this
software implemented, and distributed network wide as soon as possible,
so that the peak of the deluge of information will be that much sooner,
and that much lower, than if we do nothing.

	your comments and observations are solicited,

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU


	S U G G E S T E D   R E A D I N G S

DRAGONMAIL: A Prototype Conversation-Based Mail System
	Douglas E. Comer, Larry L. Peterson, Purdue University
	SLC USENIX Conference Proceedings, June 1984, p. 42

The Readers Workbench -  A System for Computer Assisted Reading
	Evan L. Ivie, Brigham Young University
	SLC USENIX Conference Proceedings, June 1984, p. 270

Structuring Computer-Mediated Communication Systems
	to Avoid Information Overload

	Starr Roxanne Hiltz, Murray Turoff
	CACM, July 1985, Vol 28, #7, p. 680

Conversation-Based Mail
	DRAFT TR August 26, 1985

	Douglas E. Comer, Purdue University
	Larry L. Peterson, University of Arizona

tim@k.cs.cmu.edu.ARPA (Tim Maroney) (09/15/85)

Your idea for newsgroup "reviewing" (e.g., "Let me see all articles which
Mr. Foo and Mr. Bar thought were interesting") is currently being
implemented by Jon Rosenberg and Nathaniel Borenstein at the CMU Information
Technology Center, for the bulletin board system to be used on our giant
distributed file system, VICE.  You should be able to contact them at
Jon.Rosenberg (or Nathaniel.Borenstein) at either cmu-itc-linus.ARPA or
cmu-vice-postoffice.ARPA .  There is also a "wish list" document for the
system which I will post here if there is interest.
-=-
Tim Maroney, Carnegie-Mellon University, Networking
ARPA:	Tim.Maroney@CMU-CS-K	uucp:	seismo!cmu-cs-k!tim
CompuServe:	74176,1360	audio:	shout "Hey, Tim!"

mark@cbosgd.UUCP (Mark Horton) (09/16/85)

These are some good points.  I'd like to expand on one of them.

If, within a newsgroup, for each article you form
	concat(references, message-id)
and sort by the result, you'll have all the discussions in order.
Ties are broken by date of submission.  There is no need to look
at the subject line anymore (is there?)

Also, it sure would be nice if the discussions (identifiable by
having the same prefix in the above concatenation) were grouped
when you asked "what's next" it showed one line per conversation,
possibly with an article count.

	Mark

rees@apollo.uucp (Jim Rees) (09/16/85)

As Peter Honeyman likes to point out, the news system is a database and
should be treated as such.  Maybe we could use the Maryland multiple-key
dbm library.  If articles were keyed by author, subject, references,
submission date, and message-id, we would be part way there.  Or maybe we
need something more elaborate to keep track of the discussion trees.
This might make a good thesis if someone is interested (or can be strong-
armed).

freed@aum.UUCP (Erik Freed) (09/17/85)

> 
> With the volume of network traffic increasing, there is no doubt in my
> mind that we will have a test of fire (site death by network byte?).
> However, I think that the mechanisms I have outlined, coupled with
> sensible naming of groups (and management of that namespace as a whole)
> will `save' the network that we know as USENET. The key is getting this
> software implemented, and distributed network wide as soon as possible,
> so that the peak of the deluge of information will be that much sooner,
> and that much lower, than if we do nothing.
> 
> 	your comments and observations are solicited,
> 
> 	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU
> 

Erik,
	Your mission should you chose to accept it... Sounds like the perfect
weekend project for you :-)
-- 
-------------------------------------------------------------------------------
                           Erik James Freed
			   Aurora Systems
			   San Francisco, CA
			   {dual,ptsfa}!aum!freed

chuqui@nsc.UUCP (Chuq Von Rospach) (09/17/85)

In general, Erik is right on the mark. He and I seem to be coming to more
or less parallel conclusions about the same problems, as I've been working
for about the last two months (on and off) on something I'm calling NNTN
(Not Neccessarily The Net). It matches Erik's thoughts to a high degree, so
rather than bore you with lots of verbiage (for once) I'll just make a few
quick comments. 

In article <10381@ucbvax.ARPA> fair@ucbvax.ARPA (Erik E. Fair) writes:

>1) I can arrange to read netnews at a higher baud rate
>	(instead of 1200 baud, how about 9600 or 19200?).

I've actually found that reading at 1200 baud is better, faster, and/or
more efficient for me (with rn) because I get more bloodthirsty about
zapping stuff.

>	I N F O R M A T I O N   S T R U C T U R E

>Right now (with the exception of rn & notes) netnews articles are
>presented to the user in the order they arrived on the system. This is
>not optimal. To create structure in the way that netnews articles are
>presented, we can start (as rn does) with the Subject line, and follow
>that along, presenting articles whose subjects match. This gives us the
>thread of a discussion.

This isn't quite accurate. rn shows articles sorted by newsgroup preference
sorted by subject line sorted by arrival, and other news programs show
articles sorted by newsgroup by arrival.

>		F I L T E R I N G   M E C H A N I S M S 

>Consider the following information that might be useful
>to filter by:
>
>author		(also known as the `bozo' filter)
>site		(they're all bozos on that bus)
>date		(kill articles that are four days old)
>time		(kill articles composed between 0000 and 0600?)
>transit-time	(kill articles that took more than x days to get here)
>length		(anything too small or too big)
>newsgroups	(in a multiple group posting,
>		  skip if `net.flame' is one of the other groups)
>keywords	(suppose that postnews mungs up a set of keywords
>		  from the body of the article when it was first posted...)

One of the things that looks very attractive to me right now is
disassociating the concept of a 'newsgroup' from the user interface
completely. The ONLY thing the user should see is the subject line. Right
now all news programs do their primary key using the newsgroup when the
primary piece of information of interest is the subject. I suggest trashing
the concept of a 'newsgroup' completely, and switching to having a set of
distributions (world, continent, country, region, state, city, and site,
and let the program worry about the details of that they are), a set of
'required keywords' [known system-wide, at least one required per message]
and a set of 'optional keywords' chosen by the user. 

One thing you can then do is define a default distribution to a required
keyword as well. net.flame could translate into {region,flame} and
net.unix-wizards would translate into {world,unix|expert} or some such.
Newsgroups can then be mapped into the required keyword set, and the
filtering mechanism will do that part of the work for you. You no longer
have to worry about seeing the same 'M'arked message popping up in both
net.news and net.news.adm because you don't see the groups anymore, you
just deal with the messages.

>		W H A T   D O   W E   D O   N O W ?
>Clearly this is a database function
>that should go into rnews and expire for update & maintainance, rather
>than in the user-interfaces.

One other thing that ought to be considered is moving the filtering
mechanism out of the user interface. One of the important design elements
of NNTN is that there is now a NNTN_reader and an NNTN_daemon for a user --
the daemon spawned when you log on. There is not only a system database,
but there is a user database as well. The daemon deals with the system
database, updates the user database and does 'biffing' as requested, and
the reader program reads. This gets rid of the irritating waits when rn
needs to rebuild the .newsrc stuff, uses group or glocal kill files, or the
'k'ill key, since it is all done background.

The other things I've found about the user interface is that there is no
reason why news and mail ought to have separate programs/interfaces.
Whether the message is news or mail should be part of the
filtering/priotizing setup, but is irrelevant to 99.44% of the user
interface. A new filtering bit would be whether it is public or private
based, but whatever interface deals with news should deal with email as
well.

-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

Take time to stop and count the ewoks...

lwall@sdcrdcf.UUCP (Larry Wall) (09/19/85)

I'd love to make rn run off of a multi-key dbm file.  Who'll rewrite inews?
I wouldn't mind, but I don't think I have the time.  I can't even keep up
with my mail on rn, and I'm also maintaining patch and warp.  In a few days
I plan to post an automatic Configure script generator, and that will take
all the more time.  Every now and then I have to do some "real" work too.

Is this multi-key dbms that was mentioned public-domain, portable, reliable,
and efficient?  How much memory would it steal from a process on a dinky
machine?  How does it do on disk space?  Does it rely on "holes" in files?
How can I get a copy?

Partly yours,

Larry Wall
{allegra,burdvax,cbosgd,hplabs,ihnp4,sdcsvax}!sdcrdcf!lwall

mason@utcsri.UUCP (Dave Mason) (09/20/85)

Chuqui's news article bore a marked resemblance to mail I sent to
Eric following his posting.  I agree with cancelling the newsgroups...
there is just too much cross correlation.  Maybe it would even cut down on
total net traffic when people in net.micro.a saw the discussion in net.micro.b

The other thing I mentioned is that I had my doubts regarding using other
people as filters.  The only use I could see would be office mates choosing
to read different things & filling each other in on the useful bits,
otherwise I can't see someone else's reading being a useful guide to me.

As I wrote that last I realized something that might be useful: some
kind of highlight file where each user could mark articles they felt
were particularly good or insightful.  I COULD definitely see reading
stuff that people I respect felt were important.

-- 
Usenet:	{dalcs dciem garfield musocs qucis sask titan trigraph ubc-vision
 	 utzoo watmath allegra cornell decvax decwrl ihnp4 uw-beaver}
	!utcsri!mason		Dave Mason, U. Toronto CSRI
CSNET:	mason@Toronto
ARPA:	mason%Toronto@CSNet-Relay

inc@fluke.UUCP (Gary Benson) (09/21/85)

Chuq Von Rospach {nsc!chuqui@decwrl.ARPA} recently wrote:

> One of the things that looks very attractive to me right now is
> disassociating the concept of a 'newsgroup' from the user interface
> completely. The ONLY thing the user should see is the subject line.

As one user, I object to this. I make my 'n' decisions at least partly on
the sender, and in some cases the originating site.

> One thing you can then do is define a default distribution to a required
> keyword as well. net.flame could translate into {region,flame} and
> net.unix-wizards would translate into {world,unix|expert} or some such.
> Newsgroups can then be mapped into the required keyword set, and the
> filtering mechanism will do that part of the work for you. You no longer
> have to worry about seeing the same 'M'arked message popping up in both
> net.news and net.news.adm because you don't see the groups anymore, you
> just deal with the messages.

Part of this I can go for -- it provides one way out of the wilderness of
cross-posted messages. If I read it in net.flame, there is no need for
the same message to be presented to me in net.nlang.celts. HOWEVER, I do
like having my messages grouped by my different interests. I have fun
"popping in and out" of various groups just to check out what they're
discussing. If the "newsgroup" concept disappears, it is my opinion that
the only option left is mailing lists, a clearly inefficient solution to the
"grouping by interest" dilemna.

> The other things I've found about the user interface is that there is no
> reason why news and mail ought to have separate programs/interfaces.
> Whether the message is news or mail should be part of the
> filtering/priotizing setup, but is irrelevant to 99.44% of the user
> interface. A new filtering bit would be whether it is public or private
> based, but whatever interface deals with news should deal with email as
> well.

YES, YES, YES!! Not only should mail and news be part of the same interface,
but it would be nice if it also fired up a background process to get the
editor of choice fired up and ready. As things are here, it's a pain to
always wait for 2 minutes for the editor to load for each reply, followup or
new posting.

*** REPLACE THIS LINE WITH YOUR MESSAGE ***

-- 
 Gary Benson  *  John Fluke Mfg. Co.  *  PO Box C9090  *  Everett WA  *  98206
   MS/232-E  = =   {allegra} {uw-beaver} !fluke!inc   = =   (206)356-5367
 _-_-_-_-_-_-_-_-ascii is our god and unix is his profit-_-_-_-_-_-_-_-_-_-_-_

chuqui@nsc.UUCP (Chuq Von Rospach) (09/21/85)

In article <1408@utcsri.UUCP> mason@utcsri.UUCP (Dave mason) writes:
>The other thing I mentioned is that I had my doubts regarding using other
>people as filters.  The only use I could see would be office mates choosing
>to read different things & filling each other in on the useful bits,
>otherwise I can't see someone else's reading being a useful guide to me.

On a netwide basis, I also see a problem with this from the point of view
of privacy. I don't particularly want the net to know what I do or don't
read, so unless it is an optional system [best bet would be to set up a
special "accolade" command that generates a control message to be
transmitted, but imagine the overhead requirements of 1000 control messages
going out for every 'real' message to pass around the accolades...]
I don't see how I can support it.
-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

Take time to stop and count the ewoks...

chuqui@nsc.UUCP (Chuq Von Rospach) (09/22/85)

In article <698@tpvax.fluke.UUCP> inc@fluke.UUCP (Gary Benson) writes:
>Chuq Von Rospach {nsc!chuqui@decwrl.ARPA} recently wrote:
>>The ONLY thing the user should see is the subject line.
>
>As one user, I object to this. I make my 'n' decisions at least partly on
>the sender, and in some cases the originating site.

I don't think I was quite clear. That information will still be there 
and available. You would be able to do primary filtering before it ever
hits the user interface. Also, once you get down to the level of an
individual article the same general setup as 'rn' now gives would likely
apply. My comment was aimed at the level of the interface where you decide
whether or not to read an article. Currently, you first have to decide to
read a newsgroup, then you have to decide to read a given article. I'm
proposing making the first decision based on the subject line instead, and
at that point the sender and site information isn't available yet since you
aren't looking at a specific article.

>HOWEVER, I do
>like having my messages grouped by my different interests. I have fun
>"popping in and out" of various groups just to check out what they're
>discussing. If the "newsgroup" concept disappears, it is my opinion that
>the only option left is mailing lists, a clearly inefficient solution to the
>"grouping by interest" dilemna.

I disagree, since the current setup will be replaced by a set of keywords
that will allow you to define and filter material in what I hope will be a
more efficient way. newgroups as they are currently defined would map into
keywords pretty well, and you could set up your filtering mechanisms to let
you browse through a set of interests. 

If it comes together as I hope, you ought to be able to do things pretty
much the way we do it now if you want, but I also expect that you'd be able
to do them a lot better. You don't lose any functionality. you just gain a
lot more flexibility.
-- 
Chuq Von Rospach nsc!chuqui@decwrl.ARPA {decwrl,hplabs,ihnp4}!nsc!chuqui

Take time to stop and count the ewoks...

chuck@dartvax.UUCP (Chuck Simmons) (09/23/85)

> > The other things I've found about the user interface is that there is no
> > reason why news and mail ought to have separate programs/interfaces.
> > Whether the message is news or mail should be part of the
> > filtering/priotizing setup, but is irrelevant to 99.44% of the user
> > interface. A new filtering bit would be whether it is public or private
> > based, but whatever interface deals with news should deal with email as
> > well.
> 
> YES, YES, YES!! Not only should mail and news be part of the same interface,
> but it would be nice if it also fired up a background process to get the
> editor of choice fired up and ready. As things are here, it's a pain to
> always wait for 2 minutes for the editor to load for each reply, followup or
> new posting.

My state of mind when I am reading mail is quite different from my state
of mind when I am browsing through newsgroups.  When I am working with mail,
I generally intend to reply to each individual message immediately.  When
I am browsing through newsgroups, I generally intend to ignore most of the
articles.

Perhaps an analogy can be made with paper mail.  I sort my paper mail into
three piles -- junk mail, mail from friends that I intend to read right
away and answer quickly, and magazines that I intend to put aside for leisure
reading.  While it is no doubt reasonable for mail and news to have similar
interfaces, I think it is also quite reasonable for the computer to know
whether I am interested in reading and replying to my mail, or interested in
browsing through the news.

-- Chuck

dave@garfield.UUCP (David Janes) (09/23/85)

In article <2355@sdcrdcf.UUCP> lwall@sdcrdcf.UUCP (Larry Wall) writes:
| I'd love to make rn run off of a multi-key dbm file.  Who'll rewrite inews?

	I *have* rewritten inews and rnews totally from scratch and it
does basically everything that Erik E. Fair suggested. It uses a different
delivery system (for saving articles on the local site). It extensively
uses dbm type files. Newsgroups are considered just another type of keyword. 
	It is still in the final stages of debuging, and of course, 
will need 'real-world' testing. Source code for interested people in 4-8 
weeks (depending on my work load.) It still needs an 'expire' program and 
a 'readnews' type program, which are the next things I will work on. A few 
details (I'll post more later, in a week or so):

o	I use ndbm for database functions. I *might* replace this
	with mdbm, depending...
o	The main program (rnews) is rather small: I have an extensive
	news handling library (which I use a lot, and is quite nice).
o	Almost all memory is dynamically allocated, headers can be of
	infinite length (if you have the memory.) No more truncated 
	header problems. Headers can be extended across multiple lines.
o	It keeps track of all the followups to a single article in the
	ndbm file in (posting time) sorted order (along with other info).
o	It is essentially keywords based, only right now I use the
	Newsgroups: line as the keywords line, with an option to
	also use the Keywords: field (Brad Templeton's Knews). It would be 
	trivial to add Bozo filters if needed, I didn't cause I felt
	it would be dangerous, and basically unnecessary (Bozo's tend
	to stick to the same conversations, which would get killed anyway.)
o	It can support a very intelligent 'expire' [soon to be done].
o	lot's of other neat stuff!

I have been working on this (on and off) for > 1 year. My main inspirations
were the discussions in 'net.news' over the last 18 months, Knews (which it
mostly implements), down!honey's ideas of news being a giant database
(it is!), and the code in 2.10.2 Bnews (it had to be rewritten.)

It will still need extra work for things like batching, etc. It also
doesn't really care too much about Control: messages either, but that will
be taken care of. Anyone interested?

dave
-- 
The             UUCP: {utcsri,ihnp4,allegra,mcvax}!garfield!dave
Mercenary   INTERNET: dave@garfield.uucp
Programmer    CDNNET: dave@garfield.mun.cdn

"There are two types of people in the world, those who divide
the people of the world into two types, and those who can't"

chuqui@nsc.UUCP (Chuq Von Rospach) (09/24/85)

>Perhaps an analogy can be made with paper mail.  I sort my paper mail into
>three piles -- junk mail, mail from friends that I intend to read right
>away and answer quickly, and magazines that I intend to put aside for leisure
>reading.  While it is no doubt reasonable for mail and news to have similar
>interfaces, I think it is also quite reasonable for the computer to know
>whether I am interested in reading and replying to my mail, or interested in
>browsing through the news.

Actually, NNTN does deal with this concept through the use of prioritizing
features and the ability to decide to only view messages with a given or
higher priority, so you can only look at the kinds of messages you're
interested. By prioritizing mail different from news, you can do just what
you want to do.

-- 
:From the shores of Avalon:     Chuq Von Rospach 
nsc!chuqui@decwrl.ARPA          {decwrl,hplabs,ihnp4,pyramid}!nsc!chuqui

Closing your mind is not a prerequisite to opening your mouth.

ian@darwin.UUCP (09/24/85)

>> The other things I've found about the user interface is that there is no
>> reason why news and mail ought to have separate programs/interfaces.
>> Whether the message is news or mail should be part of the
>> filtering/priotizing setup, but is irrelevant to 99.44% of the user
>> interface. A new filtering bit would be whether it is public or private
>> based, but whatever interface deals with news should deal with email as
>> well.
>
>YES, YES, YES!! Not only should mail and news be part of the same interface,

NO NO! Please don't put all that news in my $MAIL file! It's all I can
do to read all the mail that people send me and still get some work done!

On the other hand, using the same set of tools for reading/writing
mail and news makes perfect sense.

Actually, I've just switched over to the _mh_ mail system (the ``one
true way'' to read mail on UNIX, by the way), in whose
terminology the news could go into a `folder' that you only read
when you have time.  Even so, the saving throw of news has recently
been that you can ignore it (and if you're lucky some of the really
useless articles will expire along with the three that you really
need and thus have to restore from tape... :=} ).

Be careful of overloading your inbox when trying to escape from
information overload.

henry@utzoo.UUCP (Henry Spencer) (09/26/85)

> ... As things are here, it's a pain to
> always wait for 2 minutes for the editor to load for each reply, followup or
> new posting.

You don't need a better news system, you need a less elephantine editor!
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

peter@graffiti.UUCP (Peter da Silva) (09/26/85)

Chuq:

	Your required keyword *is* a newsgroup.

mark@sdencore.UUCP (Mark DiVecchio) (09/26/85)

One simple step, which has to have been suggested before, is prohibit
posting the same message to multiple newsgroups.

-- 
Mark C. DiVecchio    K3FWT
[ihnp4|akgua|decvax|dcdwest|ucbvax]sdcsvax!sdencore!mark

henry@utzoo.UUCP (Henry Spencer) (09/27/85)

> I disagree, since the current setup will be replaced by a set of keywords
> that will allow you to define and filter material in what I hope will be a
> more efficient way. newgroups as they are currently defined would map into
> keywords pretty well, and you could set up your filtering mechanisms to let
> you browse through a set of interests. 
> 
> If it comes together as I hope, you ought to be able to do things pretty
> much the way we do it now if you want, but I also expect that you'd be able
> to do them a lot better. You don't lose any functionality. you just gain a
> lot more flexibility.

Chuq, I'd love to see an explanation of how people who can't even get stuff
into the right newsgroup half the time are going to cope with a more complex
and more flexible interface.  Keywords lose big if half the messages that
come in have inappropriate or just-plain-wrong keywords on them.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

chuqui@nsc.UUCP (Chuq Von Rospach) (09/29/85)

In article <251@graffiti.UUCP> peter@graffiti.UUCP (Peter da Silva) writes:
>Chuq:
>
>	Your required keyword *is* a newsgroup.

Well, yes and no. Required keywords allows you the utility of the newsgroup
setup without it getting in your way. Keywords can be implemented a LOT
more flexibly and be a lot more dynamic (if you want to start a new group,
simply use a required keyword of 'misc' and some set of defining keywords,
and if the keywords come into general usage allow them to migrate to the
rquired set. If a keyword goes out of favor, take it off the required
list...) than a newsgroup. Moving to keywords also allows us to
disassociate ourselves from the newsgroup as a place holder -- the subject
should be the place holder and the keywords filtering and selection
mechanisms. They do many things the same, but semantically they are much
different.

chuq
-- 
:From under the bar at Callahan's:   Chuq Von Rospach 
nsc!chuqui@decwrl.ARPA               {decwrl,hplabs,ihnp4,pyramid}!nsc!chuqui

If you can't talk below a bellow, you can't talk...

chuqui@nsc.UUCP (Chuq Von Rospach) (09/30/85)

References:<3166@nsc.UUCP> <5999@utzoo.UUCP>

In article <5999@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Chuq, I'd love to see an explanation of how people who can't even get stuff
>into the right newsgroup half the time are going to cope with a more complex
>and more flexible interface.  Keywords lose big if half the messages that
>come in have inappropriate or just-plain-wrong keywords on them.

Henry, please don't interrupt me with facts... (*grin*) It is intuitively
obvious that rewriting things with keywords in mind will solve the problems
of the net, build an environment for nuclear disarmament and cure acne.

Seriously, if the user interface is done right (which is a big if -- as
anyone who has spent time on Unix will attest it is much easier to do a
user interface wrong, or even mediocre) we can get the added flexibility
and power while making it easier to do things right. That's my hope. Since
I haven't even finished my design, much less implemented it, I don't know
how well it will succeed, or whether it'll flop. 

There ARE a number of things we can do. keywords can be generated
automatically for the user and given to him as a list of keywords to choose
from, for example. I've done some research in this area, and keyword
generation is tricky, but with a little help from the net-gurus and some
decent heuristics it can probably be done quite well (the trick is not
getting a list of words, the trick is getting a list of useful words...)
Until I get a prototype up, I simply won't know whether I'm breathing
sillygas or not for sure, but taking an evolutionary step forward in the
user interface should give us both more power and a simpler interface at
the same time. Easy to say, but not so easy to do (that's why I like user
interfaces -- real challenges, since the idea is to not be noticed...)
-- 
:From under the bar at Callahan's:   Chuq Von Rospach 
nsc!chuqui@decwrl.ARPA               {decwrl,hplabs,ihnp4,pyramid}!nsc!chuqui

If you can't talk below a bellow, you can't talk...

rogerh@bocklin.UUCP (09/30/85)

Ummm, Henry, logical fallacy: because users can't cope with a baroque
user interface, we are going to deny them a better one?  DNF (does not
follow).

The fundamental disagreement is whether a keyword system is bound
to be "more complex".  I would argue that it can be more flexible
without being more "complex", in the sense that it will take less
knowledge to use it well (compared to the present system).

lawrence@encore.UUCP (Scott Lawrence) (10/02/85)

>
> prohibit posting the same message to multiple newsgroups.
>
>Mark C. DiVecchio    K3FWT

An even simpler one would be to prohibit the posting of followups to multiple
groups, and requiring that a multiple-group posting specify a single group 
for Followups.


-- 

    Scott Lawrence
    UUCP: {decvax,allegra,linus,ihnp4}!encore!lawrence

nazgul@apollo.uucp (Kee Hinckley) (10/02/85)

....
Just as a nice generic followup that really applies to any code, but which
I find particularly relevant here, since I have gotten frustrated over news
and this issue many times.  

If anyone does get around to implementing any of the things discussed here 
(and I certainly hope they do) please, pleaSE, plEASE, PLEASE keep the user 
interface seperate from the rest of the code!

Why?
    So those of us who don't like the interface you implement (or want to
    modify it to take advantage of the capabilities of their machine),
    can easily plug in a new interface.  Every once and a while I get
    frustrated enough to want to do this to readnews; just pop off
    and modify the command handling routine to take its commands in
    a different form (DOMAIN/Dialogue to be specific).  Soooorrrry.

How?
    It's simple.  Just make one routine that gets the input from the user.
    Make all of the actions performed subroutines that are called from
    that routine.  Likewise, make all of the output routines seperate from
    the rest of the code.  No printfs scattered about the program, and no
    reads either.  Then if someone wants to use curses, or some other
    smart display handler, they can easily find the pieces that they will
    have to modify to make it work.

So why don't I do it if I'm so stuck up about how it is done?  
    Sure!  Just give me a small sum of money and a some spare time.  In a 
    crunch I'll skip the money, but the spare time is non-negotiable.

Some other notes, since I'm here.

    o   I like the suggestion of using a real database to store stuff (mdbm
        perhaps?).  That could greatly speed up some things, and allow the
        use to specify some very complicated restrictions on getting articles.
        Presumably the database would only store the news-headers, with a ptr
        to the actual article location.

    o   I also definitely like the idea of having a seperate server to gather
        the stuff.  I can see two kinds of servers, one a global server that
        gathers the news article headers and stores them in a global database,
        the other a user run server that updates the local database from the
        global one (deleting old entries and adding new ones).  Then along
        comes the user and can access everything fast and at his/her leisure.

This is all kind of fragmented (I have some more specific stuff, but its on
paper), but anyway, there's my two-cents.


                                            Kee Hinckley
                                            User Environment
                                            Apollo Computer
                                            ...decvax!wanginst!apollo!nazgul

days@glasgow.glasgow.UUCP (Judge Dredd) (10/03/85)

> I disagree, since the current setup will be replaced by a set of keywords
> that will allow you to define and filter material in what I hope will be a
> more efficient way. newgroups as they are currently defined would map into
> keywords pretty well, and you could set up your filtering mechanisms to let
> you browse through a set of interests. 

	The problem as I see it with keywords is that people would abuse them.
A better method to cut down traffic would be to default to local distribution
The number of adverts for cars being sold in NJ which we receive in EUROPE is
truly horrendous. Also cancel articles which dont contain anything except
quotes from a previous article. We get quite a few long articles, which after
you've waded through pages of lines beginning in "> " have a
REPLACE LINE WITH MESSAGE line and finally a signature. An alternative ( which
should appeal to all net.fascists) is to prevent users posting stuff or posting
stuff off-site, until they've proved that they know what they're doing.
	Little things like prompting the user for a 'y' response after a message
would also help :-)
-- 
Stephen Day, Comp Sci Dept, University of Glasgow, Scotland

seismo!mcvax!ukc!glasgow!days		If time were like a treacle bun,
					I would enjoy it so,
					But now it seems it's on the run,
					I'd really better go.

jss@sjuvax.UUCP (J. Shapiro) (10/03/85)

Chuq, I didn't see your original posting, but I am familiar with
several keyword schemes, and in general, I find they work pretty
well.  The catches that I see are as follows:

	1. All keyword systems in successful use at the moment have some
	   central authority controlling the meaning of keywords and the
	   selection of which keywords properly convey a topic.  Otherwise
	   there is no way to reliably search for stuff on a given topic.
	   I don't see how this can be maintained on the net.

	2. Given 1., How is a given user to keep informed of the current
	   set of active keywords, particularly in that your article on
	   the Amiga posted under the keywords "micro" and "amiga"
	   would go right through my keyword scan for "cbm" and I
	   probably would not find out about this until too late.

	3. Is there a strategy which can be adopted with respect to
	   informing users about new keywords which does not in
	   effect automatically resubscribe the user to a bad newsgroup
	   each time a keyword is invented?

I am also not convinced that this solves the newsgroup proliferation
problem, as the invention of new keywords which duplicate old ones wil
happen all the time, with the net effect of increasing my .newsrc
indefinitely.

These problems can be solved in a number of ways, and the best seems
to be to keep a history on keywords and expire them if they fall below
a certain usage level.

I am curious, though, how you plane to do the keyword to file
mapping.  Any information you might provide would be appreciated.

Jon
-- 
Jonathan S. Shapiro
Haverford College

	"It doesn't compile pseudo code... What do you expect for fifty
		dollars?" - M. Tiemann

biep@klipper.UUCP (J. A. "Biep" Durieux) (10/04/85)

>> ... As things are here, it's a pain to
>> always wait for 2 minutes for the editor to load for each reply, followup or
>> new posting.

I think the slowness of Pnews is one of the more important reasons net collapse
hasn't happened yet.
-- 
							  Biep.
	{seismo|decvax|philabs|garfield|okstate}!mcvax!vu44!biep

	To be the question or not to be the question, that is.

fair@ucbarpa.berkeley.edu (Erik E. Fair) (10/05/85)

With regard to Chuq's comment about an `accolade' control message for
propagating `I liked that' marks on articles; I thought briefly about
that and decided that it's too hard to collect such information on a
netwide basis (nor would you really want to; there are just too many of
us), which is why I advocated it on a site wide basis only.

The other thing is that people with similiar professional interests are
loosely grouped by site.

With regard to keywords, it should be noted that I was advocating
automatic generation of a list of keywords from the text of the
article.  While this technique has some obvious problems (how many
keywords would you label this article with? How many of those words did
I actually use in the body of the article?), it is clearly superior to
people doing it on this network for two reasons:

	1. consistency
	2. higher probability of the selected keywords
		actually reflecting message content.

As it has been exhaustively pointed out, people are bad at selecting
keywords. In this area, we can expect the network community to be
better than average, but considerably worse than our expectations. All
you have to do for proof of this is look at the keywords that people
are attaching to articles now, even though the software does nothing
with them!

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

fair@ucbarpa.berkeley.edu (Erik E. Fair) (10/05/85)

In article <126@sdencore.UUCP> mark@sdencore.UUCP (Mark DiVecchio) writes:
>
>One simple step, which has to have been suggested before, is prohibit
>posting the same message to multiple newsgroups.

No, no, no! What you are suggesting is the removal of useful information.
What I have been suggesting is the addition of useful information to the
headers of netnews articles, and software in the user-interfaces to make
use of the information. One such piece of information is the list of news
groups that an article was posted into.

Besides, multiple posting allows someone to clearly mark an article as
being of interest to more than one interest group. I cross posted the
original article in this chain to net.news and net.news.notes, because
I was discussing interface and information issues that involve both
software systems (and communities).

Further, disallowing cross posting will not discourage the practice; it
will just remove useful filtering information from the header(s), when
people post several separate copies of the same thing to several
newsgroups. In the current scheme, I save disk space (on UNIX systems
running netnews, only one copy of this article is around, although it
appears in two newsgroups), and users who have read it in one
newsgroup, will not have to read it again provided that their
user-interface is sufficiently clever (most of them are, these days).

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

dee@cca.UUCP (Donald Eastlake) (10/06/85)

I don't really see why or how your are going to "prohibit" postings of a
message to more than one newsgroup.  (If your really try to stop them
with the obvious software check people can always post multiple copies
to different groups, etc., so its really best, if they are going to post
multiply anyway, to get them to put all the groups on one copy.)

What you *do* want is some smarts to handle multiple newgroup postings
so a user don't have to read the same message more than once, can say
things like they don't want to see anything that *includes* net.flame or
whatever in its list, etc.  Possibly it would make sense to provide an
ordering so net.flame dominated net.misc which dominated net.general and
net.followup which dominated all others in terms of now a message is
handled.  Possibly also some subset logic might help.  You could do
something clever about something that was sent simultaneously to various
subsets of x.y, x.y.z, sub-set-of-x.y, and sub-set-of-x.y.z.
  
-- 
	+1 617-492-8860		Donald E. Eastlake, III
	ARPA:  dee@CCA-UNIX	usenet:	{decvax,linus}!cca!dee

lauren@vortex.UUCP (Lauren Weinstein) (10/07/85)

One problem with automatic keyword generation is that you tend
to get LOTS of keywords.  The more keywords, the more "false"
matches (in either a positive or negation sense).  That is, words
that have been classified as keywords by the system but which
have little or nothing to do with the primary topic of the
article "confuse" the match system.  Articles that you really
didn't want to see start showing up as matches (since they matched
on "extraneous" keywords) and articles you wanted to see may often be missed
(since search keys that specified the exclusion of articles containing
certain keywords will trigger on all these "extra" keywords as well!)

I can point at a variety of real-world examples for both of these
keyword error modalities if desired.

--Lauren--

tim@k.cs.cmu.edu.ARPA (Tim Maroney) (10/07/85)

The idea of some means for "reviewing" messages and using others' "reviews"
as a filter will be in the bulletin board system for VICE.  However, since
it will be on VICE, there will be only one copy of the articles for the
whole campus, so it doesn't take broadcasting issues into account.

For a situation like USENET's, the best solution is to designate "reviewers"
for each newsgroup.  A finite set of reviewers would exist for each
newsgroup.  Each would rank (either by a Boolean worth reading/not worth
reading scale, or from one to ten) each message, although since this would
be a volunteer activity, everyone would not be expected to rank each
message.  Users would set up combinations of reviewers in a file to be used
together with the current .newsrc file.  For instance, you could specify, "I
want to read everything which Andy and Betty thought was interesting in
net.pets.fishheads, but which Chuck disliked".

The reviews would be propagated by a new sort of control message.  Each
machine would keep a database of articles and reviews.  This could be done
easily in dbm, although some machines would have to fall back on a two-field
text file (in classic slow UN*X style).  Another database would keep track
of reviewers.

Reviewers would be selected by the informal method currently used to select
moderators.  Additional control messages could be used to maintain the
database of reviewers.  There would be little security due to the ease of
putting a message with a false sender address onto USENET (and the
insecurity of requiring a password to be sent around), but I doubt this
would be a serious problem.

Overall, you'd need these pieces of code:

	(1) store/retrieve operations for article-review database
	(2) store/retrieve operations for reviewer database
	(3) code to receive and handle new control messages
	(4) a user interface for reviewers (extension of readnews
	    or rn)
	(5) a program to help users set up their preferences (possibly
	    within readnews or rn)
	(6) code within readnews and rn to determine whether any article
	    fits the user's review criteria
	(7) code to remove the reviewer information of an article when
	    it's cancelled or it expires

This effort could be split up among several people if we can arrive at a
more specific set of specs for each part.  I'd be willing to write a few
parts myself.

The idea could also be elaborated on to assuage the problem of storing the
news.  For instance, a majority of reviewers giving thumbs down on a message
could be equivalent to a cancellation, or "expire" could run every day and
eliminate all articles N days old or older which have M/(number of revewers
for its groups) or fewer thumbs up ratings, as well as deleting anything
more than a certain age.
-=-
Tim Maroney, CMU Center for Art and Technology
ARPA:	Tim.Maroney@CMU-CS-K	uucp:	seismo!cmu-cs-k!tim
CompuServe:	74176,1360	audio:	shout "Hey, Tim!"

skip@gitpyr.UUCP (Skip Addison) (10/08/85)

In article <10550@ucbvax.ARPA> fair@ucbarpa.berkeley.edu (Erik E. Fair) writes:
> ...
>The other thing is that people with similiar professional interests are
>loosely grouped by site.
> ...
>	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

Wrongo!!  For instance, relatively few people at Georgia Tech read net.lan.
Since I'm in the Office of Telecommunications and Networking, I consider
that an important group to read.  And the people who read the articles
I write and write the articles I read are almost never at Tech.

The accolade system, if implemented, should work accross all sites, but
may be limited to a homogeneous (if there is such a thing) network like
usenet, arpanet, or whatever.

-- 
"Here I stand, for I can do no other."   -- Martin Luther

Skip Addison
The Office of Telecommunications and Networking
Georgia Tech, Atlanta GA  30332-0348
Southern Bell, AT&T, MCI, etc:   (404) 894-6866
CSNet:	Skip @ GATech		ARPA:	Skip.GATech.CSNet @ CSNet-Relay.ARPA
uucp:  ...!{akgua,allegra,hplabs,ihnp4,linus,seismo,ulysses}!gatech!skip

dhb@rayssd.UUCP (David H. Brierley) (10/08/85)

I think that what was meant by the suggestion to prevent
postings to multiple groups was that the software should
somehow prevent a person from posting an article to several
groups ONE AT A TIME.  Cross-postings should most
assuredly be allowed.  The only way that I can see to prevent
a user from posting the article more than once is to maintain
some sort of database that defines what articles the user
has submitted and then compare the articles.  Perhaps just
a simple comparison of the subject lines would be sufficient.
-- 
	Dave Brierley
	Raytheon Co.; Portsmouth RI; (401)-847-8000 x4073
	...!decvax!brunix!rayssd!dhb
	...!allegra!rayssd!dhb
	...!linus!rayssd!dhb