[news.software.b] Future of USENET

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (06/25/91)

[note crossposts, followup]

 mathews@hadar.cs.Buffalo.EDU (Ryan Mathews) writes:

> But I just have to ask one question:
> Don't you think we're creating too many groups?

No, quite to the contrary, I think we are creating
far too few, and our news maintenance and news
reading software is poorly set up for the growth
which should occur to make up for this.

My reasoning is based on weak data, but to the best
of my ability to determine, in the last five years,
the volume of postings has risen 25-fold, while the
number of newsgroups has risen only tenfold. That
makes each newsgroup 2.5 times as crowded on average
as was the case back when it was still (barely)
possible for a single human being to read the entire
net as a full time occupation.

I attribute this deficiency of organizational
improvements to the cumbersome newsgroup creation
process. Once having determined that the net needs
to get back to a much finer split of newsgroups to
give readers any hope of reading interesting
material without wading through ten times as much
uninteresting material, why should anyone but the
current readers of a newsgroup be involved in the
partition of that group into subgroups?

The rest of the net has only three possible
responses when presented with a vote for an
unfamiliar group: 1) ignore the vote; 2) vote in
ignorance; or 3) vote NO on the "principle", however
misguided, that there are "too many newsgroups".

What is needed instead is a re-examination of this
whole question, and the creation of software and
operating paradigms to satisfy the following poorly
met needs:

1) Index the net, so that groups of interest can be
found by keyword searches; even a full text search
of the entire online news spool, while slow as mud,
would be a help in this direction.

This would actually _lessen_ the need for group
creation, by showing the user that topic X is
already heavily discussed in newsgroup Y, and so
doesn't need a newsgroup of its own to get a
discussion going.

2) Index and automate feed sys file maintenance, so
that, while all group propagate to those who want to
read them, uninterested sites are omitted from
carrying, and to the extent possible from passing,
unwanted newsgroups. Among other things this will
require a much denser set of interconnections for
the net than now exist, and software to accomplish
the much more complex feed contact protocols and
expiry protocols needed.

This will save scads of spool space and telecomm
charges.

3) Change the news base to a hypertext style, to
limit the actual volume used for passing context
material in followups.

This would save space, and if actually presented by
some news readers as hot buttons, would also
dramatically decrease reading time for a subscriber
following a thread who already has the context in
mind and doesn't need to see it again.

4) Present newsgroup choices hierarchically, to let
the user view the actual newsgroup organization, and
to limit screen painting time for newsgroup
selection; change from a typing to a pointing
interface.

The more I read news, the less satisfied I find
myself with _any_ particular order of presentation
of newsgroups; I tend to read in different orders on
different days or even different hours of the same
day. None of the current interfaces I've seen make
random access to newsgroups easy.

5) Take much more advantage of user-local processing
power; this one is tough because of the wide variety
of news reading hardware, but lots of stuff that I
have to access over slow dial up lines repeatedly
during my session could be downloaded silently to a
database on my local hardware while I read other
articles, and painted on my screen much faster (about
30 times) from local store.

This would actually _decrease_ the communications
load on the host machine.

6) Create an easy to use compliment to kill files:
interest files, such that only files that meet some
positive criterion are presented for reading, rather
than negative criteria being avoided.

As an example, show me articles containing at least
five of twenty keywords, or articles starting new
threads. Make a global facility that pulls forward
articles from _anywhere_ I subscribe, or even
anywhere at all, containing ten of twenty
particularly hot keywords, and presents them to me
before I enter any newsgroup, in case that is all I
have time for right now.

7) Improve the software to cope gracefully with lots
more newsgroups, with much deeper hierarchies, with
longer, less typable, fully qualified newsgroup
names.

For example, I just found out that one of the two
leaf site packages for my local hardware has a very
hard limit of 30 characters in a fully qualified
newsgroup name, because it makes that a directory
name rather than using a hierarchical directory
structure. Unfortunately, my local site already has
several newsgroups whose fully qualified name is
longer than 30 characters!

Naturally typing one of these behemoths in rn or trn
to jump directly to the newsgroup is a royal pain.

There are lots more ideas along the same direction.
The net has become an information overload for any
one person, and even individual newsgroups are such
for many of us. Lacking better access mechanisms,
finer newsgroup partitioning is at least a start.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

flee@cs.psu.edu (Felix Lee) (06/28/91)

We're currently at 20M of news a day in 1200 newsgroups.  Three years
ago it was 4M/day and 400 newsgroups.

I'd like to propose 1G/day and 100 000 newsgroups as a target for
normal operation in some future news system.  Start designing now.

1G/day is volume of news generated by posters everywhere, not
necessarily volume of news exchanged between two sites.

100 000 newsgroups is a rough measure of breadth.  Perhaps newsgroups
will be replaced by something better.

I just want to inject some real numbers in the fuzzy handwaving of
"doing this will help us cope with greater news volume".
--
Felix Lee	flee@cs.psu.edu

spike@coke.std.com (Joe Ilacqua) (06/28/91)

In article <21eHwfd$@cs.psu.edu> flee@cs.psu.edu (Felix Lee) writes:

<We're currently at 20M of news a day in 1200 newsgroups.  Three years
>ago it was 4M/day and 400 newsgroups.

	And there are more than twice that number of newsgroups
available.

->Spike

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (06/28/91)

 sksircar@stroke.princeton.edu (Subrata Sircar) writes:
> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:

>> The rest of the net has only three possible
>> responses when presented with a vote for an
>> unfamiliar group: 1) ignore the vote; 2) vote in
>> ignorance; or 3) vote NO on the "principle",
>> however misguided, that there are "too many
>> newsgroups".

> Very few groups ever get rejected, however. The
> opposite argument is more true; that the only
> people who vote are the people in favor of the
> split.

Not quite; there are a steady 45 or so folks who
vote against every group, no matter the merits,
because they think the net has too many newsgroups.
This means you'd better aim for 150 YES votes to
pass a group, not just 100.

>> 1) Index the net, so that groups of interest can
>> be found by keyword searches; even a full text
>> search of the entire online news spool, while
>> slow as mud, would be a help in this direction.

> With proper naming, this is easy; just grep the
> news spool for directory names.

I wish it were that easy; read a few research papers
on the relative success rate of keyword searches
against even full text indexed databases; the
results are pretty sorry. Humans do _not_ have a
good unspoken agreement about what words should be
used to talk about which subjects, so you have to
use lots of keywords against lots of pertinent text
to have a good chance of finding what you seek.

Take a look at another posting in this thread, from
Richard Miller, that bemoans the difficulty of
keeping conversations correctly slotted in a mere
_three_ education newsgroups. Just looking at the
group names isn't nearly enough, though it can of
course help; I use it myself a lot, but less in
looking for a subject than in finding a fully
qualified group name I only remember in part.

>> 3) Change the news base to a hypertext style, to
>> limit the actual volume used for passing context
>> material in followups.

> This is unfortunately extremely difficult, given
> the number of character based interfaces to the
> net. How do you generate hypertext interfaces that
> can be manipulated only through 7-bit ascii codes,
> which is what the majority of the net uses?

Up until someone got a little too clever installing
facist options in inews, there was a common
agreement on the net that a leading ">" (or several)
indicated included material, so reserving a marker
seems the right thing to do. The Thinker(tm)
hypertext package encloses words which are hypertext
link hotbuttons in "<", ">" pairs. Mix this with a
message id, start byte, end byte contents (which
need not be displayed that way to the user) and you
have the essence of a hypertext link, done in
printable ASCII.  I'd prefer that the display to the
user in the hot button show the user-id of the author
of the included material, with a level number in case
the thread contains quotes from that author from more
than one prior article.  So what the user sees would
look like "<xanthian-1>" to indicate a most recent
level quote from me had been included by the present
article's author.

We could continue the convention of keeping this
left adjusted on a line alone, or tag it on the end
of the previous paragraph to save space if our news
displayer did real time paragraph flowing and worked
in meaningful (SGML) units of text.

>> 6) Create an easy to use compliment to kill
>> files: interest files, such that only files that
>> meet some positive criterion are presented for
>> reading, rather than negative criteria being
>> avoided.

> This is possible with rn, I don't know about other
> newsreaders.

The operative word is "possible"; I use this with
alt.flame to pull out napalm aimed at my personal
carcass, but it is an inconvenient side effect of
trn mechanisms meant for other purposes, and quite
clunky. An "interest" filter designed explicitly for
this purpose could be better designed.

>> As an example, show me articles containing at
>> least five of twenty keywords, or articles
>> starting new threads. Make a global facility that
>> pulls forward articles from _anywhere_ I
>> subscribe, or even anywhere at all, containing
>> ten of twenty particularly hot keywords, and
>> presents them to me before I enter any newsgroup,
>> in case that is all I have time for right now.

> The first is conceptually easy; run twenty "mark"
> files on the newsgroup, and only present articles
> which are marked by five or more (storing the
> numbers in separate files, unmarking all when
> done).

Thinking harder about that, what I'd probably want
is "N occurrances of some subset of these M keywords
with at least R differnt keywords appearing;
persistent mention is a better clue than casual
mention.

> The second is conceptually just as easy, but
> tremendously difficult in current practice.

Not conceptually harder, just that our machines are
nowhere near fast enough to do the job; a Connection
Machine wired into the disk drive hardware would be
Just About Right.

>> Naturally typing one of these behemoths in rn or
>> trn to jump directly to the newsgroup is a royal
>> pain.

> This can be done with filename completion, as
> t-shell editing or Mach on the NeXT provide;
> simply type part of the name and hit a hot key and
> it finishes unique extensions.

Yeah, except that hierarchy names by design aren't
unique until you get close to the end. Doing
completion a level at a time would be better, but
a point and click interface would be _much_ better.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

david@twg.com (David S. Herron) (06/28/91)

In article <21eHwfd$@cs.psu.edu> flee@cs.psu.edu (Felix Lee) writes:
>We're currently at 20M of news a day in 1200 newsgroups.  Three years
>ago it was 4M/day and 400 newsgroups.
>
>I'd like to propose 1G/day and 100 000 newsgroups as a target for
>normal operation in some future news system.  Start designing now.

Hmm.. part of the design has to be a fat enough wire leading
to my home.  A quick calculation 

	| 1 - boomer:david --> bc
	scale=3
	1000000000/24
	41666666.666
	./60
	694444.444
	./60
	11574.074

Says the wire will need to handle over 100Kbits-per-second all
day long.  Hmm.. up til now my TB+ seemed pretty capable of 
the job.  Sigh.. if only I were Rob Pike AT&T would be willing
to pay for T1 lines to the house, ohwell. ;-)

ISDN doesn't cut it -- it's only 56Kbaud.

But then it will require 6 doublings of traffic.  In the
past each doubling has required 18 months, up until the
a.s.pictures nonsense hit us.  Reaching 1G/day should then
take somewhere between 3 and 10 years, depending on whether
the volume curve is really increasing.

-- 
<- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>
<-
<-
<- "MS-DOS? Where we're going we don't need MS-DOS." --Back To The Future

amanda@visix.com (Amanda Walker) (06/29/91)

In article <9197@gollum.twg.com> david@twg.com (David S. Herron) writes:

   Says the wire will need to handle over 100Kbits-per-second all day long.

SMDS--no problem :).

I just wanna see what UUNET would be like at that point...
--
Amanda Walker						      amanda@visix.com
Visix Software Inc.					...!uunet!visix!amanda
-- 
"Speak in French when you can't think of the English for a thing--
 turn out your toes as you walk-- and remember who you are!"
		--Lewis Carroll

sef@kithrup.COM (Sean Eric Fagan) (06/29/91)

In article <9197@gollum.twg.com> david@twg.com (David S. Herron) writes:
>ISDN doesn't cut it -- it's only 56Kbaud.

ISDN is 64kbaud.  Two 64kbaud lines and an 8kbaud line (for information and
control).

If and when net traffic gets that high, one of a few things will happen.
The net could split up into a couple of subnets.  Various sites might become
more restrictive in what groups they get.  Telebit might come up with a
T1-capable "modem" requiring multiple ordinary phone lines, or something
similar.  Telecommunication speeds might move over one notch (i.e., home
lines get 56kbaud or higher, businesses start to get t1, universities get
connected via t3 lines, and the really desperate hook up at t11 lines 8-)).
People might start getting daily newsfeeds on laserdisks 8-).

>Reaching 1G/day should then
>take somewhere between 3 and 10 years, depending on whether
>the volume curve is really increasing.

I'd say more like 5 years, as a reasonable estimate (longer if people stop
posting pictures and sounds).  And that should be enough time for technology
to do something.

-- 
Sean Eric Fagan  | "What *does* that 33 do?  I have no idea."
sef@kithrup.COM  |           -- Chris Torek
-----------------+              (torek@ee.lbl.gov)
Any opinions expressed are my own, and generally unpopular with others.

brad@looking.on.ca (Brad Templeton) (06/29/91)

For a medium like USENET there is no need for a fast TWO-way wire into your
home.   After all, your cable TV wire is getting the equivalent of 50 trillion
bytes a day.   Current ethernet does 108 gigabytes, FDDI rates ten times as
much.

USENET is broadcast and the technology is easily in place for quite a lot.
In fact, a single gigabit channel could easily handle the entire typing output
of the human race.   Of course, we want to send sound and pictures so the output
level increases.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

wb8foz@mthvax.cs.miami.edu (David Lesher) (06/29/91)

Al-la Stargate, all we need to do is PCM the non-fiber optic
source - Sol, and put photocells on the roof. 

Of course, we might need a Pierson's Puppeter and someone lucky to
help with some of the complicated stuff.
-- 
A host is a host from coast to coast.....wb8foz@mthvax.cs.miami.edu 
& no one will talk to a host that's close............(305) 255-RTFM
Unless the host (that isn't close)......................pob 570-335
is busy, hung or dead....................................33257-0335

jtk@weber.UUCP (Joe T. Klein) (06/29/91)

On the future of USENET...

Anybody have a table of the year-by-year growth in traffic?

At some point the curve must level off!! I don't think I can
justify getting a fiber-optic link just to read the news. :-)
I just hope people don't start trading CD length digital
music via USENET. ;-)

-- 
  Life of a UNIX hack is...                            Joseph T. Klein
  /nev/dull                                            jtk@rwmke.lakesys.com
                               (414) 372-4454
  RiverWest Milwaukee Public UNIX ** 808 E. Wright St., Milwaukee, WI  53212

asp@uunet.uu.net (Andrew Partan) (06/30/91)

In article <1991Jun28.211649.20438@visix.com>, amanda@visix.com (Amanda Walker) writes:
> In article <9197@gollum.twg.com> david@twg.com (David S. Herron) writes:
>    Says the wire will need to handle over 100Kbits-per-second all day long.
> I just wanna see what UUNET would be like at that point...

Well we now have 4 machines handling news - one for all incoming &
outgoing nntp, one for uucp batching, & two to handle compress.  Two for
compress is over kill, but they were just sitting there...

Once I get a news replicator in place (so that I can have the same
article path names on multiple systems), we will be splitting the uucp
batcher into multiple pieces (and getting rid of the uunet!uunet hack in
the Path: that we have now) and then we should be able to scale, at
least as far as news goes.  Scaling uucp (or whatever) is going to take
more work.

	--asp@uunet.uu.net (Andrew Partan)