[alt.hypertext] Hypertext Usenet

gnu@hoptoad.uucp (John Gilmore) (11/07/87)

[I would've cross posted to comp.society.futures but it's a "mod"
group and doesn't allow links to other groups.]

[Apple's Hypercard has more to do with "hype" than with "hypertext".
If you are looking for a hypertext system, look elsewhere.  I think
of hypercard as "shell scripts for the Mac".]

Now to the real topic -- the Hypertexting of Usenet.  People [in comp.
society.futures] have been proposing various strange character
conbinations to indicate hypertext content.  This is pretty silly.

(1)  We haven't figured out what kinds of information we want to
convey, so picking a representation is premature.

(2)  We already have a representation for the major thing we need
-- document to document links.  This is the <messageid@uniquehost>
notation.

Most proposed hypertext systems give the ability to link one piece
of text with another one, down to the character or word level.
Usenet currently only provides this at the article level, but for
the next few years I think that's fine.  Current literary references
(citations, bibliographies, footnotes, etc) typically refer to the
page or section level, which is about the same amount of text as
a Usenet article.

-----

A major problem with turning the Usenet into a hypertext system is
the automated following of links.  Let's say I have an article which
references article <1234@hop.toad.com>.  I don't have a copy of 1234.
(Maybe it expired, maybe I didn't subscribe to it, maybe it got dropped
by somebody 3 feeds away.)  How do I get a copy?

Currently this is all done manually.  Though there are large archives
kept at various places, automated retrieval, even if you know the
unique message-ID of the article, is in an infant stage.

Before we start considering how to build the user interfaces and such,
I think we should shore up the infrastructures so that all the data
which is *somewhere* accessible on the network can be gotten without
human intervention.  *Then* build mechanisms, beyond the current
References: lines and such, for indexing this information so that you
can go from a desire-for-info-on-widgets to a bunch of article-IDs
to the actual articles.

Ideally I'd like to see a distributed database, updated when any user
does an "s" command to save a copy of an article (if that user & site
are willing for other people to be able to get it from them), that
would allow anybody else to locate and retrieve that article.  Hugh
Daniel and Jeff Anton and I sat down and designed a candidate database
setup a month ago, and it may be doable with a year or two of work.

-----

This is not to say that we should abandon user interface work on the
Usenet; far from it!  But the timbers under the net are pretty rotten
for the kind of loads we will want to put on 'em, once we have better
user interfaces.

	John
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
Love your country but never trust its government.
		      -- from a hand-painted road sign in central Pennsylvania

bryce@hoser.berkeley.edu.UUCP (11/07/87)

In article <3296@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>
>Now to the real topic -- the Hypertexting of Usenet....

The main thing I personally would find useful for Usenet, short of
complete hypertext, is a reader ratings system.  Since I may have only a 
peripheral interrest in certain high volume groups I'd set a ratings
threshold much higher than a group near and dear to my everyday life.

Don't bother asking me about implementation details, I just want to *use*
the thing. :-)

|\ /|  . Ack! (NAK, SOH, EOT)
{o O} . bryce@hoser.berkeley.EDU -or- ucbvax!hoser!bryce
 (")
  U	"Fanitic: One who can't change his mind, and won't change the
	 subject."

webber@brandx.rutgers.edu.UUCP (11/08/87)

In article <3296@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> ...
> (2)  We already have a representation for the major thing we need
> -- document to document links.  This is the <messageid@uniquehost>
> notation.

Yes, and these documents are generally small enough to not need substructure
addressing.  

> ...
> A major problem with turning the Usenet into a hypertext system is
> the automated following of links.  Let's say I have an article which
> references article <1234@hop.toad.com>.  I don't have a copy of 1234.
> (Maybe it expired, maybe I didn't subscribe to it, maybe it got dropped
> by somebody 3 feeds away.)  How do I get a copy?

The simple thing would be to have each system that participates in the
database archive all messages posted from it (using mag tapes where necessary).
If someone was interested in giving general access to messages from
multiple sites, they could simply request a copy of the tapes
[assuming that the original holders would view the reduced request
handling as being worth the price of a blank tape :-) ].  A catalog of
sites that are keeping archives and what sites they are archiving then
becomes our equivalent to the library Serials List and the whole
system is our ``inter-library loan.''

Minimal technical implementation problems.  As to the other, you will
find the solution on page 290 of Suzette Haden Elgin's The Judas Rose
(DAW Books, Inc., Feb. 1987).  [How was that for a comment in the true
spirit of hypertext :-) ]

> ...
> Ideally I'd like to see a distributed database, updated when any user
> does an "s" command to save a copy of an article (if that user & site
> are willing for other people to be able to get it from them), that
> would allow anybody else to locate and retrieve that article.  Hugh
> Daniel and Jeff Anton and I sat down and designed a candidate database
> setup a month ago, and it may be doable with a year or two of work.

Hmmm.  I guess we would all be interested in hearing more about your
design.  But, from what I have heard of the success of the arpa people
with their  ``named'' distributed database, I would rather try and build a
central computer big enough to support everyone in the world connected
simultaneously instead.  First things first, but easy before hard :-)

-------- BOB (webber@aramis.rutgers.edu ; rutgers!aramis.rutgers.edu!webber)

rsalz@papaya.bbn.com.UUCP (11/08/87)

] Each site archives all its messages on magtape, e.g.  Then others could
] ask them for a copy of that magtape.  "A catalog of sites that are keeping
] archives and what sites they are archiving then becomes our equivalent to
] the library Serials List and the whole system is our ``inter-library loan.''"
Cute idea, but needs more infra-structure.  I could see every junior
hacker sending away to mimsy in order to get the complete words of "Chris
Torek on BSD" -- completely overloading that site with hundreds of
requests, each of which is "reasonable" in and of itself.  As anyone who's
done a non-commercial software distribution can tell you, this kind of
thing rapidly becomes a big hassle.

> ... sat down and designed a candidate database
> setup a month ago, and it may be doable with a year or two of work.

>Hmmm.  I guess we would all be interested in hearing more about your
>design.  But, from what I have heard of the success of the arpa people
>with their  ``named'' distributed database, I would rather try and build a
>central computer big enough to support everyone in the world connected
>simultaneously instead.  First things first, but easy before hard :-)

Ahh, one big machine to serve the world?  You mean like Multics, which was
to serve the entire Boston/Cambridge community?  The problem the
nameserver folks (users and maintainers) are having is that the system is
just not big/fast enough.  Not many people think the basic concept is
broken.  Unless you complete control over growth (does anyone ever have
complete control over growth), you had best lay your groundwork to allow
distributed mechanisms otherwise your system is guaranteed to become
insufficient at some point.

But what about "better" than Usenet hypertext:  Where's Ted Nelson?
Where's the Brown IRIS group?  Doesn't UMichigan have something in the
fire?  What about interactive videodisc?

Comments?
	/r$
-- 
For comp.sources.unix stuff, mail to sources@uunet.uu.net.

webber@brandx.rutgers.edu.UUCP (11/09/87)

In article <234@papaya.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> ] Each site archives all its messages on magtape, e.g.  Then others could
> ] ask them for a copy of that magtape.  "A catalog of sites that are keeping
> ] archives and what sites they are archiving then becomes our equivalent to
> ] the library Serials List and the whole system is our ``inter-library loan.''"
> Cute idea, but needs more infra-structure.  I could see every junior
> hacker sending away to mimsy in order to get the complete words of "Chris
> Torek on BSD" -- completely overloading that site with hundreds of
> requests, each of which is "reasonable" in and of itself.  As anyone who's
> done a non-commercial software distribution can tell you, this kind of
> thing rapidly becomes a big hassle.

First, in order to request CWOCTOBSD they would have to know all the
message-id's relevant, which should slow down the requests a tad bit :-)

Second, while we sit around designing the ideal system, mimsy is
expiring messages as fast as chris can write them.

Third, clearly requests are handled when convenient for the site.  One could
easily set up a standard form, such as ``send Message-ID: <.....>'' as
subject line (reminescent of netlib).  When enough requests pile up
(or someone has time) a tape gets mounted, the requests sorted by id number
to minimize rewinds and the requests handled automatically.  Bounce-backs 
from ``postmaster'' will probably always be the biggest problem -- I wonder
how the netlib people handle it.

> Ahh, one big machine to serve the world?  You mean like Multics, which was
> to serve the entire Boston/Cambridge community?  The problem the

I didn't say it was easy, only easier.

> nameserver folks (users and maintainers) are having is that the system is
> just not big/fast enough.  Not many people think the basic concept is
> broken.  

Depends on what you think the ``basic concept'' is.  Their current
algorithms corrupt their own database leading to some rather quaint
problems.  Word is they are doing a complete rewrite.

> Unless you complete control over growth (does anyone ever have
> complete control over growth), you had best lay your groundwork to allow
> distributed mechanisms otherwise your system is guaranteed to become
> insufficient at some point.

Perhaps, but it doesn't mean that it is an approach that is currently
technically feasible.  Writing a distributed database is much like
writing an operating system.  Would you want to write an operating system
for a machine whose communication with its disk is as faulty as a
uucp link and whose disks were as varied as Usenet sites (each of
which to be managed by a different person)?

---------- BOB (webber@aramis.rutgers.edu ; rutgers!aramis.rutgers.edu!webber)

paul@umix.UUCP (11/16/87)

In article <234@papaya.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>Doesn't UMichigan have something in the fire?
>

Indeed ... we're not even going to the rosebowl this year -:)

Seriously, part of NSF's EXPRES is being done here, primarily at an
organization called citi.  That is a multimedia mail project, based on
BBN's Diamond, but also incorporating more work.  I could ask someone
there to provide a blurb for this group to read.  In fact, I will.

Along the lines of a shared database ...

Here we have a locally produced mailer running on IBM architecture
mainframes.  It has features that take advantage of the fact that all
messages are stored in one structured file.  So, you can retrieve sent
messages for reediting, retrieve 'deleted' messages, see history chains
of a series of replies, etc.

I am working on something similar for workstations.  My starting base
is an Apollo ring, because they have features I like.  I am especially
using something they call extensible streams.  All messages won't go
into one file -- they will be distributed around the ring -- but there
will be some global database of info for these messages.

So, /usr/spool/mail won't be a directory -- it will be an object that
will have info about the mail messages extant in that file system,
which is kind of what /usr/spool/mail is now.  /usr/spool/mail/paul
will be info about paul's mail.  finally, paul's mail will be in typed
files, with the idea that different message types can be accomodated.

--paul
-- 
Trying everything that whiskey cures in Ann Arbor, Michigan.
Over one billion messages read.