[comp.society.futures] Filtering A Global Hypermedia Network

Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/18/87)

[Drexler presented a stimulating talk at the MIT Media Lab yesterday
on hypertext, which quickly developed into a discussion about how to
filter out all the junk which would be attached by billions of people
to trillions of documents, and to zoom in on precisely that small set
information which is most valuable for one's purposes.  Drexler is
seeking to make an important distinction between micro-hypertext--for
instance, programs like Hypercard whose domain is the information space
of an individual user--and macro-hypertext, whose domain is the
information space of the entire world.  I agree with him that the
latter technology is far more interesting than the former.  (Let me
emphasize that the terms "micro-hypertext" and "macro-hypertext" are my
own invention, and may not be properly descriptive.  Following is a
message about his talk to another list, the context of which should be
obvious. -- WHM]

Drexler's hypertext talk seemed to serve mainly as an occasion to
discuss the problem of filtering the deluge of often trivial
information which the creation of a global hypermedia network will
inevitably exacerbate by many orders of magnitude.  A few thoughts:

The filtering problem might best by solved by a regular communication
between two agents: (1) one's intelligent personal assistant which runs
continuously and automatically in the background on one's local
machine, monitoring, analyzing, and weighting one's attentional
patterns, interests, and cognitive styles and capacities, and (2) a
global superintelligence which is stocked full of algorithms which
combine the best insights and rules of thumb from the best minds in all
fields for measuring the worth of and prioritizing new information in
general and within specific domains.

The personal assistant would build an ever-evolving and changing
model of its master's mind, and periodically communicate it to the
global intelligence; the global intelligence, in turn, would recommend
from the set of all information in the world the best set of
information (in its wise estimation) which would satisfy the needs, and
and maximize the personal development, of the user, taking into
account the present state of his or her knowledge, resources, etc.

This scheme might really not be as far-fetched as it sounds.  Even
now it would be feasible to write a program running as a background
meta-task which would monitor a user's computing activity and
determine, say, that Walter or Mary is more interested in Qlisp than
hypertext, and even more interested in multiprocessing micros than
Qlisp.  A simple analysis of word frequencies would suffice, but one
can imagine even more sophisticated algorithms to fine tune the
cognitive and attentional profile.  Walter or Mary would probably not
know all the language forms in which information about multiprocessing
micros is expressed, or all the sources of information in the world on
the topic ranked by value, but the global intelligence most certainly
would.

The basic elements of a global intelligence for information
evaluation are already in place.  Consider, for instance, the mammoth
citation indexes produced by the Institute for Scientific Information
in Philadelphia.  ISI has developed formulas for measuring the citation
frequencies and citation impacts of authors, works, serials and
organizations.  The presumption is that an object with a high citation
impact might be more worthwhile to pay attention to than one with a low
citation impact.  Another approach in citation analysis has been to
uncover automatically networks and families of authors, works, serials
and organizations through co-citation analysis.  Objects which are
often co-cited with other objects are usually closely connected
conceptually.

Citation analysis is only one of many methods that could be
integrated into a global information evaluator that would make gentle
recommendations to a personal assistant offering it a user profile.

Of course, the privacy issue will be raised by many.  The simple
solution is to turn off your personal assistant, or leave it on but
don't let it talk to the global brain.

The bottom line in all of this is that by maximizing the personal
development of each individual, the full potential of society as a
whole can be fulfilled to the benefit of everyone.

Marvin at Drexler's talk referred to an information retrieval system
(Indexor?), developed by David Waltz's group at Thinking Machines,
which locates other documents out in information space which closely
resemble by certain complex criteria a given document at hand.  Does
this program only run on Connection Machines?  Will a smaller version
of it be developed for personal computers?

Wayne

mt@MEDIA-LAB.MEDIA.MIT.EDU (Michael Travers) (11/19/87)

Intelligent filtering is an idea that's been around for a while.  I'm
sure it will be useful and probably necessary, but as a concept it
seems to bypass the fundamental idea of hypertext.  Filtering implies
a single undifferentiated stream of messages, with a person or a
process picking out the interesting ones using some set of criteria.
This is how e-mail works now--all messages are collected into a serial
ordering in your mailbox, and you provide the filtering (some mail
readers provide some help in this).

But there is no good reason to collapse a hypertext into a stream in
the first place!  A hypertext is a network, and if it is densly
connected, you should rarely have a need to do global searches on it.
Instead, you use your favorite index to find an entry point to the
area your are interested in, and chase references from there.
Browsing, or spreading activation, is a better metaphor than filtering
here.

Local filtering on references might be useful (ie, if you are doing
some form of cognitive science and believe that neuroscience has
little to offer, you might have a rule that says "be less interested
in any articles if the journal name contains the string 'neuro'.")

But global filtering will be mostly unnecessary if the hypertext is
any good.  It's a method for dealing with non-hypertext-ness of
current media, and not the thing to be have foremost in mind when
thinking about hypertext systems.

Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/19/87)

Even after a robust global hypermedia network is brought into being
--one which richly interlinks the full semantic and propositional
content of all the texts and digitized audio/visual works in the
world, including all the informal and spontaneous nth levels of
commentary on primary works by anyone and everyone--one will still
require an intelligent filter to make this information manageable, to
access it and use it most productively and not get buried by trivia. 
Perhaps powerful filters, albeit local not global, will be required
_especially then_, even more than now.

Even if one is stationed on a fairly specialized node of hypermedia
knowledge space--let's say parallel processing programming
languages--each day is likely to bring into one's mailbox or dynabook
far more items, and pointers to items, and pointers to pointers to
items, etc. than anyone would be able usefully to sort through and
prioritize with the aim of reading carefully even a small percentage of
the take.  In this situation a global superintelligence and information
evaluator would be helpful in deciding which handful of hundreds or
thousands of links and pointers attached to a given item is, from the
cognitive context of a particular user, worth tracking down in depth. 
Hypermedia will not alter the fundamental human constraint that we read
words and documents serially, one after another, and that while the
volume of new information is exploding, the time in which to select and
meaningfully absorb knowledge from the world remains constant.

Another thought on how a global intelligence might use user models and
profiles assembled by personal assistants: any GM (Global Mind) worth
its salt, and able to learn from its experience, would be able to say
in the case of Person X, I've seen nearly 5,000 cases like this one
before; by abstracting all the knowledge from those previous cases,
there is a high probability that X needs or wants Y, but doesn't yet
know it and wouldn't know how to get Y if he or she even knew that Y
was required and available.  I know exactly the best way to open X's
mind to the knowledge that Y is probably what they need to pay
attention to now to get on with the next stage in their development.

Of course, the privacy issue still looms large in all this, and the
potential for abuse (as in the thought recognition research you
earlier alerted us to) is enormous.  One should always have the option
simply to say no to cooperating with a personal assistant which in
turn is cooperating with a global supermind.  In a worst-case scenario
one's micro could become one's figurative jailer, the oppressive agent
of a police state, and not really your good buddy in the quest for
self-realization.

Yet another thought: each day now in many settings, from global
networks to small BBS's, thousands of email messages are being
exchanged, many of which are gropiong in deep ignorance on sundry
topics.  The person who left a message on a Virginia BBS yesterday
(this actually happened) which reveals a misunderstanding of
cryptographic techniques doesn't know that someone somewhere else in
the world, on a network that is invisible to him, left a message on the
same day and on the same topic that dispells this ignorance.  One can
see possibilities in these situations for an automateed Global Referee
(to be turned on or off at will, of course), specializing in spreading
the light and tearing down walls of ignorance.  How about a standard
hypermedia link/property: is-more-authoritative-than.

Wayne

madd@BUCSF.BU.EDU.UUCP (11/19/87)

[About intelligeng filtering]
|But there is no good reason to collapse a hypertext into a stream in
|the first place!  A hypertext is a network, and if it is densly
|connected, you should rarely have a need to do global searches on it.
|Instead, you use your favorite index to find an entry point to the
|area your are interested in, and chase references from there.
[...]
|Local filtering on references might be useful
[...]
|But global filtering will be mostly unnecessary if the hypertext is
|any good.  It's a method for dealing with non-hypertext-ness of
|current media, and not the thing to be have foremost in mind when
|thinking about hypertext systems.

I think this is true.  Consider an example:  (relatively) recently the
Grolier Encyclopaedia was placed on CD-ROM.  A large amount of
processing time went into generating a fantastic cross-reference for
the encyclopaedia.  While this isn't hypertext by definition, you
can see how the idea applies; the computer was used to generate the
cross-links that would have been user-generated in a hypertext
environment.

Anyway, the cross-reference ended up being about the size of the
encyclopaedia but made it possible to find even obscure references in
only seconds WITHOUT A GLOBAL SEARCH.  This encyclopaedia dealt with
only a few megabytes (200?  something like that) though; it would be
interesting to see what would happen if you're dealing with several
magnitudes of that amount.

I suppose the only problem with hypertext is that the user might not
generate the extensive links that you might like, either through
laziness or ignorance.  In any case this problem will become more
serious as your database grows larger, so you'll probably need either
an automatic link-generator or some sort of global search mechanism to
help find items that were not properly linked.  I'd opt for the
automatic link generator since it'd be easier to search a database of
link topics than the entire database!

jim frost
madd@bucsb.bu.edu

dm@BFLY-VAX.BBN.COM.UUCP (11/19/87)

Perhaps I'm stating the obvious, but these referees sound very much
like two things:

   1) a good reference librarian.
   2) a good editor of an eclectic journal (e.g., Harper's or the Atlantic). 

Indeed, when one is just ``interested in things'' one goes to a
magazine like Harper's and browses (Harper's and the Utne Reader are
particularly well suited for this).  I would expect that hypermedia
will have the equivalent: people who prospect the fields of hypermedia
and leave behind a trail that others can follow to the gold.  Ted
Nelson's hypertext project, Xanadu, devotes a great deal of attention
to royalties in order to encourage this practice.

Until computer programs are interesting companions in their own right,
I'll bet people will be able to do this better.

When one is researching a particular project, the services of a good
research librarian are invaluable.  Research librarians come in two
flavors: generalists (the kind you'll find at the public library) and
specialists (the kind you'll find in a university department's reading
room or library).  These people are experts at gleaning information
from the library.  They spend many years learning their trade and
learning their library.  In designing a hypertext filter, it is the
expertise of these people that you'll want to tap.  

I expect that in the hypertext morass, there will still be people you
go to whose expertise and advise will guide you through the twisty
maze of hypertext links to the valuable information.  Those people
will develop tools to bring to bear to help you find your way to your
goal (there's a REASON it's called library SCIENCE (though it should
probably be called ``library engineering'')).  Now THERE would be
an expert system to develop...

FRUIN@HLERUL5.BITNET (Thomas Fruin) (11/20/87)

 > From:    Wayne McGuire <Wayne%OZ.AI.MIT.EDU@XX.LCS.MIT.EDU>
 > Subject: Filtering A Global Hypermedia Network

What is the rationale for bringing a "global superintelligence" in to solve
the filtering problem for a global hypermedia network?  There are _so_ many
disadvantages of having one centralized body: impracticality due to size,
reliability (what if the thing goes down), and the issue of privacy you
already mentioned.

Of course "turning off your personal assistant" if you are worried about
privacy is no solution at all.  That's like solving the many car accidents
by refraining from driving.

-- Thomas Fruin

   fruin@hlerul5.BITNET
   thomas@uvabick.UUCP
   2:500/15 on FidoNet

   Leiden University, Netherlands

Wayne@OZ.AI.MIT.EDU.UUCP (11/20/87)

> Date:     Fri, 20 Nov 87 01:36 N
> From: <FRUIN%HLERUL5.BITNET@BUACCA.BU.EDU> (Thomas Fruin)
> 
> What is the rationale for bringing a "global superintelligence" in to
> solve the filtering problem for a global hypermedia network?  There are
> _so_ many disadvantages of having one centralized body: impracticality
> due to size, reliability (what if the thing goes down), and the issue
> of privacy you already mentioned.

Impracticality due to size: with nanotechnology and Crays that will
fit in pocket watches or teeth?

Reliability: why can't a global mind or global hypermedia advisor
replicate itself each day and be distributed by fiber optic or
superconductive links in multiple copies throughout all the cities in
the world?  If one goes down, just turn on another.

Privacy: yes, a serious problem, but you should realize that we
already leave behind us a large and detailed digital trail which
profiles our most intimate habits of mind.  Many large corporations
and government agencies can access and manipulate that data now.  Your
privacy is already long gone.

So why would one want a global hypermedia advisor?  For the same
reasons, I suppose, that most of us would rather take advantage of the
resources of the Library of Congress or Harvard's Widener Library than
those of our local public library: knowledge and power.  It's a basic
human drive.

Wayne

Wayne@OZ.AI.MIT.EDU.UUCP (11/20/87)

> Date: Thu, 19 Nov 87 09:14:57 est
> From: madd@bucsf.bu.edu (Jim Frost)
> 
> Anyway, the cross-reference ended up being about the size of the
> encyclopaedia but made it possible to find even obscure references in
> only seconds WITHOUT A GLOBAL SEARCH....

The power of the indexing system for the Grolier CD-ROM lies
precisely in the fact that IS based on a global analysis of the total
text.  An intelligent agent (no doubt a few people armed with computers
and the appropriate software) scanned the entire text for conceptual
links.  Any users of the Grolier CD-ROM are taking advantage of an
indexing scheme built on this global preprocessing.  You don't have to
conduct a global search, because someone has already done it for you,
although no doubt your or my personal global analysis would turn up
radically different links than did Grolier's editors.

ISI's citation indexes, which cover the majority of the world
scientific literature, are also based on a global scan, in this case of
millions of documents.  It is impossible to predict what journal in
what domain will refer to a given document, and so it is necessary to
analyze (nearly) all the scientific journals in the world to uncover
citation links.

Never underestimate the necessity for or power of global analysis.
Any local structure is only as robust as its knowledge of the entire
world.  Presumably a global hypermedia advisor would be very robust
indeed.

Wayne

FRUIN@HLERUL5.BITNET.UUCP (11/20/87)

 > Date:    20 Nov 1987  05:18 EST (Fri)
 > From:    Wayne McGuire <Wayne%OZ.AI.MIT.EDU@XX.LCS.MIT.EDU>
 > Subject: Filtering A Global Hypermedia Network

If communications speeds are going to be so much higher, what's the point in
cramming everything into one big hypermedia adviser?  I thought the way of the
future was networking.  A more likely prospect is that each person's advisor
queries several databases around the world and copies whatever relevant
information it finds there.  Big centralized systems will always stay slow,
impractical, and unreliable because with the advancement of technology the
amount of digitized information is growing at an ever faster rate.

 > Privacy: yes, a serious problem, but you should realize that we
 > already leave behind us a large and detailed digital trail which
 > profiles our most intimate habits of mind.  Many large corporations
 > and government agencies can access and manipulate that data now.  Your
 > privacy is already long gone.

You're very cynical here, and maybe you are right.  I want to think there is
still hope, though, and in that case a centralized hypermedia advisor is not
the way to go.  There is a big difference in leaving behind a _public_ digital
trail (like messages in newsgroups) and a trail that "profiles our most
intimate habits of mind".  What do you mean by that?

In Holland a new law will soon take effect regarding databases that store
information about people.  It's basic premise is that a database should have
a GOAL, i.e. to send you your electricity bill or to keep track of your car's
registration number.  It is FORBIDDEN two match or combine any two databases
that don't have the same goal.  You can take anybody to court who does so
anyway. This should make it very hard for corporations and government agencies
to access any information about you.

-- Thomas Fruin

   fruin@hlerul5.BITNET
   thomas@uvabick.UUCP
   2:500/15 on FidoNet

   Leiden University, Netherlands

bzs@BU-CS.BU.EDU (Barry Shein) (11/20/87)

From: <FRUIN%HLERUL5.BITNET@BUACCA.BU.EDU> (Thomas Fruin)
>If communications speeds are going to be so much higher, what's the point in
>cramming everything into one big hypermedia adviser?  I thought the way of the
>future was networking.

I've had some conversations with folks here who are working on large
hypertext projects and some of them in fact do not believe the future
is in networks at all.

One major reason they cite is inevitable frustration of dealing with
the necessary central organization who would be running the network
(and, of course, varying scepticism on the available bandwidth.)

The system of the future they envision would be something more like a
desktop, high-speed multi-processor with CD-ROM readers and a nice
stack of CD-ROMs (not unlike your current CD player.) People would buy
sets of CDs to start collections (not unlike investing in a good
encyclopaedia) and beyond that would either buy them in typical ways
or subscribe to "CD of the month" clubs which might send you all of
the previous months journals w/in some field (or popular mags,
whatever.)

To be more up to date you might use a network to peruse very current
stuff, it's not either or, but the network may not be a critical
component.

Another very important point that was stated was: How do you make
money on networks? Connect charges? Access charges, etc? Nuisance
service organizations and open-ended costs, blech. Notice all the
hostility towards the phone company? People will leap at alternatives
like private collections.

You get what you want, when you want and you (the service org) doesn't
have to figure out how to get everything on-line at all times (that
is, analogous to the reason that VCRs sell better than attempts at
Pay-per-view cable services.)

There's far more money (they claim) to be made in selling everyone
their own copies of the stuff and that's where the "smart" money is
going.

Remember, this is not so much an issue of what is possible (eg.
discussing suitably high-speed network technology) but where the MONEY
is going to go for R&D. And there is some indication that it prefers
the idea of publishing and sales to building service organizations.
There's a very heavy socio-economic aspect here that cannot be
overlooked.

	-Barry Shein, Boston University

Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/22/87)

A global hypermedia advisor doesn't need to be a big centralized
system in the sense of storing the full text of all the documents in
the world, but it should be a supreme index of indexes--a clearinghouse
of pointers to pointers to pointers ad infinitum to all the information
chunks and information chunk types (including the full text of
documents and document elements) on all the networks in the world.  An
analogy might be the Harvard Union Catalog, which stores easily
accessible pointers to all the works in the many libraries in the
Harvard library network.  But a GHA would be much more powerful than,
say, the HUC, since it would embody the best knowledge of the best
experts in the world about the conceptual structures of their
domains.

I am not being cynical about privacy, merely realistic.  Regarding
"intimate habits of mind": certainly one's banking and telephone
records, which chronicle in exquisite detail what one buys and with
whom one communicates, provide an in-depth psychological profile to
the eye of an acute analyst.  Holland and other nations may be passing
laws to restrict access to these records in the usual case, but the
security and intelligence establishments of most of these countries can
find loopholes and exceptions in these laws through which to drive
fleets of Mack trucks.

As a general rule, whatever flows through a telecommunications
channel should not be considered private.  James Bamford in _The Puzzle
Palace_ outlines the methods of the NSA for intercepting and analyzing
global telecommunications.  England's Government Communications
Headquarters and the Soviet Union's KGB (or the Soviet equivalent to
the NSA) are engaged in the same activities.  They don't capture
everything, but they get enough.  They probably have as much regard for
the spirit and letter of the public privacy laws as do drivers on the
Massachusetts Turnpike for the 55 mph speed limit.

As far as protecting your privacy from the general public, I assume
that with a global hypermedia advisor one could choose how much of
one's profile to make public, or one could choose not to interact with
the system at all.

Current online database vendors like Dialog and Mead Data Central are
already foreshadowings (albeit extremely primitive) of a GHA.  It is
interesting to recall that under the reign of John Poindexter, of
Irangate fame, the NSC was seeking to gain legal access to the records
of these companies, which store sensitive information about the search
targets and patterns of their users.  As I recall, the NSC was denied
legal access by Congress, but then there is always the problem of
illegal access, which is relatively trivial to accomplish wholesale by
intercepting telecommunications.

Wayne